<a href="https://colab.research.google.com/github/cosraj/learning_keras_with_tensorflow/blob/main/MyKeras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf

In [2]:
from tensorflow import keras

In [3]:
tf.__version__

'2.3.0'

In [4]:
keras.__version__

'2.4.0'

In [5]:
fashion_mnist = keras.datasets.fashion_mnist

In [6]:
(X_train_full, y_train_full), ( X_test, y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [7]:
X_train_full.shape

(60000, 28, 28)

In [8]:
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0

In [9]:
X_train.shape

(55000, 28, 28)

In [10]:
y_valid, y_train = y_train_full[:5000] , y_train_full[5000:] 

In [11]:
y_valid.shape

(5000,)

We are dividing by 255 since each pixel can have a value up to 255...by diving it with 255, we convert it to a binary...0 or 1. This is part of the standard scaling of the features

In [12]:
X_test = X_test/255.0

In [13]:
class_names = ["T-Shirt/top","Trouse", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle Boot"]

In [14]:
class_names[y_train[0]]

'Coat'

In [15]:
model = keras.models.Sequential()


We start building the model here
First, we create a Flatten input layer which takes each instance of test data, flattens the images 784 features in a 1D array with 784 columns

We then add a Dense layer with 300 neurons using RELU activation function. Each Dense layer manages its own weigh matrix ( sum of weights times the parameters plus biases). This is required since during backpropagation, the weights would have to be adjusted. 

Next wee add a Dense layer with 100 neurons followed by a Output neuron with softmax activation since this is a multi classification problem

By specifying the input_shape[28x28], we are telling the shape of the weight matrix. If you don't specify this explicitly, the weights of the layers are not set until the build is built after runnig through the data. So, if you know the shape before hand, you should set the input_shape

In [16]:
model.add(keras.layers.Flatten(input_shape=[28,28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100,activation="relu"))
model.add(keras.layers.Dense(10,activation="softmax"))

model.summary() shows a total of 235500 parameters because each node gets 784 features ( for each instance)..so, it will be 300 x 784 plus 300 biases. 

In [17]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1010      
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


In [18]:
model.layers
hidden1 = model.layers[1]
hidden1.name
weights, biases = hidden1.get_weights()

In [19]:
weights
weights.shape

(784, 300)

Here loss is set to "sparse_categorical_crossentroy" since fashion MNIST example from the book uses softmax activation function and we are using sparse labels ( for each instance, there is just a target class index, from 0 to 0    ). If we are using a binary classification, then we would have used "binary_crossentropy" loss and a sigmoid( logistic) activation function.  

In [20]:
history = model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])

If the fit() method returns better accuracy on the training data than the validation data, it means the mode is overfitting. Here we are not seeing that meaning there is no overfitting and the model generalizes well

In [21]:
model.fit(X_train, y_train, epochs=30, validation_data=(X_valid, y_valid))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7f0eede53588>

In [22]:
model.evaluate(X_test, y_test)



[0.32507285475730896, 0.8848000168800354]

Use the model to make predictions

In [23]:
X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)

array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.02, 0.  , 0.98],
       [0.  , 0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)

In [24]:
y_proba.shape

(3, 10)