<a href="https://colab.research.google.com/github/babai95/Fashion-Mnist-Classification-using-Tensorflow/blob/master/Fashion_mnist_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Fashion MNIST classification


In [0]:
#import tensorflow and keras
import tensorflow as tf
from tensorflow import keras

import matplotlib.pyplot as plt
import numpy as np

print(tf.__version__)

In [0]:
#load the dataset
fashion_mnist = keras.datasets.fashion_mnist

(train_images,train_labels),(test_images,test_labels) = fashion_mnist.load_data()

In [0]:
class_names = ['T-Shirt/Top','Trouser','PullOver','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']

Class_names is a list .these are the classes and train_labels are the labels.
label 0 : T-shirt/Top 
1: Trouser
2: Pullovers
3: Dress
4: Coat
5: Sandal 6. Shirt 7. Sneaker 8. Bag 9. Ankle Boot


In [0]:
train_images.shape

shape is (60000, 28,28). This means there are 60,000 images and each image is of size 28*28 pixels


In [0]:
len(train_labels)

In [0]:
train_labels.shape

In [0]:
train_labels

train_labels is an array of size 60000. the 1st value is 9 , which means that the 1st image of train_images is an ankle boot, whose label is 9


In [0]:
#print the 1st image
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.gca().grid(False)

In [0]:
print("Image is: ")
print(class_names[train_labels[0]])


In [0]:
#preprocessing the data
# 1st step is normalizing the data in the range 0 to 1 before feeding the data to the neural network model, as neural network prefers data to be in the range 0 to 1

train_images = train_images/255.0
test_images = test_images/255.0
#We are normalizing the pixel values which are in the range 0 to 255

In [0]:
#Display the 1st 25 images from the training set and the class_names of each image
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

Now we are building the neural network model. We use sequential api of keras, which uses a stack of layers and it is used in those cases where there is single i/p and single o/p, like here we will give one image as i/p and get its corresponding label as o/p


In [0]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

Here in Sequential api, we are using a stack of layers. 

1st Layer: Flatten Layer: To make it simple, we flatten the 2d image into 1d vector of size 28*28. This is just preprocessing and no ML

2nd Layer: Dense Layer: It has 128 nodes or neurons and for such simple classification problems, we use relu as activation functions for all the middle layers(here only 1 middle layer)

3rd Layer : Dense Layer. This is the last layer, which is the o/p layer and it has 10 nodes(from 0 to 9, it will output, so 10 nodes). We want to give o/p as probabilities and the node with the highest probability will be the final predicted label. As we are using probabilities, so we will use softmax as activation func.

In neural networks, if we increase the no. of layers, then network will learn more patterns and if no. of units or nodes in each layer is increased, it will learn more no. of types in each pattern, eg, in convolutional neural network, one layer learns faces, the 2nd layer learns edges and in 2nd layer if more no. of nodes are there, then it will learn more different types of edges. So, overall learning improves with large no. of layers and large no. of units in each layer.

In [0]:
#Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

While compiling, we use 2 things, optimizer and loss function. We use AdamOptimizer() as optimizer and sparse_categorical_crossentropy as loss function

In [0]:
#Fit or train the model for training set. Fit is another word for train
model.fit(train_images,train_labels,epochs=5)

Epoch means one pass over the entire training set to learn from the training set. If no. of epochs are very less, the machine wont learn the patterns properly. If the no. of epochs are very high, it would cause overfitting, i.e it will learn the patterns to such a extent that it will simply memorize the patterns in training set, and in test set, since that exact pattern will not be there, so it will not predict correctly and give bad results for test set. So, the no. of epochs needs to be optimal. 

So, after training the model, we got 2 things: loss for training set and accuracy for training set.
Lower loss and higher accuracy indicates training was good. The overall loss and accuracy are the ones on the last epoch.Training set accuracy is 92.03%

In [0]:
#Evaluate the model on test set to know how the model performs
test_loss,test_acc = model.evaluate(test_images,test_labels)
print("Test accuracy: ", test_acc)


If the test set accuracy is almost similar to training set accuracy, then there is a balance in the no. of epochs selected i.e right epoch selected.  Here Test set accuracy is 87.48%. So, test set accuracy almost similar to training set accuracy. if test set accuracy << training set accuracy, then that is the case of overfitting, where model  performs worse in test set that in training set.  And if both test set accuracy and training set accuracy are very less, then that's the case of underfitting and learning needs to be done more properly.

In [0]:
#Make predictions
predictions = model.predict(test_images)

In [0]:
predictions[0]

predictions is an array . prediction[0] is for 1st test set image and it contains probability distributions of 10 labels from 0 to 9 and we can see that probability distribution is maximum for label 9, so its final predicted value is 9

In [0]:
for i in range(25):
  print("prediction : ", np.argmax(predictions[i]))

So, here np.argmax(predictions[0]) gives the max value out of all the values in predictions[0] and that is the overall prediction