### Image classification using Fashion MNIST data set
#### This notebook investigates whether multiple CNN models can achieve higher classification accuracy than any individual model. Two simple strategies for combining models are examined:

> 1.   Classification based on the average class probabilities of models

> 2.   Using the mode class for prediction

In [3]:
# Import
import os # for file handling
import pandas as pd # for data handling
import numpy as np # for linear algebra
import time # to time runs


In [4]:
import matplotlib.pyplot as plt # to display images
from sklearn import metrics # to evaluate classification accuracy
import tensorflow as tf # for neural networks

In [5]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import BatchNormalization, Dropout

#### Get data

In [6]:
# get fashion mnist data
(x_train,y_train), (x_test,y_test) = tf.keras.datasets.fashion_mnist.load_data()

# show shapes of tensors
print("x_train shape:", x_train.shape, ", y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape, ", y_test shape:", y_test.shape)

# get number of classes
nClasses = len(np.unique(y_train)) # number of output classes
print("Number of classes: ", nClasses)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
x_train shape: (60000, 28, 28) , y_train shape: (60000,)
x_test shape: (10000, 28, 28) , y_test shape: (10000,)
Number of classes:  10


#### Pre-process data

In [7]:
# normalize grayscale pixel values (0-255) to (0,1)
x_train = x_train.astype('float32')/255 # normalized training inputs
x_test = x_test.astype('float32')/255 # normalized test inputs

# reshape to needed input shape for network
x_train, x_test = x_train.reshape((-1,28,28,1)), x_test.reshape((-1,28,28,1))
input_shape = x_train.shape[1:] # input shape for network

# show shapes of re-shaped tensors
print("x_train shape:", x_train.shape, ", y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape, ", y_test shape:", y_test.shape)

# get image dimensions
img_h, img_w, img_channels = x_train.shape[1:] # size of image
print("Image height = %d, image width = %d, number of channels = %d" 
      %(img_h, img_w, img_channels))

x_train shape: (60000, 28, 28, 1) , y_train shape: (60000,)
x_test shape: (10000, 28, 28, 1) , y_test shape: (10000,)
Image height = 28, image width = 28, number of channels = 1


#### Define function to create Convolution Neural Network

In [8]:
def convNN(model, ch1, ch2, kernel, pool, nDense, drop, dropDense):
    
    model = tf.keras.models.Sequential() # create model                
    
    # first CONV => RELU => CONV => RELU => POOL layer set
    model.add(Conv2D(ch1, kernel, padding="same", 
                     activation="relu", input_shape=input_shape))
    model.add(BatchNormalization())
    model.add(Conv2D(ch1, kernel, padding="same", activation="relu"))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=pool))
    model.add(tf.keras.layers.Dropout(drop))
 
    # second CONV => RELU => CONV => RELU => POOL layer set
    model.add(Conv2D(ch2, kernel, padding="same", 
                     activation="relu", input_shape=input_shape))
    model.add(BatchNormalization())
    model.add(Conv2D(ch2, kernel, padding="same", activation="relu"))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=pool))
    model.add(tf.keras.layers.Dropout(drop))
 
    model.add(tf.keras.layers.Flatten()) 
    
    # FC => RELU layers
    model.add(tf.keras.layers.Dense(nDense, activation='relu'))
    model.add(BatchNormalization())
    model.add(tf.keras.layers.Dropout(dropDense))
    
    # output softmax layer
    model.add(tf.keras.layers.Dense(nClasses, activation='softmax'))
    
    model.compile(loss=tf.keras.losses.categorical_crossentropy,
                  optimizer=tf.keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
 
    return model

#### Specify parameters for convolution network

In [9]:
# Parameters for CNN models (change as desired)
ch1, ch2 = 32, 64 # number of output channels
kernel = (3,3) # filter shape
pool = (2,2) # max pool size
nDense = 512 # dense layer size
drop, dropDense = 0.25, 0.5

# create model
mod = convNN('model', ch1, ch2, kernel, pool, nDense, drop, dropDense)
mod.summary() # show model summary

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 32)        320       
_________________________________________________________________
batch_normalization_v1 (Batc (None, 28, 28, 32)        128       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        9248      
_________________________________________________________________
batch_normalization_v1_1 (Ba (None, 28, 28, 32)        128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
dropout (Dropout)    

#### Define function to plot accuracy with training and validation data 

In [10]:
def plotHistorty(model, history):
    # plot history for accuracy
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title(model+' accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

#### Specify parameters for training

In [11]:
batchSize = 32 # batch size for training
epochs = 60 # number of training epochs

#### Train CNN models

In [12]:
nModels = 3 # number of models to train
probs = np.zeros((len(y_test), nClasses)) # mean probabilities for classes
result = [] # for results

# Train models
for i in range(nModels):
    model = 'model_'+str(i+1) # model name
    print('\nTraining Model: '+ model + '\n')
    bestWts = model+".weights.hdf5" # best weights file
  
  # checkpoint to save best model
    checkpoint = tf.keras.callbacks.ModelCheckpoint(bestWts,
                                                  monitor='val_acc',
                                                  verbose=1,
                                                  save_best_only=True,
                                                  mode='max')
  
  # create model
    mod = convNN(model, ch1, ch2, kernel, pool, nDense, drop, dropDense)
  
    st = time.time() # start time for training
  
  # train models and maintain history
    hist = mod.fit(x_train,
                 tf.keras.utils.to_categorical(y_train, num_classes=nClasses), 
                 batch_size=batchSize,
                 epochs=epochs, 
                 validation_split = 1/6,
                 callbacks=[checkpoint])
    t = time.time() - st # time to train model
  
    print("\nTime to train classifier: %4.2f seconds\n" %(t))
  
    mod.save_weights(bestWts) # save best weights

    prob = mod.predict(x_test) # predict probabilities for test examples
    predicted = prob.argmax(axis=1) # most likely class
    acc = metrics.accuracy_score(y_test, predicted) # best accuracy 
    print('%s Test accuracy = %4.2f%%\n\n' %(model, acc*100.0))
  
    probs += prob
    predicted = probs.argmax(axis=1) # most likely class
    accCum = metrics.accuracy_score(y_test, predicted) # best accuracy 
    print('%s Gestalt Test accuracy = %4.2f%%\n\n' %(model,accCum*100.0))
  
    result.append([model,acc,accCum,t])
    plotHistorty(model, hist) # display training and test accuracy


Training Model: model_1

Train on 50000 samples, validate on 10000 samples
Instructions for updating:
Use tf.cast instead.
Epoch 1/60
Epoch 00001: val_acc improved from -inf to 0.87740, saving model to model_1.weights.hdf5
Epoch 2/60

KeyboardInterrupt: 

#### Show results on model accuracy, cumulative accuracy, and training time

In [11]:
result = pd.DataFrame(result, 
                      columns=['model', 'accuracy', 'cum. accuracy', 'time'])
result

Unnamed: 0,model,accuracy,cum. accuracy,time
0,model_1,0.9374,0.9374,954.793582
1,model_2,0.9356,0.9411,962.929622
2,model_3,0.9411,0.9447,967.744258


#### specify label for classes

In [0]:
items = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 
         'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] # labels
item = dict(zip(range(10), items)) # create dictionary mapping class to labels

#### Show confusion matrix

In [13]:
cm = metrics.confusion_matrix(y_test, predicted)
cm = pd.DataFrame(cm, columns=items)
cm.insert(0,"True",items)
cm

Unnamed: 0,True,T-shirt/top,Trouser,Pullover,Dress,Coat,Sandal,Shirt,Sneaker,Bag,Ankle boot
0,T-shirt/top,900,0,14,5,3,1,74,0,3,0
1,Trouser,0,989,0,8,0,0,1,0,2,0
2,Pullover,18,1,919,6,29,0,27,0,0,0
3,Dress,9,0,7,950,17,0,16,0,1,0
4,Coat,1,0,14,20,935,0,30,0,0,0
5,Sandal,0,0,0,0,0,989,0,9,0,2
6,Shirt,72,0,26,23,53,0,821,0,5,0
7,Sneaker,0,0,0,0,0,4,0,985,0,11
8,Bag,1,1,1,5,0,1,1,1,989,0
9,Ankle boot,0,0,0,0,0,3,0,26,1,970


#### Show classification report

In [14]:
print(metrics.classification_report([items[i] for i in y_test], 
                                    [items[i] for i in predicted]))

              precision    recall  f1-score   support

  Ankle boot       0.99      0.97      0.98      1000
         Bag       0.99      0.99      0.99      1000
        Coat       0.90      0.94      0.92      1000
       Dress       0.93      0.95      0.94      1000
    Pullover       0.94      0.92      0.93      1000
      Sandal       0.99      0.99      0.99      1000
       Shirt       0.85      0.82      0.83      1000
     Sneaker       0.96      0.98      0.97      1000
 T-shirt/top       0.90      0.90      0.90      1000
     Trouser       1.00      0.99      0.99      1000

   micro avg       0.94      0.94      0.94     10000
   macro avg       0.94      0.94      0.94     10000
weighted avg       0.94      0.94      0.94     10000



#### Use mode class for prediction

In [15]:
predClass = pd.DataFrame()
for i in range(nModels):
  model = 'model_'+str(i+1)
  print("processing " + model)
  bestWts = model+".weights.hdf5" # best weights file
  mod.load_weights(bestWts) # load best weights
  prob = mod.predict(x_test) # predict probabilities for test examples
  predClass[model] = prob.argmax(axis=1) # mst likely class
  
modeClass = predClass.mode(axis=1)
modeAcc = metrics.accuracy_score(y_test, modeClass[0])
print("Accuracy based on mode class = %4.2f%%" %(100.0*modeAcc))

processing model_1
processing model_2
processing model_3
Accuracy based on mode class = 94.37%


#### Compare accuracy for easy and hard classes
- Note that 'T-shirt/top', 'Pullover', 'Coat', and  'Shirt' are harder to classify than other items

In [18]:
hardClasses = [0,2,4,6] # classes with low classification accuracy
hardIndxTest = [i for i in range(len(y_test)) if y_test[i] in hardClasses]
easyIndxTest = [i for i in range(len(y_test)) if y_test[i] not in hardClasses]
print('Test data contains %d hard and %d easy examples' 
      %(len(hardIndxTest), len(easyIndxTest)))

accEasy = metrics.accuracy_score(y_test[easyIndxTest], predicted[easyIndxTest])
accHard = metrics.accuracy_score(y_test[hardIndxTest], predicted[hardIndxTest])

print('Test accuracy with easy examples = %4.2f%%' %(100.0*accEasy)) 
print('Test accuracy with hard examples = %4.2f%%' %(100.0*accHard)) 

Test data contains 4000 hard and 6000 easy examples
Test accuracy with easy examples = 97.87%
Test accuracy with hard examples = 89.38%


#### Observations:

1.   A simple CNN can achieve classification accuracy of over 93%
2.   Combining 3 models improves accuracy around 94.4%

1. It takes around 16 seconds per epoch using Colaboratory GPU accelerator and Test accuracy does not improve significantly after the first 20 epochs

1.   Combining a few more models trained over 20 epochs may further improve classification accuracy in a resonable amount of time.

1.   Classification accuracy is significantly lower for 4 classes: 'T-shirt/top', 'Pullover', 'Coat', and 'Shirt' 



#### Opportunities for improvement:


1.   Devise alternate methods for combining models
2.   Increase the diversity of constituent models

1.   Introduce regularization methods that prevent over-fitting beyond 20 epochs
2.   Develop a two-phased approach:  Predict using a combination of models in the first phase and use a separate model to re-classify examples predicted as 'T-shirt/top', 'Pullover', 'Coat', or 'Shirt












