# Introduction

I have applied 3 approaches to solve the image classification problem, out of which the second approach of using simple CNNs worked out to be best. I had also used data augmentation using ImageDataGenerator function of keras but the results were almost the same as the ones got without using data augmentation. Since using data augmentation has quite a big impact on training time, i did not find the results motivating enough to use data augmentation and hence i have not used data augmentation in this notebook.

There in total 4 sections including this introduction, the other 3 are : 


*   Approach-1 (Using ANNs)
*   Approach-1 (Using CNNs)
*   Approach-1 (Using pretrained CNNs)




# Approach-1( Using ANNs) 

**Importing Required Libraries**

In [0]:
import keras 
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout, Activation
import numpy as np
import pandas as pd
import pickle
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, Callback
from sklearn.preprocessing import StandardScaler

**Reading FIles**

Using Python's pickle library to read pickle files.

*   Training image data is present in train_images
*   Training Labels are present in train_labels
*   Test image data is present in test_images





In [0]:
with open('train_image.pkl', 'rb') as f:
    train_images = pickle.load(f)

with open('train_label.pkl', 'rb') as f:
    train_labels = pickle.load(f)
    
with open('test_image.pkl', 'rb') as f:
    test_images = pickle.load(f)

**Going Through the data and preprocessing**

First converted all the read files into numpy arrays, then checked their shape for understanding the structure of data

In [0]:
train_images = np.array(train_images)
train_labels = np.array(train_labels)
test_images = np.array(test_images)

print(train_images.shape)
print(test_images.shape)
print(train_labels.shape)

**Feature Scaling**

Using sklearn's standardScaler function to transform data such that it's distribution will have a mean value 0 and standard deviation of 1. This helps in better fitting the model in less time.

In [0]:
sc = StandardScaler()
train_images = sc.fit_transform(train_images)
test_images = sc.transform(test_images)

**One Hot Encoding**

Since the problem is multi-label classification problem, we first convert the labels in on hot encoded form using to_categorical function from keras library. The function returns one hot encoded vector with number of labels = maximum value of label present in the passed train_labels. SInce the labels present in the train_labels are {0, 2, 3, 6}, we mapped the labels as follows:

0 -> 0
2 -> 1
3 -> 2
6 -> 3

Before final submission, we will transform the results back to original labels

In [0]:
print(set(train_labels))
    
for i in range(len(train_labels)):
  if train_labels[i] == 2:
    train_labels[i] = 1
  elif train_labels[i] == 3:
    train_labels[i] = 2
  elif train_labels[i] == 6:
    train_labels[i] = 3
    
print(set(train_labels))

train_labels = np_utils.to_categorical(np.array(train_labels))
print(train_labels.shape)

**Building the model**

Here we build our ANN model with 8 layers using Dense and Dropouts layers. All the layers except the last have relu activation to introduce non-linearity. The last layer uses softmax activation to output probabilities of each class.

In [0]:
model = Sequential()
model.add(Dense(512, input_shape = (784,), activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(256, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(128, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(64, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(32, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(16, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(8, activation = 'relu', kernel_initializer = 'uniform'))
model.add(Dense(4, activation = 'softmax'))

model.summary()

**Some Tuning before finally fitting the model**

To save our model weights, we used ModelCheckpoint, it's parameters are set in order to save the weights to the current working directory every time validation accuracy improves.

We used Early Stopping to stop training the model once the validation accuracy stops increasing after a set number of epochs (patience value).

We used ReduceLROnPlateau to reduce the learning rate by a set factor each time validation accuracy stops increasing after a set number of epochs (patience value).

In [0]:
checkpoint = ModelCheckpoint("model_ANN-{val_acc:.2f}.h5", monitor="val_acc", verbose=1, save_best_only=True,
                                 save_weights_only=True, mode="max", period=1)

stop = EarlyStopping(monitor="val_acc", patience=50, mode="max")
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
                                            patience = 3,
                                            #patience=5, 
                                            verbose=1, 
                                            factor = 0.5,
                                            #factor=0.25, 
                                            min_lr=0.00001)

**Training The model**

Finally compiling and fitting the model. We used categorical crossentropy as the loss function and accuracy as the performance metric.
The model will run for around 100 epochs depending upon when the early stopping stops the training.

In [0]:
model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(train_images, train_labels, validation_split = 0.2, epochs = 500, callbacks = [checkpoint, stop, learning_rate_reduction])

**Loading Weights**

We will have some weights been saved in the directory after successfully training the model. The names of the weights has name according to the validation accuracy (for eg: weight with 90% validaion accuracy score will have a name - model_ANN-0.90.h5). 
We then manually load the weight with best validation accuracy and finally make the predictions on test data.

In my case the best weight was with validation accuracy of 53.3, so i load the weights with name: model_ANN-0.53.h5

In [0]:
model.load_weights('model_ANN-0.53.h5')

Since the model performance is not satisfactory, we will use some different approach

# Approach-2 (Using CNNs)

**Importing Required Libraries**

In [0]:
import pickle
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras import backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, Callback
from keras.utils import np_utils
import os as os
from tqdm import tqdm
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split

**Reading FIles**

Using Python's pickle library to read pickle files.

*   Training image data is present in train_images
*   Training Labels are present in train_labels
*   Test image data is present in test_images





In [0]:
with open('train_image.pkl', 'rb') as f:
    train_images = pickle.load(f)

with open('train_label.pkl', 'rb') as f:
    train_labels = pickle.load(f)
    
with open('test_image.pkl', 'rb') as f:
    test_images = pickle.load(f)

**Data Preprocessing**

We will use to_categorical funciton from keras library to transform labels into one-hot-encoded vectors. The function returns one hot encoded vector with number of labels = maximum value of label present in the passed train_labels. SInce the labels present in the train_labels are {0, 2, 3, 6}, we mapped the labels as follows:

0 -> 0 2 -> 1 3 -> 2 6 -> 3

Before final submission, we will transform the results back to original labels

We then split the training set into training and validation set using sklearn's train_test_split function, Following a general trend we will put 80% data into training and 20% into validation set.

In [0]:
print(set(train_labels))
    
for i in range(len(train_labels)):
  if train_labels[i] == 2:
    train_labels[i] = 1
  elif train_labels[i] == 3:
    train_labels[i] = 2
  elif train_labels[i] == 6:
    train_labels[i] = 3
    
print(set(train_labels))

train_complete_images, train_complete_labels = train_images, train_labels

train_images, val_images, train_labels, val_labels = train_test_split(train_images, train_labels, test_size = 0.2, random_state = 20)
print(np.array(train_images).shape, np.array(train_labels).shape)
print(np.array(val_images).shape, np.array(val_labels).shape)

Since the pixel values of all the images in training, validation and test set are stored as one vector, we will need to reshape the vector to form a image out of it and to be able to run a CNN model.

Since number of pixel for a training set were 784, an obvious guess was to try to reshape it into (28x28), because 28x28=784. We then checked whether the reshaped image make some sense or not by plotting it using PIL library of python.

We will then perform **Feature Scaling** by dividing the training, validation and test images by 255(which is the max value of a pixel). This helps in better fitting the model in less time. 

In [0]:
train_images = np.array(train_images).reshape((6400, 28, 28, 1)).astype('uint8')
img = Image.fromarray(train_images[5000].reshape((28, 28)).astype('uint8'), 'L')
plt.imshow(img)

val_images = np.array(val_images).reshape((1600, 28, 28, 1)).astype('uint8')
img = Image.fromarray(val_images[50].reshape((28, 28)).astype('uint8'), 'L')
plt.imshow(img)

test_images = np.array(test_images).reshape((2000, 28, 28, 1)).astype('uint8')
img = Image.fromarray(test_images[1500].reshape((28, 28)).astype('uint8'), 'L')
plt.imshow(img)

train_images = train_images/255.
val_images = val_images/255.
test_images = test_images/255.

We saw that the plotted image makes some sense and hence our guess to reshape the pixel array was correct

Transform the train and validation labels to one-hot-encoded vectors

In [0]:
train_labels = np_utils.to_categorical(np.array(train_labels))
val_labels = np_utils.to_categorical(np.array(val_labels))

**Building the model**

Used Convolutional, MaxPooling , Dropout, Batch Normalization and Flatten Layers to build a CNN model.
The model uses padding parameter to be 'same' in order to not reduce the dimension of the the image at each layer, using Max pooling layers helps in extracting the main features in the previous layers. 
Batch Normalization maintains the mean to be 0 and deviation to be 1 for each layer, which helps in faster learning. 
We used Dropouts in between in order to avoid overfitting
Finally flattened all the activation units to form a vector of 1 dimension and passed it through dense layers to eventually output 4 values. The last layer uses softmax activation function to output probabilities of occurrence of 4 classes.

In [0]:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding="same", input_shape=(28, 28, 1)))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(32, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
 
# second CONV => RELU => CONV => RELU => POOL layer set
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
 
# softmax classifier
model.add(Dense(4))
model.add(Activation("softmax"))
model.summary()

**Some Tuning before finally fitting the model**

To save our model weights, we used ModelCheckpoint, it's parameters are set in order to save the weights to the current working directory every time validation accuracy improves.

We used Early Stopping to stop training the model once the validation accuracy stops increasing after a set number of epochs (patience value).

We used ReduceLROnPlateau to reduce the learning rate by a set factor each time validation accuracy stops increasing after a set number of epochs (patience value).

In [0]:
checkpoint = ModelCheckpoint("model_CNN-{val_acc:.2f}.h5", monitor="val_acc", verbose=1, save_best_only=True,
                                 save_weights_only=True, mode="max", period=1)

stop = EarlyStopping(monitor="val_acc", patience=50, mode="max")
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
                                            patience = 3,
                                            #patience=5, 
                                            verbose=1, 
                                            factor = 0.5,
                                            #factor=0.25, 
                                            min_lr=0.00001)

**Training The model**

Finally compiling and fitting the model. We used categorical crossentropy as the loss function and accuracy as the performance metric.
The model will run for around 120-130 epochs depending upon when the early stopping stops the training.

In [0]:
model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=["accuracy"])

history = model.fit(train_images, train_labels, validation_data = (val_images, val_labels), epochs=500, batch_size = 256, callbacks = [checkpoint, stop, learning_rate_reduction])

**Loading Weights**

We will have some weights been saved in the directory after successfully training the model. The names of the weights has name according to the validation accuracy (for eg: weight with 90% validaion accuracy score will have a name - model_CNN-0.90.h5). 
We then manually load the weight with best validation accuracy and finally make the predictions on test data.

In my case the best weight was with validation accuracy of 88.375, so i load the weights with name: model_CNN-0.89.h5

In [0]:
model.load_weights('model_CNN-0.89.h5')

**Making Predictions**

Here we predicted the probabilities for test images and then used argmax function of numpy library to find the index of maximum probability among the 4 predicted probabilities. The result of this is the model's prediction of which class a test image belongs.

We then transformed back the labels changed previously back to the original ones, using the reverse mapping of the previously discussed mapping

In [0]:
pred = model.predict(test_images, verbose = 1)

predictions=np.argmax(pred,axis=1)
print(predictions)

In [0]:
predictions_transformed = []
for i in range(len(predictions)):
    
    if predictions[i] == 0:
        predictions_transformed.append(0)
        
    if predictions[i] == 1:
        predictions_transformed.append(2)
        
    elif predictions[i] == 2:
        predictions_transformed.append(3)
        
    elif predictions[i] == 3:
        predictions_transformed.append(6)
        
print(predictions_transformed)

In [0]:
print(np.sum(np.array(predictions_transformed) == 0))
print(np.sum(np.array(predictions_transformed) == 2))
print(np.sum(np.array(predictions_transformed) == 3))
print(np.sum(np.array(predictions_transformed) == 6))

**Making The prediction File**

After going through all 3 approaches, we can conclude that this approach works out to be the best, hence making the submission file according to this model's prediction

In [0]:
sub = []
for i in range(len(predictions_transformed)):
  temp = []
  temp.append(i)
  temp.append(predictions_transformed[i])
  sub.append(temp)
  
df = pd.DataFrame(sub, columns =['image_index', 'class'])
df.to_csv('Dhruv Agarwal.csv', index = False)

# Approach - 3(Using Pre-Trained CNNs)

In this section I have used transfer learning on imagenet weights to classify test images

In [0]:
import pickle
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.layers import GlobalAveragePooling2D
from keras import backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, Callback
from keras.utils import np_utils
import os as os
from tqdm import tqdm
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from keras.applications.vgg19 import VGG19, preprocess_input
from keras.models import Model

We do the similar data preprocessing as done in the second approach

In [0]:
with open('train_image.pkl', 'rb') as f:
    train_images = pickle.load(f)

with open('train_label.pkl', 'rb') as f:
    train_labels = pickle.load(f)
    
for i in range(len(train_labels)):
  if train_labels[i] == 2:
    train_labels[i] = 1
  elif train_labels[i] == 3:
    train_labels[i] = 2
  elif train_labels[i] == 6:
    train_labels[i] = 3
    
print(set(train_labels))

train_images, val_images, train_labels, val_labels = train_test_split(train_images, train_labels, test_size = 0.2, random_state = 42)
print(np.array(train_images).shape, np.array(train_labels).shape)
print(np.array(val_images).shape, np.array(val_labels).shape)

with open('test_image.pkl', 'rb') as f:
    test_images = pickle.load(f)

**Gray scale equivalent of RGB for Pre-Trained CNN model**

Since any pretrained model on imagenet weights expects 3 color channels (RGB) instead of 1 (in case of Grayscale), I used a trick to make the grayscale image compatible with imagenet models, The trick is to copy the exact same image(28x28) into 3 channels, i.e. convert it to a size of 28x28x3 where each channel of an image has the same 28x28 pixels.


In [0]:
train_images = np.dstack([train_images] * 3)
test_images = np.dstack([test_images]*3)
val_images = np.dstack([val_images]*3)
                        
print(train_images.shape, test_images.shape, val_images.shape)

In [0]:
train_images = train_images.reshape(-1, 28,28,3)
test_images= test_images.reshape (-1,28,28,3)
val_images= val_images.reshape (-1,28,28,3)

print(train_images.shape, test_images.shape, val_images.shape)
print(np.array(train_labels).shape, np.array(val_labels).shape)

train_labels = np_utils.to_categorical(train_labels)
val_labels = np_utils.to_categorical(val_labels)
print(np.array(train_labels).shape, np.array(val_labels).shape)

**preprocess_input : Keras**

We used preprocessing tool of keras, which performs preprocessing like transforming data to have mean of 0 and deviation of 1. This preprocessing is helpful in training the model faster.

Since the VGG19 model which we are using expects a size of atlease (32, 32, 3), we need to resize our image from a size of (28, 28, 3) to a bigger size.
I tries using various different sizes and found that (96, 96, 3) works the best, so I have used that only in this notebook.

In [0]:
train_images = preprocess_input(train_images)
test_images = preprocess_input(test_images)
val_images = preprocess_input(val_images)

In [0]:
from keras.preprocessing.image import img_to_array, array_to_img

train_images = np.asarray([img_to_array(array_to_img(im, scale=False).resize((96,96))) for im in train_images])
test_images = np.asarray([img_to_array(array_to_img(im, scale=False).resize((96,96))) for im in test_images])
val_images = np.asarray([img_to_array(array_to_img(im, scale=False).resize((96,96))) for im in val_images])

print(train_images.shape, test_images.shape, val_images.shape)

Below we build a model which passes an image through VGG 19 model with top few layers removed and weights initialized with imagenet weights. The output of model is then passed through few layers to eventually give 4 outputs(probabilities of 4 classes, using softmax activation)

In [0]:
model1 = VGG19(include_top = False, input_shape = (96, 96, 3), weights = 'imagenet')
x = model1.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(4, activation='softmax')(x)
model = Model(inputs = model1.input, outputs = predictions)
model.summary()

In [0]:
checkpoint = ModelCheckpoint("model_CNN_Pretrained-{val_loss:.2f}.h5", monitor="val_loss", verbose=1, save_best_only=True,
                                 save_weights_only=True, mode="min", period=1)

stop = EarlyStopping(monitor="val_loss", patience=10, mode="min")

In [0]:
model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=["accuracy"])

In [0]:
history = model.fit(train_images, train_labels, validation_data = (val_images, val_labels),
                    epochs=100, callbacks = [checkpoint, stop], batch_size = 32
)

We see that the validation and train accuracies are quite unsatisfactory, hence we will go with approach-2 only