## EIP Project


### Problem Statement: Use only Winograd Conv and convert this into Tutorial (train as well)
### https://github.com/zlpure/Facial-Expression-Recognition 

Before we get to the Facial Expression Recognition let's discuss about the Winograd Convolutions.

### Winograd Convolutions:

Inspite of reading many papers on Winograd algorithm I still don't get how its works. These are the sources I refered to 
- https://arxiv.org/pdf/1509.09308.pdf
- https://www.scribd.com/doc/55802885/Winograd-algorithm

What I've gathered from all these sources has been summerized here:

#### What are Winograd Convolutions:
Winograd Convolutions uses the Winograd minimal filtering algorithm. The key idea is to perform convolution in transformed domains using Winograd algorithm. This algorithm reduces the number of multiplications with the expense of additional addition and constant multiplication.
According to the article https://ai.intel.com/winograd-2/, The Winograd algorithm works on small tiles of the input image. In a nutshell, the input tile and filter are transformed, the outputs of the transform are multiplied together in an element-wise fashion, and the result is transformed back to obtain the outputs of the convolution.

#### Why do we need Winograd Conv? 
In most of the applicaions of deep neural networks, the speed takes the priority over precision. E.g. in self-driving cars. Hence fast algorithms like Winograd Convolutions are used.

Other references:
- http://cs231n.stanford.edu/reports/2016/pdfs/117_Report.pdf
- https://arxiv.org/pdf/1803.09004.pdf
- https://www.encyclopediaofmath.org/index.php/Winograd_small_convolution_algorithm

#### Where to find the Winograd Conv?
Since I was not confident in my understanding of Winograd algorithms I couldn't implement it. There were few implementations of the algorithm. I found these options which were directly usable:
- One is Nervana Neon: Implementation of Winograd Conv in this architecture seemed straight forward. But I did not find a way of comparing the speed with and without Winograd Conv with this implementation.
https://github.com/NervanaSystems/neon/tree/master/neon

- Then there is CudNN which implements Winograd Conv. The Winograd Conv is enabled by default in version higher than 5. But it  also provides an environment variable TF_ENABLE_WINOGRAD_NONFUSED, that could be used to enable or disable it. So I chose this to find out how Winograd Conv can help the performance of Deep neural networks.
https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tf_enable_winograd_nonfused


### Using Winograd Convolutions in CuDNN 
I used Google Colaboratory, which did not have Cuda and CuDNN installed. Hence everytime I login to colab I had to install Cuda, and CuDNN that took a lot of time. 
I tested the speed for just one epoch, supposing that it should be enough to measure the difference when Wino-Conv is on and off.
I did not find much difference in either case. 

This notebook installs CUDA 9 and CuDNN 7 in colaboratory.
https://github.com/Curiousss/InkerIntern/blob/master/CUda9Cudnn7.ipynb

Both version re-install tensorflow-gpu to make sure that the CuDNN is used. 
The model is tested with the environment variable TF_ENABLE_WINOGRAD_NONFUSED set to "1" and "0". As a result the speed was almost the same. Any small differences in the speed were not found related to the setting of the flag. Sometimes enabling the flag was faster sometime disabling was faster.


This notebook installs CUDA 8 and CuDNN 6 in colaboratory.
https://github.com/Curiousss/InkerIntern/blob/master/FER_WINO.ipynb

I think I spent almost 80-90% of the project time on understanding and testing Winograd-Convolutions but I had to let go of it since it did not seem to improve the speed. Hence it has not been used in the Facial Expression Recognition model.


### Model for Facial Emotion Recognition
First I ran the implementation given in https://github.com/zlpure/Facial-Expression-Recognition using the model.json and weights given. The accuracy was 64% and with Image augmentation it went upto 66-67%. The speed was around 77s per epoch.
Find the implementation here: https://github.com/Curiousss/InkerIntern/blob/master/FacialEmotion.ipynb

I re-implemented the same model at first. Then enhanced with these features:

- Separable Convolutions: To speed up the model the regular convolutions were replaced with SeparableConv2D. The speed almost doubled. They have fewer parameters than regular convolutional layers, and thus are less prone to overfitting. With fewer parameters, they also require less operations to compute, and thus are cheaper and faster.


- The model with 7x7 and 5x5 layers did not help in any kind of improvement. Only 3x3 convolutions were retained. Two 3x3 conv layers have a receptive field of 5x5, and have fewer mathematical operations and more non-linearities. So they should be faster and able to create more complex functions.


- Global Average Pooling: The fully-connected layers were replaced with Global Average Pooling. This increased the accuracy and the speed. GAP helps in minimizing overfitting reducing the total number of parameters in the model. To match the Global Average Pooling I tried using Average pooling in all other layers, but that did not improve the performance hence I retained the Max Pooling for the top layers.


- Image Augmentation: Applying random transformation on the image might actually hinder the training process. Hence each transformation was tested for its efficiecy and then chosen for the final model. 
    - Applying horizontal flip did improve the accuracy. Applying vertical flip was not very helpful.
    - Image shearing and zooming was applied.
    - Width and height shift was applied.
    - Normalization was applied outside the Augmentation function, so none of the normalization options in were applied during Image Generator.


While trying to display the images that was read from the data, I noticed some of them were just zeros. In an attempt to find ways to clean the input data I found this list of bad data indices and used the same: https://github.com/LamUong/FacialExpressionRecognition/blob/master/badtrainingdata.txt

### The accuracy achieved was 67%

#### For neater code please refer to the code in my github profile: https://github.com/Curiousss/InkerIntern/blob/master/FER_WINO_SEPARABLE_NO_CUDNN.ipynb

### Data
https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
!ls

In [None]:
!tar xvf fer2013.tar
!ls

In [None]:

import csv
import numpy as np

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, GlobalAveragePooling2D, InputLayer
from keras.layers import Convolution2D, SeparableConv2D, MaxPooling2D, BatchNormalization 
from keras.layers.advanced_activations import LeakyReLU
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator

In [None]:
img_rows, img_cols = 48, 48
batch_size = 64
classes = 7
epoch = 100
img_channels = 1

In [None]:
import csv
f = open('fer2013/fer2013.csv')
csv_f = csv.reader(f)


In [None]:
train_x = []
train_y = []
val_x =[]
val_y =[]

In [None]:
ToBeRemovedTrainingData = []
with open("baddata.txt", "r") as text:
  for line in text:
    ToBeRemovedTrainingData.append(int(line))

In [None]:
num=0
for row in csv_f:
  num = num +1
  if num in ToBeRemovedTrainingData or num==1:
    continue
  #print(row)
  #print(num)
  temp_list = []
  for pixel in row[1].split( ):
    temp_list.append(int(pixel))

  if str(row[2]) == "Training":
    train_y.append(int(row[0]))
    train_x.append(temp_list) 
  elif str(row[2]) == "PublicTest":
    val_y.append(int(row[0]))
    val_x.append(temp_list)

In [None]:
train_x = np.asarray(train_x)
train_y = np.asarray(train_y)
val_x = np.asarray(val_x)
val_y = np.asarray(val_y)

In [None]:
train_x = train_x.reshape(train_x.shape[0], 48, 48)
train_x = train_x.reshape(train_x.shape[0], 48, 48, 1 )
train_y = np_utils.to_categorical(train_y, 7)

In [None]:
val_x = val_x.reshape(val_x.shape[0], 48, 48)
val_x = val_x.reshape(val_x.shape[0], 48, 48, 1)
val_y = np_utils.to_categorical(val_y, 7)

In [None]:
from PIL import Image

#print(train_x.shape)

showimg = train_x[1].reshape(48,48)
img = Image.fromarray(showimg.astype('uint8'))
from IPython.display import display
display(img)

In [None]:
# Normalization
train_x = train_x.astype('float32')
train_x = train_x / 255.0
val_x = val_x.astype('float32')
val_x = val_x / 255.0
train_x = train_x - 0.5
train_x = train_x * 2
val_x = val_x - 0.5
val_x = val_x * 2


In [None]:
input_shape = (img_rows, img_cols, img_channels)
model = Sequential()
model.add(SeparableConv2D(filters=64, kernel_size=(3, 3), padding='same',
                            name='image_array', input_shape=input_shape))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(SeparableConv2D(filters=64, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
#model.add(Dropout(.3))

model.add(SeparableConv2D(filters=128, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(SeparableConv2D(filters=128, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
#model.add(Dropout(.3))

model.add(SeparableConv2D(filters=256, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(SeparableConv2D(filters=256, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))
#model.add(Dropout(.3))

model.add(SeparableConv2D(filters=512, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(SeparableConv2D(filters=512, kernel_size=(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))

model.add(InputLayer(input_shape=(3, 3, 1024)))
model.add(GlobalAveragePooling2D())

model.add(Dense(7))
model.add(Activation('softmax'))

In [None]:
model.compile(optimizer='Adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])
filepath='Model.best.hdf5'
checkpointer = keras.callbacks.ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='auto')


In [None]:
model.load_weights('Model.best.hdf5')

In [None]:

import time
start_time = time.time()

datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=30,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False,
    shear_range=0.2,
    zoom_range=0.2)  # randomly flip images

datagen.fit(train_x)

model.fit_generator(datagen.flow(train_x, train_y,
                    batch_size=batch_size),
                    steps_per_epoch=(train_x.shape[0]/batch_size),
                    epochs=50,
                    validation_data=(val_x, val_y),
                    callbacks=[checkpointer])
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:

import time
start_time  = time.time()
model.fit(train_x, train_y, epochs=150, batch_size=batch_size, validation_data=(val_x, val_y),
             callbacks=[checkpointer])
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
def predict_emotion(model, pic):
  pic = pic.convert('L')
  pic = pic.resize((48,48))
  
  from IPython.display import display
  display(pic)
  pic_np=np.asarray(pic)#.getdata()).reshape(48, 48, 1)
  pic_np = pic_np.reshape(1, 48, 48, 1)
  print(pic_np.shape)
  pic_np = pic_np / 255.0
  pic_np = pic_np - 0.5
  pic_np = pic_np * 2
  
  print(pic_np.shape)
  y = model.predict(pic_np)
  print("0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral")
  print(y)

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
celebanger = Image.open("celeb_fer1.jpg")
predict_emotion(model, celebanger)