# Batch Normalisation 

Insipiration from [Keras-Resnet](https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py) Implementation. 

```
     x = layers.Conv2D(filters1, (1, 1), strides=strides,
                      kernel_initializer='he_normal',
                      name=conv_name_base + '2a')(input_tensor)
    x = layers.BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
    x = layers.Activation('relu')(x)
```

Here we see a conv2d layer followed by batchnormalisation

[Keras Normalisation](https://keras.io/layers/normalization/): the batch normalisation layer will normalise the activations of a pervioud layer at every batch. the normalisation will maintain the mean activation close to 0 and standard deviation close to 1. 


according to paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) this method addresses what is known as the internal covariate shift, when the distribution of layer's inputs change during training. the paper suggests that batch normalisation acives accuracy 14 times fewer traning in fewer steps. 
    

# data flow 


[Keras Data Generators](https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720)


# ResNEt

good article on using Resnet: [Understanding and Coding a ResNet in Keras](https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33)

Source code for [ResNet50 implementation on github](https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py). 

In this sample, it is good to see how Resnet implements the conv_block, which serves as the building block for ResNet. conv_bloc contains 4 Conv2D layers, each layer will use he_normal initialisation and batch normalisation. activation is alwways relu. last conv layer is actually a short cut that takes the initial input of the building block. so essentiall, input goes through three layers, then at the last layer we add the input again (that is the shortcut layer. 

then identity_block has thre Conv2D with he_normal initalisation, batch normalisation and rely activation. there is no conv layer at shortcut 

The resnet network then does the following: 
- zero pad input image
- conv2d (valid), batchnormalised, relu activation 
- zeropadding again 
- maxpooling 

Then starts repeating conv blocks with identitiy blocks in stages. 


Link to help page for [resnet50](https://keras.io/applications/#resnet).

Api call: 
```python
keras.applications.resnet.ResNet50(
    include_top=True, 
    weights='imagenet', 
    input_tensor=None, 
    input_shape=None, 
    pooling=None, 
    classes=1000)
```

## predict using resnet50 

at first i tried to use resnet50 without any modification to predict from fashion mnist

```python
...
train_images, train_labels = load_images_train_32_32_rgb()
plt.imshow(train_images[6])
plt.show()
...
new_model = ResNet50()
new_image = skimage.transform.resize(
    train_images[6], 
    (224,224), 
    mode='constant'
)
# some additional transforms to put the image in shape
imgs_in = []
imgs_in.append(new_image) # now this is an array in a list
# make it a array
imgs_in = np.array(imgs_in)
print(imgs_in.shape)
# this will produce (1,224,224,3)
...
pred_output = new_model.predict(resnet50.preprocess_input(imgs_in))
# pred_output shape is (1,1000)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', resnet50.decode_predictions(pred_output, top=3)[0])
## output is Predicted: [('n02504458', 'African_elephant', 0.5607367), ('n01871265', 'tusker', 0.3651714), ('n02504013', 'Indian_elephant', 0.073968664)]
```

the result is pretty bad, the test is to predict this one again after retraining resnet on the fashnion mnist and see how it performs. 
also note how the predicted output had 1000 classes, in our case we will retrain to predict 10 other classes from fashion mnist. 

the code and output are here [resnet predict example notebook](output_notebooks/resnet_predict_example.ipynb)

## Transfer Learning  (1)

In the first attempt at transfer learning i removed the top layers, and added my own fully connected layers. 


```python 

# initial run was showing very high variance (acc on trainng is good, but acc on val is very poor)
# so i added regulisation 
added_layers = GlobalAveragePooling2D()(added_layers)#  Flatten()(added_layers)
added_layers = Dropout(0.7)(added_layers)
added_layers = Dense(128, kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01))(added_layers)
added_layers = Activation('relu')(added_layers)
# added_layers = BatchNormalization()(added_layers)

preds = Dense(10, activation ='softmax')(added_layers)

final_model = Model(input = base_model.input, outputs=preds)

```
all base_model layers where frozen -- mean that the new model will take the final output of the base, do all initial feature detection and the more complex ones (this is a problem) before applying my dense layers, the final result was very bad with val_acc == .1 

full results [notebook is found here](output_notebooks/mnist_resnet_1.ipynb). 

## Transfer Learning (2) full trainng 

since the previous attempt was bad, i decided to redo the experiment but this time i am not freezing any layers. i did try to freeze all but last two, but that did not help. and i run this one on a smaller data set to see results quickly (just 1000 samples) 


this case was training 23 million params 
```
Total params: 23,851,274
Trainable params: 23,798,154
Non-trainable params: 53,120
```

results are here [mnist_resnet_2_full_resnet_training](output_notebooks/mnist_resnet_2_full_resnet_training.ipynb)

## Transfer Learning (3) - freeze some and keep some 

I tried differnt settgings, freeze first 60, 20 and 10 layers 

The results are very depressing 

I even added another FC layer at the end, trained on the full data set to deal with variance issue, and trained for longer (150 epochs).


training accuracy plateaued quickly, after 20 epochs, while val accuracy remained very low, around .1

validation loss did not drop below 3.8 compared to training loss at .012

so far, it seems that the resnet weights are heavily trained to recognise the ImageNet data set features, our data set seems to be very different for transfer learning to be able to adapt to it. 


explaination in [A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning](https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a) may suggest that transfering learning is appropriate in this case 

the output is in [mnist_resnet_3_freeze_some](output_notebooks/mnist_resnet_3_freeze_some.ipynb) 
