# Introduction to CNN with keras

*  ** 1. introduction ** 
*  ** 2. Data preparation **
 *               2.1  loading data 

 *               2.2 Normalization
 *               2.3 Reshape
 *               2.4 Label Encoding
 *              2.5 splitting into training and validation data
 
 
 ** 3.  CNN Architecure **
*          3.1. Define Model
*          3.2 Data Augmentation
*           3.3 optimizer and Learning Rate scheduler

** 4. Model Evaluation **
*        4.1 plotting training and validation loss
*        4.2 plotting training and validation loss
*         4.3 confusion matrix




<center> <h1> 1.Introduction to Convolution Neural network <h1></center>

![](https://adeshpande3.github.io/assets/Cover.png)
<br>
<br>
<p style="font-size:120%;"> A CNN is a neural network that typically contains several types of layers, one of which is a convolutional layer, as well as pooling, and activation layers. </p>

<h2>  convolutional layer </h2>
</br>
<p style="font-size:120%;">The Conv layer is the core building block of a Convolutional Network that does most of the computational heavy lifting.</p>


<img src="https://cdn-images-1.medium.com/max/1600/0*1PSMTM8Brk0hsJuF.">

<p style="font-size:120%;"> Imagine you have an image represented as a 5x5 matrix of values, and you take a 3x3 matrix and slide that 3x3 window around the image. At each position the 3x3 visits, you matrix multiply element wise the values of your 3x3 window by the values in the image that are currently being covered by the window and it also passes through RELU Activation.. This results in a single number the represents all the values in that window of the image. </p>



<img src="https://cdn-images-1.medium.com/max/1600/1*ZCjPUFrB6eHPRi4eyP6aaA.gif" />


<p style="font-size:120%;"> The “window” that moves over the image is called a <b>kernel. </b>. The weigts of Kernel are randomly initialize and later it learn them.<br> The distance the window moves each time is called the <b>stride. </b>

<h2> pooling layer</h2>

<p style="font-size:120%;"> Convolutional networks may include local or global pooling layers, which combine the outputs of neuron clusters at one layer into a single neuron in the next layer. For example, <b>max pooling</b> uses the maximum value from each of a cluster of neurons at the prior layer.Another is <b>average pooling</b>, which uses the average value from each of a cluster of neurons at the prior layer.</p>

![](https://www.embedded-vision.com/sites/default/files/technical-articles/CadenceCNN/Figure7.jpg)

For more detail [Convolutional Neural Networks](http://cs231n.github.io/convolutional-networks/) <br>

<br>

<h2> Activation layer </h2>

<p style="font-size:120%;"> Activation functions are important for a Artificial Neural Network to learn and understand the complex patterns. The main function of it is to introduce non-linear properties into the network. What it does is, it calculates the ‘weighted sum’ and adds direction and decides whether to ‘fire’ a particular neuron or not.  There are  several kinds of non-linear activation functions, like Sigmoid, Tanh, ReLU and leaky ReLU. The non linear activation function will help the model to understand  the complexity and give accurate results.</p>
![](https://i.stack.imgur.com/iIcbq.gif)

For more detail. [Types Of Activation Functions In Neural Networks And Rationale Behind It](https://i.stack.imgur.com/iIcbq.gif)




In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline




##  2.Data preprocess


### 2.1  loading data

In [2]:

train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

FileNotFoundError: File b'../input/train.csv' does not exist

In [None]:
print(train.shape)
print(test.shape)

In [None]:
train.head(5)

In [None]:
test.head(5)

In [None]:
X_train=train.drop(labels = ["label"],axis = 1) 
Y_train=train['label']
print(X_train.shape)
print(Y_train.shape)

In [None]:
Y_train.value_counts()

### visualizing the number of different labels in traing data

In [None]:
import seaborn as sns
plt.figure(figsize=(8,4))
sns.countplot(x='label', data=train);

###  2.2 Normalizing  data

In [None]:
X_train=X_train.astype('float32')/255
test=test.astype('float32')/255

### 2.3  Reshape 

Reshaping image into 3D matrix

In [None]:
X_train = X_train.values.reshape(-1,28,28,1)
test = test.values.reshape(-1,28,28,1)

In [None]:
X_train.shape

In [None]:
test.shape

###  2.4 Label Encoding

In [None]:
from keras.utils.np_utils import to_categorical
Y_train = to_categorical(Y_train, num_classes = 10)

In [None]:
Y_train.shape

In [None]:
print(Y_train[:5])

**  2.5 Now we will split training data into training data and validation data **

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid = train_test_split(X_train, Y_train, test_size = 0.1, random_state=42)

In [None]:
plt.figure(figsize=(6,6))
plt.imshow(X_train[1][:,:,0])
plt.title(Y_train[1].argmax());

## 3.  Building CNN architecture using keras

### 3.1 Defining cnn model

In [None]:
from keras.layers import Input,InputLayer, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
from keras.layers import AveragePooling2D, MaxPooling2D, Dropout
from keras.models import Sequential,Model
from keras.optimizers import SGD
from keras.callbacks import ModelCheckpoint,LearningRateScheduler
import keras
from keras import backend as K

In [None]:
inputShape=(28,28,1)
input = Input(inputShape)

x = Conv2D(64,(3,3),strides = (1,1),name='layer_conv1',padding='same')(input)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2),name='maxPool1')(x)



x = Conv2D(64,(3,3),strides = (1,1),name='layer_conv2',padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2),name='maxPool2')(x)

x = Conv2D(32,(3,3),strides = (1,1),name='conv3',padding='same')(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPooling2D((2,2),name='maxPool3')(x)


x = Flatten()(x)
x = Dense(64,activation = 'relu',name='fc0')(x)
x = Dropout(0.25)(x)
x = Dense(32,activation = 'relu',name='fc1')(x)
x = Dropout(0.25)(x)
x = Dense(10,activation = 'softmax',name='fc2')(x)

model = Model(inputs = input,outputs = x,name='Predict')


In [None]:
model.summary()

## 3.2 Data Augmentation

```
datagen_train = ImageDataGenerator(
    width_shift_range=0.2,  # randomly shift images horizontally 
    height_shift_range=0.2,# randomly shift images vertically 
    
    horizontal_flip=True) # randomly flip images horizontally

# fit augmented image generator on data
datagen_train.fit(X_train)
```

## 3.3 optimizer 

In [None]:
# define SGD optimizer
momentum = 0.5
sgd = SGD(lr=0.01, momentum=momentum, decay=0.0, nesterov=False) 

# compile the model
model.compile(loss='categorical_crossentropy',optimizer=sgd, metrics=['accuracy'])

## Learning Rate Schedules

Learning rate schedules seek to adjust the learning rate during training by reducing the learning rate according to a pre-defined schedule. Common learning rate schedules include time-based decay, step decay and exponential decay

** Here we will implement Step Decay **

Step decay schedule drops the learning rate by a factor every few epochs. The mathematical form of step decay is :
```
lr = lr0 * drop^floor(epoch / epochs_drop)
```
** we will drop learning rate after every 3 epochs **

In [None]:
import math
def step_decay(epoch):
    
    
    initial_lrate=0.1
    drop=0.6
    epochs_drop = 3.0
    lrate= initial_lrate * math.pow(drop,  
           math.floor((1+epoch)/epochs_drop))
    return lrate
   

lrate = LearningRateScheduler(step_decay)
callbacks_list = [ lrate]


In [None]:
history=model.fit(X_train, Y_train, validation_data=(X_valid, Y_valid),
                          epochs=35,callbacks=callbacks_list,verbose=1)

###  above we have  not run our model with augmented data
** we can run our model model with augmentated data like below **
```
model.fit_generator(datagen_train.flow(X_train, Y_train, batch_size=16), validation_data=(X_valid, Y_valid),
                          epochs=10,steps_per_epoch=X_train.shape[0],callbacks=[checkpointer,lrate], verbose=1)
                         
 ```                        

## model Evaluation

## 4.1 plotting training and validation loss

In [None]:
import matplotlib.pyplot as plt
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, color='red', label='Training loss')
plt.plot(epochs, val_loss, color='green', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

## 4.2 plotting training and validation accuracy

In [None]:
acc = history.history['acc']
val_acc = history.history['val_acc']
plt.plot(epochs, acc, color='red', label='Training acc')
plt.plot(epochs, val_acc, color='green', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
print("on valid data")
pred1=model.evaluate(X_valid,Y_valid)
print("accuaracy", str(pred1[1]*100))
print("Total loss",str(pred1[0]*100))

## Visualize CNN Layers

In [None]:
from keras.models import Model
layer_outputs = [layer.output for layer in model.layers]
activation_model = Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(X_train[10].reshape(1,28,28,1))
 
def display_activation(activations, col_size, row_size, act_index): 
    activation = activations[act_index]
    activation_index=0
    fig, ax = plt.subplots(row_size, col_size, figsize=(row_size*2.5,col_size*1.5))
    for row in range(0,row_size):
        for col in range(0,col_size):
            ax[row][col].imshow(activation[0, :, :, activation_index], cmap='gray')
            activation_index += 1
        
        
      

### Displaying original Image

In [None]:
plt.imshow(X_train[10][:,:,0]);

### Desplaying above image after layer 2 .
** layer 1 is input layer **.

In [None]:
display_activation(activations, 8, 8, 1)

### Displaying output of layer 4

In [None]:
display_activation(activations, 8, 8, 3)

## Displaying output of layer 8

In [None]:
display_activation(activations, 8, 8, 7)

In [None]:
from sklearn.metrics import confusion_matrix
Y_prediction = model.predict(X_valid)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_prediction,axis = 1) 
# Convert validation observations to one hot vectors
Y_true = np.argmax(Y_valid,axis = 1) 
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 

### 4.3 confusion matrix 

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(confusion_mtx, annot=True, fmt="d");

In [None]:
# predict results
results = model.predict(test)

# select the indix with the maximum probability
results = np.argmax(results,axis = 1)


In [None]:
submissions=pd.DataFrame({"ImageId": list(range(1,len(results)+1)),
                         "Label": results})
submissions.to_csv("re2-submission.csv", index=False, header=True)

Refrences 
1.[visualize-convolutional-neural-network ](http://www.codeastar.com/visualize-convolutional-neural-network/)
<br>
2.[learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning ](https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1)
 