# Tutorial NN-CNN

![](neural.jpg)

# 1- ¿Qué son las redes neuronales?

Las redes neuronales son un tipo de modelo de machine learning que están diseñadas para operar de manera similar a las neuronas biológicas y el sistema nervioso humano. Estos modelos se usan para reconocer patrones complejos y relaciones que existen en un dataset con etiquetas. Tienen las siguientes propiedades:

    La arquitectura de un modelo de red neuronal está compuesta de un gran numero de nodos simples de procesamiento, llamados neuronas,  las cuales están interconectadas y organizadas en diferentes capas.

    Un nodo individual en una capa está conectado a muchos otros nodos de la capa anterior y de la capa siguiente. The inputs form one layer are received and processed to generate the output which is passed to the next layer.

    The first layer of this architecture is often named as input layer which accepts the inputs, the last layer is named as the output layer which produces the output and every other layer between input and output layer is named is hidden layers.

Key concepts in a Neural Network


A. Neuron:

A Neuron is a single processing unit of a Neural Network which are connected to different other neurons in the network. These connections repersents inputs and ouputs from a neuron. To each of its connections, the neuron assigns a “weight” (W) which signifies the importance the input and adds a bias (b) term.



B. Activation Functions

The activation functions are used to apply non-linear transformation on input to map it to output. The aim of activation functions is to predict the right class of the target variable based on the input combination of variables. Some of the popular activation functions are Relu, Sigmoid, and TanH.
C. Forward Propagation

Neural Network model goes through the process called forward propagation in which it passes the computed activation outputs in the forward direction.

Z = W*X + b
A = g(Z)

    g is the activation function
    A is the activation using the input
    W is the weight associated with the input
    B is the bias associated with the node

D. Error Computation:

The neural network learns by improving the values of weights and bias. The model computes the error in the predicted output in the final layer which is then used to make small adjustments the weights and bias. The adjustments are made such that the total error is minimized. Loss function measures the error in the final layer and cost function measures the total error of the network.

Loss = Actual_Value - Predicted_Value

Cost = Summation (Loss)
E. Backward Propagation:

Neural Network model undergoes the process called backpropagation in which the error is passed to backward layers so that those layers can also improve the associated values of weights and bias. It uses the algorithm called Gradient Descent in which the error is minimized and optimal values of weights and bias are obtained. This weights and bias adjustment is done by computing the derivative of error, derivative of weights, bias and subtracting them from the original values.





# 2-Implementando una red neuronal, clasificación binaria

Lets implement a basic neural network in python for binary classification which is used to classify if a given image is 0 or 1.

In [1]:
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.models import Sequential
import pandas as pd 
import numpy as np 
import keras

Using TensorFlow backend.


2.1 Dataset Preparation

First step is to load and prepare the dataset

In [2]:
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# include only the rows having label = 0 or 1 (binary classification)
X = train[train['label'].isin([0, 1])]

# target variable
Y = train[train['label'].isin([0, 1])]['label']

# remove the label from X
X = X.drop(['label'], axis = 1)

2.2 Implementing a Activation Function

We will use sigmoid activation function because it outputs the values between 0 and 1 so its a good choice for a binary classification problem

In [3]:
# implementing a sigmoid activation function
def sigmoid(z):
    s = 1.0/ (1 + np.exp(-z))    
    return s

2.3 Define Neural Network Architecture

Create a model with three layers - Input, Hidden, Output.

In [4]:
def network_architecture(X, Y):
    # nodes in input layer
    n_x = X.shape[0] 
    # nodes in hidden layer
    n_h = 10          
    # nodes in output layer
    n_y = Y.shape[0] 
    return (n_x, n_h, n_y)

2.4 Define Neural Network Parameters

Neural Network parameters are weights and bias which we need to initialze with zero values. The first layer only contains inputs so there are no weights and bias, but the hidden layer and the output layer have a weight and bias term. (W1, b1 and W2, b2)

In [5]:
def define_network_parameters(n_x, n_h, n_y):
    W1 = np.random.randn(n_h,n_x) * 0.01 # random initialization
    b1 = np.zeros((n_h, 1)) # zero initialization
    W2 = np.random.randn(n_y,n_h) * 0.01 
    b2 = np.zeros((n_y, 1)) 
    return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}    

2.5 Implement Forward Propagation

The hidden layer and output layer will compute the activations using sigmoid activation function and will pass it in the forward direction. While computing this activation, the input is multiplied with weight and added with bias before passing it to the function.

In [6]:
def forward_propagation(X, params):
    Z1 = np.dot(params['W1'], X)+params['b1']
    A1 = sigmoid(Z1)

    Z2 = np.dot(params['W2'], A1)+params['b2']
    A2 = sigmoid(Z2)
    return {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}    


2.6 Compute the Network Error

To compute the cost, one straight forward approach is to compute the absolute error among prediction and actual value. But a better loss function is the log loss function which is defines as :

-Summ ( Log (Pred) Actual + Log (1 - Pred ) Actual ) / m


In [7]:
def compute_error(Predicted, Actual):
    logprobs = np.multiply(np.log(Predicted), Actual)+ np.multiply(np.log(1-Predicted), 1-Actual)
    cost = -np.sum(logprobs) / Actual.shape[1] 
    return np.squeeze(cost)


2.7 Implement Backward Propagation

In backward propagation function, the error is passed backward to previous layers and the derivatives of weights and bias are computed. The weights and bias are then updated using the derivatives.


In [8]:
def backward_propagation(params, activations, X, Y):
    m = X.shape[1]
    
    # output layer
    dZ2 = activations['A2'] - Y # compute the error derivative 
    dW2 = np.dot(dZ2, activations['A1'].T) / m # compute the weight derivative 
    db2 = np.sum(dZ2, axis=1, keepdims=True)/m # compute the bias derivative
    
    # hidden layer
    dZ1 = np.dot(params['W2'].T, dZ2)*(1-np.power(activations['A1'], 2))
    dW1 = np.dot(dZ1, X.T)/m
    db1 = np.sum(dZ1, axis=1,keepdims=True)/m
    
    return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

def update_parameters(params, derivatives, alpha = 1.2):
    # alpha is the model's learning rate 
    
    params['W1'] = params['W1'] - alpha * derivatives['dW1']
    params['b1'] = params['b1'] - alpha * derivatives['db1']
    params['W2'] = params['W2'] - alpha * derivatives['dW2']
    params['b2'] = params['b2'] - alpha * derivatives['db2']
    return params


2.8 Compile and Train the Model

Create a function which compiles all the key functions and creates a neural network model.


In [9]:
def neural_network(X, Y, n_h, num_iterations=100):
    n_x = network_architecture(X, Y)[0]
    n_y = network_architecture(X, Y)[2]
    
    params = define_network_parameters(n_x, n_h, n_y)
    for i in range(0, num_iterations):
        results = forward_propagation(X, params)
        error = compute_error(results['A2'], Y)
        derivatives = backward_propagation(params, results, X, Y) 
        params = update_parameters(params, derivatives)    
    return params

In [10]:
y = Y.values.reshape(1, Y.size)
x = X.T.as_matrix()
model = neural_network(x, y, n_h = 10, num_iterations = 10)

  
  This is separate from the ipykernel package so we can avoid doing imports until


2.9 Predictions

In [11]:
def predict(parameters, X):
    results = forward_propagation(X, parameters)
    print (results['A2'][0])
    predictions = np.around(results['A2'])    
    return predictions

predictions = predict(model, x)
print ('Accuracy: %d' % float((np.dot(y,predictions.T) + np.dot(1-y,1-predictions.T))/float(y.size)*100) + '%')

[0.80217027 0.06472329 0.89967995 ... 0.89967995 0.06472329 0.89967995]
Accuracy: 97%


  This is separate from the ipykernel package so we can avoid doing imports until


# 3-Implementando una red neuronal, multiclasificación 

In the previous step, I discussed about how to implement a NN for binary classification in python from scratch. Python's libraries such as sklearn provides an excellent implementation of efficient neural networks which can be used to directly implement neural networks on a dataset. In this section, lets implement a multi class neural network to classify the digit shown in an image from 0 to 9
3.1 Dataset Preparation

Slice the train dataset into train and validation set

In [12]:
from sklearn.model_selection import train_test_split
from sklearn import neural_network
from sklearn import  metrics
import tensorflow as tf              # quitar texto de tensorflow
tf.logging.set_verbosity(tf.logging.ERROR)

Y = train['label'][:10000] # use more number of rows for more training 
X = train.drop(['label'], axis = 1)[:10000] # use more number of rows for more training 
x_train, x_val, y_train, y_val = train_test_split(X, Y, test_size=0.20, random_state=42)


3.2 Train the Model

Train a neural network model with 10 hidden layers.


In [13]:
model = neural_network.MLPClassifier(alpha=1e-5, hidden_layer_sizes=(5,), solver='lbfgs', random_state=18)
model.fit(x_train, y_train)


MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(5,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=18, shuffle=True, solver='lbfgs', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

3.3 Predictions

In [14]:
predicted = model.predict(x_val)
print("Classification Report:\n %s:" % (metrics.classification_report(y_val, predicted)))

Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00       186
           1       0.96      0.92      0.94       210
           2       0.12      0.99      0.22       220
           3       0.00      0.00      0.00       190
           4       0.00      0.00      0.00       188
           5       0.00      0.00      0.00       194
           6       0.00      0.00      0.00       190
           7       0.00      0.00      0.00       233
           8       0.00      0.00      0.00       197
           9       0.00      0.00      0.00       192

   micro avg       0.20      0.20      0.20      2000
   macro avg       0.11      0.19      0.12      2000
weighted avg       0.11      0.20      0.12      2000
:


  'precision', 'predicted', average, warn_for)


# 4-DNN-CNN

Deep Neural Networks are composed of complex and many number of hidden layers which tries to extract low level features from the images. Some examples of complex deep neural networks are convolutional neural networks and Recurrent Neural Networks.
Convolutional Neural Networks

In Convolutional Neural Networks, every image input is treated as a a matrix of pixel values which represents the amount of darkness at a given pixel in the image. Unlike, tradational neural networks which treats an image as a one dimentional network, CNNs considers the location of pixels and the neighbours for classification.

![](neural2.webp)

Key components of Convolutional Neural Network.

A. Convolutional layer: In this layer, a kernel (or weight) matrix is used to extract low level features from the images. The kernel with its weights rotates over the image matrix in a sliding window fashion in order to obtained the convolved output. The kernel matrix behaves like a filter in an image extracting particular information from the original image matrix. During the colvolution process, The weights are learnt such that the loss function is minimized.

B. Stride: Stride is defined as the number of steps the kernel or the weight matrix takes while moving across the entire image moving N pixel at a time. If the weight matrix moves N pixel at a time, it is called stride of N.

![](neural3.gif)

Image Credits - www.deeplearning.net

C. Pooling Layer: Pooling layers are used to extract the most informative features from the generated convolved output.

![](neural4.png)


D. Output Layer: To generate the final output, a dense or a fully connected layer is applied with the softmax activation function. Softmax function is used to generate the probabilities for each class of the target variable.




# 5-Implementando una red neuronal convolucional CNN


5.1 Dataset Preparation

In the first step lets prepare the dataset and slice it into train and validation sets. For the modelling and training purpose, we can use python's library - Keras.



In [15]:
Y = train['label']
X = train.drop(['label'], axis=1)

x_train, x_val, y_train, y_val = train_test_split(X.as_matrix(), Y.as_matrix(), test_size=0.10, random_state=42)


  after removing the cwd from sys.path.



5.2 Define the Network Parameters

Network Parameters are :

Batch Size - Number of rows from the input data to use it one iteratation from the training purpose
Num Classes - Total number of possible classes in the target variable
Epochs - Total number of iterations for which cnn model will run.


In [16]:
# network parameters 
batch_size = 128
num_classes = 10
epochs = 5 # Further Fine Tuning can be done

# input image dimensions
img_rows, img_cols = 28, 28


5.3 Preprocess the Inputs

In the preprocessing step the corresponding image data vectors are reshaped into a 4 dimentional vector : total batch size, width of the image, height of the image, and the channel. In our case, channel = 1 as we will only use single channel instead of three channels (R,G,B). The next step is to normalize the inputs by dividing them by max pixel value ie. 255.


In [17]:
# preprocess the train data 
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_train /= 255

# preprocess the validation data
x_val = x_val.reshape(x_val.shape[0], img_rows, img_cols, 1)
x_val = x_val.astype('float32')
x_val /= 255

input_shape = (img_rows, img_cols, 1)

# convert the target variable 
y_train = keras.utils.to_categorical(y_train, num_classes)
y_val = keras.utils.to_categorical(y_val, num_classes)

# preprocess the test data
Xtest = test.as_matrix()
Xtest = Xtest.reshape(Xtest.shape[0], img_rows, img_cols, 1)





5.4 Create the CNN Model Architecture

In this step, create the convolutional neural network architecture with following layers:

    Convolutional Layer with kernel size = 3*3, 32 convolutional units, and RelU activation function
    
    Convolutional Layer with kernel size = 3*3, 64 convolutional units, and RelU activation function
    
    Max Pooling Layer with pooling matrix size = 2*2
    
    Dropout Layer : A dropout layer is used for regularization and reducing the overfitting
    Flatten Layer : A layer to convert the output in one dimentional array
    
    Dense Layer : A dense layer is a fully connected layer in which every node is connected to every other node in the previous and next layers. In our network, it contains 128 neurons but this number can be changed for further experiments.
    
    Another Dropout Layer for regularization
    
    Final output layer : A dense layer with 10 neurons for generating the output class

In the simple neural network that we implemented in step 1, the loss function was LogLoss function and the optimizing Algorithm was Gradient Descent, In this neural network, we will use categorical_crossentropy as this is a multi class classification as the loss function and Adadelta as the optimizing function.


In [18]:
model = Sequential()

# add first convolutional layer
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))

# add second convolutional layer
model.add(Conv2D(64, (3, 3), activation='relu'))

# add one max pooling layer 
model.add(MaxPooling2D(pool_size=(2, 2)))

# add one dropout layer
model.add(Dropout(0.25))

# add flatten layer
model.add(Flatten())

# add dense layer
model.add(Dense(128, activation='relu'))

# add another dropout layer
model.add(Dropout(0.5))

# add dense layer
model.add(Dense(num_classes, activation='softmax'))

# complile the model and view its architecur
model.compile(loss=keras.losses.categorical_crossentropy,  optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
__________

5.5 Train the Model

In [19]:
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_val, y_val))
accuracy = model.evaluate(x_val, y_val, verbose=0)
print('Test accuracy:', accuracy[1])

Train on 37800 samples, validate on 4200 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy: 0.9876190476190476


5.6 Generate Predictions

In [20]:
pred = model.predict(Xtest)
y_classes = pred.argmax(axis=-1)
res = pd.DataFrame()
res['ImageId'] = list(range(1,28001))
res['Label'] = y_classes
res.to_csv("output.csv", index = False)

Gracias a https://www.kaggle.com/kernels/svzip/4153864