<a href="https://colab.research.google.com/github/hussain0048/Machine-Learning/blob/master/Sklearn/supervised%20algorithm/Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural network (NN) 

**Introduction:**

The most common type of neural network referred to as Multi-Layer Perceptron (MLP) is a function that maps input to output. MLP has a single input layer and a single output layer. In between, there can be one or more hidden layers. The input layer has the same set of neurons as that of features. Hidden layers can have more than one neuron as well. Each neuron is a linear function to which activation function is applied to solve complex problems. The output from each layer is given as input to all neurons of the next layers.

**sklearn provides 2 estimators for classification and regression problems respectively.:**

In [None]:
!git clone https://github.com/hussain0048/Machine-Learning.git

## 1 - Importing necessary libraries ##

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import itertools
import warnings
warnings.filterwarnings('ignore')
np.set_printoptions(precision=2)
%matplotlib inline

## 2 - Load Datasets ##

We'll be loading below mentioned two for our purpose.:
    - **Digits Dataset**: We'll be using digits dataset which has images of size 8x8 for digits 0-9. We'll use digits data for classification tasks below.
    - a test set of m_test images labeled as cat or non-cat
    - **Housing Dataset**: We'll be using the Boston housing dataset which has information about various house properties like average no of rooms, per capita crime rate in town, etc. We'll be using it for regression task.

Sklearn provides both of this dataset as a part of the datasets module. We can load them by calling load_digits() and load_boston() methods. It returns dictionary-like object BUNCH which can be used to retrieve features and target.


In [None]:
# Loading the data 
from sklearn.datasets import load_digits, load_boston
digits = load_digits()
X_digits, Y_digits = digits.data, digits.target
print('Dataset Sizes : ', X_digits.shape, Y_digits.shape)

Dataset Sizes :  (1797, 64) (1797,)


In [None]:
boston = load_boston()
X_boston, Y_boston = boston.data, boston.target
print('Dataset Sizes : ', X_boston.shape, Y_boston.shape)

**MLPClassifier** 

MLPClassifier is an estimator available as a part of the neural_network module of sklearn for performing classification tasks using a multi-layer perceptron

## 3 - Splitting Data Into Train/Test Sets ##
We'll split the dataset into two parts:
- Training data which will be used for the training model.

- Test data against which accuracy of the trained model will be checked.

train_test_split function of model_selection module of sklearn will help us split data into two sets with 80% for training and 20% for test purposes. We are also using seed(random_state=123) with train_test_split so that we always get the same split and can reproduce results in the future as well.

In [None]:
 from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, stratify=Y_digits, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

##4- Fitting Default Model To Train Data##

 We'll first fit the MLPClassifier model with default parameters to our train data.

In [None]:
 from sklearn.neural_network import MLPClassifier

mlp_classifier  = MLPClassifier(random_state=123)
mlp_classifier.fit(X_train, Y_train)

In [None]:
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

## 5-Evaluating Trained Model On Test Data ##


Almost all models in Scikit-Learn API provides predict() method which can be used to predict target variable on Test Set passed to it.:


In [None]:
Y_preds = mlp_classifier.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%mlp_classifier.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%mlp_classifier.score(X_train, Y_train))

## 6 - Plotting Confusion Matrix ##

Below we have created a method named plot_confusion_matrix() which accepts original labels of data and predicted labels by model. It then plots a confusion matrix using matplotlib. We'll be reusing the same method for plotting the confusion matrix.


In [None]:
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(Y_test, Y_preds):
    conf_mat = confusion_matrix(Y_test, Y_preds)
    #print(conf_mat)
    fig = plt.figure(figsize=(6,6))
    plt.matshow(conf_mat, cmap=plt.cm.Blues, fignum=1)
    plt.yticks(range(10), range(10))
    plt.xticks(range(10), range(10))
    plt.colorbar();
    for i in range(10):
        for j in range(10):
            plt.text(i-0.2,j+0.1, str(conf_mat[j, i]), color='tab:red')

In [None]:
plot_confusion_matrix(Y_test, mlp_classifier.predict(X_test))


## 7 - Important Attributes of MLPClassifier## 

Below is a list of important attributes available with an MLPClassifier which can provide meaningful insights once the model is trained.:
1. loss_ - It returns loss after the training process has completed.
2. coefs_ - It returns an array of length n_layers-1 where each element represents weights associated with layer i.
3. intercepts_ - It returns an array of length n_layers-1 where each element represents intercept associated with layer i's perceptrons.
4. n_iter_ - The number of iterations for which estimator ran.
5. out_activation_ - It returns name of output layer activation function.

In [None]:
print("Loss : ", mlp_classifier.loss_)

In [None]:
print("Number of Coefs : ", len(mlp_classifier.coefs_))

In [None]:
print("Number of Intercepts : ", len(mlp_classifier.intercepts_))

In [None]:
print("Number of Iterations for Which Estimator Ran : ", mlp_classifier.n_iter_)

In [None]:
print("Name of Output Layer Activation Function : ", mlp_classifier.out_activation_)


## 8- Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that needs tuning for getting the best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.
1. hidden_layer_sizes - It accepts tuple of integer specifying sizes of hidden layers in multi layer perceptrons. According to size of tuple, that many perceptrons will be created per hidden layer. default=(100,)
2. activation - It specifies activation function for hidden layers. It accepts one of below strings as input. default=relu.
  - identity' - No Activation. f(x) = x.
  - 'logistic' - Logistic Sigmoid Function. f(x) = 1 / (1 + exp(-x))
  - 'tanh' - Hyperbolic tangent function. f(x) = tanh(x)
  - 'relu' - Rectified Linear Unit function. f(x) = max(0, x)
3. solver - It accepts one of below strings specifying which optimization solver to use for updating weights of neural network hidden layer perceptrons. default='adam'
identity' - No Activation. f(x) = x.
  - 'lbfgs
  - 'sgd'
  - 'adam'
4. learning_rate - It specifies learning rate schedule to be used for training. It accepts one of below strings as value and only applicable when solver='sgd' 
  
  -'constant' - Keeps learning rate constant through a learning process which was set in learning_rate_init.
  - 'invscaling' - It gradually decreases learning rate. effective_learning_rate = learning_rate_init / pow(t, power_t)
  - 'adaptive' - It keeps learning rate constant as long as loss is decreasing or score is improving. If consecutive epochs fails in decreasing loss according to tol parameter and early_stopping is on, then it divides current learning rate by 5.
5. batch_size - It accepts integer value specifying size of batch to use for dataset. default='auto'. The default auto batch size will set batch size to min(200, n_samples). It accepts one of below strings as value and only applicable when solver='sgd'
6. tol - It accepts float values specifying threshold for optimization. When training loss or score is not improved by at least tol for n_iter_no_change iterations, then optimization ends if learning_rate is constant else it decreases learning rate if learning_rate is adaptive. default=0.0001 It accepts one of below strings as value and only applicable when solver='sgd'
7. alpha - It specifies L2 penalty coefficient to be applied to perceptrons. default=0.0001
8. momentum - It specifies momentum to be used for gradient descent and accepts float value between 0-1. It's applicable when solver is sgd.
9. early_stopping - It accepts boolean value specifying whether to stop training if training score/loss is not improving. default=False
10. validation_fraction - It accepts float value between 0-1 specifying amount of training data to keep aside if early_stopping is set.default=0.1

##9-GridSearchCV


It's a wrapper class provided by sklearn which loops through all parameters provided as params_grid parameter with a number of cross-validation folds provided as cv parameter, evaluates model performance on all combinations and stores all results in cv_results_ attribute. It also stores model which performs best in all cross-validation folds in best_estimator_ attribute and best score in best_score_ attribute.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by splitting data into 5-fold cross-validation.

In [None]:
from sklearn.model_selection import GridSearchCV

params = {'activation': ['relu', 'tanh', 'logistic', 'identity'],
          'hidden_layer_sizes': [(100,), (50,100,), (50,75,100,)],
          'solver': ['adam', 'sgd', 'lbfgs'],
          'learning_rate' : ['constant', 'adaptive', 'invscaling']
         }

mlp_classif_grid = GridSearchCV(MLPClassifier(random_state=123), param_grid=params, n_jobs=-1, cv=5, verbose=5)
mlp_classif_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%mlp_classif_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%mlp_classif_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%mlp_classif_grid.best_score_)
print('Best Parameters : ',mlp_classif_grid.best_params_)

## 10 - Plotting Confusion Matrix##

In [None]:
plot_confusion_matrix(Y_test, mlp_classif_grid.best_estimator_.predict(X_test))

## 11 - MLPRegressor ##

 MLPRegressor is an estimator available as a part of the neural_network module of sklearn for performing regression tasks using a multi-layer perceptron

### 11.1 Splitting Data Into Train/Test Set
We'll split the dataset into two parts:

  -Train data(80%) which will be used for the training model.
  -Test data(20%) against which accuracy of the trained model will be checked.

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

In [None]:
from sklearn.neural_network import MLPRegressor

mlp_regressor  = MLPRegressor(random_state=123)
mlp_regressor.fit(X_train, Y_train)

In [None]:
Y_preds = mlp_regressor.predict(X_test)

print(Y_preds[:10])
print(Y_test[:10])

print('Test R^2 Score : %.3f'%mlp_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%mlp_regressor.score(X_train, Y_train))

### 11.2 Important Attributes of MLPRegressor
MLPRegressor has all attributes the same as that of MLPClassifier.

In [None]:
print("Loss : ", mlp_regressor.loss_)

In [None]:
print("Number of Coefs : ", len(mlp_regressor.coefs_))

In [None]:
print("Number of Intercepts : ", len(mlp_regressor.intercepts_))


In [None]:
print("Number of Iterations for Which Estimator Ran : ", mlp_regressor.n_iter_)


In [None]:
print("Name of Output Layer Activation Function : ", mlp_regressor.out_activation_)


### 11.3 - Finetuning Model By Doing Grid Search On Various Hyperparameters. ###

MLPRegressor has almost the same parameters as that of MLPClassifier.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by splitting data into 5-fold cross-validation.
    

In [None]:
 params = {'activation': ['relu', 'tanh', 'logistic', 'identity'],
          'hidden_layer_sizes': list(itertools.permutations([50,100,150],2)) + list(itertools.permutations([50,100,150],3)) + [50,100,150],
          'solver': ['adam', 'lbfgs'],
          'learning_rate' : ['constant', 'adaptive', 'invscaling']
         }

mlp_regressor_grid = GridSearchCV(MLPRegressor(random_state=123), param_grid=params, n_jobs=-1, cv=5, verbose=5)
mlp_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%mlp_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%mlp_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%mlp_regressor_grid.best_score_)
print('Best Parameters : ',mlp_regressor_grid.best_params_)

Fitting 5 folds for each of 360 candidates, totalling 1800 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed:    6.2s
[Parallel(n_jobs=-1)]: Done  68 tasks      | elapsed:   30.8s
[Parallel(n_jobs=-1)]: Done 158 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done 284 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done 446 tasks      | elapsed:  4.6min
[Parallel(n_jobs=-1)]: Done 644 tasks      | elapsed:  7.1min
[Parallel(n_jobs=-1)]: Done 878 tasks      | elapsed: 10.8min
[Parallel(n_jobs=-1)]: Done 1148 tasks      | elapsed: 14.7min
[Parallel(n_jobs=-1)]: Done 1454 tasks      | elapsed: 18.1min
[Parallel(n_jobs=-1)]: Done 1796 tasks      | elapsed: 21.2min
[Parallel(n_jobs=-1)]: Done 1800 out of 1800 | elapsed: 21.2min finished


Train R^2 Score : 0.734
Test R^2 Score : 0.592
Best R^2 Score Through Grid Search : 0.722
Best Parameters :  {'activation': 'identity', 'hidden_layer_sizes': (150, 50), 'learning_rate': 'constant', 'solver': 'lbfgs'}


References:
- Scikit-Learn - Neural Network¶
https://coderzcolumn.com/tutorials/machine-learning/scikit-learn-sklearn-neural-network