# Multi-layer Perceptron

<ol>
    <li> Components: Input layer, Hidden layer(s) and Output layer. </li>
    <li> Fully connected </li>
</ol>


### Steps:

For every epoch,

<ol>
    <li> Training data is propagated to the MLP through input layers. It passes through the hidden layers, if any forwarding outputs of activation functions to the next layer. Finally the output is generated at the output layer by applying activation functions. </li>
    <li> The predicted output will be compared with actual output and hence error will be calculated. </li>
    <li> If error>0, apply backpropagation methodology to modify weights starting from output layer moving towards input layer. </li>
    <li> Check accuracy score. If satisfied, stop. Else, go to step 1. </li>
</ol>


### Import dataset

In [28]:
import pandas as pd
import numpy as np

dataset= pd.read_csv("allfaultdatasetfiltered5.csv")
#dataset= pd.read_csv("allfaultdatasetfiltered3.csv")
print(dataset.head())
print(dataset['class'].unique())

      speed  vibration  class
0  0.036017   0.019833      1
1  0.036346   0.026410      1
2  0.036346   0.033316      1
3  0.025825   0.034960      1
4  4.093751   0.029041      1
[1 2 4]


In [29]:
dataset.shape

(59844, 3)

In [30]:
dataset.tail

<bound method NDFrame.tail of           speed  vibration  class
0      0.036017   0.019833      1
1      0.036346   0.026410      1
2      0.036346   0.033316      1
3      0.025825   0.034960      1
4      4.093751   0.029041      1
...         ...        ...    ...
59839  4.223946   0.016874      4
59840  0.040620   0.015887      4
59841  0.038976   0.016216      4
59842  0.039305   0.014901      4
59843  0.038976   0.015887      4

[59844 rows x 3 columns]>

In [31]:
dataset.describe(include = 'all')

Unnamed: 0,speed,vibration,class
count,59844.0,59844.0,59844.0
mean,2.104554,0.056508,2.331997
std,2.052571,0.090963,1.247318
min,0.013003,-0.32447,1.0
25%,0.037661,-0.005817,1.0
50%,2.021317,0.012928,2.0
75%,4.197644,0.139534,4.0
max,4.318962,0.448979,4.0


In [5]:
X = dataset['vibration']
y = dataset['class']
print(X.head(2))
print(y.head(2))
print(y.head())

0   -0.002199
1   -0.001871
Name: vibration, dtype: float64
0    0
1    0
Name: class, dtype: int64
0    0
1    0
2    0
3    0
4    0
Name: class, dtype: int64


In [32]:
X = dataset['vibration'].values.reshape(-1, 1)
y = dataset['class'].values

### Splitting to training and testing


In [33]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)
print(X_train.shape)
print(y_test.shape)

(41890, 1)
(17954,)


Normalized input X train

## Train the model

Import the MLP classifier model from sklearn

In [34]:
from sklearn.neural_network import MLPClassifier


In [42]:
mlp = MLPClassifier(solver='adam',max_iter=5000, activation='relu')
mlp

MLPClassifier(max_iter=5000)

### About parameters 

1. hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)

The ith element represents the number of neurons in the ith hidden layer.

2. activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’

Activation function for the hidden layer.

‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x
‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).
‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).
‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

3. learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’

4. learning_rate_init : double, optional, default 0.001

5. max_iter : int, optional, default 200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

6. shuffle : bool, optional, default True

Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

7. momentum : float, default 0.9

Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

8. early_stopping : bool, default False

Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically keep 10% of training data as validation and terminate training when validation score is not improving by at least tol for two consecutive epochs. Only effective when solver=’sgd’ or ‘adam’


### Training

In [43]:
mlp.fit(X_train,y_train)

MLPClassifier(max_iter=5000)

### Testing

In [37]:
pred = mlp.predict(X_test)
pred

array([1, 1, 4, ..., 2, 2, 1], dtype=int64)

## Evaluation metrics- Confusion matrix and F2 score

In [67]:
from sklearn.metrics import classification_report,confusion_matrix

confusion_matrix(y_test,pred)

array([[4834,    0, 1177],
       [  72, 5607,  297],
       [3128,  171, 2668]], dtype=int64)

In [68]:
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           1       0.60      0.80      0.69      6011
           2       0.97      0.94      0.95      5976
           4       0.64      0.45      0.53      5967

    accuracy                           0.73     17954
   macro avg       0.74      0.73      0.72     17954
weighted avg       0.74      0.73      0.72     17954



In [44]:
# another approach 

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(max_iter=100)

In [77]:
#https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html
#https://datascience.stackexchange.com/questions/19768/how-to-implement-pythons-mlpclassifier-with-gridsearchcv
parameter_space = {
    'hidden_layer_sizes': [(50,50,50), (50,100,50),(10,50,10), (100,)],
    'activation': ['identity','tanh', 'relu', 'logistic'],
    'solver': ['sgd', 'adam', 'lbfgs'],
    'alpha': [0.0001, 0.0002 , 0.0004 ,0.0009,0.002 ,0.006 , 0.01 , 0.05 ],
    'learning_rate': ['invscaling','constant','adaptive'],
    #'max_iterint': [100],
}

#{'activation': 'tanh', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 100, 50), 'learning_rate': 'adaptive', 'solver': 'adam'}

In [78]:
from sklearn.model_selection import GridSearchCV

clf = GridSearchCV(mlp, parameter_space, n_jobs=-1, cv=5,verbose=4)
clf.fit(X_train,y_train)

Fitting 5 folds for each of 1152 candidates, totalling 5760 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  66 tasks      | elapsed:   46.1s
[Parallel(n_jobs=-1)]: Done 189 tasks      | elapsed:  1.6min
[Parallel(n_jobs=-1)]: Done 360 tasks      | elapsed:  3.1min
[Parallel(n_jobs=-1)]: Done 581 tasks      | elapsed:  4.9min
[Parallel(n_jobs=-1)]: Done 850 tasks      | elapsed:  7.2min
[Parallel(n_jobs=-1)]: Done 1169 tasks      | elapsed: 10.4min
[Parallel(n_jobs=-1)]: Done 1536 tasks      | elapsed: 18.0min
[Parallel(n_jobs=-1)]: Done 1953 tasks      | elapsed: 32.0min
[Parallel(n_jobs=-1)]: Done 2418 tasks      | elapsed: 47.5min
[Parallel(n_jobs=-1)]: Done 2933 tasks      | elapsed: 63.5min
[Parallel(n_jobs=-1)]: Done 3496 tasks      | elapsed: 83.4min
[Parallel(n_jobs=-1)]: Done 4109 tasks      | elapsed: 105.1min
[Parallel(n_jobs=-1)]: Done 4770 tasks      | elapsed: 120.1min
[Parallel(n_jobs=-1)]: Done 5481 tasks      | elapsed: 131.8min
[Parallel(n_jobs=-1)]: Done 5760 out of 576

GridSearchCV(cv=5, estimator=MLPClassifier(max_iter=100), n_jobs=-1,
             param_grid={'activation': ['identity', 'tanh', 'relu', 'logistic'],
                         'alpha': [0.0001, 0.0002, 0.0004, 0.0009, 0.002, 0.006,
                                   0.01, 0.05],
                         'hidden_layer_sizes': [(50, 50, 50), (50, 100, 50),
                                                (10, 50, 10), (100,)],
                         'learning_rate': ['invscaling', 'constant',
                                           'adaptive'],
                         'solver': ['sgd', 'adam', 'lbfgs']},
             verbose=4)

In [79]:
# Best paramete set
print('Best parameters found:\n', clf.best_params_)

# All results
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

Best parameters found:
 {'activation': 'relu', 'alpha': 0.006, 'hidden_layer_sizes': (10, 50, 10), 'learning_rate': 'adaptive', 'solver': 'adam'}
0.637 (+/-0.037) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'invscaling', 'solver': 'sgd'}
0.658 (+/-0.065) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'invscaling', 'solver': 'adam'}
0.655 (+/-0.010) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'invscaling', 'solver': 'lbfgs'}
0.657 (+/-0.041) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'constant', 'solver': 'sgd'}
0.647 (+/-0.064) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learning_rate': 'constant', 'solver': 'adam'}
0.655 (+/-0.010) for {'activation': 'identity', 'alpha': 0.0001, 'hidden_layer_sizes': (50, 50, 50), 'learnin

In [80]:
y_true, y_pred = y_test , clf.predict(X_test)

from sklearn.metrics import classification_report
print('Results on the test set:')
print(classification_report(y_true, y_pred))

Results on the test set:
              precision    recall  f1-score   support

           1       0.60      0.83      0.70      6011
           2       0.97      0.94      0.95      5976
           4       0.66      0.43      0.52      5967

    accuracy                           0.73     17954
   macro avg       0.74      0.73      0.72     17954
weighted avg       0.74      0.73      0.72     17954



In [81]:
predicted_values = clf.predict(X_test)

from sklearn.metrics import accuracy_score
score = accuracy_score(y_test,predicted_values)

print(score)

0.731146262671271


### Retesting :

Retest the parameter obtained from grid search with diffrent iteration,


In [123]:
#retest best parameter from grid search with diffrent iteration 
#{'activation': 'relu', 'alpha': 0.006, 'hidden_layer_sizes': (10, 50, 10), 'learning_rate': 'adaptive', 'solver': 'adam'}
mlp = MLPClassifier(solver='adam',learning_rate = 'adaptive',alpha = 0.006, max_iter=5000, activation='relu')
mlp

MLPClassifier(alpha=0.006, learning_rate='adaptive', max_iter=5000)

In [124]:
mlp.fit(X_train,y_train)

MLPClassifier(alpha=0.006, learning_rate='adaptive', max_iter=5000)

In [125]:
pred = mlp.predict(X_test)
pred

array([1, 1, 4, ..., 2, 2, 1], dtype=int64)

In [126]:
confusion_matrix(y_test,pred)

array([[4997,    0, 1014],
       [  80, 5614,  282],
       [3284,  181, 2502]], dtype=int64)

In [127]:
print(classification_report(y_test,pred))

              precision    recall  f1-score   support

           1       0.60      0.83      0.70      6011
           2       0.97      0.94      0.95      5976
           4       0.66      0.42      0.51      5967

    accuracy                           0.73     17954
   macro avg       0.74      0.73      0.72     17954
weighted avg       0.74      0.73      0.72     17954

