# Neural Networks with Python and SciKit Learn!



### Importing data

In [46]:
import pandas as pd
import numpy as np
wine = pd.read_csv('wine_data.csv')

### Data exploation:

In [47]:
wine.head()
#wine.describe()
#wine.shape

Unnamed: 0,Cultivator,Alchol,Acid,Ash,Alcalinit,Magnesium,Total_phenols,Falvanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,OD280,Proline
0,1,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,1,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


### Selection of data and labels

In [48]:
X = wine.drop('Cultivator',axis=1)
y = wine['Cultivator']
type(X)
X.shape
X.describe()
X.head()

Unnamed: 0,Alchol,Acid,Ash,Alcalinit,Magnesium,Total_phenols,Falvanoids,Nonflavanoid_phenols,Proanthocyanins,Color_intensity,Hue,OD280,Proline
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


### Train Test Split
As Fundamental step you need to split data to train and test; to do so, you can levelrage SciKit Learn's train_test_split function from model_selection. Feel free to check it out here:
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [49]:
from sklearn.model_selection import train_test_split

In [50]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

### Data Preprocessing
Standardize features by removing the mean and scaling to unit variance

The standard score of a sample x is calculated as z = (x - u) / s

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html 

In [51]:
from sklearn.preprocessing import StandardScaler
help(StandardScaler)

Help on class StandardScaler in module sklearn.preprocessing._data:

class StandardScaler(sklearn.base.TransformerMixin, sklearn.base.BaseEstimator)
 |  Standardize features by removing the mean and scaling to unit variance
 |  
 |  The standard score of a sample `x` is calculated as:
 |  
 |      z = (x - u) / s
 |  
 |  where `u` is the mean of the training samples or zero if `with_mean=False`,
 |  and `s` is the standard deviation of the training samples or one if
 |  `with_std=False`.
 |  
 |  Centering and scaling happen independently on each feature by computing
 |  the relevant statistics on the samples in the training set. Mean and
 |  standard deviation are then stored to be used on later data using
 |  :meth:`transform`.
 |  
 |  Standardization of a dataset is a common requirement for many
 |  machine learning estimators: they might behave badly if the
 |  individual features do not more or less look like standard normally
 |  distributed data (e.g. Gaussian with 0 mean and 

In [52]:
scaler = StandardScaler()

In [53]:
# Fit only to the training data

scaler.fit(X_train)
help(scaler.fit)
print(scaler.mean_)


Alchol                   12.959699
Acid                      2.351805
Ash                       2.374211
Alcalinit                19.463158
Magnesium               100.894737
Total_phenols             2.282256
Falvanoids                2.030677
Nonflavanoid_phenols      0.364662
Proanthocyanins           1.601654
Color_intensity           4.848722
Hue                       0.968617
OD280                     2.643083
Proline                 753.601504
dtype: float64
Help on method fit in module sklearn.preprocessing._data:

fit(X, y=None) method of sklearn.preprocessing._data.StandardScaler instance
    Compute the mean and std to be used for later scaling.
    
    Parameters
    ----------
    X : {array-like, sparse matrix}, shape [n_samples, n_features]
        The data used to compute the mean and standard deviation
        used for later scaling along the features axis.
    
    y
        Ignored

[1.29596992e+01 2.35180451e+00 2.37421053e+00 1.94631579e+01
 1.00894737e+02 2.28225

In [9]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

### Model Training

Now it is time to train our model. Import  estimator (the Multi-Layer Perceptron Classifier model) from the neural_network library of SciKit-Learn:

In [10]:
from sklearn.neural_network import MLPClassifier
help(MLPClassifier)

Help on class MLPClassifier in module sklearn.neural_network._multilayer_perceptron:

class MLPClassifier(sklearn.base.ClassifierMixin, BaseMultilayerPerceptron)
 |  Multi-layer Perceptron classifier.
 |  
 |  This model optimizes the log-loss function using LBFGS or stochastic
 |  gradient descent.
 |  
 |  .. versionadded:: 0.18
 |  
 |  Parameters
 |  ----------
 |  hidden_layer_sizes : tuple, length = n_layers - 2, default=(100,)
 |      The ith element represents the number of neurons in the ith
 |      hidden layer.
 |  
 |  activation : {'identity', 'logistic', 'tanh', 'relu'}, default='relu'
 |      Activation function for the hidden layer.
 |  
 |      - 'identity', no-op activation, useful to implement linear bottleneck,
 |        returns f(x) = x
 |  
 |      - 'logistic', the logistic sigmoid function,
 |        returns f(x) = 1 / (1 + exp(-x)).
 |  
 |      - 'tanh', the hyperbolic tan function,
 |        returns f(x) = tanh(x).
 |  
 |      - 'relu', the rectified linear 

Next we create an instance of the model,we will only define the hidden_layer_sizes. 
For that we pass in a tuple consisting of the number of neurons you want at each layer. 

In [28]:
mlp = MLPClassifier(activation='tanh', hidden_layer_sizes=(20,20,3),max_iter=1500, learning_rate='adaptive')
#max_iter=500

Now that the model has been made we can fit the training data to our model, remember that this data has already been processed and scaled:

In [29]:
mlp.fit(X_train,y_train)

MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(20, 20, 3), learning_rate='adaptive',
              learning_rate_init=0.001, max_fun=15000, max_iter=1500,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

Play around with parameters and discover what effects they have on your model!

## Predictions and Evaluation

Now that we have a model it is time to use it to get predictions! We can do this simply with the predict() method off of our fitted model:

In [30]:
predictions = mlp.predict(X_test)

Now we can use SciKit-Learn's built in metrics such as a classification report and confusion matrix to evaluate how well our model performed:

In [31]:
from sklearn.metrics import classification_report,confusion_matrix

In [32]:
print(confusion_matrix(y_test,predictions))
#help(confusion_matrix)

[[15  0  0]
 [ 0 16  0]
 [ 0  0 14]]


In [33]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           1       1.00      1.00      1.00        15
           2       1.00      1.00      1.00        16
           3       1.00      1.00      1.00        14

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [34]:
len(mlp.coefs_)

4

In [35]:
len(mlp.coefs_[0])
#mlp.coefs_
#mlp.coefs_[0]

13

In [36]:
len(mlp.intercepts_[0])
mlp.intercepts_

[array([ 0.54330364, -0.47692472, -0.35900682,  0.20153473,  0.22844264,
         0.15490929,  0.19586948, -0.48382469,  0.03894546,  0.14399384,
         0.38000248,  0.16610923, -0.05500069,  0.00155287, -0.46534669,
        -0.38767574, -0.36971464,  0.23168506, -0.51983115, -0.65394608]),
 array([-0.11486412, -0.11992477, -0.40090293, -0.08529271,  0.18102686,
         0.16567977,  0.29798168,  0.42618423, -0.06912531,  0.02125276,
        -0.03499656,  0.13306689, -0.17942612,  0.29902932,  0.15369106,
         0.42951696,  0.21212466,  0.44535375,  0.4869664 ,  0.29267826]),
 array([ 0.51045127, -0.15208679,  0.20899313]),
 array([-0.593127  , -0.90213922, -1.01240905])]


To extract the MLP weights and biases after training your model, you use its public attributes **coefs_** and **intercepts_**. 

**coefs_** is a list of weight matrices, where weight matrix at index i represents the weights between layer i and layer i+1. 

**intercepts_** is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.