# Multilayer Perceptron using scikit library for Python

## Implementing a Multilayer Perceptron in Python is very easy. Scikit is a library for Python used for data mining and analysis.

## The documentation for the MLPClassifier can be found <a href="http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html" target="_blank">here</a>.

# Part 1: Importing the data set

### Import your data as a pandas DataFrame.

In [43]:
import pandas as pd

data = pd.read_csv('animals.csv')
data

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,...,fins,2 legs,4 legs,5 legs,6 legs,8 legs,tail,domestic,catsize,class
0,0,1,1,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,1,0,birds
1,0,1,1,0,1,0,1,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
2,0,1,1,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,1,0,birds
3,0,1,1,0,1,1,0,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
4,0,1,1,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,1,birds
5,0,1,1,0,1,1,1,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
6,0,1,1,0,1,0,1,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
7,0,1,1,0,0,0,1,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
8,0,1,1,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,0,birds
9,0,1,1,0,0,0,0,0,1,1,...,0,1,0,0,0,0,1,0,1,birds


# Part 2: Preparing the data set for training and testing

### The classifier requires a training set and a test set. For each set, the column containing the class label must be separated from the actual data. Doing this step first also prevents the class label from being affected by normalization.

### The <code>drop</code> method removes a column given a header name. It is also important to include the <code>axis=1</code> parameter to indicate that a column is to be removed. If the said parameter is not specified, the method will attempt to delete a row.

In [44]:
x = data.drop('class',axis=1)
x

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,2 legs,4 legs,5 legs,6 legs,8 legs,tail,domestic,catsize
0,0,1,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0
1,0,1,1,0,1,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0
2,0,1,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,1,0
3,0,1,1,0,1,1,0,0,1,1,0,0,1,0,0,0,0,1,0,0
4,0,1,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1
5,0,1,1,0,1,1,1,0,1,1,0,0,1,0,0,0,0,1,0,0
6,0,1,1,0,1,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0
7,0,1,1,0,0,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0
8,0,1,1,0,1,0,0,0,1,1,0,0,1,0,0,0,0,1,0,0
9,0,1,1,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0,1


### To separate the class labels, extract the class labels column and assign it to a new DataFrame.

In [45]:
y = data['class']
y

0           birds
1           birds
2           birds
3           birds
4           birds
5           birds
6           birds
7           birds
8           birds
9           birds
10          birds
11          birds
12          birds
13          birds
14          birds
15          birds
16          birds
17          birds
18          birds
19          birds
20     crustacean
21     crustacean
22     crustacean
23     crustacean
24     crustacean
25     crustacean
26     crustacean
27     crustacean
28     crustacean
29     crustacean
          ...    
71        mammals
72        mammals
73        mammals
74        mammals
75        mammals
76        mammals
77        mammals
78        mammals
79        mammals
80        mammals
81        mammals
82        mammals
83        mammals
84        mammals
85        mammals
86        mammals
87        mammals
88        mammals
89        mammals
90        mammals
91        mammals
92        saurian
93        saurian
94        saurian
95        

### Scikit also offers a method for easily splitting your data set into training set and test set so that you don't have to do it manually. This is done using <code>train_test_split()</code>.

### To specify the number of training data to be used, the following parameters are available:
<ul>
<li><code>train_size</code></li>
<li><code>test_size</code></li>
</ul>

In [46]:
from sklearn.model_selection import train_test_split

#70% training and 30% testing
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.7)
x_train

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,2 legs,4 legs,5 legs,6 legs,8 legs,tail,domestic,catsize
33,0,0,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0,1,0,0
40,0,0,1,0,0,1,0,1,1,0,0,1,0,0,0,0,0,1,0,0
71,1,0,0,1,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,1
99,0,0,1,0,0,1,0,1,1,1,0,0,0,1,0,0,0,0,0,0
38,0,0,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0,1,0,0
25,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,1,1,0,0
24,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1
23,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0
81,1,0,0,1,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,1
18,0,1,1,0,1,0,1,0,1,1,0,0,1,0,0,0,0,1,0,1


### Before the actual learning process, the data must be normalized first. MLP works best if the data is scaled in terms of [0,1] or [-1,1] ranges. Scikit offers <code>MinMaxScaler</code> that allows scaling of the data.

In [47]:
from sklearn.preprocessing import MinMaxScaler

#Sigmoid activation
#scaler = MinMaxScaler(feature_range=(0,1))

#TANH activation
scaler = MinMaxScaler(feature_range=(-1,1))

x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
x_train

array([[-1., -1.,  1., ...,  1., -1., -1.],
       [-1., -1.,  1., ...,  1., -1., -1.],
       [ 1., -1., -1., ...,  1., -1.,  1.],
       ..., 
       [ 1., -1., -1., ...,  1., -1.,  1.],
       [ 1., -1., -1., ..., -1., -1.,  1.],
       [ 1., -1., -1., ...,  1., -1., -1.]])

### MLP training can be accomplished using one line of code! Import the MLPClassifier from scikit and adjust the parameters to train the model.

### Some of the parameters are:
<ul>
<li><code>hidden_layer_sizes(x,y,z...)</code>: Each number included in the parameter indicated the number of hidden <em>nodes</em> for one hidden <em>layer</em></li>
<li><code>activation</code>: 'identity', 'logistic', 'tanh', 'relu'</li>
<li><code>max_iter</code>: an integer value representing the maximum number of training iterations</li>
<li><code>learning_rate_init</code>: a double value representing the learning rate to be used</li>
</ul>

In [64]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(5),max_iter=10000,learning_rate_init=0.001,activation='tanh')
mlp.fit(x_train,y_train)

MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=5, learning_rate='constant',
       learning_rate_init=0.001, max_iter=10000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [65]:
predictions = mlp.predict(x_test)

In [66]:
from sklearn.metrics import classification_report,confusion_matrix

print (confusion_matrix(y_test,predictions))

[[ 3  0  0  0  0  0  0]
 [ 0  3  0  0  0  0  0]
 [ 0  0  5  0  0  0  0]
 [ 0  0  0  2  0  0  0]
 [ 0  0  0  0 16  0  0]
 [ 0  0  1  0  0  0  0]
 [ 0  0  0  0  0  0  1]]


In [67]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

      birds       1.00      1.00      1.00         3
 crustacean       1.00      1.00      1.00         3
       fish       0.83      1.00      0.91         5
    insects       1.00      1.00      1.00         2
    mammals       1.00      1.00      1.00        16
    saurian       0.00      0.00      0.00         1
      toads       1.00      1.00      1.00         1

avg / total       0.94      0.97      0.95        31



  'precision', 'predicted', average, warn_for)
