# 19. Multi-Layer Perceptron (MLP) - Part 1
Since I couldn't get CNN to work in the last notebook, I decided to try out another NN. This time one that was present already in the SKLearn library; MLP. https://www.kaggle.com/ahmethamzaemra/mlpclassifier-example

## Preprocessing

In [12]:
import pandas as pd
from preprocessing import PreProcessor

pp = PreProcessor()

df = pd.read_csv('Structured_DataFrame_Sample_500.csv', index_col=0)
df['Item Description'] = df['Item Description'].apply(lambda d: pp.preprocess(str(d)))
df

Unnamed: 0,Category,Item Description,category_id
40127,Counterfeits/Watches,emporio armani ar shell case ceram bracelet re...,0
40126,Counterfeits/Watches,cartiertank ladi brand cartier seri tank gende...,0
40125,Counterfeits/Watches,patek philipp watch box patek philipp watch bo...,0
40130,Counterfeits/Watches,breitl navitim cosmonaut replica watch inform ...,0
40129,Counterfeits/Watches,emporio armani men ar dial color gari watch re...,0
...,...,...,...
15401,Services/Money,canada cc get card number cvv expiri date name...,29
15402,Services/Money,uk debit card take chanc buy uk visa debit car...,29
15403,Services/Money,itali card detail high valid fresh itali card ...,29
15404,Services/Money,centurionblack cc get us centurion cc card num...,29


## Vectorizing

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2))
features = tfidf.fit_transform(df['Item Description'])
labels = df.Category

features

<15000x16833 sparse matrix of type '<class 'numpy.float64'>'
	with 390081 stored elements in Compressed Sparse Row format>

## Splitting

In [14]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test, indices_train, indices_test = train_test_split(features, labels, df.index, test_size=0.33, random_state=0)

X_train

<10050x16833 sparse matrix of type '<class 'numpy.float64'>'
	with 261715 stored elements in Compressed Sparse Row format>

## Creating the model

In [15]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

clf = MLPClassifier(hidden_layer_sizes=(100,100,100), max_iter=500, learning_rate='constant', 
                    learning_rate_init=0.00001, alpha=0.001, solver='adam', verbose=1, 
                    random_state=21, tol=0.000001)
clf

MLPClassifier(activation='relu', alpha=0.001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100, 100, 100), learning_rate='constant',
       learning_rate_init=1e-05, max_iter=500, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=21, shuffle=True, solver='adam', tol=1e-06,
       validation_fraction=0.1, verbose=1, warm_start=False)

## Training the model

In [16]:
clf.fit(X_train, y_train)

Iteration 1, loss = 3.42113749
Iteration 2, loss = 3.41865994
Iteration 3, loss = 3.41621672
Iteration 4, loss = 3.41375455
Iteration 5, loss = 3.41120135
Iteration 6, loss = 3.40855421
Iteration 7, loss = 3.40579565
Iteration 8, loss = 3.40287692
Iteration 9, loss = 3.39983792
Iteration 10, loss = 3.39665785
Iteration 11, loss = 3.39332126
Iteration 12, loss = 3.38981745
Iteration 13, loss = 3.38609759
Iteration 14, loss = 3.38217820
Iteration 15, loss = 3.37801807
Iteration 16, loss = 3.37362137
Iteration 17, loss = 3.36896821
Iteration 18, loss = 3.36403671
Iteration 19, loss = 3.35879068
Iteration 20, loss = 3.35323236
Iteration 21, loss = 3.34738021
Iteration 22, loss = 3.34121173
Iteration 23, loss = 3.33472789
Iteration 24, loss = 3.32798818
Iteration 25, loss = 3.32101264
Iteration 26, loss = 3.31377140
Iteration 27, loss = 3.30626940
Iteration 28, loss = 3.29839691
Iteration 29, loss = 3.29014049
Iteration 30, loss = 3.28153784
Iteration 31, loss = 3.27264257
Iteration 32, los

Iteration 253, loss = 0.25252436
Iteration 254, loss = 0.25003675
Iteration 255, loss = 0.24758915
Iteration 256, loss = 0.24518045
Iteration 257, loss = 0.24280163
Iteration 258, loss = 0.24045084
Iteration 259, loss = 0.23814067
Iteration 260, loss = 0.23583910
Iteration 261, loss = 0.23362260
Iteration 262, loss = 0.23136761
Iteration 263, loss = 0.22915456
Iteration 264, loss = 0.22699930
Iteration 265, loss = 0.22484462
Iteration 266, loss = 0.22274859
Iteration 267, loss = 0.22064996
Iteration 268, loss = 0.21861120
Iteration 269, loss = 0.21657733
Iteration 270, loss = 0.21459794
Iteration 271, loss = 0.21258125
Iteration 272, loss = 0.21066219
Iteration 273, loss = 0.20874434
Iteration 274, loss = 0.20681151
Iteration 275, loss = 0.20491606
Iteration 276, loss = 0.20307715
Iteration 277, loss = 0.20124976
Iteration 278, loss = 0.19944015
Iteration 279, loss = 0.19765304
Iteration 280, loss = 0.19587317
Iteration 281, loss = 0.19410711
Iteration 282, loss = 0.19237425
Iteration 



MLPClassifier(activation='relu', alpha=0.001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100, 100, 100), learning_rate='constant',
       learning_rate_init=1e-05, max_iter=500, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=21, shuffle=True, solver='adam', tol=1e-06,
       validation_fraction=0.1, verbose=1, warm_start=False)

## Validating the model

In [17]:
y_pred = clf.predict(X_test)

In [18]:
accuracy_score(y_test, y_pred)

0.908080808080808

## Save the model

In [62]:
import pickle

filename = 'MLP_500_Model.sav'
pickle.dump(clf, open(filename, 'wb'))

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)
print(result)

0.908080808080808


## Conclusion
The model has a pretty good accuracy score. Just a few percent lower than linear svc without a neural network. The learning rate concerns me though. The loss keeps declining slower and slower. This probably means the model is  overfitting. I tried to solve this by decreasing the tolerence so the model would stop training sooner. This did not really helpt though. I should probably adjust the value or the number of the weights. The most commonly used is to adjust the value of the weights by altering the penalty (alpha in this case). This is exactly what I tried. The result was that the model showed a decrease on a different number than zero. Aka; it evened out around 0.2 instead of 0.00. This is a good sign. In theory, the model should be more generalized now. 

https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html

When I tried increasing the penalty though, the higher the training loss ended on, the weaker the model performed. I don't know why this is.

In notebook 21 I will compare this score to other models.