# Making an MLP Classifier

MLP = multi-layer perceptron

This is a 3 layer neural net that is used for classifying data

link that i used for reference: 

https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification

In [1]:
# import libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.neural_network import MLPClassifier


_it trains on two arrays: the first being the training data, the 2nd being the labels_

In [49]:
data = (pd.read_csv('nisei_set1_labelled_matrix.csv', 
                    index_col = 0))
data = data.fillna(0) # remove all na values to be 0 -- just a precaution, the data should be fine 
data.head()

Unnamed: 0,。,目,｜,此の,一,地,為る,把,日,中,...,本,二十,七,占い,部,ぁ,間,‘,学,三十
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [53]:
# data = data.drop(['。', '｜'], axis = 1) # tokens I know aren't good

data.shape[1]

94

In [187]:
prop_split = 0.8
n_split = int(data.shape[1] * prop_split)
X = data.values
X_train = X[0:n_split] # train on 80 out of 96
X_test = X[n_split:]

Y = pd.read_csv("C:\\Users\\alica\\Documents\\URAP\\data\\nisei_set1_labelled.csv")['label'].values
Y_train = Y[0:n_split]
Y_test = Y[n_split:]

In [188]:
X_train, Y_train

(array([[1., 1., 1., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 1., ..., 0., 0., 0.],
        [0., 0., 1., ..., 0., 0., 0.],
        [1., 1., 1., ..., 0., 1., 0.]]),
 array([ 1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,
         1,  1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  0,  1,  1,  1,  1,  1,
        -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1], dtype=int64))

In [189]:
clf = (MLPClassifier(solver = 'lbfgs', 
                     alpha = 1e-5, 
                     hidden_layer_sizes = (5,2), 
                     random_state = 1))

clf.fit(X_train,Y_train)

In [190]:
X_pred = clf.predict(X_test)

In [191]:
np.mean(Y_test == X_pred) # accuracy of model  

0.72

_this model has an accuracy of 88.5%. we should expand this to more data points that have a more diverse spread of variables! As we have run into the issue that a lot of the data points fit into one label_

In [192]:
np.mean(Y) # heavily leaning towards class 1

0.77

## Finding which group has the highest error: 

Sensitivity, specificity, precision, and accuracy

In [193]:
def sensitivity(TP, FN):
    '''return the proportion of your positive labels that are labeled positive'''
    return TP / (TP + FN) 

def specificity(TN, FP):
    return TN/(TN + FP)

def precision(TP, FP):
    return TP/(TP + FP)

def negative_predictive_value(TN, FN):
    return TN/(FN + TN)

## Fitting for the best parameters: Size of Layers

_hidden layer sizes can be of size n_layers -2, you pick the number of neurons per layer._

solver = lbfgs, sgd -- stochastic gradient descent (what im used to), or adam


In [194]:
def get_accuracy(solver = 'lbfgs', alpha = 1e-5, layer_sizes = (5,2)):
    clf = (MLPClassifier(solver = solver, 
                     alpha = alpha, 
                     hidden_layer_sizes = layer_sizes, 
                     random_state = 1))

    clf.fit(X_train,Y_train)
    X_pred = clf.predict(X_test)

    return clf.score(X_test, Y_test)
    

In [195]:
alpha_values = np.linspace(start = 1e-6, stop = 0.005, num = 20)
accuracies = []

for a in alpha_values:
    acc = get_accuracy(alpha = a)
    # print(f'Alpha = {a} \nAccuracy = {acc}\n')
    accuracies.append(acc)

alpha_df = pd.DataFrame(data = {'Alpha':alpha_values, 'Accuracy':accuracies})

In [196]:
alpha_train, alpha_accuracy = alpha_df.sort_values('Accuracy', ascending= False).iloc[0].values

In [197]:
alpha_train, alpha_accuracy * 100

(0.0026320526315789477, 84.0)

_So we have determined the best alpha value (with the given parameters) so we can go ahead and look at other things as well, like layer size and such_

## Making a neural net to filter out the mis-translations

There aren't many mis-translations, but this could be more prevalent in later translations

In [198]:
np.sum(Y == 0) # there is only one lollll

1

In [199]:
np.sum(Y == -1) # incorrect usages

11

_there are 11 negatives, so since we have a biased data set towards positives, 
we should minimize those!_

this means maximizing the **negative predictability**

In [200]:
def train_model(solver = 'lbfgs', alpha = 1e-5, layer_sizes = (5,2)):
    clf = (MLPClassifier(solver = solver, 
                     alpha = alpha, 
                     hidden_layer_sizes = layer_sizes, 
                     random_state = 1))

    clf.fit(X_train,Y_train)
    X_pred = clf.predict(X_test) 
    return clf

In [201]:
model = train_model(alpha = alpha_train)

In [202]:
predictions = model.predict(X_test)

In [203]:
model.score(X_test,Y_test)

0.84

In [204]:
np.sum(Y_test == 1)/len(Y_test)

0.88

In [205]:
def get_negative_accuracy(predicitions, labels):
    TN, FN = 0,0
    for i,p in enumerate(predictions):
        if p == -1: # predicted a negative
            TN += (p == labels[i])
            FN += (p != labels[i])
    if TN + FN > 0:
        return negative_predictive_value(TN, FN)
    else: 
        return 'no negatives found'


In [206]:
# now let's find the best learning rate for getting negatives: 

for a in alpha_values:
    model = train_model(alpha = a)
    preds = model.predict(X_test)
    print(get_negative_accuracy(preds, Y_test))

0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333
0.3333333333333333


_seems learning rate doesn't affect this, lets try to change the layer sizes?_

In [222]:
n = 20

for i in range(8,n):
    for j in range(4, i-2):
        for k in range(2,j-2):
            model = train_model(layer_sizes = (i,j,k))
            preds = model.predict(X_test)
            score = get_negative_accuracy(preds, Y_test)
            if score > 1/3:
                print(f'Layer sizes: {(i,j)}, score: {score}')

ABNORMAL: .

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


_okay so there's nothing there sadly!_