## Model Pipeline 
### When to use multilayer perceptron 
A Multilayer Perceptron (MLP) is useful when you have complex data and need a model that can learn non-linear relationships. 


In [1]:
import pandas as pd
import sklearn.model_selection
import sklearn.metrics

### Read the file

In [3]:
df = pd.read_csv("mnist.csv")
df.head()

Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,31953,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34452,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60897,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,36953,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1981,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Data Exploration, Checking for missing values

In [41]:
df.isna().sum()

id          0
class       0
pixel1      0
pixel2      0
pixel3      0
           ..
pixel780    0
pixel781    0
pixel782    0
pixel783    0
pixel784    0
Length: 786, dtype: int64

### Confirmimg data info

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Columns: 786 entries, id to pixel784
dtypes: int64(786)
memory usage: 24.0 MB


### Data Cleaning, Drop ID column 
If we proceed with the ID column this will introduce noice to the data set, the ID numbers have no significant weight to the final output

In [None]:
df.drop('id', axis=1)

### Split data into training and test sets

In [17]:
x = df.drop('class', axis=1)
y = df['class']

In [18]:
x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,y)

### 

In [38]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier

pipeline = Pipeline([('scaler', StandardScaler()),
                     ('mlp', MLPClassifier(
                         hidden_layer_sizes=(20,20),
                         activation='relu',
                         solver='adam',
                         learning_rate_init=0.001,
                         max_iter= 1000,
                         early_stopping=True,
                         random_state=42))
                    ])

model_1 = pipeline.fit(x_train,y_train)

In [39]:
accuracy = pipeline.score(x_train, y_train)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 97.70%


In [40]:
y_predict = model_1.predict(x_test)
accuracy = sklearn.metrics.accuracy_score(y_test, y_predict)
print("Test accuracy:", accuracy)

Test accuracy: 0.895
