## Week 3 (13/05/24)

# Perceptrons
You should build an end-to-end machine learning pipeline using a perceptron model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Build an end-to-end machine learning pipeline, including a [perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html) model.
- Optimize your pipeline by validating your design decisions.
- Test the best pipeline on the test set and report various [evaluation metrics](https://scikit-learn.org/0.15/modules/model_evaluation.html).  
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.linear_model import Perceptron
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import SGDClassifier

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/m-mahdavi/teaching/main/datasets/mnist.csv')
df.head(21)

Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,31953,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34452,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60897,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,36953,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1981,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,61207,2,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,33799,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,5414,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,61377,6,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,1875,9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
df_test, df_train = train_test_split(df)

print ('df size: ', df.shape)
print ('df_train size: ', df_train.shape)
print ('df_test size: ', df_test.shape)

df size:  (4000, 786)
df_train size:  (1000, 786)
df_test size:  (3000, 786)


In [None]:
x_train = df_train.drop(['id','class'], axis = 1)
y_train = df_train['class']

x_test = df_test.drop(['id','class'], axis = 1)
y_test = df_test['class']
###########################
print('x_train size: ', x_train.shape)
print('y_train size: ', y_train.shape)

print('x_test size: ', x_test.shape)
print('y_test size: ', y_test.shape)

x_train size:  (1000, 784)
y_train size:  (1000,)
x_test size:  (3000, 784)
y_test size:  (3000,)


In [None]:
param_grid = {
    'loss' : ['hinge','perceptron', 'log_loss'],
    'learning_rate' : ['constant', 'optimal', 'invscaling', 'adaptive'],
    'max_iter' : [100, 1000, 10000],
    'penalty' : ['l2', 'l1', 'elasticnet']
}

Tunning_SGD = GridSearchCV(estimator =
                                      SGDClassifier(),
                                      param_grid = param_grid
                                      )

clf = make_pipeline(StandardScaler(), Tunning_SGD)
# You should tune learning rate, max iterations (epochs), penalty
clf.fit(x_train, y_train)