# Perceptrons
You should build an end-to-end machine learning pipeline using a perceptron model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Build an end-to-end machine learning pipeline, including a [perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html) model.
- Optimize your pipeline by validating your design decisions.
- Test the best pipeline on the test set and report various [evaluation metrics](https://scikit-learn.org/0.15/modules/model_evaluation.html).  
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, accuracy_score


In [None]:
# Load dataset from the provided link
df = pd.read_csv('https://raw.githubusercontent.com/m-mahdavi/teaching/refs/heads/main/datasets/mnist.csv')

# Separate features and target
X = df.iloc[:, 1:].values  # Features (pixel values)
y = df.iloc[:, 0].values   # Target (digit labels)

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In [None]:
# Create a pipeline with scaling and Perceptron model
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Standardize the features
    ('perceptron', Perceptron(random_state=42))  # Perceptron classifier
])

In [None]:
# Hyperparameter grid for tuning the Perceptron model
param_grid = {
    'perceptron__penalty': [None, 'l2', 'l1', 'elasticnet'],  # Regularization types
    'perceptron__alpha': [0.0001, 0.001, 0.01],  # Regularization strength
    'perceptron__max_iter': [1000, 2000],  # Number of iterations
    'perceptron__eta0': [1.0, 0.1, 0.01]  # Learning rate
}

# Initialize GridSearchCV
grid = GridSearchCV(pipeline, param_grid, cv=3, scoring='accuracy', n_jobs=-1)

# Fit GridSearchCV
grid.fit(X_train, y_train)

# Output the best parameters found
print("Best parameters:", grid.best_params_)

In [None]:
# Predictions on the test set
y_pred = grid.predict(X_test)

# Calculate accuracy
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

# Classification Report
print("Classification Report:")
print(classification_report(y_test, y_pred))