# Multi-Layer Perceptron (MLP): Advanced Tutorial

This notebook demonstrates the use of MLPs (fully connected feed-forward neural networks) for classification using synthetic and real-world data.
We cover:
- Theory and architecture
- Using `sklearn.neural_network.MLPClassifier`
- Training, tuning, and evaluation
- Real-world dataset: digits recognition
- Activation functions and hidden layers

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_moons, load_digits
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay

import warnings
warnings.filterwarnings("ignore")
sns.set(style="whitegrid")


## 2. What is an MLP?

A Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network that consists of:
- An input layer
- One or more hidden layers (fully connected)
- An output layer

Each neuron applies a weighted sum followed by an **activation function** such as ReLU or tanh.  
MLPs can model complex, nonlinear decision boundaries.

We use `MLPClassifier` from `sklearn.neural_network` for classification tasks.


## 3. Synthetic Data: Two-Class Moons Dataset

In [None]:
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

plt.figure(figsize=(8, 6))
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, palette="coolwarm")
plt.title("Synthetic Moon-Shaped Dataset")
plt.show()


## 4. Train an MLP Classifier

In [None]:
mlp = MLPClassifier(hidden_layer_sizes=(100,), activation='relu', max_iter=1000, random_state=42)
mlp.fit(X_train_scaled, y_train)
y_pred = mlp.predict(X_test_scaled)

print(classification_report(y_test, y_pred))
ConfusionMatrixDisplay.from_estimator(mlp, X_test_scaled, y_test, cmap='Blues')
plt.title("Confusion Matrix - MLP on Moon Data")
plt.show()


## 5. Real Dataset: Digits Classification

In [None]:
digits = load_digits()
X_digits = digits.data
y_digits = digits.target

X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(X_digits, y_digits, test_size=0.3, random_state=42)

scaler_d = StandardScaler()
X_train_d_scaled = scaler_d.fit_transform(X_train_d)
X_test_d_scaled = scaler_d.transform(X_test_d)

mlp_digits = MLPClassifier(hidden_layer_sizes=(100,), activation='relu', max_iter=1000, random_state=42)
mlp_digits.fit(X_train_d_scaled, y_train_d)

y_pred_d = mlp_digits.predict(X_test_d_scaled)
print(classification_report(y_test_d, y_pred_d))
ConfusionMatrixDisplay.from_estimator(mlp_digits, X_test_d_scaled, y_test_d, cmap='Purples')
plt.title("Confusion Matrix - Digits MLP")
plt.show()


## 6. Hyperparameter Tuning

In [None]:
param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)],
    'activation': ['relu', 'tanh'],
    'alpha': [0.0001, 0.001],
}

grid_search = GridSearchCV(MLPClassifier(max_iter=1000, random_state=42), param_grid, cv=3, n_jobs=-1)
grid_search.fit(X_train_d_scaled, y_train_d)

print("Best parameters:", grid_search.best_params_)
print("Best cross-val score:", grid_search.best_score_)


## 7. Summary

- MLPs can model complex patterns via hidden layers and activation functions
- Proper scaling is essential for convergence
- GridSearchCV helps tune depth, width, activation
- MLPs are flexible and widely used in structured data classification