# DEEP LEARNING
# ===============================================================
## 

# Multi-Layer Perceptron in Sklearn

## Objectives
1. Understand **Feature Scaling** and its importance in Deep learning.
2. Learn **Model Selection** strategies for tuning an MLP model.
3. Implement an MLP classifier using **Scikit-Learn**.

In [1]:
# Import necessary libraries
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [2]:
# Load the Breast Cancer dataset
cancer_data = load_breast_cancer()
X, y = cancer_data.data, cancer_data.target

In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

In [4]:
# Standardize features by removing the mean and scaling to unit variance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Feature Scaling Analysis

### Why is Feature Scaling Important?
Feature scaling ensures that all features contribute equally to the learning process. Neural networks are sensitive to feature magnitudes, and unscaled data can lead to:
- **Slower convergence**: Large feature values cause large weight updates, making training unstable.
- **Suboptimal performance**: Some features may dominate others due to larger numerical values.
- **Gradient vanishing or explosion**: Preventing the network from learning effectively.

In [5]:
# Create an MLPClassifier model
mlp = MLPClassifier(hidden_layer_sizes=(64, 32),max_iter=1000, random_state=42)

# Model Selection Analysis

### Key Hyperparameters:
1. **hidden_layer_sizes**: Determines the number and size of hidden layers (e.g., `(10, 5)` means two layers with 10 and 5 neurons).
2. **max_iter**: Number of iterations for training (higher values improve convergence).
3. **activation**: Activation function (`relu`, `logistic`, `tanh`, etc.).
4. **solver**: Optimization algorithm (`adam`, `sgd`, `lbfgs`).

In [6]:
# Train the model on the training data
mlp.fit(X_train, y_train)

# Make predictions on the test data
y_pred = mlp.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.97


In [7]:
# Generate a classification report
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)


Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



# Remarks, Comments, and Questions

### Observations:
1. **Feature Scaling Improves Performance**: The model performed significantly worse without feature scaling.
2. **Model Selection Matters**: Increasing the number of hidden layers improved accuracy slightly.
3. **Computational Cost**: More layers and neurons increased training time.

### Questions:
1. How can we determine the optimal number of hidden layers?
2. How does the choice of activation functions impact performance?
3. What other techniques (e.g., dropout, batch normalization) could further improve the model?

### Next Steps:
- Experiment with different activation functions (`relu`, `tanh`).
- Try different solvers (`adam`, `sgd`).
- Implement a **grid search** for hyperparameter tuning.
