# Tutorial 2 Multi-layered perceptron

**Step 1: Importing necessary libraries and packages**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, accuracy_score

**Step 2: Loading and Splitting the Dataset**

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data # Features
y = iris.target # Labels

# Split the data into training and testing sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

**Step 3: Data Scaling**

In [None]:
# Standardize the features to have a mean of 0 and a standard deviation of 1
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # Fit on training data and transform
X_test_scaled = scaler.transform(X_test) # Transform test data (using same scaling)

**Step 4: Creating and Training the MLP Classifier**

In [None]:
# Create an MLP classifier with two hidden layers of 10 neurons each
mlp = MLPClassifier(hidden_layer_sizes=(10, 10), max_iter=1000, random_state=42, learning_rate_init=0.001)

# Train the MLP classifier on the scaled training data
mlp.fit(X_train_scaled, y_train)

**Step 5: Making Predictions and Evaluating the Model**

In [None]:
# Predict the test set results
y_pred = mlp.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Print detailed classification report
print("Classification Report:\n", classification_report(y_test, y_pred))

**Step 6: Displaying the MLP Structure and Training Information**

In [None]:
# Display the structure of the MLP classifier
print("\nMLP Structure:")
print(f"Number of layers: {mlp.n_layers_}")
print(f"Number of outputs: {mlp.n_outputs_}")
print(f"Activation function: {mlp.activation}")
print(f"Output activation function: {mlp.out_activation_}")
print(f"Number of epochs: {mlp.n_iter_}")

**Step 7: Visualizing the Learning Curve**

In [None]:
# Plot the loss curve
plt.figure(figsize=(8, 6))
plt.plot(mlp.loss_curve_, label='Training Loss')
plt.title('MLP Classifier Learning Curve')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()
plt.show()

**Task 1: Experiment with Different Architectures**

In [None]:
# Task 1: Experiment with Different Architectures

# Define a function to train and evaluate a model with a given architecture
def train_and_evaluate(hidden_layer_sizes, activation='relu'):
    print(f"\n--- Training with hidden layers: {hidden_layer_sizes}, activation: {activation} ---")
    mlp = MLPClassifier(hidden_layer_sizes=hidden_layer_sizes, activation=activation, max_iter=1000, random_state=42, learning_rate_init=0.001)
    mlp.fit(X_train_scaled, y_train)
    y_pred = mlp.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")
    print("Classification Report:\n", classification_report(y_test, y_pred))

    plt.figure(figsize=(8, 6))
    plt.plot(mlp.loss_curve_, label='Training Loss')
    plt.title(f'Learning Curve with Hidden Layers {hidden_layer_sizes} and {activation} Activation')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid()
    plt.show()
    return accuracy

# Experiment with different numbers of neurons and layers
architectures = [(5, ), (20, 10), (50, 20, 10)]
for arch in architectures:
    train_and_evaluate(arch)

# Experiment with different activation functions
train_and_evaluate(hidden_layer_sizes=(10, 10), activation='tanh')
train_and_evaluate(hidden_layer_sizes=(10, 10), activation='logistic')

How does increasing the number of layers or neurons affect the
accuracy and learning curve?
What configuration gives the best performance?

Based on the output from the experiment with different architectures:

Number of layers and neurons: Increasing the number of layers or neurons (from 5 neurons in one layer to 20, 10 neurons in two layers, and then to 50, 20, 10 neurons in three layers) did not significantly change the accuracy in this case; all three architectures achieved 1.00 accuracy on the test set. The learning curves show that all models converged, although the simplest model with one hidden layer of 5 neurons took slightly longer to converge and had a less smooth loss curve.

Activation functions: Experimenting with different activation functions showed that tanh also achieved 1.00 accuracy, similar to relu. However, the logistic activation function resulted in a slightly lower accuracy of 0.98 and showed a warning about not converging within the maximum iterations, indicating it might not be as suitable for this dataset with the given parameters.
    

    

In this particular experiment, all tested architectures with relu and tanh activation achieved perfect accuracy on the test set. The simplest architecture with one hidden layer of 5 neurons, while showing a slightly less smooth learning curve, also achieved 1.00 accuracy, suggesting that for this dataset, a very complex model might not be necessary.

Given the performance on the test set, the configuration with hidden_layer_sizes=(5,) and activation='relu' or activation='tanh' would be considered among the best performing and also the simplest.

**Task 2: Change the Learning Rate**

In [None]:
# Task 2: Change the Learning Rate

# Define a function to train and evaluate a model with a given learning rate
def train_with_learning_rate(learning_rate):
    print(f"\n--- Training with learning rate: {learning_rate} ---")
    mlp = MLPClassifier(hidden_layer_sizes=(10, 10), max_iter=1000, random_state=42, learning_rate_init=learning_rate)
    mlp.fit(X_train_scaled, y_train)
    y_pred = mlp.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.2f}")

    plt.figure(figsize=(8, 6))
    plt.plot(mlp.loss_curve_, label='Training Loss')
    plt.title(f'Learning Curve with Learning Rate {learning_rate}')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid()
    plt.show()

# Experiment with different learning rates
learning_rates = [0.1, 0.01, 0.0001]
for lr in learning_rates:
    train_with_learning_rate(lr)

How does changing the learning rate affect the loss curve and the number of epochs? What learning rate provides the best balance between convergence speed and model performance?

Based on the output from the experiment with different learning rates:

Effect on loss curve and epochs:
        A learning rate of 0.1 shows a rapid initial decrease in loss, but the curve is less smooth and might overshoot the minimum. It also converged quickly with fewer epochs.
        A learning rate of 0.01 shows a smoother and more gradual decrease in loss, converging effectively and achieving high accuracy.
        A learning rate of 0.0001 shows a very slow decrease in loss and a warning about not converging within the maximum iterations, indicating that it would require many more epochs to potentially reach a better loss.
        
Best balance:
        A learning rate of 0.01 appears to provide the best balance between convergence speed and model performance in this experiment. It achieved the highest accuracy (1.00) among the tested learning rates and the loss curve indicates a stable and effective convergence without requiring an excessive number of epochs. While a learning rate of 0.1 also achieved good accuracy (0.96), the less smooth loss curve suggests potential instability during training. The very low learning rate of 0.0001 resulted in poor performance and slow convergence.

    

