## Exercise 1 Understanding MLPs and Network Architecture

Complete to code to accomplish the following:

1. Train each architecture for 50 epochs
2. Record training and validation accuracy for each
3. Plot the learning curves

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load and prepare data
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

def create_model(layer_sizes):
    model = Sequential()
    model.add(Dense(layer_sizes[0], activation='relu', input_shape=(4,)))
    for size in layer_sizes[1:-1]:
        model.add(Dense(size, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    
    model.compile(optimizer='adam',
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])
    return model

# Test different architectures
architectures = [
    [10, 3],
    [20, 10, 3],
    [30, 20, 10, 3]
]

# TODO: Replace with your code (fill)

How did your different architectures perform?  Why do you think that is?

*Enter your answer in this cell*

## Exercise 2: Impact of Batch Size and Learning Rate


Using the best architecture from Exercise 1, complete the following code to explore how batch size and learning rate affect training:

Tasks:
1. Create a grid of experiment plots testing the different batch sizes and learning rates
2. Each plot should show training and validation curves

In [None]:
# Your code here


Analyze how the different parameters affected:
   - Training speed
   - Final accuracy
   - Stability of training

*Enter your answer in this cell*

## Exercise 3: Comparing MLPs with Traditional Models

<!-- @q -->

1. Complete the MLP comparison code; start with at least a couple of hidden layers. Use relu for the internal layers and adam as an optimizer.
2. Run comparisons on both datasets

In [None]:

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Load datasets
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X_c, y_c = cancer.data, cancer.target

from tensorflow.keras.datasets import fashion_mnist
(X_train_full, y_train_full), (X_test_full, y_test_full) = fashion_mnist.load_data()
X_f = X_train_full[:1000].reshape(1000, -1) / 255.0
y_f = y_train_full[:1000]

def compare_models(X, y, name="Dataset"):
    # Split and scale data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Random Forest
    rf = RandomForestClassifier(n_estimators=100)
    rf.fit(X_train_scaled, y_train)
    rf_score = rf.score(X_test_scaled, y_test)
    
# TODO: Replace with your code (fill)

    return rf_score, mlp_score

# Compare on both datasets
cancer_results = compare_models(X_c, y_c, "Cancer Dataset")
fashion_results = compare_models(X_f, y_f, "Fashion MNIST Subset")



Try varying network architecture (number and width of layers), epochs, and learning rate. Use google to find out more about these two datasets. What do you observe? Does one data set seem "easier" for one of the two methods?  If so why?  If not why not?

*Enter your answer in this cell*

## Exercise 4 Solution: Early Stopping and Overfitting
<!-- @q -->

In the following, complete the code to compare the impact of early stopping for a complex network.  Use the same model parameters (by calling `create_complex_model()`) for both tests.

1. Train the model without early stopping for 400 epochs
2. Implement early stopping with appropriate parameters
3. Visualize training and validation curves 
4. Compare final test performance
4. Visualize decision boundaries for both models

In [None]:

from tensorflow.keras.callbacks import EarlyStopping
from sklearn.datasets import make_moons
import numpy as np
import matplotlib.pyplot as plt

# Generate dataset
X, y = make_moons(n_samples=500, noise=0.3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

def create_complex_model():
    model = Sequential([
        Dense(100, activation='relu', input_shape=(2,)),
        Dense(100, activation='relu'),
        Dense(100, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

# Plot decision boundaries
def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                        np.arange(y_min, y_max, 0.02))
    
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.contourf(xx, yy, Z, alpha=0.4)
    plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.3, s=2)

# TODO: Replace with your code (fill)


What is the impact of early stopping on accuracy?  What do you learn from looking at the decision boundaries?

*Enter your answer in this cell*