In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("ws12.ipynb")

In [None]:
rng_seed = 70

In [None]:
#imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import scipy as sp
import pandas as pd
import sklearn
#below line allows matplotlib plots to appear in cell output
%matplotlib inline

# **Question 1**: Binary Classification on the Two Moons Dataset

In this question, you'll explore **binary classification** using machine learning methods on a classic synthetic dataset: the **two moons dataset**. This dataset consists of two interleaving half-moon shapes, making it a challenging problem for linear classifiers but ideal for demonstrating the power of non-linear methods.

## Background: Binary Classification

**Binary classification** is a supervised learning task where we want to predict which of two classes a data point belongs to. Given:
- **Features**: $\mathbf{x} = (x_1, x_2, ..., x_n)$ - the input variables describing each data point
- **Labels**: $y \in \{0, 1\}$ - the class each data point belongs to

We want to learn a function $f(\mathbf{x})$ that predicts the correct label for new, unseen data points.

## The Two Moons Dataset

The two moons dataset is a 2D dataset where:
- Each point has 2 features: $(x_1, x_2)$ representing coordinates in the plane
- Points belong to one of two classes (0 or 1), forming two interleaving half-moon shapes
- The dataset can include noise, making the classification boundary less clear

This dataset is particularly useful for testing classification algorithms because:
1. It's **not linearly separable** - you can't draw a straight line to perfectly separate the classes
2. It's **visually interpretable** - we can plot the data and decision boundaries in 2D
3. It tests an algorithm's ability to learn **non-linear decision boundaries**

## **Part A**: Generate Two Moons Dataset

Implement `generate_two_moons(n_samples, noise, show_plot=False)` that generates the two moons dataset using scikit-learn.

**Requirements:**
- Use `sklearn.datasets.make_moons(n_samples=n_samples, noise=noise, random_state=rng_seed)`
- Return features `X` as a numpy array of shape `(n_samples, 2)` and labels `y` as a numpy array of shape `(n_samples,)`
- If `show_plot=True`, create a scatter plot showing the two classes with different colors
- Plot configuration (if `show_plot=True`):
  - Figure size: (10, 8)
  - Use `plt.scatter()` to plot each class separately
  - Class 0: color='blue', label='Class 0', s=50, alpha=0.6
  - Class 1: color='red', label='Class 1', s=50, alpha=0.6
  - X-axis label: "Feature 1"
  - Y-axis label: "Feature 2"
  - Title: "Two Moons Dataset"
  - Grid with alpha=0.3
  - Legend
  - Equal aspect ratio using `ax.set_aspect('equal')`
- Only call `plt.show()` if `show_plot=True`
- Return the figure object (or None if `show_plot=False`)

**Parameters:**
- `n_samples`: int, total number of data points to generate
- `noise`: float, standard deviation of Gaussian noise added to the data
- `show_plot`: bool, default False. If True, display the plot

**Returns:**
- `X`: numpy array of shape (n_samples, 2), the feature matrix
- `y`: numpy array of shape (n_samples,), the label vector
- `fig`: matplotlib figure object (or None if show_plot=False)

In [None]:
def generate_two_moons(n_samples, noise, show_plot=False):
    from sklearn.datasets import make_moons
    
    # Generate the two moons dataset using make_moons
    # Use random_state=rng_seed for reproducibility
    
    
    #   - Plot Class 0 points in blue
    #   - Plot Class 1 points in red
    #   - Set appropriate labels, title, grid, legend
    #   - Set equal aspect ratio
    
    if show_plot:
        # Show plot if requested
        plt.show()
    
    return X, y, fig

In [None]:
# Example: Generate and visualize the two moons dataset with different noise levels

# Example 1: Low noise
print("Example 1: Two moons with low noise (0.05)")
X1, y1, fig1 = generate_two_moons(n_samples=200, noise=0.05, show_plot=True)
print(f"Data shape: X={X1.shape}, y={y1.shape}")
print()

# Example 2: Medium noise
print("Example 2: Two moons with medium noise (0.15)")
X2, y2, fig2 = generate_two_moons(n_samples=200, noise=0.15, show_plot=True)
print(f"Data shape: X={X2.shape}, y={y2.shape}")
print()

# Example 3: High noise
print("Example 3: Two moons with high noise (0.25)")
X3, y3, fig3 = generate_two_moons(n_samples=200, noise=0.25, show_plot=True)
print(f"Data shape: X={X3.shape}, y={y3.shape}")

In [None]:
grader.check("q1a")

## **Part B**: Linear Support Vector Machine Classification

Implement `classify_with_svm(X, y, show_plot=False)` that trains a **Linear Support Vector Machine (SVM)** classifier and visualizes the decision boundary.

### Background: Support Vector Machines

A **Support Vector Machine (SVM)** is a powerful classification algorithm that finds the optimal hyperplane (a line in 2D, a plane in 3D, etc.) that separates the classes with the maximum margin.

**Linear SVM** finds a linear decision boundary:
$$f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b$$

where:
- $\mathbf{w}$ is the weight vector (perpendicular to the decision boundary)
- $b$ is the bias term
- Points are classified based on the sign of $f(\mathbf{x})$

**Key Limitation**: Linear SVMs can only learn linear decision boundaries. This works well for linearly separable data but struggles with datasets like the two moons that require non-linear boundaries.

### Your Task

**Requirements:**

1. **Train the classifier**:
   - Use `sklearn.svm.LinearSVC(random_state=rng_seed, max_iter=10000)`
   - Fit the model using `classifier.fit(X, y)`

2. **Create visualization meshgrid**:
   - Determine the range: `x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5`
   - Determine the range: `y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5`
   - Create meshgrid with step size 0.02: `xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))`
   - Flatten and stack: `mesh_points = np.c_[xx.ravel(), yy.ravel()]`
   - Predict on mesh: `Z = classifier.predict(mesh_points)`
   - Reshape predictions: `Z = Z.reshape(xx.shape)`

3. **Plot configuration** (if `show_plot=True`):
   - Figure size: (10, 8)
   - Use `ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdBu_r')` to show decision regions
   - Scatter plot training data with Class 0 in blue and Class 1 in red (same as Part A)
   - X-axis label: "Feature 1"
   - Y-axis label: "Feature 2"
   - Title: "Linear SVM Decision Boundary"
   - Grid with alpha=0.3
   - Legend
   - Equal aspect ratio using `ax.set_aspect('equal')`
   - Only call `plt.show()` if `show_plot=True`

**Parameters:**
- `X`: numpy array of shape (n_samples, 2), feature matrix from Part A
- `y`: numpy array of shape (n_samples,), label vector from Part A
- `show_plot`: bool, default False. If True, display the plot

**Returns:**
- `classifier`: trained LinearSVC object
- `fig`: matplotlib figure object (or None if show_plot=False)

In [None]:
def classify_with_svm(X, y, show_plot=False):
    from sklearn.svm import LinearSVC
    
    # Train the Linear SVM classifier
    # Use: LinearSVC(random_state=rng_seed, max_iter=10000)
    # Fit using: classifier.fit(X, y)
    
    # Create plot if show_plot=True:
    #   1. Create meshgrid covering data range (with 0.5 padding)
    #      Use step size 0.02 for smooth visualization
    #   2. Predict class for all meshgrid points
    #   3. Use contourf to show decision regions with alpha=0.3, cmap='RdBu_r'
    #   4. Scatter plot training data (same colors as Part A)
    #   5. Set labels, title, grid, legend, and equal aspect ratio
    
    return classifier, fig

In [None]:
# Example: Train Linear SVM on two moons dataset with different noise levels

# Example 1: Low noise - SVM struggles with non-linear boundary
print("Example 1: Linear SVM on low noise two moons")
X1, y1, mfig1 = generate_two_moons(n_samples=200, noise=0.05, show_plot=False)
plt.close(mfig1)
clf1, fig1 = classify_with_svm(X1, y1, show_plot=True)
print(f"Training accuracy: {clf1.score(X1, y1):.3f}")
print("Note: Linear boundary cannot perfectly separate the moons")
print()

# Example 2: Medium noise
print("Example 2: Linear SVM on medium noise two moons")
X2, y2, mfig2 = generate_two_moons(n_samples=200, noise=0.15, show_plot=False)
plt.close(mfig2)
clf2, fig2 = classify_with_svm(X2, y2, show_plot=True)
print(f"Training accuracy: {clf2.score(X2, y2):.3f}")
print()

# Example 3: High noise
print("Example 3: Linear SVM on high noise two moons")
X3, y3, mfig3 = generate_two_moons(n_samples=200, noise=0.25, show_plot=False)
plt.close(mfig3)
clf3, fig3 = classify_with_svm(X3, y3, show_plot=True)
print(f"Training accuracy: {clf3.score(X3, y3):.3f}")

In [None]:
grader.check("q1b")

## **Part C**: Neural Network Classification

Implement `classify_with_mlp(X, y, show_plot=False)` that trains a **Multi-Layer Perceptron (MLP)** classifier and visualizes the decision boundary.

### Background: Multi-Layer Perceptron

A **Multi-Layer Perceptron (MLP)** is a type of artificial neural network with multiple layers of neurons. Unlike linear SVMs, MLPs can learn **non-linear decision boundaries** by using:

1. **Hidden layers** with non-linear activation functions
2. **Multiple neurons** that can combine features in complex ways
3. **Backpropagation** to adjust weights and learn from data

**Architecture**:
- **Input layer**: Receives the features (2 in our case)
- **Hidden layer(s)**: Transforms features with non-linear activations
- **Output layer**: Produces class predictions

**Key Advantage**: MLPs can learn complex, non-linear decision boundaries that adapt to the data structure, making them ideal for datasets like the two moons.

### Your Task

This part is very similar to Part B, but uses an MLP instead of a linear SVM.

**Requirements:**

1. **Train the classifier**:
   - Use `sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000, random_state=rng_seed)`
     - `hidden_layer_sizes=(100, 50)`: Two hidden layers with 100 and 50 neurons
     - `max_iter=1000`: Maximum number of training iterations
   - Fit the model using `classifier.fit(X, y)`

2. **Create visualization meshgrid** (same as Part B):
   - Create meshgrid covering the data range with 0.5 padding and 0.02 step size
   - Predict on all meshgrid points
   - Reshape predictions to match meshgrid shape

3. **Plot configuration**:
   - Figure size: (10, 8)
   - Use `ax.contourf(xx, yy, Z, alpha=0.3, cmap='RdBu_r')` to show decision regions
   - Scatter plot training data with Class 0 in blue and Class 1 in red
   - X-axis label: "Feature 1"
   - Y-axis label: "Feature 2"
   - Title: "Neural Network (MLP) Decision Boundary"
   - Grid with alpha=0.3
   - Legend
   - Equal aspect ratio using `ax.set_aspect('equal')`
   - Only call `plt.show()` if `show_plot=True`

**Parameters:**
- `X`: numpy array of shape (n_samples, 2), feature matrix from Part A
- `y`: numpy array of shape (n_samples,), label vector from Part A
- `show_plot`: bool, default False. If True, display the plot

**Returns:**
- `classifier`: trained MLPClassifier object
- `fig`: matplotlib figure object (or None if show_plot=False)

In [None]:
def classify_with_mlp(X, y, show_plot=False):
    from sklearn.neural_network import MLPClassifier
    
    # Train the MLP classifier
    # Use: MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000, random_state=rng_seed)
    # Fit using: classifier.fit(X, y)
    
    #   1. Create meshgrid (same as Part B)
    #   2. Predict class for all meshgrid points
    #   3. Use contourf to show decision regions
    #   4. Scatter plot training data
    #   5. Set labels, title (use "Neural Network (MLP) Decision Boundary"), grid, legend

    if show_plot:    
        # Show plot if requested
        plt.show()

    return classifier, fig

In [None]:
# Example: Train Neural Network on two moons dataset and compare with SVM

# Example 1: Low noise - MLP learns non-linear boundary
print("Example 1: Neural Network on low noise two moons")
X1, y1, mfig = generate_two_moons(n_samples=200, noise=0.05, show_plot=False)
plt.close(mfig)
mlp1, fig1 = classify_with_mlp(X1, y1, show_plot=True)
print(f"Training accuracy: {mlp1.score(X1, y1):.3f}")
print("Note: Non-linear boundary fits the moon shapes much better!")
print()

# Example 2: Medium noise
print("Example 2: Neural Network on medium noise two moons")
X2, y2, mfig = generate_two_moons(n_samples=200, noise=0.15, show_plot=False)
plt.close(mfig)
mlp2, fig2 = classify_with_mlp(X2, y2, show_plot=True)
print(f"Training accuracy: {mlp2.score(X2, y2):.3f}")
print()

# Example 3: High noise
print("Example 3: Neural Network on high noise two moons")
X3, y3, mfig = generate_two_moons(n_samples=200, noise=0.25, show_plot=False)
plt.close(mfig)
mlp3, fig3 = classify_with_mlp(X3, y3, show_plot=True)
print(f"Training accuracy: {mlp3.score(X3, y3):.3f}")
print()

In [None]:
grader.check("q1c")

# **Question 2**: Handwritten Digit Classification with Neural Networks

In this question, you'll build a **multi-layer perceptron (MLP)** to classify handwritten digits from the classic **MNIST-style digits dataset**. This is a fundamental problem in machine learning and computer vision.

## Background: Handwritten Digit Recognition

**Handwritten digit recognition** is a multi-class classification problem where we want to identify which digit (0-9) is shown in an image. This task has practical applications in:
- Reading ZIP codes on mail
- Processing bank checks
- Digitizing handwritten documents
- Automatic form processing

## The Digits Dataset

The digits dataset contains:
- **1797 samples** of 8×8 pixel grayscale images of handwritten digits (0-9)
- Each image is represented as 64 features (flattened 8×8 pixels)
- 10 classes (digits 0 through 9)
- Pixel values range from 0 (white) to 16 (black)

Unlike the two moons dataset, this is a **multi-class classification** problem with real-world image data.

## **Part A**: Load and Split the Dataset

Implement `load_digits_dataset(train_fraction, show_samples=False)` that loads the handwritten digits dataset and splits it into training and validation sets.

**Requirements:**
- Use `sklearn.datasets.load_digits()` to load the dataset
- Access features as `X = digits.data` and labels as `y = digits.target`
- Use `sklearn.model_selection.train_test_split()` to split the data:
  - `train_size=train_fraction`
  - `random_state=rng_seed`
  - `stratify=y` (ensures balanced class distribution in both sets)
- If `show_samples=True`, create a figure showing 10 random samples from the training set:
  - Figure size: (12, 4)
  - Use `plt.subplot(2, 5, i+1)` for 2 rows and 5 columns
  - Display each image using `ax.imshow(image.reshape(8, 8), cmap='gray')`
  - Title each subplot with its label: `f'Label: {label}'`
  - Turn off axis: `ax.axis('off')`
  - Only call `plt.show()` if `show_samples=True`
- Return the figure object (or None if `show_samples=False`)

**Parameters:**
- `train_fraction`: float, fraction of data to use for training (rest is validation)
- `show_samples`: bool, default False. If True, display sample images

**Returns:**
- `X_train`: numpy array, training features
- `X_val`: numpy array, validation features
- `y_train`: numpy array, training labels
- `y_val`: numpy array, validation labels
- `fig`: matplotlib figure object (or None if show_samples=False)

In [None]:
def load_digits_dataset(train_fraction, show_samples=False):
    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split
    
    # Load the digits dataset
    # Access features as digits.data and labels as digits.target
    
    # Split into training and validation sets
    # Use train_test_split with train_size=train_fraction, random_state=rng_seed, stratify=y
    
    # If show_samples=True:
    #   - Select 10 random samples from training set
    #   - Create 2x5 subplot grid
    #   - Display each image (reshaped to 8x8) with gray colormap
    #   - Set title to show label
    #   - Turn off axes
    #   - Call plt.show()
    
    return X_train, X_val, y_train, y_val, fig

In [None]:
# Example: Load and visualize the digits dataset

print("Loading digits dataset with 75% training / 25% validation split")
X_train, X_val, y_train, y_val, fig = load_digits_dataset(train_fraction=0.75, show_samples=True)

print(f"\nDataset shapes:")
print(f"  Training: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"  Validation: X_val={X_val.shape}, y_val={y_val.shape}")
print(f"\nNumber of features: {X_train.shape[1]}")
print(f"Number of classes: {len(np.unique(y_train))}")
print(f"Classes: {np.unique(y_train)}")

In [None]:
grader.check("q2a")

## **Part B**: Train Neural Network and Visualize Learning

Implement `train_digit_classifier(X_train, y_train, X_val, y_val, show_plot=False)` that trains an MLP classifier and visualizes the training process.

### Background: Training Neural Networks

When training a neural network, it's important to monitor:
- **Training loss**: How well the model fits the training data (should decrease over time)
- **Validation loss**: How well the model generalizes to unseen data (should also decrease)
- **Overfitting**: When training loss decreases but validation loss increases, indicating the model memorizes rather than learns

### Your Task: Design Your Architecture

You'll need to experiment with the `MLPClassifier` parameters to achieve good performance. Here are the key hyperparameters to explore:

**Architecture Parameters:**
- `hidden_layer_sizes`: Tuple specifying number of neurons in each hidden layer
  - Example: `(100,)` = 1 layer with 100 neurons
  - Example: `(128, 64)` = 2 layers with 128 and 64 neurons
  - Try different sizes and depths!

**Training Parameters:**
- `max_iter`: Maximum number of training epochs (iterations)
  - More iterations = more training time
  - Try 100-500 for reasonable training time
- `alpha`: L2 regularization parameter (prevents overfitting)
  - Smaller = less regularization
  - Try values like 0.0001, 0.001, 0.01
- `learning_rate_init`: Initial learning rate
  - Controls how fast the model learns
  - Try values like 0.001, 0.01

**Other Important Parameters:**
- `activation`: Activation function for hidden layers
  - Options: `'relu'`, `'tanh'`, `'logistic'`
  - ReLU is often a good default
- `solver`: Optimization algorithm
  - Options: `'adam'`, `'sgd'`
  - Adam is usually recommended
- `random_state=rng_seed`: For reproducibility

**Goal**: Achieve **>95% validation accuracy** through experimentation!

**Requirements:**
1. **Train the classifier**:
   - Create `MLPClassifier` with your chosen parameters
   - Fit using `classifier.fit(X_train, y_train)`

2. **Extract training history**:
   - After training, `classifier.loss_curve_` contains the training loss at each epoch

3. **Plot configuration** (if `show_plot=True`):
   - Figure size: (10, 6)
   - Plot training loss curve: `ax.plot(epochs, training_losses, label='Training Loss', linewidth=2)`
   - X-axis label: "Epoch"
   - Y-axis label: "Loss"
   - Title: "Training Loss"
   - Grid with alpha=0.3
   - Legend
   - Only call `plt.show()` if `show_plot=True`

4. **Print accuracy** (if print_acc = True):
   - Print training accuracy: `classifier.score(X_train, y_train)`
   - Print validation accuracy: `classifier.score(X_val, y_val)`

**Parameters:**
- `X_train`: numpy array, training features from Part A
- `y_train`: numpy array, training labels from Part A
- `X_val`: numpy array, validation features from Part A
- `y_val`: numpy array, validation labels from Part A
- `show_plot`: bool, default False. If True, display the plot
- `print_acc`: whether to print train/validation accuracy

**Returns:**
- `classifier`: trained MLPClassifier object
- `fig`: matplotlib figure object (or None if show_plot=False)

**Note**: For simplicity, you can train once and plot just the training loss curve. Computing validation loss at every epoch requires iterative training with `warm_start=True`, which is optional.

In [None]:
def train_digit_classifier(X_train, y_train, X_val, y_val, show_plot=False, print_acc = False):
    from sklearn.neural_network import MLPClassifier
    
    # Create and train the MLP classifier
    # Experiment with parameters to achieve >95% validation accuracy:
    
    classifier = MLPClassifier(
        # Your parameters here
        random_state=rng_seed
    )
    
    classifier.fit(X_train, y_train)
    
    # Calculate and print accuracies
    
    # Create plot if show_plot=True:
    #   - Extract training loss from classifier.loss_curve_
    #   - Plot loss vs epoch
    #   - Add labels, title, grid, legend
    #   - fig = None if show_plot-False
    
    return classifier, fig

In [None]:
# Example: Train the digit classifier

print("Training neural network classifier on handwritten digits...")
print("="*60)

# Load the dataset
X_train, X_val, y_train, y_val, _ = load_digits_dataset(train_fraction=0.75, show_samples=False)

# Train the classifier
classifier, fig = train_digit_classifier(X_train, y_train, X_val, y_val, show_plot=True, print_acc=True)

In [None]:
# Example: Test the trained model on individual samples

print("\nTesting classifier on individual digit samples")
print("="*60)

# Select 12 random samples from validation set
np.random.seed(rng_seed + 1)
test_indices = np.random.choice(len(X_val), size=12, replace=False)

# Make predictions
predictions = classifier.predict(X_val[test_indices])
true_labels = y_val[test_indices]

# Visualize predictions
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
axes = axes.ravel()

for i, idx in enumerate(test_indices):
    image = X_val[idx].reshape(8, 8)
    pred = predictions[i]
    true = true_labels[i]
    
    axes[i].imshow(image, cmap='gray')
    
    # Color the title: green if correct, red if wrong
    color = 'green' if pred == true else 'red'
    axes[i].set_title(f'True: {true}, Pred: {pred}', color=color, fontweight='bold')
    axes[i].axis('off')

plt.tight_layout()
plt.show()

# Calculate accuracy on these samples
correct = np.sum(predictions == true_labels)
print(f"\nAccuracy on these {len(test_indices)} samples: {correct}/{len(test_indices)} = {correct/len(test_indices)*100:.1f}%")

In [None]:
grader.check("q2b")

## Required disclosure of use of AI technology

Please indicate whether you used AI to complete this homework. If you did, explain how you used it in the python cell below, as a comment.

In [None]:
"""
# write ai disclosure here:

"""

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

Upload the .zip file to Gradescope!

In [None]:
grader.export(pdf=False, force_save=True, run_tests=True)