# Lesson 1 Homework — Exploring Layers and Activations

Goals:
- Recreate the TensorFlow Playground experiments locally
- Observe how depth, width, activation, and learning rate affect results
- Summarize your findings concisely

## Setup
If packages are missing, uncomment the pip installs.

In [None]:
# !pip install -q tensorflow scikit-learn matplotlib
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras.layers import Dense
from sklearn.datasets import make_circles

X, y = make_circles(n_samples=1500, factor=0.5, noise=0.15, random_state=1)
X = X.astype('float32'); y = y.astype('float32')
plt.figure(figsize=(4.5,4))
plt.scatter(X[:,0], X[:,1], c=y, cmap='RdBu', s=10, edgecolor='k')
plt.title('Dataset (make_circles)')
plt.show()


## Helper: build and train model
Adjust layers, units, activation, optimizer, learning rate, and epochs.

In [None]:
def build_model(units=(8,8), activation='relu', lr=1e-3):
    model = keras.Sequential()
    model.add(Dense(units[0], activation=activation, input_shape=(2,)))
    for u in units[1:]:
        model.add(Dense(u, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    opt = keras.optimizers.Adam(learning_rate=lr)
    model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
    return model

def train_and_plot(model, X, y, epochs=20, title=''):
    h = model.fit(X, y, batch_size=32, epochs=epochs, validation_split=0.2, verbose=0)
    # loss curves
    fig, ax = plt.subplots(1,2, figsize=(9,3.8))
    ax[0].plot(h.history['loss'], label='train')
    ax[0].plot(h.history['val_loss'], label='val')
    ax[0].set_title('Loss'); ax[0].legend()
    ax[0].grid(True, alpha=0.3)
    ax[1].plot(h.history['accuracy'], label='train')
    ax[1].plot(h.history['val_accuracy'], label='val')
    ax[1].set_title('Accuracy'); ax[1].legend()
    ax[1].grid(True, alpha=0.3)
    fig.suptitle(title)
    plt.show()
    return h


## Task 1 — Layers and Units
Try 1 hidden layer vs 2 hidden layers; vary hidden units (e.g., 4, 8, 16).
- Record your validation accuracy and observations.

In [None]:
m1 = build_model(units=(8,), activation='relu', lr=1e-3)
h1 = train_and_plot(m1, X, y, epochs=20, title='1 layer, 8 units, ReLU')

m2 = build_model(units=(8,8), activation='relu', lr=1e-3)
h2 = train_and_plot(m2, X, y, epochs=20, title='2 layers, 8-8 units, ReLU')


## Task 2 — Activation Choices
Compare ReLU vs Tanh vs Sigmoid for hidden layers.
- Which converges faster? Any saturation issues?

In [None]:
for act in ['relu','tanh','sigmoid']:
    m = build_model(units=(16,16), activation=act, lr=1e-3)
    _ = train_and_plot(m, X, y, epochs=25, title=f'2 layers, 16-16, activation={act}')


## Task 3 — Learning Rate Sensitivity
Try a range of learning rates (e.g., 1e-4, 1e-3, 1e-2, 1e-1).
- What happens for very large values?

In [None]:
for lr in [1e-4, 1e-3, 1e-2, 1e-1]:
    m = build_model(units=(8,8), activation='relu', lr=lr)
    _ = train_and_plot(m, X, y, epochs=20, title=f'LR={lr}')


## Optional — Decision Regions
Visualize learned decision boundaries for your favorite model.

In [None]:
def plot_regions(model, X, y):
    xx, yy = np.meshgrid(
        np.linspace(X[:,0].min()-0.5, X[:,0].max()+0.5, 220),
        np.linspace(X[:,1].min()-0.5, X[:,1].max()+0.5, 220)
    )
    grid = np.c_[xx.ravel(), yy.ravel()].astype('float32')
    probs = model.predict(grid, verbose=0).reshape(xx.shape)
    plt.figure(figsize=(5,4))
    plt.contourf(xx, yy, probs, levels=20, cmap='RdBu', alpha=0.6)
    plt.scatter(X[:,0], X[:,1], c=y, cmap='RdBu', edgecolor='k', s=10)
    plt.title('Decision regions')
    plt.show()

# Example usage:
m_best = build_model(units=(16,16), activation='relu', lr=1e-3)
_ = m_best.fit(X, y, batch_size=32, epochs=25, validation_split=0.2, verbose=0)
plot_regions(m_best, X, y)


## Short Reflection
Answer in your own words (2–4 sentences each):
1. How does adding a second hidden layer change what the model can represent?
2. When did you observe over- or under-fitting? What signaled it?
3. Which activation worked best here, and why do you think that is?

## To-Do (stubs only — do not implement)
These are optional extensions meant for practice. Leave them as TODOs.

In [None]:
# TODO: Implement a manual NumPy forward pass for a 2-layer network
# (ReLU hidden, sigmoid output) on the same dataset.
# Compare predictions qualitatively to the Keras model.

In [None]:
# TODO: Write a function plot_regions_threshold(model, X, y, threshold=0.5)
# that visualizes decision regions for different probability thresholds.
# Try thresholds: 0.3, 0.5, 0.7 and note changes.

In [None]:
# TODO: Swap loss to mean-squared error (MSE) while keeping sigmoid output.
# Train briefly and record any differences in convergence/accuracy vs BCE.

In [None]:
# TODO: Add L2 weight decay (kernel_regularizer) or a Dropout layer
# and observe effects on training/validation curves.
# Keep other hyperparameters the same for a fair comparison.