# Neural Networks: Inside the Black Box
## A Complete Tutorial from Concept to Code

**Based on StatQuest Neural Networks Tutorial**

---

### Learning Objectives:
1. Understand what neural networks are and how they work
2. Learn about activation functions (Softplus, ReLU, Sigmoid)
3. Build intuition with real-world examples
4. Implement neural networks using libraries (TensorFlow/Keras)
5. Implement neural networks from scratch (NumPy)
6. Visualize neural network operations

---

## Slide 1: What is a Neural Network?

### Key Concept:
**Neural networks are NOT black boxes - they are "Big Fancy Squiggle Fitting Machines"!**

### Components:
1. **Nodes** (neurons)
2. **Connections** (synapses)
3. **Weights** (parameters that multiply inputs)
4. **Biases** (parameters that shift results)
5. **Activation Functions** (curved/bent lines that create non-linearity)

### Architecture:
- **Input Layer**: Where we feed data
- **Hidden Layer(s)**: Where transformations happen
- **Output Layer**: Where we get predictions

### The Magic:
Neural networks can fit complex patterns (squiggles) to data that simple straight lines cannot!

In [None]:
# Install required packages (uncomment if needed)
# !pip install numpy matplotlib tensorflow scikit-learn

# Import all necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyBboxPatch
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("✓ All libraries imported successfully!")

## Slide 2: Real-World Problem - Drug Dosage Effectiveness

### The Dataset:
We tested a drug at different dosages and measured effectiveness:

| Dosage Level | Normalized Dosage | Effective? |
|--------------|-------------------|------------|
| Low          | 0.0 - 0.3        | No (0)     |
| Medium       | 0.4 - 0.6        | Yes (1)    |
| High         | 0.7 - 1.0        | No (0)     |

### The Challenge:
❌ A straight line cannot fit this data accurately!  
✓ We need a **curved line (squiggle)** to make good predictions

### Goal:
Build a neural network that predicts whether a given dosage will be effective.

In [None]:
# Slide 2 Code: Create the drug dosage dataset

# Generate synthetic data based on the problem description
np.random.seed(42)

# Low dosage (0-0.3): Not effective
low_dosage = np.random.uniform(0.0, 0.3, 15)
low_effectiveness = np.random.uniform(0.0, 0.2, 15)  # Close to 0

# Medium dosage (0.4-0.6): Effective
medium_dosage = np.random.uniform(0.4, 0.6, 15)
medium_effectiveness = np.random.uniform(0.8, 1.0, 15)  # Close to 1

# High dosage (0.7-1.0): Not effective
high_dosage = np.random.uniform(0.7, 1.0, 15)
high_effectiveness = np.random.uniform(0.0, 0.2, 15)  # Close to 0

# Combine all data
X_data = np.concatenate([low_dosage, medium_dosage, high_dosage])
y_data = np.concatenate([low_effectiveness, medium_effectiveness, high_effectiveness])

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_data, c='blue', s=100, alpha=0.6, edgecolors='black', linewidth=1.5)
plt.xlabel('Dosage (normalized 0-1)', fontsize=12, fontweight='bold')
plt.ylabel('Effectiveness (0=No, 1=Yes)', fontsize=12, fontweight='bold')
plt.title('Drug Dosage Effectiveness Dataset\n(Notice: No straight line can fit this!)', 
          fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.ylim(-0.1, 1.1)
plt.axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='Decision Boundary')
plt.legend()
plt.tight_layout()
plt.show()

print(f"Dataset created:")
print(f"  - Total samples: {len(X_data)}")
print(f"  - Dosage range: [{X_data.min():.2f}, {X_data.max():.2f}]")
print(f"  - Effectiveness range: [{y_data.min():.2f}, {y_data.max():.2f}]")

## Slide 3: Activation Functions - The Building Blocks

### What are Activation Functions?
Activation functions are **curved or bent lines** that allow neural networks to learn complex patterns.

### Three Common Activation Functions:

#### 1. **Softplus** (used in our example)
- Formula: \\( f(x) = \\ln(1 + e^x) \\)
- Smooth, differentiable approximation of ReLU
- Never returns exactly 0

#### 2. **ReLU** (Rectified Linear Unit)
- Formula: \\( f(x) = \\max(0, x) \\)
- Most popular in practice
- Simple: returns 0 for negative inputs, x for positive inputs

#### 3. **Sigmoid**
- Formula: \\( f(x) = \\frac{1}{1 + e^{-x}} \\)
- S-shaped curve
- Outputs between 0 and 1
- Often used in output layers for binary classification

In [None]:
# Slide 3 Code: Implement and visualize activation functions

# Define activation functions
def softplus(x):
    """Softplus activation: ln(1 + e^x)"""
    return np.log(1 + np.exp(np.clip(x, -500, 500)))  # Clip to prevent overflow

def relu(x):
    """ReLU activation: max(0, x)"""
    return np.maximum(0, x)

def sigmoid(x):
    """Sigmoid activation: 1 / (1 + e^(-x))"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip to prevent overflow

# Create x values for plotting
x = np.linspace(-5, 5, 1000)

# Calculate activation function outputs
y_softplus = softplus(x)
y_relu = relu(x)
y_sigmoid = sigmoid(x)

# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Softplus
axes[0].plot(x, y_softplus, 'b-', linewidth=3, label='Softplus')
axes[0].grid(True, alpha=0.3)
axes[0].set_xlabel('Input (x)', fontweight='bold')
axes[0].set_ylabel('Output f(x)', fontweight='bold')
axes[0].set_title('Softplus Activation\nf(x) = ln(1 + e^x)', fontweight='bold')
axes[0].axhline(y=0, color='k', linestyle='-', linewidth=0.5)
axes[0].axvline(x=0, color='k', linestyle='-', linewidth=0.5)
axes[0].legend()

# ReLU
axes[1].plot(x, y_relu, 'r-', linewidth=3, label='ReLU')
axes[1].grid(True, alpha=0.3)
axes[1].set_xlabel('Input (x)', fontweight='bold')
axes[1].set_ylabel('Output f(x)', fontweight='bold')
axes[1].set_title('ReLU Activation\nf(x) = max(0, x)', fontweight='bold')
axes[1].axhline(y=0, color='k', linestyle='-', linewidth=0.5)
axes[1].axvline(x=0, color='k', linestyle='-', linewidth=0.5)
axes[1].legend()

# Sigmoid
axes[2].plot(x, y_sigmoid, 'g-', linewidth=3, label='Sigmoid')
axes[2].grid(True, alpha=0.3)
axes[2].set_xlabel('Input (x)', fontweight='bold')
axes[2].set_ylabel('Output f(x)', fontweight='bold')
axes[2].set_title('Sigmoid Activation\nf(x) = 1/(1 + e^(-x))', fontweight='bold')
axes[2].axhline(y=0, color='k', linestyle='-', linewidth=0.5)
axes[2].axvline(x=0, color='k', linestyle='-', linewidth=0.5)
axes[2].legend()

plt.tight_layout()
plt.show()

print("Key Properties:")
print("  Softplus: Smooth, always positive, approximates ReLU")
print("  ReLU: Simple, fast, most popular in practice")
print("  Sigmoid: S-shaped, outputs between 0 and 1")

## Slide 4: How Neural Networks Create Squiggles

### The Process (Step-by-Step):

1. **Start with identical activation functions** in the hidden layer

2. **Transform each activation function:**
   - **Slice**: Use only a portion of the curve (via weights and biases)
   - **Flip**: Negative weights flip the curve
   - **Stretch**: Multiply y-values to change amplitude
   - **Shift**: Add bias to move the curve up/down

3. **Add transformed curves together** to create new shapes

4. **Final adjustment** to fit the data

### The Neural Network from the Video:
```
Input (dosage)
    ↓
Hidden Layer:
  Node 1: dosage × (-34.4) + 2.14 → softplus() → × (-1.3)
  Node 2: dosage × (-2.52) + 1.29 → softplus() → × 2.28
    ↓
Add both nodes + (-0.58) → Output (effectiveness)
```

In [None]:
# Slide 4 Code: Demonstrate how neural networks create squiggles

# Parameters from the StatQuest video
# Hidden Layer Node 1
w1_hidden1 = -34.4  # weight from input to hidden node 1
b1_hidden1 = 2.14   # bias for hidden node 1
w2_hidden1 = -1.3   # weight from hidden node 1 to output

# Hidden Layer Node 2
w1_hidden2 = -2.52  # weight from input to hidden node 2
b1_hidden2 = 1.29   # bias for hidden node 2
w2_hidden2 = 2.28   # weight from hidden node 2 to output

# Output bias
b_output = -0.58

# Create dosage range for visualization
dosage_range = np.linspace(0, 1, 1000)

# STEP 1: Calculate hidden layer node 1 outputs
z1 = dosage_range * w1_hidden1 + b1_hidden1  # Linear transformation
a1_before_scale = softplus(z1)                 # Apply activation
a1 = a1_before_scale * w2_hidden1             # Scale by output weight

# STEP 2: Calculate hidden layer node 2 outputs
z2 = dosage_range * w1_hidden2 + b1_hidden2  # Linear transformation
a2_before_scale = softplus(z2)                 # Apply activation
a2 = a2_before_scale * w2_hidden2             # Scale by output weight

# STEP 3: Combine both nodes
combined = a1 + a2

# STEP 4: Add output bias to get final prediction
final_output = combined + b_output

# Visualize the transformation process
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# Row 1: Node 1 transformation
axes[0, 0].plot(dosage_range, a1_before_scale, 'b-', linewidth=2)
axes[0, 0].set_title('Node 1: After Softplus\n(before scaling)', fontweight='bold')
axes[0, 0].set_xlabel('Dosage')
axes[0, 0].set_ylabel('Activation')
axes[0, 0].grid(True, alpha=0.3)

axes[0, 1].plot(dosage_range, a1, 'b-', linewidth=2)
axes[0, 1].set_title(f'Node 1: After Scaling\n(× {w2_hidden1})', fontweight='bold', color='blue')
axes[0, 1].set_xlabel('Dosage')
axes[0, 1].set_ylabel('Scaled Activation')
axes[0, 1].grid(True, alpha=0.3)

axes[0, 2].plot(dosage_range, a2_before_scale, 'orange', linewidth=2)
axes[0, 2].set_title('Node 2: After Softplus\n(before scaling)', fontweight='bold')
axes[0, 2].set_xlabel('Dosage')
axes[0, 2].set_ylabel('Activation')
axes[0, 2].grid(True, alpha=0.3)

# Row 2: Node 2 transformation and combination
axes[1, 0].plot(dosage_range, a2, 'orange', linewidth=2)
axes[1, 0].set_title(f'Node 2: After Scaling\n(× {w2_hidden2})', fontweight='bold', color='orange')
axes[1, 0].set_xlabel('Dosage')
axes[1, 0].set_ylabel('Scaled Activation')
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].plot(dosage_range, a1, 'b-', linewidth=2, label='Node 1', alpha=0.6)
axes[1, 1].plot(dosage_range, a2, 'orange', linewidth=2, label='Node 2', alpha=0.6)
axes[1, 1].plot(dosage_range, combined, 'purple', linewidth=3, label='Combined')
axes[1, 1].set_title('Adding Both Nodes', fontweight='bold')
axes[1, 1].set_xlabel('Dosage')
axes[1, 1].set_ylabel('Output')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

axes[1, 2].scatter(X_data, y_data, c='gray', s=50, alpha=0.5, label='Data', zorder=1)
axes[1, 2].plot(dosage_range, final_output, 'green', linewidth=4, label='Final Squiggle', zorder=2)
axes[1, 2].set_title('Final Output: The Green Squiggle!', fontweight='bold', color='green', fontsize=12)
axes[1, 2].set_xlabel('Dosage')
axes[1, 2].set_ylabel('Effectiveness')
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)
axes[1, 2].set_ylim(-0.1, 1.1)

plt.tight_layout()
plt.show()

print("\n🎉 This is how neural networks create squiggles from curved activation functions!")
print("\nKey Insight:")
print("  - Each node transforms the activation function differently")
print("  - When added together, they create complex patterns")
print("  - This allows fitting to non-linear data!")

## Slide 5: Making Predictions with Neural Networks

### Using the Trained Network:

Once we have the trained weights and biases, making predictions is straightforward!

### Example Prediction:
**Question:** Will a dosage of 0.5 be effective?

**Step-by-step calculation:**
1. Node 1: `0.5 × (-34.4) + 2.14 = -15.06` → `softplus(-15.06) = 0.00` → `0.00 × (-1.3) = 0.00`
2. Node 2: `0.5 × (-2.52) + 1.29 = 0.03` → `softplus(0.03) = 0.74` → `0.74 × 2.28 = 1.69`
3. Output: `0.00 + 1.69 + (-0.58) = 1.11` → **Effective!** (closer to 1 than 0)

### Decision Rule:
- Output > 0.5 → **Effective**
- Output ≤ 0.5 → **Not Effective**

In [None]:
# Slide 5 Code: Make predictions with the neural network

def predict_effectiveness(dosage):
    """
    Predict drug effectiveness for a given dosage using our neural network.
    
    Parameters:
    -----------
    dosage : float
        Drug dosage (normalized between 0 and 1)
    
    Returns:
    --------
    prediction : float
        Predicted effectiveness (0 = not effective, 1 = effective)
    """
    # Hidden layer node 1
    z1 = dosage * w1_hidden1 + b1_hidden1
    a1 = softplus(z1) * w2_hidden1
    
    # Hidden layer node 2
    z2 = dosage * w1_hidden2 + b1_hidden2
    a2 = softplus(z2) * w2_hidden2
    
    # Output
    output = a1 + a2 + b_output
    
    return output

# Test predictions at different dosages
test_dosages = [0.1, 0.3, 0.5, 0.7, 0.9]

print("\n" + "="*60)
print(" "*15 + "PREDICTION RESULTS")
print("="*60)
print(f"{'Dosage':<15} {'Prediction':<15} {'Decision':<20}")
print("-"*60)

for dosage in test_dosages:
    pred = predict_effectiveness(dosage)
    decision = "✓ EFFECTIVE" if pred > 0.5 else "✗ NOT EFFECTIVE"
    print(f"{dosage:<15.2f} {pred:<15.3f} {decision:<20}")

print("="*60)

# Visualize predictions
fig, ax = plt.subplots(figsize=(12, 6))

# Plot the neural network prediction curve
dosage_fine = np.linspace(0, 1, 1000)
predictions = [predict_effectiveness(d) for d in dosage_fine]

ax.plot(dosage_fine, predictions, 'green', linewidth=4, label='NN Prediction', zorder=3)
ax.scatter(X_data, y_data, c='lightblue', s=100, alpha=0.6, 
           edgecolors='black', linewidth=1, label='Training Data', zorder=2)

# Mark test predictions
test_predictions = [predict_effectiveness(d) for d in test_dosages]
ax.scatter(test_dosages, test_predictions, c='red', s=200, marker='*', 
           edgecolors='black', linewidth=2, label='Test Predictions', zorder=4)

# Add decision boundary
ax.axhline(y=0.5, color='red', linestyle='--', linewidth=2, alpha=0.5, label='Decision Boundary')
ax.fill_between(dosage_fine, 0.5, 1.1, alpha=0.1, color='green', label='Effective Zone')
ax.fill_between(dosage_fine, -0.1, 0.5, alpha=0.1, color='red', label='Not Effective Zone')

ax.set_xlabel('Dosage (normalized)', fontsize=12, fontweight='bold')
ax.set_ylabel('Predicted Effectiveness', fontsize=12, fontweight='bold')
ax.set_title('Neural Network Predictions for Drug Effectiveness', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.1, 1.2)
ax.set_xlim(-0.05, 1.05)

plt.tight_layout()
plt.show()

print("\n✓ Neural network successfully predicts drug effectiveness!")

## Slide 6: Implementation with Libraries (TensorFlow/Keras)

### Why Use Libraries?
- **Faster development**: Pre-built optimized functions
- **GPU acceleration**: Automatic GPU utilization
- **Automatic differentiation**: No manual gradient calculation
- **Production-ready**: Battle-tested code

### Our Network in Keras:
```python
model = Sequential([
    Dense(2, activation='softplus', input_shape=(1,)),  # Hidden layer with 2 nodes
    Dense(1)                                             # Output layer
])
```

### Training Process:
1. Define the model architecture
2. Compile with loss function and optimizer
3. Fit to training data
4. Evaluate and predict

In [None]:
# Slide 6 Code: Build neural network with TensorFlow/Keras

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Set random seeds for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

# Prepare data
X_train = X_data.reshape(-1, 1)  # Reshape for Keras (samples, features)
y_train = y_data.reshape(-1, 1)  # Reshape for Keras (samples, outputs)

print("Building Neural Network with Keras...\n")

# Define the model
model = Sequential([
    Dense(2, activation='softplus', input_shape=(1,), name='hidden_layer'),
    Dense(1, activation='linear', name='output_layer')
], name='Drug_Effectiveness_NN')

# Display model architecture
print("Model Architecture:")
model.summary()

# Compile the model
model.compile(
    optimizer=Adam(learning_rate=0.01),
    loss='mean_squared_error',
    metrics=['mae']  # Mean Absolute Error
)

print("\nTraining the neural network...")

# Train the model
history = model.fit(
    X_train, 
    y_train,
    epochs=500,
    batch_size=5,
    verbose=0,  # Silent training
    validation_split=0.2
)

print("✓ Training complete!\n")

# Evaluate the model
loss, mae = model.evaluate(X_train, y_train, verbose=0)
print(f"Final Training Loss: {loss:.4f}")
print(f"Final Mean Absolute Error: {mae:.4f}")

# Make predictions
X_test_range = np.linspace(0, 1, 1000).reshape(-1, 1)
predictions_keras = model.predict(X_test_range, verbose=0)

# Visualize results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot 1: Training history
ax1.plot(history.history['loss'], label='Training Loss', linewidth=2)
ax1.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
ax1.set_xlabel('Epoch', fontweight='bold')
ax1.set_ylabel('Loss (MSE)', fontweight='bold')
ax1.set_title('Training History', fontweight='bold', fontsize=12)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Predictions
ax2.scatter(X_data, y_data, c='blue', s=100, alpha=0.6, 
            edgecolors='black', linewidth=1.5, label='Training Data', zorder=2)
ax2.plot(X_test_range, predictions_keras, 'green', linewidth=3, 
         label='Keras NN Prediction', zorder=3)
ax2.axhline(y=0.5, color='red', linestyle='--', linewidth=1.5, alpha=0.5)
ax2.set_xlabel('Dosage', fontweight='bold')
ax2.set_ylabel('Effectiveness', fontweight='bold')
ax2.set_title('Keras Neural Network Fit', fontweight='bold', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim(-0.1, 1.1)

plt.tight_layout()
plt.show()

print("\n🎉 Successfully trained neural network using Keras!")

## Slide 7: Implementation from Scratch (NumPy)

### Why Build from Scratch?
- **Deep understanding**: Learn exactly how neural networks work
- **Debugging skills**: Better at troubleshooting issues
- **Customization**: Complete control over every aspect
- **Interview prep**: Common technical interview topic

### Key Components to Implement:
1. **Forward Propagation**: Pass data through the network
2. **Loss Calculation**: Measure prediction error
3. **Backward Propagation**: Calculate gradients
4. **Weight Update**: Adjust parameters to reduce error

### Mathematical Foundation:
- **Forward pass**: \\( \\hat{y} = f(W_2 \\cdot \\sigma(W_1 \\cdot x + b_1) + b_2) \\)
- **Loss**: \\( L = \\frac{1}{n} \\sum (y - \\hat{y})^2 \\)
- **Backpropagation**: Chain rule to compute \\( \\frac{\\partial L}{\\partial W} \\) and \\( \\frac{\\partial L}{\\partial b} \\)

In [None]:
# Slide 7 Code: Neural Network from Scratch using NumPy

class NeuralNetworkFromScratch:
    """
    A simple neural network with:
    - 1 input node
    - 2 hidden nodes (softplus activation)
    - 1 output node (linear activation)
    """
    
    def __init__(self, learning_rate=0.01):
        """Initialize the neural network with random weights and biases."""
        self.lr = learning_rate
        
        # Initialize weights and biases with small random values
        np.random.seed(42)
        self.W1 = np.random.randn(1, 2) * 0.5  # Input to hidden (1x2)
        self.b1 = np.random.randn(1, 2) * 0.5  # Hidden layer bias (1x2)
        self.W2 = np.random.randn(2, 1) * 0.5  # Hidden to output (2x1)
        self.b2 = np.random.randn(1, 1) * 0.5  # Output bias (1x1)
        
        # Store training history
        self.loss_history = []
    
    def softplus(self, x):
        """Softplus activation function."""
        return np.log(1 + np.exp(np.clip(x, -500, 500)))
    
    def softplus_derivative(self, x):
        """Derivative of softplus (sigmoid function)."""
        return 1 / (1 + np.exp(-np.clip(x, -500, 500)))
    
    def forward(self, X):
        """
        Forward propagation.
        
        Parameters:
        -----------
        X : numpy array, shape (n_samples, 1)
            Input data
        
        Returns:
        --------
        output : numpy array, shape (n_samples, 1)
            Network predictions
        """
        # Hidden layer
        self.z1 = np.dot(X, self.W1) + self.b1  # Linear combination
        self.a1 = self.softplus(self.z1)        # Activation
        
        # Output layer
        self.z2 = np.dot(self.a1, self.W2) + self.b2  # Linear combination
        self.output = self.z2                          # No activation (linear)
        
        return self.output
    
    def backward(self, X, y):
        """
        Backward propagation (compute gradients).
        
        Parameters:
        -----------
        X : numpy array, shape (n_samples, 1)
            Input data
        y : numpy array, shape (n_samples, 1)
            True labels
        """
        n_samples = X.shape[0]
        
        # Output layer gradients
        d_loss = 2 * (self.output - y) / n_samples  # Derivative of MSE loss
        d_W2 = np.dot(self.a1.T, d_loss)             # Gradient for W2
        d_b2 = np.sum(d_loss, axis=0, keepdims=True) # Gradient for b2
        
        # Hidden layer gradients
        d_a1 = np.dot(d_loss, self.W2.T)                        # Backprop to hidden layer
        d_z1 = d_a1 * self.softplus_derivative(self.z1)        # Apply activation derivative
        d_W1 = np.dot(X.T, d_z1)                                 # Gradient for W1
        d_b1 = np.sum(d_z1, axis=0, keepdims=True)              # Gradient for b1
        
        # Update weights and biases using gradient descent
        self.W2 -= self.lr * d_W2
        self.b2 -= self.lr * d_b2
        self.W1 -= self.lr * d_W1
        self.b1 -= self.lr * d_b1
    
    def train(self, X, y, epochs=1000, verbose=True):
        """
        Train the neural network.
        
        Parameters:
        -----------
        X : numpy array, shape (n_samples, 1)
            Training data
        y : numpy array, shape (n_samples, 1)
            True labels
        epochs : int
            Number of training iterations
        verbose : bool
            Print training progress
        """
        for epoch in range(epochs):
            # Forward propagation
            predictions = self.forward(X)
            
            # Calculate loss (Mean Squared Error)
            loss = np.mean((predictions - y) ** 2)
            self.loss_history.append(loss)
            
            # Backward propagation
            self.backward(X, y)
            
            # Print progress
            if verbose and (epoch % 100 == 0 or epoch == epochs - 1):
                print(f"Epoch {epoch:4d}/{epochs}: Loss = {loss:.6f}")
    
    def predict(self, X):
        """Make predictions on new data."""
        return self.forward(X)

print("="*60)
print(" "*15 + "TRAINING FROM SCRATCH")
print("="*60)

# Create and train the neural network
nn_scratch = NeuralNetworkFromScratch(learning_rate=0.05)

# Prepare data
X_train_scratch = X_data.reshape(-1, 1)
y_train_scratch = y_data.reshape(-1, 1)

# Train the network
nn_scratch.train(X_train_scratch, y_train_scratch, epochs=1000, verbose=True)

print("\n✓ Training complete!")
print("\nFinal Parameters:")
print(f"  W1 (input to hidden): \n{nn_scratch.W1}")
print(f"  b1 (hidden bias): \n{nn_scratch.b1}")
print(f"  W2 (hidden to output): \n{nn_scratch.W2}")
print(f"  b2 (output bias): \n{nn_scratch.b2}")

In [None]:
# Visualize the from-scratch implementation results

# Generate predictions
X_test_scratch = np.linspace(0, 1, 1000).reshape(-1, 1)
predictions_scratch = nn_scratch.predict(X_test_scratch)

# Create comprehensive visualization
fig = plt.figure(figsize=(16, 10))
gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)

# Plot 1: Loss curve
ax1 = fig.add_subplot(gs[0, 0])
ax1.plot(nn_scratch.loss_history, 'purple', linewidth=2)
ax1.set_xlabel('Epoch', fontweight='bold')
ax1.set_ylabel('Loss (MSE)', fontweight='bold')
ax1.set_title('Training Loss Over Time', fontweight='bold', fontsize=12)
ax1.grid(True, alpha=0.3)
ax1.set_yscale('log')

# Plot 2: Predictions vs True Data
ax2 = fig.add_subplot(gs[0, 1])
ax2.scatter(X_data, y_data, c='blue', s=100, alpha=0.6, 
            edgecolors='black', linewidth=1.5, label='True Data', zorder=2)
ax2.plot(X_test_scratch, predictions_scratch, 'red', linewidth=3, 
         label='From-Scratch Prediction', zorder=3)
ax2.axhline(y=0.5, color='gray', linestyle='--', linewidth=1.5, alpha=0.5)
ax2.set_xlabel('Dosage', fontweight='bold')
ax2.set_ylabel('Effectiveness', fontweight='bold')
ax2.set_title('Neural Network Fit (From Scratch)', fontweight='bold', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim(-0.1, 1.1)

# Plot 3: Hidden Layer Activations
ax3 = fig.add_subplot(gs[1, 0])
# Forward pass to get activations
_ = nn_scratch.forward(X_test_scratch)
ax3.plot(X_test_scratch, nn_scratch.a1[:, 0], 'blue', linewidth=2, label='Hidden Node 1', alpha=0.7)
ax3.plot(X_test_scratch, nn_scratch.a1[:, 1], 'orange', linewidth=2, label='Hidden Node 2', alpha=0.7)
ax3.set_xlabel('Dosage', fontweight='bold')
ax3.set_ylabel('Activation Value', fontweight='bold')
ax3.set_title('Hidden Layer Activations', fontweight='bold', fontsize=12)
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Comparison with manual calculation
ax4 = fig.add_subplot(gs[1, 1])
# Use the original StatQuest parameters for comparison
manual_predictions = np.array([predict_effectiveness(x[0]) for x in X_test_scratch])
ax4.plot(X_test_scratch, manual_predictions, 'green', linewidth=3, 
         label='StatQuest Manual', alpha=0.7, linestyle='--')
ax4.plot(X_test_scratch, predictions_scratch, 'red', linewidth=2, 
         label='From-Scratch Trained', alpha=0.7)
ax4.scatter(X_data, y_data, c='blue', s=50, alpha=0.4, 
            edgecolors='black', linewidth=1, label='Data', zorder=1)
ax4.set_xlabel('Dosage', fontweight='bold')
ax4.set_ylabel('Effectiveness', fontweight='bold')
ax4.set_title('Comparison: Manual vs Trained', fontweight='bold', fontsize=12)
ax4.legend()
ax4.grid(True, alpha=0.3)
ax4.set_ylim(-0.1, 1.2)

plt.suptitle('From-Scratch Neural Network: Complete Analysis', 
             fontsize=16, fontweight='bold', y=0.995)
plt.show()

print("\n🎉 Successfully implemented and trained neural network from scratch!")
print("\nKey Achievement:")
print("  - Implemented forward propagation")
print("  - Implemented backward propagation (backprop)")
print("  - Trained using gradient descent")
print("  - Achieved good fit to the data!")

## Slide 8: Understanding Backpropagation

### What is Backpropagation?
**Backpropagation** is the algorithm used to train neural networks by computing gradients efficiently.

### The Process:
1. **Forward Pass**: Calculate predictions
2. **Calculate Loss**: Measure error between predictions and true values
3. **Backward Pass**: Calculate gradients using the chain rule
4. **Update Weights**: Adjust parameters to reduce loss

### Mathematical Foundation:
For each weight \\(w\\), we want to know: "How does changing \\(w\\) affect the loss?"

This is the partial derivative: \\( \\frac{\\partial L}{\\partial w} \\)

**Chain Rule** allows us to compute this by breaking it into steps:
\\[ \\frac{\\partial L}{\\partial w} = \\frac{\\partial L}{\\partial \\hat{y}} \\times \\frac{\\partial \\hat{y}}{\\partial a} \\times \\frac{\\partial a}{\\partial z} \\times \\frac{\\partial z}{\\partial w} \\]

### Gradient Descent Update:
\\[ w_{\\text{new}} = w_{\\text{old}} - \\alpha \\frac{\\partial L}{\\partial w} \\]

where \\(\\alpha\\) is the learning rate.

In [None]:
# Slide 8 Code: Visualize backpropagation process

def visualize_gradient_descent():
    """
    Visualize how gradient descent updates weights to minimize loss.
    """
    # Simple 1D example: minimize (w - 3)^2
    def loss_function(w):
        return (w - 3) ** 2
    
    def gradient(w):
        return 2 * (w - 3)
    
    # Initialize weight and learning rate
    w = 0.0
    learning_rate = 0.1
    
    # Track optimization path
    w_history = [w]
    loss_history = [loss_function(w)]
    
    # Gradient descent iterations
    for i in range(20):
        grad = gradient(w)
        w = w - learning_rate * grad
        w_history.append(w)
        loss_history.append(loss_function(w))
    
    # Create visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot 1: Loss surface with gradient descent path
    w_range = np.linspace(-1, 5, 300)
    loss_range = loss_function(w_range)
    
    ax1.plot(w_range, loss_range, 'b-', linewidth=2, label='Loss Function')
    ax1.plot(w_history, loss_history, 'ro-', markersize=8, linewidth=2, 
             label='Gradient Descent Path', alpha=0.7)
    ax1.plot(w_history[0], loss_history[0], 'go', markersize=15, 
             label='Start', zorder=5)
    ax1.plot(w_history[-1], loss_history[-1], 'r*', markersize=20, 
             label='End (Optimum)', zorder=5)
    
    # Add arrows to show direction
    for i in range(0, len(w_history)-1, 2):
        ax1.annotate('', xy=(w_history[i+1], loss_history[i+1]), 
                     xytext=(w_history[i], loss_history[i]),
                     arrowprops=dict(arrowstyle='->', color='red', lw=1.5, alpha=0.6))
    
    ax1.set_xlabel('Weight (w)', fontweight='bold', fontsize=12)
    ax1.set_ylabel('Loss', fontweight='bold', fontsize=12)
    ax1.set_title('Gradient Descent Optimization Path', fontweight='bold', fontsize=13)
    ax1.legend(loc='upper right')
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Loss over iterations
    ax2.plot(loss_history, 'purple', linewidth=3, marker='o', markersize=6)
    ax2.set_xlabel('Iteration', fontweight='bold', fontsize=12)
    ax2.set_ylabel('Loss', fontweight='bold', fontsize=12)
    ax2.set_title('Loss Reduction Over Iterations', fontweight='bold', fontsize=13)
    ax2.grid(True, alpha=0.3)
    ax2.set_yscale('log')
    
    plt.tight_layout()
    plt.show()
    
    print("\nGradient Descent Summary:")
    print(f"  Starting weight: {w_history[0]:.4f}")
    print(f"  Final weight: {w_history[-1]:.4f}")
    print(f"  Starting loss: {loss_history[0]:.4f}")
    print(f"  Final loss: {loss_history[-1]:.6f}")
    print(f"  Iterations: {len(w_history) - 1}")

# Visualize gradient descent
visualize_gradient_descent()

print("\n" + "="*60)
print("Key Insights:")
print("="*60)
print("1. Gradients tell us the direction to move weights")
print("2. Learning rate controls the step size")
print("3. We iteratively move towards the minimum loss")
print("4. This process is called 'training' the neural network")
print("="*60)

## Slide 9: Comparing Implementations

### Three Approaches:

| Aspect | Manual (StatQuest) | Library (Keras) | From Scratch (NumPy) |
|--------|-------------------|-----------------|----------------------|
| **Difficulty** | Easy (given params) | Very Easy | Hard |
| **Flexibility** | None | Medium | Complete |
| **Speed** | Fast | Very Fast (GPU) | Slow |
| **Learning Value** | High (concepts) | Low | Very High |
| **Production Use** | No | Yes | No |
| **Understanding** | Conceptual | Black box | Deep |

### When to Use Each:
- **Manual**: Learning concepts, understanding mechanics
- **Library**: Production systems, rapid prototyping, research
- **From Scratch**: Education, interviews, custom algorithms

### Best Practice:
1. Learn from scratch first (understand fundamentals)
2. Use libraries for real applications (efficiency)
3. Keep manual calculations for debugging (verification)

In [None]:
# Slide 9 Code: Compare all three implementations

# Generate test data
X_compare = np.linspace(0, 1, 500).reshape(-1, 1)

# Get predictions from all three methods
pred_manual = np.array([predict_effectiveness(x[0]) for x in X_compare])
pred_keras = model.predict(X_compare, verbose=0).flatten()
pred_scratch = nn_scratch.predict(X_compare).flatten()

# Create comprehensive comparison visualization
fig = plt.figure(figsize=(16, 10))
gs = fig.add_gridspec(2, 2, hspace=0.3, wspace=0.3)

# Plot 1: All predictions together
ax1 = fig.add_subplot(gs[0, :])
ax1.scatter(X_data, y_data, c='lightblue', s=100, alpha=0.6, 
            edgecolors='black', linewidth=1.5, label='Training Data', zorder=1)
ax1.plot(X_compare, pred_manual, 'green', linewidth=3, 
         label='Manual (StatQuest)', linestyle='--', alpha=0.8)
ax1.plot(X_compare, pred_keras, 'red', linewidth=2.5, 
         label='Keras (Library)', alpha=0.7)
ax1.plot(X_compare, pred_scratch, 'purple', linewidth=2, 
         label='NumPy (From Scratch)', alpha=0.7, linestyle=':')
ax1.axhline(y=0.5, color='gray', linestyle='--', linewidth=1.5, alpha=0.5)
ax1.set_xlabel('Dosage', fontweight='bold', fontsize=12)
ax1.set_ylabel('Effectiveness', fontweight='bold', fontsize=12)
ax1.set_title('Comparison of All Three Implementations', fontweight='bold', fontsize=14)
ax1.legend(loc='upper right', fontsize=11)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(-0.1, 1.2)

# Plot 2: Prediction differences from manual
ax2 = fig.add_subplot(gs[1, 0])
diff_keras = np.abs(pred_keras - pred_manual)
diff_scratch = np.abs(pred_scratch - pred_manual)
ax2.plot(X_compare, diff_keras, 'red', linewidth=2, label='Keras vs Manual')
ax2.plot(X_compare, diff_scratch, 'purple', linewidth=2, label='Scratch vs Manual')
ax2.set_xlabel('Dosage', fontweight='bold')
ax2.set_ylabel('Absolute Difference', fontweight='bold')
ax2.set_title('Prediction Differences from Manual Method', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Performance metrics
ax3 = fig.add_subplot(gs[1, 1])

# Calculate MSE for each method on training data
X_train_flat = X_data.reshape(-1, 1)
pred_manual_train = np.array([predict_effectiveness(x[0]) for x in X_train_flat])
pred_keras_train = model.predict(X_train_flat, verbose=0).flatten()
pred_scratch_train = nn_scratch.predict(X_train_flat).flatten()

mse_manual = np.mean((pred_manual_train - y_data) ** 2)
mse_keras = np.mean((pred_keras_train - y_data) ** 2)
mse_scratch = np.mean((pred_scratch_train - y_data) ** 2)

methods = ['Manual\n(StatQuest)', 'Keras\n(Library)', 'NumPy\n(Scratch)']
mse_values = [mse_manual, mse_keras, mse_scratch]
colors = ['green', 'red', 'purple']

bars = ax3.bar(methods, mse_values, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
ax3.set_ylabel('Mean Squared Error', fontweight='bold', fontsize=11)
ax3.set_title('Training Performance Comparison', fontweight='bold', fontsize=12)
ax3.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar, value in zip(bars, mse_values):
    height = bar.get_height()
    ax3.text(bar.get_x() + bar.get_width()/2., height,
             f'{value:.4f}',
             ha='center', va='bottom', fontweight='bold')

plt.suptitle('Neural Network Implementation Comparison', 
             fontsize=16, fontweight='bold', y=0.995)
plt.show()

# Print detailed comparison
print("\n" + "="*70)
print(" "*20 + "IMPLEMENTATION COMPARISON")
print("="*70)
print(f"{'Method':<20} {'MSE':<15} {'Max Error':<15} {'Mean Error':<15}")
print("-"*70)

max_err_manual = np.max(np.abs(pred_manual_train - y_data))
max_err_keras = np.max(np.abs(pred_keras_train - y_data))
max_err_scratch = np.max(np.abs(pred_scratch_train - y_data))

mean_err_manual = np.mean(np.abs(pred_manual_train - y_data))
mean_err_keras = np.mean(np.abs(pred_keras_train - y_data))
mean_err_scratch = np.mean(np.abs(pred_scratch_train - y_data))

print(f"{'Manual (StatQuest)':<20} {mse_manual:<15.6f} {max_err_manual:<15.6f} {mean_err_manual:<15.6f}")
print(f"{'Keras (Library)':<20} {mse_keras:<15.6f} {max_err_keras:<15.6f} {mean_err_keras:<15.6f}")
print(f"{'NumPy (Scratch)':<20} {mse_scratch:<15.6f} {max_err_scratch:<15.6f} {mean_err_scratch:<15.6f}")
print("="*70)

print("\n✓ All three methods successfully model the drug effectiveness!")

## Slide 10: Key Takeaways and Next Steps

### What We Learned:

#### 1. **Neural Networks are Squiggle Fitting Machines**
   - They transform simple curves into complex patterns
   - Multiple nodes work together to fit non-linear data

#### 2. **Key Components**
   - **Nodes**: Process information
   - **Weights**: Control strength of connections
   - **Biases**: Shift outputs
   - **Activation Functions**: Introduce non-linearity

#### 3. **Training Process**
   - Forward propagation: Make predictions
   - Calculate loss: Measure error
   - Backpropagation: Calculate gradients
   - Update weights: Improve predictions

#### 4. **Implementation Approaches**
   - Libraries (Keras): Fast, production-ready
   - From scratch (NumPy): Deep understanding
   - Manual calculation: Conceptual learning

### Next Steps:
1. **Deep Learning**: Networks with many hidden layers
2. **Convolutional Neural Networks (CNNs)**: For images
3. **Recurrent Neural Networks (RNNs)**: For sequences
4. **Advanced optimization**: Adam, RMSprop, etc.
5. **Regularization**: Preventing overfitting

### Resources:
- StatQuest YouTube Channel
- Deep Learning Specialization (Coursera)
- Neural Networks and Deep Learning (Nielsen)
- TensorFlow and PyTorch documentation

In [None]:
# Slide 10 Code: Final summary visualization

# Create a comprehensive final summary
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 2, hspace=0.4, wspace=0.3)

# Plot 1: Neural Network Architecture Diagram
ax1 = fig.add_subplot(gs[0, :])
ax1.text(0.5, 0.9, 'Neural Network Architecture', 
         ha='center', va='top', fontsize=16, fontweight='bold')

# Draw network structure
# Input node
circle1 = plt.Circle((0.2, 0.5), 0.08, color='lightblue', ec='black', linewidth=2, zorder=3)
ax1.add_patch(circle1)
ax1.text(0.2, 0.5, 'Input\n(Dosage)', ha='center', va='center', fontsize=9, fontweight='bold')

# Hidden layer nodes
circle2 = plt.Circle((0.5, 0.65), 0.08, color='lightgreen', ec='black', linewidth=2, zorder=3)
circle3 = plt.Circle((0.5, 0.35), 0.08, color='lightgreen', ec='black', linewidth=2, zorder=3)
ax1.add_patch(circle2)
ax1.add_patch(circle3)
ax1.text(0.5, 0.65, 'Hidden\nNode 1', ha='center', va='center', fontsize=8, fontweight='bold')
ax1.text(0.5, 0.35, 'Hidden\nNode 2', ha='center', va='center', fontsize=8, fontweight='bold')

# Output node
circle4 = plt.Circle((0.8, 0.5), 0.08, color='lightcoral', ec='black', linewidth=2, zorder=3)
ax1.add_patch(circle4)
ax1.text(0.8, 0.5, 'Output\n(Effect.)', ha='center', va='center', fontsize=8, fontweight='bold')

# Draw connections
connections = [
    ((0.2, 0.5), (0.5, 0.65)),
    ((0.2, 0.5), (0.5, 0.35)),
    ((0.5, 0.65), (0.8, 0.5)),
    ((0.5, 0.35), (0.8, 0.5))
]
for start, end in connections:
    ax1.plot([start[0], end[0]], [start[1], end[1]], 'k-', linewidth=2, alpha=0.5, zorder=1)

# Add labels
ax1.text(0.35, 0.73, 'W, b', ha='center', fontsize=9, style='italic', color='red')
ax1.text(0.35, 0.43, 'W, b', ha='center', fontsize=9, style='italic', color='red')
ax1.text(0.65, 0.6, 'W', ha='center', fontsize=9, style='italic', color='red')
ax1.text(0.65, 0.45, 'W', ha='center', fontsize=9, style='italic', color='red')

ax1.set_xlim(0, 1)
ax1.set_ylim(0.2, 1)
ax1.axis('off')

# Plot 2: Activation functions
ax2 = fig.add_subplot(gs[1, 0])
x_act = np.linspace(-3, 3, 200)
ax2.plot(x_act, softplus(x_act), 'b-', linewidth=2, label='Softplus')
ax2.plot(x_act, relu(x_act), 'r-', linewidth=2, label='ReLU')
ax2.plot(x_act, sigmoid(x_act), 'g-', linewidth=2, label='Sigmoid')
ax2.set_xlabel('Input', fontweight='bold')
ax2.set_ylabel('Output', fontweight='bold')
ax2.set_title('Activation Functions', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0, color='k', linestyle='-', linewidth=0.5)
ax2.axvline(x=0, color='k', linestyle='-', linewidth=0.5)

# Plot 3: Final fit
ax3 = fig.add_subplot(gs[1, 1])
ax3.scatter(X_data, y_data, c='blue', s=100, alpha=0.6, 
            edgecolors='black', linewidth=1.5, label='Data', zorder=2)
ax3.plot(X_compare, pred_keras, 'green', linewidth=3, 
         label='NN Prediction', zorder=3)
ax3.axhline(y=0.5, color='red', linestyle='--', linewidth=1.5, alpha=0.5)
ax3.fill_between(X_compare.flatten(), 0.5, 1.2, alpha=0.1, color='green')
ax3.fill_between(X_compare.flatten(), -0.1, 0.5, alpha=0.1, color='red')
ax3.set_xlabel('Dosage', fontweight='bold')
ax3.set_ylabel('Effectiveness', fontweight='bold')
ax3.set_title('Final Result: Perfect Squiggle Fit!', fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)
ax3.set_ylim(-0.1, 1.2)

# Plot 4: Training process
ax4 = fig.add_subplot(gs[2, 0])
ax4.text(0.5, 0.9, 'Training Process (Backpropagation)', 
         ha='center', va='top', fontsize=12, fontweight='bold', transform=ax4.transAxes)
steps = ['1. Forward\nPass', '2. Calculate\nLoss', '3. Backward\nPass', '4. Update\nWeights']
x_pos = [0.15, 0.38, 0.62, 0.85]
for i, (step, x) in enumerate(zip(steps, x_pos)):
    rect = FancyBboxPatch((x-0.08, 0.4), 0.16, 0.3, 
                          boxstyle="round,pad=0.01", 
                          edgecolor='black', facecolor=plt.cm.viridis(i/4), 
                          linewidth=2, transform=ax4.transAxes, zorder=2)
    ax4.add_patch(rect)
    ax4.text(x, 0.55, step, ha='center', va='center', fontsize=9, 
             fontweight='bold', transform=ax4.transAxes, zorder=3)
    if i < len(steps) - 1:
        ax4.annotate('', xy=(x_pos[i+1]-0.08, 0.55), xytext=(x+0.08, 0.55),
                     arrowprops=dict(arrowstyle='->', lw=2, color='black'),
                     transform=ax4.transAxes)
ax4.annotate('', xy=(0.15-0.08, 0.45), xytext=(0.85+0.08, 0.65),
             arrowprops=dict(arrowstyle='->', lw=2, color='red', linestyle='--'),
             transform=ax4.transAxes)
ax4.text(0.5, 0.2, 'Repeat until convergence', ha='center', va='center', 
         fontsize=10, style='italic', color='red', transform=ax4.transAxes)
ax4.set_xlim(0, 1)
ax4.set_ylim(0, 1)
ax4.axis('off')

# Plot 5: Key concepts
ax5 = fig.add_subplot(gs[2, 1])
ax5.text(0.5, 0.95, 'Key Concepts', ha='center', va='top', 
         fontsize=12, fontweight='bold', transform=ax5.transAxes)

concepts = [
    '✓ Neural Networks = Squiggle Fitters',
    '✓ Activation Functions = Building Blocks',
    '✓ Weights & Biases = Learned Parameters',
    '✓ Backpropagation = Training Algorithm',
    '✓ Non-linear Problems = Perfect Use Case'
]

for i, concept in enumerate(concepts):
    y_pos = 0.75 - i * 0.12
    ax5.text(0.1, y_pos, concept, ha='left', va='center', 
             fontsize=10, transform=ax5.transAxes,
             bbox=dict(boxstyle='round', facecolor='lightyellow', 
                      edgecolor='black', linewidth=1.5))

ax5.set_xlim(0, 1)
ax5.set_ylim(0, 1)
ax5.axis('off')

plt.suptitle('Neural Networks: Complete Summary', 
             fontsize=18, fontweight='bold', y=0.98)
plt.show()

print("\n" + "="*70)
print(" "*15 + "🎉 CONGRATULATIONS! 🎉")
print("="*70)
print("\nYou have successfully completed the Neural Networks tutorial!")
print("\nYou now understand:")
print("  ✓ What neural networks are and how they work")
print("  ✓ How activation functions create non-linear patterns")
print("  ✓ How to implement NNs with libraries (Keras)")
print("  ✓ How to implement NNs from scratch (NumPy)")
print("  ✓ How backpropagation trains neural networks")
print("\nNext steps: Explore deep learning, CNNs, and RNNs!")
print("="*70)
print("\n💡 Remember: Neural Networks are just Big Fancy Squiggle Fitting Machines!")
print("\n📚 Keep learning and Quest On!")
print("="*70)

## Bonus: Interactive Exploration

### Experiment with Parameters!

Try modifying these parameters to see how they affect the neural network:

1. **Number of hidden nodes**: Change from 2 to 3, 4, or more
2. **Activation functions**: Try ReLU or sigmoid instead of softplus
3. **Learning rate**: Increase or decrease to see effect on training
4. **Network depth**: Add more hidden layers
5. **Dataset**: Create different patterns of data

Use the code cells below to experiment!

In [None]:
# Bonus Code: Experiment with different architectures

# Try different numbers of hidden nodes
def experiment_with_architecture(n_hidden_nodes=2, activation='softplus', learning_rate=0.01, epochs=500):
    """
    Experiment with different neural network architectures.
    
    Parameters:
    -----------
    n_hidden_nodes : int
        Number of nodes in hidden layer
    activation : str
        Activation function ('softplus', 'relu', or 'sigmoid')
    learning_rate : float
        Learning rate for training
    epochs : int
        Number of training epochs
    """
    print(f"\nExperiment: {n_hidden_nodes} hidden nodes, {activation} activation, lr={learning_rate}")
    print("="*70)
    
    # Create model
    model_exp = Sequential([
        Dense(n_hidden_nodes, activation=activation, input_shape=(1,)),
        Dense(1, activation='linear')
    ])
    
    model_exp.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss='mean_squared_error'
    )
    
    # Train
    history_exp = model_exp.fit(
        X_data.reshape(-1, 1), 
        y_data.reshape(-1, 1),
        epochs=epochs,
        batch_size=5,
        verbose=0
    )
    
    # Evaluate
    final_loss = history_exp.history['loss'][-1]
    print(f"Final Loss: {final_loss:.6f}")
    
    # Visualize
    X_test = np.linspace(0, 1, 1000).reshape(-1, 1)
    predictions = model_exp.predict(X_test, verbose=0)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot predictions
    ax1.scatter(X_data, y_data, c='blue', s=100, alpha=0.6, 
                edgecolors='black', linewidth=1.5, label='Data')
    ax1.plot(X_test, predictions, 'red', linewidth=3, label='Prediction')
    ax1.axhline(y=0.5, color='gray', linestyle='--', linewidth=1.5, alpha=0.5)
    ax1.set_xlabel('Dosage', fontweight='bold')
    ax1.set_ylabel('Effectiveness', fontweight='bold')
    ax1.set_title(f'Fit: {n_hidden_nodes} nodes, {activation}', fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(-0.1, 1.1)
    
    # Plot training history
    ax2.plot(history_exp.history['loss'], 'purple', linewidth=2)
    ax2.set_xlabel('Epoch', fontweight='bold')
    ax2.set_ylabel('Loss', fontweight='bold')
    ax2.set_title('Training History', fontweight='bold')
    ax2.grid(True, alpha=0.3)
    ax2.set_yscale('log')
    
    plt.tight_layout()
    plt.show()
    
    return model_exp, final_loss

# Example experiments - uncomment to run!
print("Run these experiments to see how architecture affects performance:\n")

# Experiment 1: Different hidden layer sizes
model_2, loss_2 = experiment_with_architecture(n_hidden_nodes=2, activation='softplus')

# Uncomment to try more:
# model_4, loss_4 = experiment_with_architecture(n_hidden_nodes=4, activation='softplus')
# model_relu, loss_relu = experiment_with_architecture(n_hidden_nodes=2, activation='relu')
# model_sigmoid, loss_sigmoid = experiment_with_architecture(n_hidden_nodes=2, activation='sigmoid')

print("\n✓ Experiment complete! Try changing the parameters above to explore more.")

## Summary and References

### What We Covered:
This tutorial provided a complete introduction to neural networks, covering:
- Fundamental concepts and terminology
- Real-world application (drug dosage effectiveness)
- Activation functions and their role
- How neural networks create complex patterns
- Implementation using TensorFlow/Keras
- Implementation from scratch using NumPy
- Backpropagation and training process
- Practical comparisons and experimentation

### Additional Resources:
- **StatQuest YouTube**: Original neural networks series
- **Deep Learning Book** (Goodfellow, Bengio, Courville)
- **Neural Networks and Deep Learning** (Michael Nielsen)
- **TensorFlow Documentation**: https://www.tensorflow.org/
- **PyTorch Documentation**: https://pytorch.org/

### Practice Exercises:
1. Modify the dataset to have different patterns
2. Implement a neural network with multiple hidden layers
3. Try different activation functions and compare results
4. Apply neural networks to a real-world dataset (e.g., Iris, MNIST)
5. Implement additional features like dropout or batch normalization

---

**Remember**: Neural networks are powerful tools, but they're not magic.  
Understanding how they work makes you a better machine learning practitioner!

**Quest On!** 🚀