# Neural Networks Fundamentals - Practice Exercises

This notebook contains hands-on exercises to reinforce concepts from the 10 core tutorial notebooks. Work through these exercises to deepen your understanding of neural networks from first principles.

**Instructions:**
- Each exercise corresponds to one or more tutorial notebooks
- Try to solve problems independently before checking hints
- Solutions are available in `solutions.ipynb`
- Experiment and explore beyond the basic requirements

---

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from utils import *
from viz_utils import *

# Set random seed for reproducibility
np.random.seed(42)

print("✓ Libraries imported successfully!")

---

## Exercise 1: Single Neuron Variations (Notebooks 1-2)

**Objective:** Implement different types of single neurons to understand basic building blocks.

### Task 1.1: Implement a Linear Neuron

Create a neuron that performs only linear transformation (no activation function):
$$y = w_1x_1 + w_2x_2 + ... + w_nx_n + b$$

In [None]:
def linear_neuron(inputs, weights, bias):
    """
    Implement a linear neuron.
    
    Parameters:
    -----------
    inputs : np.ndarray, shape (n_features,)
        Input features
    weights : np.ndarray, shape (n_features,)
        Neuron weights
    bias : float
        Neuron bias
    
    Returns:
    --------
    output : float
        Linear combination of inputs
    """
    # YOUR CODE HERE
    pass

# Test your implementation
test_inputs = np.array([1.0, 2.0, 3.0])
test_weights = np.array([0.5, -0.3, 0.8])
test_bias = 0.1

result = linear_neuron(test_inputs, test_weights, test_bias)
print(f"Linear neuron output: {result}")
print(f"Expected output: {np.dot(test_inputs, test_weights) + test_bias}")

<details>
<summary><b>💡 Hint (Click to expand)</b></summary>

Use `np.dot()` for the weighted sum of inputs, then add the bias.

</details>

### Task 1.2: Multi-Input Neuron with Sigmoid

Create a neuron that takes 5 inputs and uses sigmoid activation.

In [None]:
def sigmoid_neuron(inputs, weights, bias):
    """
    Implement a sigmoid-activated neuron.
    
    Parameters:
    -----------
    inputs : np.ndarray
        Input features
    weights : np.ndarray
        Neuron weights (same shape as inputs)
    bias : float
        Neuron bias
    
    Returns:
    --------
    output : float
        Sigmoid-activated output in [0, 1]
    """
    # YOUR CODE HERE
    pass

# Test with 5 inputs
test_inputs = np.random.randn(5)
test_weights = np.random.randn(5) * 0.1
test_bias = 0.0

output = sigmoid_neuron(test_inputs, test_weights, test_bias)
print(f"Sigmoid neuron output: {output:.4f}")
print(f"Output should be in [0, 1]: {0 <= output <= 1}")

---

## Exercise 2: Implement New Activation Function (Notebook 3)

**Objective:** Understand activation functions by implementing new ones.

### Task 2.1: Implement ELU (Exponential Linear Unit)

ELU is defined as:
$$\text{ELU}(x) = \begin{cases} 
x & \text{if } x > 0 \\
\alpha(e^x - 1) & \text{if } x \leq 0
\end{cases}$$

where $\alpha$ is typically 1.0.

In [None]:
def elu(x, alpha=1.0):
    """
    Implement the ELU activation function.
    
    Parameters:
    -----------
    x : np.ndarray
        Input values
    alpha : float, default=1.0
        ELU hyperparameter
    
    Returns:
    --------
    output : np.ndarray
        ELU-activated values
    """
    # YOUR CODE HERE
    pass

def elu_derivative(x, alpha=1.0):
    """
    Implement the derivative of ELU.
    
    Derivative is:
    - 1 if x > 0
    - alpha * exp(x) if x <= 0
    """
    # YOUR CODE HERE
    pass

# Test implementation
x = np.linspace(-3, 3, 100)
y = elu(x)
dy = elu_derivative(x)

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(x, y, 'b-', linewidth=2, label='ELU')
ax1.axhline(y=0, color='k', linestyle='--', alpha=0.3)
ax1.axvline(x=0, color='k', linestyle='--', alpha=0.3)
ax1.set_title('ELU Activation Function')
ax1.set_xlabel('Input')
ax1.set_ylabel('Output')
ax1.grid(True, alpha=0.3)
ax1.legend()

ax2.plot(x, dy, 'r-', linewidth=2, label='ELU Derivative')
ax2.axhline(y=0, color='k', linestyle='--', alpha=0.3)
ax2.axvline(x=0, color='k', linestyle='--', alpha=0.3)
ax2.set_title('ELU Derivative')
ax2.set_xlabel('Input')
ax2.set_ylabel('Gradient')
ax2.grid(True, alpha=0.3)
ax2.legend()

plt.tight_layout()
plt.show()

<details>
<summary><b>💡 Hint (Click to expand)</b></summary>

Use `np.where()` to apply different formulas for positive and negative values. For negative values, use `np.exp(x)` to compute the exponential.

</details>

### Task 2.2: Compare ELU with ReLU

Discuss the advantages of ELU over ReLU based on your plots.

**Your Answer:**

*(Write your observations here about how ELU handles negative values differently from ReLU and why this might be beneficial)*

---

## Exercise 3: Build 3-Layer Network Manually (Notebook 4)

**Objective:** Understand layer composition by building a network from scratch.

### Task 3.1: Implement a Dense Layer Class

Create a reusable `DenseLayer` class that can be stacked.

In [None]:
class DenseLayer:
    """
    A fully connected (dense) neural network layer.
    """
    
    def __init__(self, n_inputs, n_neurons, activation='relu'):
        """
        Initialize the layer.
        
        Parameters:
        -----------
        n_inputs : int
            Number of input features
        n_neurons : int
            Number of neurons in this layer
        activation : str
            Activation function ('relu', 'sigmoid', or 'linear')
        """
        # Initialize weights with small random values
        # YOUR CODE HERE
        
        # Initialize biases to zero
        # YOUR CODE HERE
        
        self.activation = activation
    
    def forward(self, inputs):
        """
        Forward pass through the layer.
        
        Parameters:
        -----------
        inputs : np.ndarray, shape (batch_size, n_inputs)
            Input data
        
        Returns:
        --------
        output : np.ndarray, shape (batch_size, n_neurons)
            Layer output after activation
        """
        # Compute linear transformation: Z = X @ W + b
        # YOUR CODE HERE
        
        # Apply activation function
        # YOUR CODE HERE
        
        pass

# Test the layer
layer = DenseLayer(n_inputs=10, n_neurons=5, activation='relu')
test_input = np.random.randn(3, 10)  # 3 samples, 10 features
output = layer.forward(test_input)

print(f"Input shape: {test_input.shape}")
print(f"Output shape: {output.shape}")
print(f"Expected output shape: (3, 5)")

<details>
<summary><b>💡 Hint (Click to expand)</b></summary>

- Initialize weights using `np.random.randn() * 0.01` for small random values
- Initialize biases using `np.zeros()`
- For matrix multiplication, use `@` operator or `np.dot()`
- For ReLU: `np.maximum(0, x)`
- For sigmoid: `1 / (1 + np.exp(-x))`

</details>

### Task 3.2: Build a 3-Layer Network

Stack three dense layers: 784 → 128 → 64 → 10

In [None]:
class ThreeLayerNetwork:
    """
    A 3-layer neural network for MNIST classification.
    """
    
    def __init__(self):
        # Create three layers
        # YOUR CODE HERE
        pass
    
    def forward(self, X):
        """
        Forward pass through all layers.
        
        Parameters:
        -----------
        X : np.ndarray, shape (batch_size, 784)
            Flattened MNIST images
        
        Returns:
        --------
        output : np.ndarray, shape (batch_size, 10)
            Class probabilities
        """
        # Pass through each layer sequentially
        # YOUR CODE HERE
        pass

# Test the network
network = ThreeLayerNetwork()
test_batch = np.random.randn(5, 784)  # 5 images
predictions = network.forward(test_batch)

print(f"Input shape: {test_batch.shape}")
print(f"Output shape: {predictions.shape}")
print(f"First prediction (10 class scores): {predictions[0]}")

---

## Exercise 4: Trace Forward Propagation (Notebook 5)

**Objective:** Understand information flow by manually tracing computations.

### Task 4.1: Manual Computation

Given this tiny network:
- Input: [2.0, -1.0]
- Layer 1: 2 neurons, weights = [[0.5, -0.3], [0.2, 0.8]], biases = [0.1, -0.1]
- Activation: ReLU

Compute the output **by hand** (showing all steps), then verify with code.

**Your Manual Calculation:**

```
Input: x = [2.0, -1.0]

Neuron 1:
  z1 = (0.5)(2.0) + (-0.3)(-1.0) + 0.1 = ?
  a1 = ReLU(z1) = ?

Neuron 2:
  z2 = (0.2)(2.0) + (0.8)(-1.0) + (-0.1) = ?
  a2 = ReLU(z2) = ?

Output: [a1, a2] = ?
```

In [None]:
# Verify your calculation with code
x = np.array([2.0, -1.0])
W = np.array([[0.5, -0.3], [0.2, 0.8]])
b = np.array([0.1, -0.1])

# Compute z (pre-activation)
# YOUR CODE HERE

# Apply ReLU
# YOUR CODE HERE

print(f"Output: {output}")
print("Does this match your hand calculation?")

---

## Exercise 5: Implement Alternative Loss Function (Notebook 6)

**Objective:** Understand loss functions by implementing Mean Absolute Error.

### Task 5.1: Implement MAE Loss

Mean Absolute Error (MAE) is defined as:
$$\text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$$

In [None]:
def mae_loss(y_true, y_pred):
    """
    Compute Mean Absolute Error loss.
    
    Parameters:
    -----------
    y_true : np.ndarray
        True labels
    y_pred : np.ndarray
        Predicted labels
    
    Returns:
    --------
    loss : float
        MAE loss value
    """
    # YOUR CODE HERE
    pass

def mae_derivative(y_true, y_pred):
    """
    Compute derivative of MAE with respect to predictions.
    
    Derivative is:
    - -1 if y_pred < y_true
    - +1 if y_pred > y_true
    - 0 if y_pred == y_true
    """
    # YOUR CODE HERE
    pass

# Test implementation
y_true = np.array([1.0, 2.0, 3.0, 4.0])
y_pred = np.array([1.2, 1.8, 3.1, 4.5])

mae = mae_loss(y_true, y_pred)
grad = mae_derivative(y_true, y_pred)

print(f"MAE Loss: {mae:.4f}")
print(f"Expected: {np.mean(np.abs(y_true - y_pred)):.4f}")
print(f"\nGradient: {grad}")

### Task 5.2: Compare MSE and MAE

When would you prefer MAE over MSE (Mean Squared Error)?

**Your Answer:**

*(Discuss robustness to outliers, gradient behavior, and use cases)*

---

## Exercise 6: Manual Gradient Computation (Notebook 7)

**Objective:** Understand backpropagation by computing gradients manually.

### Task 6.1: Compute Gradients for Simple Network

Given:
- Input: x = 3.0
- Weight: w = 0.5
- Bias: b = 0.2
- Activation: Linear (no activation)
- True label: y = 2.0
- Loss: MSE = (y - ŷ)²

Compute:
1. Forward pass: ŷ = wx + b
2. Loss: L
3. ∂L/∂ŷ
4. ∂L/∂w (using chain rule)
5. ∂L/∂b

**Your Manual Calculation:**

```
Step 1: Forward pass
  ŷ = w*x + b = 0.5*3.0 + 0.2 = ?

Step 2: Compute loss
  L = (y - ŷ)² = (2.0 - ?)² = ?

Step 3: dL/dŷ
  dL/dŷ = 2(ŷ - y) = ?

Step 4: dL/dw (chain rule: dL/dw = dL/dŷ * dŷ/dw)
  dŷ/dw = x = 3.0
  dL/dw = dL/dŷ * dŷ/dw = ? * 3.0 = ?

Step 5: dL/db
  dŷ/db = 1.0
  dL/db = dL/dŷ * dŷ/db = ? * 1.0 = ?
```

In [None]:
# Verify with code
x = 3.0
w = 0.5
b = 0.2
y_true = 2.0

# Forward pass
y_pred = w * x + b
loss = (y_true - y_pred) ** 2

# Gradients
dL_dyhat = 2 * (y_pred - y_true)
dL_dw = dL_dyhat * x  # Chain rule
dL_db = dL_dyhat * 1.0  # Chain rule

print(f"Prediction: {y_pred}")
print(f"Loss: {loss}")
print(f"dL/dw: {dL_dw}")
print(f"dL/db: {dL_db}")

---

## Exercise 7: Debug Broken Training Loop (Notebook 8)

**Objective:** Develop debugging skills by fixing a broken implementation.

### Task 7.1: Find and Fix the Bugs

The following training loop has **3 bugs**. Find and fix them!

In [None]:
def buggy_training_loop(X, y, epochs=10, learning_rate=0.01):
    """
    A training loop with bugs to fix.
    """
    n_samples, n_features = X.shape
    n_classes = y.shape[1]
    
    # Initialize weights and biases
    W = np.random.randn(n_features, n_classes) * 0.01
    b = np.zeros((1, n_classes))
    
    losses = []
    
    for epoch in range(epochs):
        # Forward pass
        z = X @ W - b  # BUG #1: Should this be minus?
        
        # Softmax activation
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        predictions = exp_z / np.sum(exp_z, axis=1, keepdims=True)
        
        # Cross-entropy loss
        loss = -np.mean(np.sum(y * np.log(predictions + 1e-8), axis=1))
        losses.append(loss)
        
        # Backward pass
        dz = predictions - y
        dW = (X.T @ dz)  # BUG #2: Missing division by batch size
        db = np.sum(dz, axis=0, keepdims=True) / n_samples
        
        # Update parameters
        W = W - learning_rate * dW
        b = b + learning_rate * db  # BUG #3: Should this be plus?
        
        if (epoch + 1) % 2 == 0:
            print(f"Epoch {epoch+1}/{epochs}, Loss: {loss:.4f}")
    
    return W, b, losses

# Test with small dataset
X_small = np.random.randn(100, 10)
y_small = np.eye(3)[np.random.randint(0, 3, 100)]  # 3 classes

W, b, losses = buggy_training_loop(X_small, y_small, epochs=10)

# Plot losses - should decrease!
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss (Should Decrease!)')
plt.grid(True, alpha=0.3)
plt.show()

<details>
<summary><b>💡 Hints (Click to expand)</b></summary>

**Bug #1:** Check the forward pass formula. Should bias be added or subtracted?

**Bug #2:** Gradient computation should average over the batch. Are you dividing by `n_samples`?

**Bug #3:** Gradient descent updates go in the direction that reduces loss. Check the sign!

</details>

---

## Exercise 8: Hyperparameter Tuning for MNIST (Notebook 9)

**Objective:** Improve model performance through systematic experimentation.

### Task 8.1: Tune Learning Rate

Train models with different learning rates and find the best one.

In [None]:
# Load MNIST (small subset for speed)
(X_train, y_train), (X_test, y_test) = load_mnist()

# Use only 10,000 samples for faster experimentation
X_train_small = X_train[:10000]
y_train_small = y_train[:10000]

print(f"Training set: {X_train_small.shape}")
print(f"Test set: {X_test.shape}")

In [None]:
# TODO: Try different learning rates
learning_rates = [0.001, 0.01, 0.1, 0.5, 1.0]

results = {}

for lr in learning_rates:
    print(f"\nTrying learning rate: {lr}")
    
    # YOUR CODE HERE:
    # 1. Create and train a simple network
    # 2. Track final training and validation accuracy
    # 3. Store results
    
    pass

# Plot results
# YOUR CODE HERE: Create a bar plot comparing accuracies

### Task 8.2: Experiment with Network Architecture

Try different hidden layer sizes and depths.

In [None]:
# TODO: Try different architectures
architectures = [
    [784, 64, 10],          # Small
    [784, 128, 10],         # Medium
    [784, 256, 10],         # Large
    [784, 128, 64, 10],     # Two hidden layers
    [784, 256, 128, 10],    # Larger with two hidden layers
]

# YOUR CODE HERE:
# For each architecture:
#   1. Build and train the network
#   2. Record accuracy and training time
#   3. Compare results

**Question:** What did you learn about the trade-off between model size and performance?

**Your Answer:**

---

## Exercise 9: PyTorch Translation (Notebook 10)

**Objective:** Learn to translate between NumPy and PyTorch implementations.

### Task 9.1: Convert NumPy Model to PyTorch

Implement the same 2-layer network using PyTorch.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

class MNISTNet(nn.Module):
    """
    PyTorch implementation of MNIST classifier.
    
    Architecture: 784 -> 128 -> 10
    """
    
    def __init__(self):
        super(MNISTNet, self).__init__()
        # YOUR CODE HERE:
        # Define layers using nn.Linear
        pass
    
    def forward(self, x):
        # YOUR CODE HERE:
        # Implement forward pass with ReLU and softmax
        pass

# Test the model
model = MNISTNet()
test_input = torch.randn(5, 784)
output = model(test_input)

print(f"Model output shape: {output.shape}")
print(f"Output probabilities sum to 1: {torch.allclose(output.sum(dim=1), torch.ones(5))}")

<details>
<summary><b>💡 Hint (Click to expand)</b></summary>

```python
# Define layers
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)

# Forward pass
x = torch.relu(self.fc1(x))
x = torch.softmax(self.fc2(x), dim=1)
```

</details>

### Task 9.2: Train PyTorch Model

Implement a training loop using PyTorch's autograd.

In [None]:
def train_pytorch_model(model, X_train, y_train, epochs=10, lr=0.01, batch_size=32):
    """
    Train PyTorch model.
    """
    # Convert NumPy arrays to PyTorch tensors
    # YOUR CODE HERE
    
    # Define loss function and optimizer
    # YOUR CODE HERE
    
    # Training loop
    # YOUR CODE HERE
    
    pass

# Train and compare with NumPy implementation
# YOUR CODE HERE

---

## Exercise 10: Bonus Challenge 🌟

**Objective:** Apply everything you've learned to a new dataset.

### Task 10.1: Fashion-MNIST Classification

Fashion-MNIST is similar to MNIST but with clothing items instead of digits.

**Challenge:** Build a neural network from scratch (using NumPy) that achieves >85% accuracy on Fashion-MNIST.

In [None]:
# Fashion-MNIST can be loaded similarly to MNIST
# For this exercise, you can use MNIST as a substitute or
# download Fashion-MNIST from: https://github.com/zalandoresearch/fashion-mnist

# YOUR CODE HERE:
# 1. Load Fashion-MNIST data
# 2. Design your network architecture
# 3. Implement training loop with proper validation
# 4. Tune hyperparameters
# 5. Visualize results and misclassifications

print("Good luck! Remember to:")
print("- Try different architectures")
print("- Tune learning rate and batch size")
print("- Use validation set for early stopping")
print("- Visualize training progress")
print("- Analyze mistakes to improve")

---

## Reflection Questions

After completing these exercises, reflect on your learning:

1. **What concept was most challenging to implement from scratch?**
   
   *Your answer:*

2. **How does implementing neural networks from scratch help you understand PyTorch/TensorFlow better?**
   
   *Your answer:*

3. **What was the most surprising thing you learned about neural networks?**
   
   *Your answer:*

4. **What would you like to explore next?**
   
   *Your answer:*

---

## Next Steps

Congratulations on completing the exercises! Here are some suggestions:

1. 📚 **Review** the solutions in `solutions.ipynb`
2. 🔬 **Experiment** with variations on these exercises
3. 🏗️ **Build** your own project using these fundamentals
4. 📖 **Study** more advanced topics (CNNs, RNNs, Transformers)
5. 🤝 **Share** what you've learned with others

Remember: The best way to learn is by doing. Keep experimenting!

---