# Notebook 4: Neural Network Layers

## From Single Neurons to Powerful Teams

Welcome back! In our previous notebooks, we learned about:
- What a neural network is (Notebook 1)
- How a single neuron works (Notebook 2)
- Different activation functions (Notebook 3)

Now it's time to scale up! 🚀

### 💡 Key Question: Why Do We Need Multiple Neurons?

Think about it this way:
- **One neuron** is like one person trying to solve a complex problem
- **Multiple neurons (a layer)** is like a team of experts, each focusing on different aspects

### 🏭 The Assembly Line Analogy

Imagine a car manufacturing plant:
- **Single neuron**: One person building an entire car (slow, limited)
- **Layer of neurons**: An assembly line where multiple workers process the car simultaneously
  - Worker 1: Checks the engine
  - Worker 2: Inspects the wheels  
  - Worker 3: Tests the electronics
  - Worker 4: Examines the body

Each neuron in a layer looks at the same input data but learns to detect **different features**!

Let's build this step by step! 🔧

In [None]:
# Import our tools - these are libraries that help us work with numbers and create visualizations
import numpy as np  # NumPy: for mathematical operations and arrays
import matplotlib.pyplot as plt  # Matplotlib: for creating graphs and visualizations

# This makes our plots appear directly in the notebook
%matplotlib inline

# Set random seed so we get the same random numbers every time (makes our experiments reproducible)
np.random.seed(42)  # 42 is just a popular choice (from "Hitchhiker's Guide to the Galaxy")

print("✅ Libraries imported successfully!")
print("📦 NumPy version:", np.__version__)  # Show which version of NumPy we're using

## Part 1: From One Neuron to Many

### 🧠 Recap: Single Neuron

Remember, a single neuron does this:
1. Takes inputs: `x1, x2, x3, ...`
2. Multiplies each by a weight: `w1*x1, w2*x2, w3*x3, ...`
3. Adds them up: `sum = w1*x1 + w2*x2 + w3*x3 + ...`
4. Adds a bias: `sum + bias`
5. Applies activation function: `activation(sum + bias)`

Let's code a single neuron as a refresher:

In [None]:
# Define a simple ReLU activation function (from Notebook 3)
def relu(x):
    """
    ReLU activation function: Returns max(0, x)
    - If x is positive, return x
    - If x is negative or zero, return 0
    """
    return np.maximum(0, x)  # Element-wise maximum between 0 and x

# Single neuron function
def single_neuron(inputs, weights, bias):
    """
    Compute the output of a single neuron.
    
    Parameters:
    - inputs: array of input values [x1, x2, x3, ...]
    - weights: array of weights [w1, w2, w3, ...]
    - bias: single number (the bias term)
    
    Returns:
    - output: the neuron's output after activation
    """
    # Step 1: Calculate weighted sum (multiply inputs by weights and sum)
    weighted_sum = np.dot(inputs, weights)  # Same as: inputs[0]*weights[0] + inputs[1]*weights[1] + ...
    
    # Step 2: Add the bias
    z = weighted_sum + bias
    
    # Step 3: Apply activation function (ReLU)
    output = relu(z)
    
    return output

# Example: Let's test our single neuron
example_inputs = np.array([1.0, 2.0, 3.0])  # Three input values
example_weights = np.array([0.5, -0.3, 0.8])  # Three weights (one for each input)
example_bias = 0.1  # One bias value

# Run the neuron
neuron_output = single_neuron(example_inputs, example_weights, example_bias)

print("🔢 Input values:", example_inputs)
print("⚖️  Weights:", example_weights)
print("➕ Bias:", example_bias)
print("\n📊 Calculation:")
print(f"   Weighted sum: {example_inputs[0]}*{example_weights[0]} + {example_inputs[1]}*{example_weights[1]} + {example_inputs[2]}*{example_weights[2]} = {np.dot(example_inputs, example_weights):.2f}")
print(f"   Add bias: {np.dot(example_inputs, example_weights):.2f} + {example_bias} = {np.dot(example_inputs, example_weights) + example_bias:.2f}")
print(f"   Apply ReLU: max(0, {np.dot(example_inputs, example_weights) + example_bias:.2f}) = {neuron_output:.2f}")
print(f"\n✨ Final output: {neuron_output:.2f}")

## Part 2: Creating a Layer of Neurons

### 👥 The Committee Analogy

Now imagine we have **4 neurons** all looking at the **same 3 inputs**:
- **Neuron 1**: Might learn to detect "vertical edges"
- **Neuron 2**: Might learn to detect "horizontal edges"
- **Neuron 3**: Might learn to detect "curves"
- **Neuron 4**: Might learn to detect "brightness"

Each neuron has its **own weights and bias** - they're all specialists!

### 📊 Layer Structure

If we have:
- **3 inputs** (x1, x2, x3)
- **4 neurons** in our layer

Then we need:
- **12 weights** (3 weights per neuron × 4 neurons)
- **4 biases** (1 bias per neuron)

Let's implement this the **simple way first** (with a loop):

In [None]:
# Method 1: Using a loop (easy to understand, but slower)
def layer_with_loop(inputs, weights, biases):
    """
    Compute the output of a layer using a loop.
    
    Parameters:
    - inputs: array of shape (num_inputs,) - e.g., [x1, x2, x3]
    - weights: 2D array of shape (num_inputs, num_neurons) - weights for all neurons
    - biases: array of shape (num_neurons,) - one bias per neuron
    
    Returns:
    - outputs: array of shape (num_neurons,) - outputs from all neurons
    """
    # Get the number of neurons in this layer
    num_neurons = weights.shape[1]  # weights.shape[1] tells us how many columns (neurons) we have
    
    # Create an empty list to store outputs from each neuron
    outputs = []  # We'll append each neuron's output to this list
    
    # Loop through each neuron
    for i in range(num_neurons):  # i goes from 0 to num_neurons-1
        # Get the weights for this specific neuron (column i from weights matrix)
        neuron_weights = weights[:, i]  # The : means "all rows", i means "column i"
        
        # Get the bias for this specific neuron
        neuron_bias = biases[i]  # Get the i-th bias value
        
        # Calculate this neuron's output using our single_neuron function
        neuron_output = single_neuron(inputs, neuron_weights, neuron_bias)
        
        # Add this neuron's output to our list
        outputs.append(neuron_output)
        
        # Print what this neuron is doing (for learning purposes)
        print(f"Neuron {i+1}: weights={neuron_weights}, bias={neuron_bias:.2f} → output={neuron_output:.2f}")
    
    # Convert the list to a NumPy array and return
    return np.array(outputs)

# Create example data for a layer
layer_inputs = np.array([1.0, 2.0, 3.0])  # 3 input values

# Create weights for 4 neurons (each neuron needs 3 weights for 3 inputs)
# Shape: (3 inputs, 4 neurons)
layer_weights = np.array([
    [0.2, -0.3, 0.5, 0.1],  # Weights from input 1 to each of the 4 neurons
    [0.8, 0.4, -0.2, 0.6],  # Weights from input 2 to each of the 4 neurons
    [-0.5, 0.7, 0.3, -0.1]  # Weights from input 3 to each of the 4 neurons
])

# Create biases (one per neuron)
layer_biases = np.array([0.1, -0.2, 0.3, 0.15])  # 4 biases for 4 neurons

print("🔄 Computing layer output using a loop:\n")
print("📥 Inputs:", layer_inputs)
print("⚖️  Weights shape:", layer_weights.shape, "(3 inputs × 4 neurons)")
print("➕ Biases:", layer_biases)
print("\n" + "="*60)

# Compute the layer output
layer_outputs_loop = layer_with_loop(layer_inputs, layer_weights, layer_biases)

print("="*60)
print("\n✨ Final layer outputs:", layer_outputs_loop)
print("📊 We went from 3 inputs → 4 outputs (one from each neuron)!")

### 💡 Key Insight: What Just Happened?

Each neuron in the layer:
1. Looked at the **same 3 inputs**: [1.0, 2.0, 3.0]
2. Used **different weights** (its own set of 3 weights)
3. Had **its own bias**
4. Produced **its own output**

So we transformed:
- **3 input values** → **4 output values**

This is what a **layer** does! It's like having multiple specialists all analyzing the same data simultaneously.

---

## Part 3: Matrix Multiplication - The Fast Way! 🚀

### 📦 The Batch Processing Analogy

Using a loop is like:
- Processing emails one by one (slow!)

Using matrix multiplication is like:
- Batch processing all emails at once (fast!)

### 🎯 How Matrix Multiplication Works for Neural Networks

Instead of looping through each neuron, we can do **all calculations at once** using matrix multiplication!

Here's the magic formula:
```
outputs = activation(inputs @ weights + biases)
```

Where `@` is matrix multiplication (also written as `np.dot()`).

Let's see this in action:

In [None]:
# Method 2: Using matrix multiplication (fast and efficient!)
def layer_with_matrix(inputs, weights, biases):
    """
    Compute the output of a layer using matrix multiplication.
    
    This does the SAME thing as layer_with_loop, but much faster!
    
    Parameters:
    - inputs: array of shape (num_inputs,)
    - weights: 2D array of shape (num_inputs, num_neurons)
    - biases: array of shape (num_neurons,)
    
    Returns:
    - outputs: array of shape (num_neurons,)
    """
    # Step 1: Matrix multiplication - this computes weighted sums for ALL neurons at once!
    # inputs @ weights means: multiply inputs by weights using matrix multiplication
    weighted_sums = np.dot(inputs, weights)  # Same as inputs @ weights
    
    # Step 2: Add biases - NumPy automatically adds the bias to each weighted sum
    z = weighted_sums + biases  # This adds each bias to the corresponding weighted sum
    
    # Step 3: Apply activation function to ALL outputs at once
    outputs = relu(z)  # ReLU works element-wise on the entire array
    
    return outputs

# Use the SAME inputs, weights, and biases as before
print("⚡ Computing layer output using matrix multiplication:\n")
print("📥 Inputs:", layer_inputs)
print("⚖️  Weights shape:", layer_weights.shape)
print("➕ Biases:", layer_biases)

# Compute the layer output using the fast method
layer_outputs_matrix = layer_with_matrix(layer_inputs, layer_weights, layer_biases)

print("\n✨ Final layer outputs:", layer_outputs_matrix)
print("\n🔍 Verification: Are both methods the same?")
print("   Loop method outputs:  ", layer_outputs_loop)
print("   Matrix method outputs:", layer_outputs_matrix)
print("   Are they equal?", np.allclose(layer_outputs_loop, layer_outputs_matrix))  # allclose checks if arrays are nearly equal
print("\n🎉 Success! Both methods give the same result, but matrix is MUCH faster!")

### 🧮 Understanding Matrix Multiplication Visually

Let's break down what happened with **actual numbers**:

In [None]:
# Let's manually show what matrix multiplication does
print("📊 DETAILED CALCULATION BREAKDOWN:")
print("="*70)
print("\nInputs:", layer_inputs)
print("\nWeights matrix (each column = one neuron's weights):")
print(layer_weights)
print("\nBiases (one per neuron):", layer_biases)

print("\n" + "="*70)
print("NEURON-BY-NEURON CALCULATION:")
print("="*70)

# Show calculation for each neuron
for i in range(4):  # We have 4 neurons
    print(f"\nNeuron {i+1}:")
    
    # Get weights for this neuron (column i)
    weights_i = layer_weights[:, i]
    
    # Calculate weighted sum step by step
    print(f"  Weights: {weights_i}")
    print(f"  Calculation: ({layer_inputs[0]}×{weights_i[0]}) + ({layer_inputs[1]}×{weights_i[1]}) + ({layer_inputs[2]}×{weights_i[2]})")
    
    weighted_sum = np.dot(layer_inputs, weights_i)
    print(f"  Weighted sum: {weighted_sum:.2f}")
    
    with_bias = weighted_sum + layer_biases[i]
    print(f"  Add bias ({layer_biases[i]}): {with_bias:.2f}")
    
    after_relu = relu(with_bias)
    print(f"  After ReLU: {after_relu:.2f}")

print("\n" + "="*70)
print(f"\n✅ FINAL OUTPUT: {layer_outputs_matrix}")
print("\nThis is what the layer computed in ONE STEP using matrix multiplication!")

### 📏 Understanding Weight Matrix Shape

The shape of the weight matrix is **SUPER IMPORTANT**!

**Rule of thumb:**
```
Weights shape: (number of inputs, number of neurons in layer)
```

In our example:
- **3 inputs** → number of rows
- **4 neurons** → number of columns
- **Weights shape**: (3, 4)

Let's visualize this:

In [None]:
# Visualize the weight matrix structure
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Show the weight matrix as a heatmap
im = axes[0].imshow(layer_weights, cmap='RdBu', aspect='auto', vmin=-1, vmax=1)
axes[0].set_xlabel('Neurons (outputs)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Inputs', fontsize=12, fontweight='bold')
axes[0].set_title('Weight Matrix Visualization\n(darker = negative, lighter = positive)', fontsize=13, fontweight='bold')
axes[0].set_xticks(range(4))
axes[0].set_xticklabels(['Neuron 1', 'Neuron 2', 'Neuron 3', 'Neuron 4'])
axes[0].set_yticks(range(3))
axes[0].set_yticklabels(['Input 1', 'Input 2', 'Input 3'])

# Add weight values as text
for i in range(3):
    for j in range(4):
        text = axes[0].text(j, i, f'{layer_weights[i, j]:.2f}',
                           ha="center", va="center", color="black", fontsize=10, fontweight='bold')

plt.colorbar(im, ax=axes[0], label='Weight value')

# Right plot: Show the connections as a network diagram
axes[1].set_xlim(-0.5, 3.5)
axes[1].set_ylim(-0.5, 4.5)
axes[1].axis('off')
axes[1].set_title('Layer Connection Diagram\n(3 inputs → 4 neurons)', fontsize=13, fontweight='bold')

# Draw input nodes (left side)
input_positions = [3.5, 2.5, 1.5]  # y-positions for 3 inputs
for i, y in enumerate(input_positions):
    circle = plt.Circle((0.5, y), 0.3, color='lightblue', ec='black', linewidth=2, zorder=5)
    axes[1].add_patch(circle)
    axes[1].text(0.5, y, f'x{i+1}', ha='center', va='center', fontsize=11, fontweight='bold', zorder=6)
    axes[1].text(-0.3, y, f'Input {i+1}', ha='right', va='center', fontsize=9)

# Draw output nodes (right side)
output_positions = [4, 3, 2, 1]  # y-positions for 4 neurons
for i, y in enumerate(output_positions):
    circle = plt.Circle((2.5, y), 0.3, color='lightcoral', ec='black', linewidth=2, zorder=5)
    axes[1].add_patch(circle)
    axes[1].text(2.5, y, f'n{i+1}', ha='center', va='center', fontsize=11, fontweight='bold', zorder=6)
    axes[1].text(3.3, y, f'Neuron {i+1}', ha='left', va='center', fontsize=9)

# Draw connections (lines from each input to each neuron)
for i, input_y in enumerate(input_positions):
    for j, output_y in enumerate(output_positions):
        # Line thickness based on weight magnitude
        weight = layer_weights[i, j]
        linewidth = abs(weight) * 3  # Thicker lines for larger weights
        color = 'red' if weight < 0 else 'green'
        alpha = min(abs(weight), 0.7)  # Transparency based on weight
        axes[1].plot([0.8, 2.2], [input_y, output_y], color=color, linewidth=linewidth, alpha=alpha, zorder=1)

# Add legend
axes[1].plot([], [], color='green', linewidth=3, label='Positive weight')
axes[1].plot([], [], color='red', linewidth=3, label='Negative weight')
axes[1].legend(loc='upper left', fontsize=9)

plt.tight_layout()
plt.savefig('layer_structure.png', dpi=150, bbox_inches='tight')
plt.show()

print("\n💡 Key Observations:")
print("   • Each INPUT connects to ALL NEURONS (fully connected)")
print("   • Green lines = positive weights (excitatory)")
print("   • Red lines = negative weights (inhibitory)")
print("   • Thicker lines = larger weight magnitudes (stronger connections)")
print(f"   • Total connections: 3 inputs × 4 neurons = {3*4} weights")

## Part 4: Hidden Layers - The Secret Feature Detectors 🕵️

### 🔍 What Are Hidden Layers?

In a neural network:
- **Input layer**: The raw data (what we feed in)
- **Hidden layer(s)**: Intermediate processing (the "black box" where magic happens)
- **Output layer**: The final answer (predictions)

### 🎨 The Feature Detector Analogy

Think of hidden layers like Instagram filters:
- **Layer 1**: Detects basic features (edges, colors, textures)
- **Layer 2**: Combines basic features into patterns (shapes, faces)
- **Layer 3**: Combines patterns into complex concepts (objects, scenes)

Each layer learns to detect increasingly **abstract features**!

Let's visualize how a layer transforms data:

In [None]:
# Create a simple example showing how a layer transforms data
# We'll use 2D inputs so we can visualize them

# Generate some random 2D data points
num_points = 100  # Number of data points
input_data = np.random.randn(num_points, 2)  # 100 points with 2 features each

# Create a layer with 2 inputs and 3 neurons
layer_weights_2d = np.array([
    [1.5, -0.8, 0.3],   # Weights from input 1 to each neuron
    [-0.5, 1.2, 0.9]    # Weights from input 2 to each neuron
])

layer_biases_2d = np.array([0.5, -0.3, 0.2])  # Biases for 3 neurons

# Transform all data points through the layer
def transform_batch(inputs, weights, biases):
    """
    Transform multiple data points through a layer.
    
    Parameters:
    - inputs: shape (num_samples, num_inputs)
    - weights: shape (num_inputs, num_neurons)
    - biases: shape (num_neurons,)
    
    Returns:
    - outputs: shape (num_samples, num_neurons)
    """
    # Matrix multiplication handles all samples at once!
    z = np.dot(inputs, weights) + biases  # Compute for ALL samples simultaneously
    outputs = relu(z)  # Apply activation to all outputs
    return outputs

# Transform the data
output_data = transform_batch(input_data, layer_weights_2d, layer_biases_2d)

print("📊 Data Transformation:")
print(f"   Input shape: {input_data.shape} (100 samples, 2 features each)")
print(f"   Output shape: {output_data.shape} (100 samples, 3 features each)")
print("\n   ✨ The layer transformed 2D data into 3D data!")

# Visualize the transformation
fig = plt.figure(figsize=(16, 5))

# Plot 1: Original input data (2D)
ax1 = fig.add_subplot(131)
ax1.scatter(input_data[:, 0], input_data[:, 1], alpha=0.6, s=50, c='blue', edgecolors='black')
ax1.set_xlabel('Input Feature 1', fontsize=11, fontweight='bold')
ax1.set_ylabel('Input Feature 2', fontsize=11, fontweight='bold')
ax1.set_title('Original Input Data (2D)\n100 points with 2 features', fontsize=12, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.axhline(y=0, color='k', linewidth=0.5)
ax1.axvline(x=0, color='k', linewidth=0.5)

# Plot 2: Output data projected to 2D (neurons 1 and 2)
ax2 = fig.add_subplot(132)
ax2.scatter(output_data[:, 0], output_data[:, 1], alpha=0.6, s=50, c='red', edgecolors='black')
ax2.set_xlabel('Neuron 1 Output', fontsize=11, fontweight='bold')
ax2.set_ylabel('Neuron 2 Output', fontsize=11, fontweight='bold')
ax2.set_title('Transformed Data (neurons 1 & 2)\nAfter passing through layer', fontsize=12, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.axhline(y=0, color='k', linewidth=0.5)
ax2.axvline(x=0, color='k', linewidth=0.5)

# Plot 3: 3D visualization of all outputs
ax3 = fig.add_subplot(133, projection='3d')
ax3.scatter(output_data[:, 0], output_data[:, 1], output_data[:, 2], 
           alpha=0.6, s=50, c='green', edgecolors='black')
ax3.set_xlabel('Neuron 1', fontsize=10, fontweight='bold')
ax3.set_ylabel('Neuron 2', fontsize=10, fontweight='bold')
ax3.set_zlabel('Neuron 3', fontsize=10, fontweight='bold')
ax3.set_title('Full Output (3D)\nAll 3 neurons', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig('layer_transformation.png', dpi=150, bbox_inches='tight')
plt.show()

print("\n💡 What Just Happened?")
print("   • The layer acted as a FEATURE EXTRACTOR")
print("   • It transformed the data into a new representation")
print("   • Each neuron detected different patterns in the data")
print("   • The network can now see the data in 3D instead of 2D!")

## Part 5: Interactive Experimentation 🧪

### Try It Yourself!

Let's play with different layer configurations and see what happens:

In [None]:
# Interactive exploration: Build your own layer!

def create_and_test_layer(num_inputs, num_neurons, use_random_weights=True):
    """
    Create a layer and test it with random input.
    
    Parameters:
    - num_inputs: how many input features
    - num_neurons: how many neurons in the layer
    - use_random_weights: if True, use random weights; if False, use small positive weights
    """
    print("="*70)
    print(f"🏗️  BUILDING A LAYER: {num_inputs} inputs → {num_neurons} neurons")
    print("="*70)
    
    # Create random input data
    test_input = np.random.randn(num_inputs)  # Random values around 0
    
    # Create weights
    if use_random_weights:
        # Random weights between -1 and 1
        weights = np.random.randn(num_inputs, num_neurons) * 0.5
    else:
        # Small positive weights
        weights = np.random.rand(num_inputs, num_neurons) * 0.3
    
    # Create biases
    biases = np.random.randn(num_neurons) * 0.1  # Small random biases
    
    # Compute output
    output = layer_with_matrix(test_input, weights, biases)
    
    # Display results
    print(f"\n📥 Input: {test_input}")
    print(f"⚖️  Weights shape: {weights.shape}")
    print(f"➕ Biases: {biases}")
    print(f"\n📤 Output: {output}")
    print(f"\n📊 Statistics:")
    print(f"   • Total parameters: {weights.size + biases.size} ({weights.size} weights + {biases.size} biases)")
    print(f"   • Active neurons (output > 0): {np.sum(output > 0)} out of {num_neurons}")
    print(f"   • Output mean: {np.mean(output):.4f}")
    print(f"   • Output std: {np.std(output):.4f}")
    
    return test_input, weights, biases, output

# Experiment 1: Small layer
print("\n🔬 EXPERIMENT 1: Small Layer")
create_and_test_layer(num_inputs=2, num_neurons=3)

# Experiment 2: Larger layer
print("\n\n🔬 EXPERIMENT 2: Larger Layer")
create_and_test_layer(num_inputs=5, num_neurons=10)

# Experiment 3: Wide layer (many neurons)
print("\n\n🔬 EXPERIMENT 3: Wide Layer")
create_and_test_layer(num_inputs=3, num_neurons=20)

print("\n" + "="*70)
print("💡 Observations:")
print("   • More neurons = more outputs (more feature detectors)")
print("   • More inputs = more parameters (more to learn)")
print("   • Parameters = (num_inputs × num_neurons) + num_neurons")
print("   • ReLU makes some outputs zero (inactive neurons)")
print("="*70)

## Part 6: Common Mistakes ⚠️

Let's learn from common errors so you can avoid them!

In [None]:
# Common Mistake #1: Dimension Mismatch
print("⚠️  COMMON MISTAKE #1: Wrong Weight Dimensions\n")

# Correct way
correct_inputs = np.array([1.0, 2.0, 3.0])  # 3 inputs
correct_weights = np.array([[0.5, 0.2],     # 3 rows (one per input)
                           [0.3, 0.4],
                           [0.1, 0.6]])     # 2 columns (one per neuron)
correct_biases = np.array([0.1, 0.2])       # 2 biases (one per neuron)

print("✅ CORRECT:")
print(f"   Inputs shape: {correct_inputs.shape} (3 inputs)")
print(f"   Weights shape: {correct_weights.shape} (3 inputs × 2 neurons)")
print(f"   Biases shape: {correct_biases.shape} (2 neurons)")
result = layer_with_matrix(correct_inputs, correct_weights, correct_biases)
print(f"   Output: {result} ✓")

print("\n❌ WRONG:")
# Wrong way - transposed weights
wrong_weights = np.array([[0.5, 0.3, 0.1],  # 2 rows (should be 3!)
                         [0.2, 0.4, 0.6]])  # Neurons as rows instead of columns

print(f"   Inputs shape: {correct_inputs.shape} (3 inputs)")
print(f"   Weights shape: {wrong_weights.shape} (2×3 - WRONG!)")

try:
    layer_with_matrix(correct_inputs, wrong_weights, correct_biases)
except ValueError as e:
    print(f"   ERROR: {e}")
    print("   ❗ The shapes don't match for matrix multiplication!")

print("\n💡 Remember: Weights shape must be (num_inputs, num_neurons)")

print("\n" + "="*70)

# Common Mistake #2: Wrong number of biases
print("\n⚠️  COMMON MISTAKE #2: Wrong Number of Biases\n")

print("✅ CORRECT: One bias per neuron")
print(f"   Neurons: 2")
print(f"   Biases: {correct_biases} (length 2) ✓")

print("\n❌ WRONG: Biases don't match neurons")
wrong_biases = np.array([0.1, 0.2, 0.3])  # 3 biases for 2 neurons!
print(f"   Neurons: 2")
print(f"   Biases: {wrong_biases} (length 3 - WRONG!)")

try:
    layer_with_matrix(correct_inputs, correct_weights, wrong_biases)
except ValueError as e:
    print(f"   ERROR: {e}")
    print("   ❗ Number of biases must equal number of neurons!")

print("\n💡 Remember: You need exactly ONE bias per neuron")

print("\n" + "="*70)

# Common Mistake #3: Forgetting activation function
print("\n⚠️  COMMON MISTAKE #3: Forgetting Activation Function\n")

test_inputs = np.array([1.0, -2.0])
test_weights = np.array([[0.5], [-0.5]])
test_bias = np.array([0.0])

# Without activation (LINEAR)
linear_output = np.dot(test_inputs, test_weights) + test_bias
print("❌ WITHOUT activation (linear):")
print(f"   Input: {test_inputs}")
print(f"   Output: {linear_output[0]:.2f}")
print("   Problem: Can be any value (including negative)")

# With activation (NON-LINEAR)
nonlinear_output = relu(linear_output)
print("\n✅ WITH activation (ReLU):")
print(f"   Input: {test_inputs}")
print(f"   Output: {nonlinear_output[0]:.2f}")
print("   Benefit: Non-linearity allows learning complex patterns!")

print("\n💡 Remember: ALWAYS use an activation function!")
print("   Without it, your network is just linear algebra (boring and limited)")

## Part 7: Visualizing Layer Architecture 📐

Let's create a comprehensive visualization of how layers connect in a network:

In [None]:
# Create a visualization of a multi-layer architecture
def visualize_network_architecture(layer_sizes):
    """
    Visualize a neural network architecture.
    
    Parameters:
    - layer_sizes: list of integers representing neurons in each layer
                  e.g., [3, 4, 2] means 3 inputs, 4 hidden neurons, 2 outputs
    """
    fig, ax = plt.subplots(figsize=(14, 8))
    ax.set_xlim(-1, len(layer_sizes))
    ax.set_ylim(-1, max(layer_sizes) + 1)
    ax.axis('off')
    
    # Layer names
    layer_names = ['Input Layer'] + [f'Hidden Layer {i}' for i in range(1, len(layer_sizes)-1)] + ['Output Layer']
    
    ax.set_title('Neural Network Architecture\n' + ' → '.join([f'{size} neurons' for size in layer_sizes]), 
                fontsize=14, fontweight='bold', pad=20)
    
    # Draw each layer
    neuron_positions = []  # Store positions for drawing connections
    
    for layer_idx, num_neurons in enumerate(layer_sizes):
        layer_x = layer_idx  # x-position of this layer
        
        # Calculate y-positions to center the neurons vertically
        start_y = (max(layer_sizes) - num_neurons) / 2
        
        layer_neuron_positions = []
        
        # Draw neurons in this layer
        for neuron_idx in range(num_neurons):
            neuron_y = start_y + neuron_idx
            layer_neuron_positions.append((layer_x, neuron_y))
            
            # Choose color based on layer type
            if layer_idx == 0:
                color = 'lightblue'  # Input layer
            elif layer_idx == len(layer_sizes) - 1:
                color = 'lightcoral'  # Output layer
            else:
                color = 'lightgreen'  # Hidden layers
            
            # Draw neuron circle
            circle = plt.Circle((layer_x, neuron_y), 0.2, color=color, ec='black', linewidth=2, zorder=5)
            ax.add_patch(circle)
        
        neuron_positions.append(layer_neuron_positions)
        
        # Add layer label
        ax.text(layer_x, max(layer_sizes) + 0.5, layer_names[layer_idx], 
               ha='center', va='bottom', fontsize=11, fontweight='bold')
    
    # Draw connections between layers
    for layer_idx in range(len(layer_sizes) - 1):
        current_layer = neuron_positions[layer_idx]
        next_layer = neuron_positions[layer_idx + 1]
        
        # Connect each neuron in current layer to each neuron in next layer
        for x1, y1 in current_layer:
            for x2, y2 in next_layer:
                ax.plot([x1 + 0.2, x2 - 0.2], [y1, y2], 'gray', linewidth=0.5, alpha=0.3, zorder=1)
    
    # Add connection count annotations
    for layer_idx in range(len(layer_sizes) - 1):
        num_connections = layer_sizes[layer_idx] * layer_sizes[layer_idx + 1]
        mid_x = layer_idx + 0.5
        mid_y = max(layer_sizes) + 0.5
        
        ax.text(mid_x, -0.7, f'{num_connections} weights', 
               ha='center', va='top', fontsize=9, style='italic',
               bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.3))
    
    plt.tight_layout()
    plt.savefig('network_architecture.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    # Print statistics
    total_params = 0
    print("\n📊 NETWORK STATISTICS:")
    print("="*60)
    
    for i in range(len(layer_sizes) - 1):
        num_weights = layer_sizes[i] * layer_sizes[i+1]
        num_biases = layer_sizes[i+1]
        layer_params = num_weights + num_biases
        total_params += layer_params
        
        print(f"\nLayer {i} → Layer {i+1}:")
        print(f"  • Neurons: {layer_sizes[i]} → {layer_sizes[i+1]}")
        print(f"  • Weights: {num_weights} ({layer_sizes[i]} × {layer_sizes[i+1]})")
        print(f"  • Biases: {num_biases}")
        print(f"  • Total parameters: {layer_params}")
    
    print("\n" + "="*60)
    print(f"🎯 TOTAL PARAMETERS IN NETWORK: {total_params}")
    print("="*60)

# Example: Visualize a 3-layer network
print("🎨 Visualizing a 3-layer network: [3, 4, 2]")
visualize_network_architecture([3, 4, 2])

# Example: Visualize a deeper network
print("\n\n🎨 Visualizing a deeper network: [5, 8, 6, 3]")
visualize_network_architecture([5, 8, 6, 3])

## Summary and Key Takeaways 📚

### 🎉 What We Learned Today:

1. **Layers are Teams**: Multiple neurons working in parallel, each detecting different features

2. **Two Ways to Compute**:
   - Loop method: Easy to understand, but slow
   - Matrix multiplication: Fast and efficient!

3. **Weight Matrix Shape**: Always `(num_inputs, num_neurons)`

4. **Hidden Layers**: Act as feature detectors, transforming data into new representations

5. **Important Formula**:
   ```
   Output = Activation(Inputs @ Weights + Biases)
   ```

6. **Common Mistakes**:
   - Wrong weight dimensions
   - Wrong number of biases
   - Forgetting activation functions

### 🔮 What's Next?

In **Notebook 5: Forward Propagation**, we'll learn:
- How to chain multiple layers together
- How data flows through an entire network
- Building a complete neural network from scratch!
- Making predictions with our network

### 💪 Practice Challenge:

Before moving on, try modifying the code above to:
1. Create a layer with 5 inputs and 7 neurons
2. Test it with your own input data
3. Visualize the weight matrix
4. Count the total number of parameters

---

**Remember**: Each layer is like adding a team of specialists to your network. The more layers you have, the more complex patterns your network can learn! 🚀

Ready to move forward? Let's learn how to connect these layers together in the next notebook! 🎯