# Module 00: Introduction to Neural Networks and Deep Learning

**Difficulty**: ⭐⭐ (Intermediate)
**Estimated Time**: 30-40 minutes
**Prerequisites**: 
- Basic Python programming
- Understanding of linear algebra (vectors, matrices)
- Basic calculus (derivatives)
- Familiarity with NumPy and Matplotlib

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Explain** the biological inspiration behind artificial neural networks
2. **Describe** the historical evolution of deep learning from perceptrons to modern architectures
3. **Identify** key applications of deep learning across different domains
4. **Understand** the mathematical notation and basic concepts used in neural networks
5. **Visualize** simple neural network structures and their components

## 1. Setup and Imports

Let's start by importing the necessary libraries for this notebook.

In [None]:
# Standard scientific computing libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seeds for reproducibility
np.random.seed(42)

# Configure matplotlib for better-looking plots
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Display settings
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

print("Setup complete! NumPy version:", np.__version__)

## 2. What Are Neural Networks?

### 2.1 Biological Inspiration

Neural networks are computing systems inspired by the biological neural networks in animal brains. Let's understand the key components:

**Biological Neuron Components:**
- **Dendrites**: Receive signals from other neurons
- **Cell Body (Soma)**: Processes incoming signals
- **Axon**: Transmits output signals to other neurons
- **Synapses**: Connections between neurons that can strengthen or weaken

**Artificial Neuron (Perceptron) Components:**
- **Inputs**: Similar to dendrites, receive data ($x_1, x_2, ..., x_n$)
- **Weights**: Similar to synaptic strengths ($w_1, w_2, ..., w_n$)
- **Summation**: Weighted sum of inputs ($\sum w_i x_i$)
- **Activation Function**: Determines if neuron "fires" (produces output)
- **Output**: Signal sent to next layer ($y$)

The mathematical model of a single neuron:

$$y = f\left(\sum_{i=1}^{n} w_i x_i + b\right)$$

Where:
- $x_i$ are input features
- $w_i$ are weights (learnable parameters)
- $b$ is bias (learnable parameter)
- $f$ is the activation function
- $y$ is the output

In [None]:
def visualize_neuron():
    """
    Visualize a simple artificial neuron structure.
    This shows the flow from inputs through weights to output.
    """
    fig, ax = plt.subplots(1, 1, figsize=(12, 6))
    
    # Define positions for components
    input_x = 0.1
    neuron_x = 0.5
    output_x = 0.9
    
    # Draw inputs
    num_inputs = 3
    input_y = np.linspace(0.2, 0.8, num_inputs)
    
    for i, y in enumerate(input_y):
        # Input nodes
        circle = plt.Circle((input_x, y), 0.04, color='lightblue', ec='black', zorder=3)
        ax.add_patch(circle)
        ax.text(input_x - 0.08, y, f'$x_{i+1}$', fontsize=14, ha='center', va='center')
        
        # Connections to neuron
        ax.plot([input_x + 0.04, neuron_x - 0.08], [y, 0.5], 'k-', alpha=0.5, linewidth=2)
        ax.text((input_x + neuron_x) / 2, (y + 0.5) / 2 + 0.05, f'$w_{i+1}$', 
                fontsize=11, ha='center', style='italic')
    
    # Draw neuron (cell body)
    neuron = plt.Circle((neuron_x, 0.5), 0.08, color='coral', ec='black', linewidth=2, zorder=3)
    ax.add_patch(neuron)
    ax.text(neuron_x, 0.5, '$\\Sigma$', fontsize=20, ha='center', va='center', weight='bold')
    ax.text(neuron_x, 0.35, '$f(\\cdot)$', fontsize=12, ha='center', va='center')
    
    # Draw bias
    bias_circle = plt.Circle((neuron_x, 0.15), 0.03, color='lightyellow', ec='black', zorder=3)
    ax.add_patch(bias_circle)
    ax.text(neuron_x, 0.05, '$b$', fontsize=12, ha='center', va='center')
    ax.plot([neuron_x, neuron_x], [0.18, 0.42], 'k-', alpha=0.5, linewidth=2)
    
    # Draw output
    ax.plot([neuron_x + 0.08, output_x - 0.04], [0.5, 0.5], 'k-', linewidth=3)
    output_circle = plt.Circle((output_x, 0.5), 0.04, color='lightgreen', ec='black', zorder=3)
    ax.add_patch(output_circle)
    ax.text(output_x + 0.08, 0.5, '$y$', fontsize=14, ha='center', va='center')
    
    # Labels
    ax.text(input_x, 0.95, 'Inputs', fontsize=13, ha='center', weight='bold')
    ax.text(neuron_x, 0.95, 'Neuron', fontsize=13, ha='center', weight='bold')
    ax.text(output_x, 0.95, 'Output', fontsize=13, ha='center', weight='bold')
    
    ax.set_xlim(-0.1, 1.1)
    ax.set_ylim(0, 1)
    ax.axis('off')
    ax.set_title('Structure of an Artificial Neuron', fontsize=15, weight='bold', pad=20)
    
    plt.tight_layout()
    plt.show()

# Visualize the neuron structure
visualize_neuron()

## 3. History of Deep Learning

Deep learning has evolved significantly over the past 70+ years. Here's a timeline of major milestones:

### Historical Timeline

**1943 - McCulloch-Pitts Neuron**
- First mathematical model of a biological neuron
- Binary threshold unit (outputs 0 or 1)

**1958 - The Perceptron (Frank Rosenblatt)**
- First learning algorithm for neural networks
- Could learn simple linear decision boundaries
- Limitation: Cannot solve XOR problem (not linearly separable)

**1969 - AI Winter Begins**
- Minsky & Papert showed limitations of single-layer perceptrons
- Research funding decreased dramatically

**1986 - Backpropagation Algorithm**
- Rumelhart, Hinton, and Williams popularized backpropagation
- Enabled training of multi-layer networks
- Solved the XOR problem and more complex patterns

**1998 - LeNet (Convolutional Neural Networks)**
- Yann LeCun developed LeNet-5 for handwritten digit recognition
- Used by banks to read checks
- Foundation for modern CNNs

**2006 - Deep Learning Renaissance**
- Geoffrey Hinton introduced "Deep Belief Networks"
- Showed deep networks could be trained effectively
- Term "Deep Learning" gained popularity

**2012 - AlexNet (ImageNet Revolution)**
- Alex Krizhevsky's CNN won ImageNet competition by large margin
- Used GPUs for training (breakthrough in computational efficiency)
- Proved deep learning's superiority in computer vision

**2014-2017 - Transformers and Attention**
- Attention mechanisms revolutionize NLP
- Transformer architecture ("Attention is All You Need")
- Foundation for BERT, GPT, and modern language models

**2020s - Foundation Models Era**
- GPT-3, GPT-4: Large language models with emergent abilities
- DALL-E, Stable Diffusion: Text-to-image generation
- AlphaFold: Protein structure prediction
- Multimodal models combining vision, language, and more

In [None]:
def plot_dl_history_timeline():
    """
    Visualize the timeline of deep learning milestones.
    Shows the progression from early neural networks to modern architectures.
    """
    # Define milestones
    milestones = [
        (1958, "Perceptron", 1),
        (1986, "Backpropagation", 2),
        (1998, "LeNet (CNN)", 3),
        (2006, "Deep Learning\nRevival", 4),
        (2012, "AlexNet", 5),
        (2017, "Transformers", 6),
        (2020, "GPT-3", 7),
        (2023, "GPT-4\nMultimodal", 8)
    ]
    
    fig, ax = plt.subplots(figsize=(14, 6))
    
    # Draw timeline
    years = [m[0] for m in milestones]
    ax.plot(years, [0] * len(years), 'k-', linewidth=2, zorder=1)
    
    # Add milestones
    colors = plt.cm.viridis(np.linspace(0.2, 0.9, len(milestones)))
    
    for i, (year, label, importance) in enumerate(milestones):
        # Alternate positions for readability
        y_pos = 0.3 if i % 2 == 0 else -0.3
        
        # Draw milestone point
        ax.scatter(year, 0, s=200 + importance * 50, c=[colors[i]], 
                  edgecolors='black', linewidth=2, zorder=3)
        
        # Draw connecting line and label
        ax.plot([year, year], [0, y_pos], 'k--', alpha=0.3, linewidth=1)
        ax.text(year, y_pos + (0.1 if y_pos > 0 else -0.1), label, 
               ha='center', va='bottom' if y_pos > 0 else 'top',
               fontsize=10, weight='bold')
        
        # Add year labels
        ax.text(year, -0.05, str(year), ha='center', va='top', fontsize=9)
    
    ax.set_ylim(-0.6, 0.6)
    ax.set_xlim(1950, 2030)
    ax.axis('off')
    ax.set_title('Deep Learning Historical Milestones', fontsize=16, weight='bold', pad=20)
    
    plt.tight_layout()
    plt.show()

# Plot the timeline
plot_dl_history_timeline()

## 4. Applications of Deep Learning

Deep learning has revolutionized numerous fields. Here are the major application domains:

### 4.1 Computer Vision
- **Image Classification**: Identifying objects in images (cats vs dogs, disease detection)
- **Object Detection**: Locating and classifying multiple objects (self-driving cars, surveillance)
- **Semantic Segmentation**: Pixel-level classification (medical imaging, satellite imagery)
- **Face Recognition**: Identity verification, photo organization
- **Image Generation**: Creating realistic images (DALL-E, Stable Diffusion, Midjourney)

### 4.2 Natural Language Processing (NLP)
- **Machine Translation**: Google Translate, DeepL
- **Text Generation**: GPT models, chatbots, content creation
- **Sentiment Analysis**: Understanding opinions in reviews, social media
- **Question Answering**: Search engines, virtual assistants
- **Text Summarization**: Automatic document summarization

### 4.3 Speech and Audio
- **Speech Recognition**: Siri, Alexa, Google Assistant
- **Text-to-Speech**: Natural-sounding voice synthesis
- **Music Generation**: AI composers (Amper, AIVA)
- **Audio Enhancement**: Noise reduction, voice separation

### 4.4 Healthcare and Medicine
- **Medical Image Analysis**: Tumor detection, X-ray interpretation
- **Drug Discovery**: Predicting molecular properties
- **Protein Folding**: AlphaFold solving 50-year-old problem
- **Disease Prediction**: Early detection from symptoms and biomarkers

### 4.5 Autonomous Systems
- **Self-Driving Cars**: Tesla, Waymo perception systems
- **Robotics**: Robot manipulation and navigation
- **Drones**: Autonomous flight and obstacle avoidance

### 4.6 Gaming and Entertainment
- **Game AI**: AlphaGo, AlphaStar defeating world champions
- **Recommendation Systems**: Netflix, YouTube, Spotify
- **Content Moderation**: Detecting inappropriate content

### 4.7 Finance and Business
- **Fraud Detection**: Identifying suspicious transactions
- **Algorithmic Trading**: Predicting market movements
- **Credit Scoring**: Assessing loan risk
- **Customer Service**: Chatbots and virtual assistants

In [None]:
def visualize_applications():
    """
    Create a bar chart showing the impact/adoption of deep learning across domains.
    Note: These are illustrative values representing relative maturity and adoption.
    """
    domains = [
        'Computer\nVision',
        'Natural\nLanguage',
        'Speech\n& Audio',
        'Healthcare',
        'Autonomous\nSystems',
        'Gaming',
        'Finance'
    ]
    
    # Adoption/maturity scores (0-100 scale)
    adoption = [95, 90, 85, 70, 65, 88, 75]
    
    fig, ax = plt.subplots(figsize=(12, 6))
    
    colors = plt.cm.viridis(np.linspace(0.2, 0.9, len(domains)))
    bars = ax.barh(domains, adoption, color=colors, edgecolor='black', linewidth=1.5)
    
    # Add value labels on bars
    for i, (bar, value) in enumerate(zip(bars, adoption)):
        ax.text(value - 5, i, f'{value}%', va='center', ha='right', 
               fontsize=11, weight='bold', color='white')
    
    ax.set_xlabel('Maturity & Adoption Level (%)', fontsize=12, weight='bold')
    ax.set_title('Deep Learning Impact Across Different Domains', 
                fontsize=14, weight='bold', pad=20)
    ax.set_xlim(0, 100)
    ax.grid(axis='x', alpha=0.3, linestyle='--')
    
    plt.tight_layout()
    plt.show()

# Visualize applications
visualize_applications()

## 5. Mathematical Notation and Prerequisites

Before diving deeper, let's review the mathematical notation commonly used in deep learning.

### 5.1 Vectors and Matrices

**Scalar** (single number):
$$x \in \mathbb{R}$$

**Vector** (1D array):
$$\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \in \mathbb{R}^n$$

**Matrix** (2D array):
$$\mathbf{W} = \begin{bmatrix} w_{11} & w_{12} & \cdots & w_{1n} \\ w_{21} & w_{22} & \cdots & w_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ w_{m1} & w_{m2} & \cdots & w_{mn} \end{bmatrix} \in \mathbb{R}^{m \times n}$$

**Tensor** (multi-dimensional array):
$$\mathcal{T} \in \mathbb{R}^{d_1 \times d_2 \times \cdots \times d_n}$$

### 5.2 Common Operations

**Dot Product** (inner product of vectors):
$$\mathbf{x} \cdot \mathbf{w} = \sum_{i=1}^{n} x_i w_i$$

**Matrix Multiplication**:
$$\mathbf{Y} = \mathbf{X} \mathbf{W}$$
where if $\mathbf{X} \in \mathbb{R}^{m \times n}$ and $\mathbf{W} \in \mathbb{R}^{n \times p}$, then $\mathbf{Y} \in \mathbb{R}^{m \times p}$

**Element-wise Operations** (Hadamard product):
$$\mathbf{C} = \mathbf{A} \odot \mathbf{B}$$
where $c_{ij} = a_{ij} \times b_{ij}$

### 5.3 Derivatives and Gradients

**Derivative** (rate of change):
$$\frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

**Partial Derivative** (derivative with respect to one variable):
$$\frac{\partial f}{\partial x_i}$$

**Gradient** (vector of all partial derivatives):
$$\nabla f(\mathbf{x}) = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}$$

**Chain Rule** (fundamental for backpropagation):
$$\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx}$$

In [None]:
# Demonstrate basic mathematical operations used in neural networks

print("=" * 60)
print("MATHEMATICAL OPERATIONS IN NEURAL NETWORKS")
print("=" * 60)

# 1. Vector operations
print("\n1. VECTOR OPERATIONS")
print("-" * 60)
x = np.array([1.0, 2.0, 3.0])  # Input vector
w = np.array([0.5, -1.0, 0.8])  # Weight vector
b = 0.3  # Bias

print(f"Input vector x: {x}")
print(f"Weight vector w: {w}")
print(f"Bias b: {b}")

# Dot product (weighted sum)
dot_product = np.dot(x, w)
print(f"\nDot product (x · w): {dot_product:.4f}")
print(f"With bias (x · w + b): {dot_product + b:.4f}")

# 2. Matrix multiplication (multiple neurons)
print("\n2. MATRIX MULTIPLICATION (Layer with 3 inputs, 2 neurons)")
print("-" * 60)
X = np.array([[1.0, 2.0, 3.0],      # Sample 1
              [4.0, 5.0, 6.0]])     # Sample 2
W = np.array([[0.5, -1.0],          # Weights for neuron 1 and 2
              [-0.2, 0.8],          # from input 1
              [0.3, -0.5]])         # from input 2, 3
B = np.array([0.1, -0.2])           # Biases for 2 neurons

print(f"Input matrix X (2 samples, 3 features):\n{X}")
print(f"\nWeight matrix W (3 inputs, 2 neurons):\n{W}")
print(f"\nBias vector B: {B}")

# Compute layer output
output = np.dot(X, W) + B
print(f"\nLayer output (X @ W + B):\n{output}")
print(f"Shape: {output.shape} (2 samples, 2 neurons)")

# 3. Element-wise operations
print("\n3. ELEMENT-WISE OPERATIONS")
print("-" * 60)
a = np.array([1, 2, 3, 4])
b = np.array([2, 2, 2, 2])

print(f"Array a: {a}")
print(f"Array b: {b}")
print(f"Element-wise multiplication (a * b): {a * b}")
print(f"Element-wise power (a ** 2): {a ** 2}")

print("\n" + "=" * 60)

## 6. Simple Neural Network Architecture

A typical neural network consists of multiple layers:

1. **Input Layer**: Receives the raw data (features)
2. **Hidden Layers**: Process the information (can have multiple layers)
3. **Output Layer**: Produces the final prediction

### Network Terminology

- **Depth**: Number of layers (deep = many layers)
- **Width**: Number of neurons per layer
- **Parameters**: Weights and biases (learnable)
- **Hyperparameters**: Learning rate, number of layers, neurons per layer (set by user)
- **Forward Propagation**: Data flows from input to output
- **Backward Propagation**: Gradients flow from output to input (for learning)

### Example Architecture

A simple 3-layer network:
- Input: 4 features
- Hidden: 8 neurons
- Output: 3 classes

Total parameters:
- Layer 1: $(4 \times 8) + 8 = 40$ (weights + biases)
- Layer 2: $(8 \times 3) + 3 = 27$ (weights + biases)
- **Total: 67 parameters to learn**

In [None]:
def visualize_network_architecture(layer_sizes=[4, 8, 6, 3]):
    """
    Visualize a multi-layer neural network architecture.
    
    Parameters:
    -----------
    layer_sizes : list
        Number of neurons in each layer (input, hidden1, hidden2, ..., output)
    """
    fig, ax = plt.subplots(figsize=(14, 8))
    
    num_layers = len(layer_sizes)
    layer_spacing = 0.8 / (num_layers - 1) if num_layers > 1 else 0.5
    
    # Calculate total parameters
    total_params = 0
    for i in range(len(layer_sizes) - 1):
        total_params += (layer_sizes[i] * layer_sizes[i+1]) + layer_sizes[i+1]
    
    # Draw layers
    for layer_idx, size in enumerate(layer_sizes):
        x = 0.1 + layer_idx * layer_spacing
        neuron_spacing = 0.8 / (size + 1)
        
        # Determine layer color
        if layer_idx == 0:
            color = 'lightblue'
            label = 'Input'
        elif layer_idx == num_layers - 1:
            color = 'lightgreen'
            label = 'Output'
        else:
            color = 'coral'
            label = f'Hidden {layer_idx}'
        
        # Draw neurons in this layer
        for neuron_idx in range(size):
            y = 0.1 + (neuron_idx + 1) * neuron_spacing
            
            # Draw neuron
            circle = plt.Circle((x, y), 0.025, color=color, ec='black', linewidth=1.5, zorder=3)
            ax.add_patch(circle)
            
            # Draw connections to next layer
            if layer_idx < num_layers - 1:
                next_x = x + layer_spacing
                next_size = layer_sizes[layer_idx + 1]
                next_neuron_spacing = 0.8 / (next_size + 1)
                
                for next_neuron_idx in range(next_size):
                    next_y = 0.1 + (next_neuron_idx + 1) * next_neuron_spacing
                    # Only draw a subset of connections for clarity
                    if neuron_idx == 0 or neuron_idx == size - 1 or next_neuron_idx % 2 == 0:
                        ax.plot([x + 0.025, next_x - 0.025], [y, next_y], 
                               'gray', alpha=0.2, linewidth=0.5, zorder=1)
        
        # Add layer label
        ax.text(x, 0.05, f"{label}\n{size} neurons", ha='center', va='top', 
               fontsize=10, weight='bold')
    
    # Add parameter count
    param_text = f"Total Trainable Parameters: {total_params:,}"
    ax.text(0.5, 0.95, param_text, ha='center', va='bottom', 
           fontsize=12, weight='bold', 
           bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
    
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.axis('off')
    ax.set_title(f'Neural Network Architecture: {layer_sizes}', 
                fontsize=14, weight='bold', pad=20)
    
    plt.tight_layout()
    plt.show()
    
    # Print parameter breakdown
    print("\nParameter Breakdown:")
    print("=" * 50)
    total = 0
    for i in range(len(layer_sizes) - 1):
        weights = layer_sizes[i] * layer_sizes[i+1]
        biases = layer_sizes[i+1]
        layer_total = weights + biases
        total += layer_total
        print(f"Layer {i+1} ({layer_sizes[i]} → {layer_sizes[i+1]}):")
        print(f"  Weights: {weights:,} | Biases: {biases} | Total: {layer_total:,}")
    print("=" * 50)
    print(f"Total Parameters: {total:,}")

# Visualize a sample architecture
visualize_network_architecture([4, 8, 6, 3])

## 7. Exercises

Now it's your turn to practice! Complete the following exercises to reinforce your understanding.

### Exercise 1: Calculate Neuron Output

Given a neuron with:
- Inputs: $x = [2.0, 3.0, -1.0]$
- Weights: $w = [0.5, -0.3, 0.8]$
- Bias: $b = 0.2$
- Activation function: $f(z) = \max(0, z)$ (ReLU)

Calculate the output of this neuron manually, then verify with NumPy.

**Steps:**
1. Calculate weighted sum: $z = \sum w_i x_i + b$
2. Apply activation: $y = f(z)$

In [None]:
# Exercise 1: Your solution here

# Given values
x = np.array([2.0, 3.0, -1.0])
w = np.array([0.5, -0.3, 0.8])
b = 0.2

# TODO: Calculate weighted sum (z)
z = None  # Replace with your calculation

# TODO: Apply ReLU activation
# ReLU(z) = max(0, z)
y = None  # Replace with your calculation

# Print results
print(f"Weighted sum (z): {z}")
print(f"Output after ReLU (y): {y}")

In [None]:
# Solution to Exercise 1

x = np.array([2.0, 3.0, -1.0])
w = np.array([0.5, -0.3, 0.8])
b = 0.2

# Calculate weighted sum
z = np.dot(x, w) + b
print(f"Step 1 - Weighted sum calculation:")
print(f"  z = (2.0 × 0.5) + (3.0 × -0.3) + (-1.0 × 0.8) + 0.2")
print(f"  z = 1.0 + (-0.9) + (-0.8) + 0.2")
print(f"  z = {z:.4f}")

# Apply ReLU activation
y = np.maximum(0, z)
print(f"\nStep 2 - Apply ReLU activation:")
print(f"  y = max(0, {z:.4f})")
print(f"  y = {y:.4f}")

if z < 0:
    print(f"\nSince z is negative, ReLU outputs 0 (neuron is not activated)")
else:
    print(f"\nSince z is positive, ReLU outputs z (neuron is activated)")

### Exercise 2: Count Network Parameters

For a neural network with the following architecture:
- Input layer: 10 features
- Hidden layer 1: 20 neurons
- Hidden layer 2: 15 neurons
- Output layer: 5 classes

Calculate:
1. Total number of weights
2. Total number of biases
3. Total number of trainable parameters

**Hint**: For each layer connection, parameters = (input_size × output_size) + output_size

In [None]:
# Exercise 2: Your solution here

# Network architecture
layer_sizes = [10, 20, 15, 5]

# TODO: Calculate parameters for each layer
# Layer 1: Input (10) -> Hidden 1 (20)
layer1_weights = None  # Replace with calculation
layer1_biases = None   # Replace with calculation
layer1_total = None    # Replace with calculation

# Layer 2: Hidden 1 (20) -> Hidden 2 (15)
layer2_weights = None
layer2_biases = None
layer2_total = None

# Layer 3: Hidden 2 (15) -> Output (5)
layer3_weights = None
layer3_biases = None
layer3_total = None

# TODO: Calculate totals
total_weights = None
total_biases = None
total_parameters = None

print("Parameter Counts:")
print(f"Total weights: {total_weights}")
print(f"Total biases: {total_biases}")
print(f"Total parameters: {total_parameters}")

In [None]:
# Solution to Exercise 2

layer_sizes = [10, 20, 15, 5]

print("Calculating Parameters for Each Layer:")
print("=" * 60)

# Layer 1: Input (10) -> Hidden 1 (20)
layer1_weights = layer_sizes[0] * layer_sizes[1]
layer1_biases = layer_sizes[1]
layer1_total = layer1_weights + layer1_biases
print(f"Layer 1 (10 → 20):")
print(f"  Weights: 10 × 20 = {layer1_weights}")
print(f"  Biases: {layer1_biases}")
print(f"  Total: {layer1_total}")

# Layer 2: Hidden 1 (20) -> Hidden 2 (15)
layer2_weights = layer_sizes[1] * layer_sizes[2]
layer2_biases = layer_sizes[2]
layer2_total = layer2_weights + layer2_biases
print(f"\nLayer 2 (20 → 15):")
print(f"  Weights: 20 × 15 = {layer2_weights}")
print(f"  Biases: {layer2_biases}")
print(f"  Total: {layer2_total}")

# Layer 3: Hidden 2 (15) -> Output (5)
layer3_weights = layer_sizes[2] * layer_sizes[3]
layer3_biases = layer_sizes[3]
layer3_total = layer3_weights + layer3_biases
print(f"\nLayer 3 (15 → 5):")
print(f"  Weights: 15 × 5 = {layer3_weights}")
print(f"  Biases: {layer3_biases}")
print(f"  Total: {layer3_total}")

# Calculate totals
total_weights = layer1_weights + layer2_weights + layer3_weights
total_biases = layer1_biases + layer2_biases + layer3_biases
total_parameters = total_weights + total_biases

print("\n" + "=" * 60)
print(f"Total Weights: {total_weights:,}")
print(f"Total Biases: {total_biases}")
print(f"Total Trainable Parameters: {total_parameters:,}")

# Verify using the visualization function
print("\nVerification using our visualization function:")
visualize_network_architecture(layer_sizes)

### Exercise 3: Deep Learning Application Research

Research and describe ONE real-world deep learning application that interests you. Include:

1. **Application name and domain**
2. **Problem it solves**
3. **Type of neural network used** (if known)
4. **Impact or results achieved**
5. **Why this application interests you**

Write your answer in the markdown cell below.

**Your Answer to Exercise 3:**

*(Double-click to edit this cell and write your response)*

1. Application name and domain:

2. Problem it solves:

3. Type of neural network used:

4. Impact or results:

5. Why it interests me:


## 8. Summary

Congratulations! You've completed the introduction to neural networks and deep learning. Let's recap what we covered:

### Key Concepts

1. **Biological Inspiration**
   - Neural networks mimic brain neurons
   - Artificial neurons use weights, biases, and activation functions
   - Mathematical model: $y = f(\sum w_i x_i + b)$

2. **Historical Evolution**
   - 1958: Perceptron (single-layer learning)
   - 1986: Backpropagation (multi-layer training)
   - 2012: AlexNet (deep learning revolution)
   - 2020s: Foundation models (GPT, DALL-E, etc.)

3. **Applications**
   - Computer Vision: Image recognition, object detection, generation
   - NLP: Translation, chatbots, text generation
   - Healthcare: Disease detection, drug discovery
   - Many other domains transforming industries

4. **Mathematical Foundations**
   - Vectors and matrices represent data and parameters
   - Dot products compute weighted sums
   - Gradients enable learning through optimization
   - Chain rule powers backpropagation

5. **Network Architecture**
   - Input layer → Hidden layers → Output layer
   - Parameters = weights + biases (learnable)
   - Depth (layers) and width (neurons) define capacity

### What's Next?

In the upcoming notebooks, we'll dive deeper into:

- **Module 01**: Perceptrons and activation functions in detail
- **Module 02**: Backpropagation and gradient descent algorithms
- **Module 03**: Building neural networks from scratch with NumPy
- **Module 04**: Introduction to TensorFlow and Keras
- **Module 05+**: Advanced architectures (CNNs, RNNs, Transformers)

### Additional Resources

**Books:**
- "Deep Learning" by Goodfellow, Bengio, and Courville (free online)
- "Neural Networks and Deep Learning" by Michael Nielsen (free online)
- "Hands-On Machine Learning" by Aurélien Géron

**Online Courses:**
- Andrew Ng's Deep Learning Specialization (Coursera)
- Fast.ai Practical Deep Learning
- MIT 6.S191: Introduction to Deep Learning

**Interactive Resources:**
- TensorFlow Playground (visual neural network experimentation)
- Distill.pub (beautiful visual explanations)
- Papers with Code (latest research with implementations)

---

**Ready to continue?** Proceed to **Module 01: Perceptrons and Activation Functions** to start building your first neural network components!