# Module 1 - Exercise 3: First Step with MLP

## Learning Objectives
- Understand the structure of nn.Linear layers (input and output dimensions)
- Learn how to use basic activation functions (ReLU, Sigmoid, Tanh)
- Build simple neural networks using nn.Sequential
- Calculate the number of parameters in a neural network
- Perform forward pass operations through the network

## Test Framework Setup

In [1]:
# Clone the test repository
!git clone https://github.com/racousin/data_science_practice.git /tmp/tests 2>/dev/null || true

# Import required modules
import sys
sys.path.append('/tmp/tests/tests/python_deep_learning')

# Import the improved test utilities
from test_utils import NotebookTestRunner, create_inline_test
from module1.test_exercise3 import Exercise3Validator, EXERCISE3_SECTIONS

# Create test runner and validator
test_runner = NotebookTestRunner("module1", 3)
validator = Exercise3Validator()

## Environment Setup

In [19]:
import torch
import torch.nn as nn
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cpu


## Section 1: Understanding nn.Linear

The `nn.Linear` layer is the fundamental building block of MLPs. It performs a linear transformation: `y = xW^T + b`
where x is the input, W is the weight matrix, and b is the bias vector.

In [21]:
# TODO: Create a linear layer that transforms input from 10 features to 5 features
linear_layer_1 = nn.Linear(10,5)

# Display layer information
if linear_layer_1 is not None:
    print(f"Linear layer: {linear_layer_1}")
    print(f"Weight shape: {linear_layer_1.weight.shape}")
    print(f"Bias shape: {linear_layer_1.bias.shape}")

Linear layer: Linear(in_features=10, out_features=5, bias=True)
Weight shape: torch.Size([5, 10])
Bias shape: torch.Size([5])


In [26]:
# TODO: Create a linear layer that transforms 5 features to 3 features
linear_layer_2 = nn.Linear(5,3)

# TODO: Calculate the total number of parameters in linear_layer_2
# Remember: parameters = (input_size * output_size) + bias_size
num_params_layer2 = torch.tensor(linear_layer_2.weight.shape).prod() + torch.tensor(linear_layer_2.bias.shape).prod()

if linear_layer_2 is not None and num_params_layer2 is not None:
    print(f"Linear layer 2: {linear_layer_2}")
    print(f"Calculated parameters: {num_params_layer2}")
    actual_params = sum(p.numel() for p in linear_layer_2.parameters())
    print(f"Actual parameters: {actual_params}")

Linear layer 2: Linear(in_features=5, out_features=3, bias=True)
Calculated parameters: 18
Actual parameters: 18


In [27]:
# Test Section 1: Understanding nn.Linear
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE3_SECTIONS["Section 1: Understanding nn.Linear"]]
test_runner.test_section("Section 1: Understanding nn.Linear", validator, section_tests, locals())


Testing: Section 1: Understanding nn.Linear
❌ Linear layer with 10 inputs and 5 outputs: Unexpected error: Exercise3Validator.test_linear_layer_1() missing 1 required positional argument: 'variables'
❌ Linear layer with 5 inputs and 3 outputs: Unexpected error: Exercise3Validator.test_linear_layer_2() missing 1 required positional argument: 'variables'
❌ Correct parameter count for linear_layer_2: Unexpected error: Exercise3Validator.test_num_params_layer2() missing 1 required positional argument: 'variables'

❌ Section 1: Understanding nn.Linear - Some tests failed. Review the errors above.


False

## Section 2: Activation Functions

Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns.
- **ReLU**: f(x) = max(0, x) - Most commonly used
- **Sigmoid**: f(x) = 1/(1+e^(-x)) - Outputs between 0 and 1
- **Tanh**: f(x) = (e^x - e^(-x))/(e^x + e^(-x)) - Outputs between -1 and 1

In [28]:
# TODO: Create instances of the three main activation functions
relu_activation = nn.ReLU()
sigmoid_activation = nn.Sigmoid()
tanh_activation = nn.Tanh()

# Test the activations with sample input
test_input = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])

if relu_activation is not None:
    print(f"Input: {test_input}")
    print(f"ReLU output: {relu_activation(test_input)}")
if sigmoid_activation is not None:
    print(f"Sigmoid output: {sigmoid_activation(test_input)}")
if tanh_activation is not None:
    print(f"Tanh output: {tanh_activation(test_input)}")

Input: tensor([-2., -1.,  0.,  1.,  2.])
ReLU output: tensor([0., 0., 0., 1., 2.])
Sigmoid output: tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])
Tanh output: tensor([-0.9640, -0.7616,  0.0000,  0.7616,  0.9640])


In [31]:
# TODO: Apply ReLU activation to the output of linear_layer_1
# First create some input data
input_data = torch.randn(2, 10)  # Batch size 2, 10 features

# TODO: Pass input_data through linear_layer_1
linear_output = linear_layer_1(input_data)

# TODO: Apply ReLU activation to linear_output
activated_output = relu_activation(linear_output)

if linear_output is not None and activated_output is not None:
    print(f"Input shape: {input_data.shape}")
    print(f"Linear output shape: {linear_output.shape}")
    print(f"Activated output shape: {activated_output.shape}")
    print(f"Number of negative values before ReLU: {(linear_output < 0).sum().item()}")
    print(f"Number of negative values after ReLU: {(activated_output < 0).sum().item()}")

Input shape: torch.Size([2, 10])
Linear output shape: torch.Size([2, 5])
Activated output shape: torch.Size([2, 5])
Number of negative values before ReLU: 2
Number of negative values after ReLU: 0


In [32]:
# Test Section 2: Activation Functions
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE3_SECTIONS["Section 2: Activation Functions"]]
test_runner.test_section("Section 2: Activation Functions", validator, section_tests, locals())


Testing: Section 2: Activation Functions
❌ ReLU activation function: Unexpected error: Exercise3Validator.test_relu_activation() missing 1 required positional argument: 'variables'
❌ Sigmoid activation function: Unexpected error: Exercise3Validator.test_sigmoid_activation() missing 1 required positional argument: 'variables'
❌ Tanh activation function: Unexpected error: Exercise3Validator.test_tanh_activation() missing 1 required positional argument: 'variables'
❌ Linear layer output computation: Unexpected error: Exercise3Validator.test_linear_output() missing 1 required positional argument: 'variables'
❌ ReLU activation applied correctly: Unexpected error: Exercise3Validator.test_activated_output() missing 1 required positional argument: 'variables'

❌ Section 2: Activation Functions - Some tests failed. Review the errors above.


False

## Section 3: Building Networks with nn.Sequential

`nn.Sequential` allows us to stack layers and create a neural network pipeline. The output of each layer becomes the input to the next.

In [34]:
# TODO: Create a simple 2-layer MLP using nn.Sequential
# Input: 8 features -> Hidden: 4 neurons with ReLU -> Output: 2 neurons
simple_mlp = nn.Sequential(
    nn.Linear(10, 5),
    nn.ReLU()
)

if simple_mlp is not None:
    print("Simple MLP architecture:")
    print(simple_mlp)

    # Count parameters
    total_params = sum(p.numel() for p in simple_mlp.parameters())
    print(f"\nTotal parameters: {total_params}")

Simple MLP architecture:
Sequential(
  (0): Linear(in_features=10, out_features=5, bias=True)
  (1): ReLU()
)

Total parameters: 55


In [10]:
# TODO: Create a deeper MLP with 3 hidden layers
# Input: 10 -> Hidden1: 8 (ReLU) -> Hidden2: 6 (ReLU) -> Hidden3: 4 (ReLU) -> Output: 2
deep_mlp = None
# TODO: Calculate the total number of parameters in deep_mlp
# Parameters per layer: (input_size * output_size) + output_size
# Layer 1: (10 * 8) + 8 = 88
# Layer 2: (8 * 6) + 6 = 54
# Layer 3: (6 * 4) + 4 = 28
# Layer 4: (4 * 2) + 2 = 10
deep_mlp_params = None  # Calculate the sum

if deep_mlp is not None and deep_mlp_params is not None:
    print("Deep MLP architecture:")
    print(deep_mlp)
    print(f"\nCalculated parameters: {deep_mlp_params}")
    actual_params = sum(p.numel() for p in deep_mlp.parameters())
    print(f"Actual parameters: {actual_params}")

In [11]:
# Test Section 3: Building Networks with nn.Sequential
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE3_SECTIONS["Section 3: Building Networks with nn.Sequential"]]
test_runner.test_section("Section 3: Building Networks with nn.Sequential", validator, section_tests, locals())


Testing: Section 3: Building Networks with nn.Sequential
❌ Simple 2-layer MLP structure: Unexpected error: Exercise3Validator.test_simple_mlp() missing 1 required positional argument: 'variables'
❌ Deep 4-layer MLP structure: Unexpected error: Exercise3Validator.test_deep_mlp() missing 1 required positional argument: 'variables'
❌ Correct parameter count for deep_mlp: Unexpected error: Exercise3Validator.test_deep_mlp_params() missing 1 required positional argument: 'variables'

❌ Section 3: Building Networks with nn.Sequential - Some tests failed. Review the errors above.


False

## Section 4: Forward Pass

The forward pass is the process of passing input data through the network to get predictions. Each layer transforms the data sequentially.

In [12]:
# TODO: Perform a forward pass through simple_mlp
# Create input data with batch size 3 and 8 features
forward_input = torch.randn(3, 8)

# TODO: Pass the input through simple_mlp
simple_output = None

if simple_output is not None:
    print(f"Input shape: {forward_input.shape}")
    print(f"Output shape: {simple_output.shape}")
    print(f"Output values:\n{simple_output}")

In [13]:
# TODO: Create a network with mixed activation functions
# Input: 6 -> Hidden1: 4 (ReLU) -> Hidden2: 3 (Tanh) -> Output: 1 (Sigmoid)
mixed_activation_mlp = None

# TODO: Perform forward pass with batch size 5
mixed_input = torch.randn(5, 6)
mixed_output = None  # Pass mixed_input through mixed_activation_mlp

if mixed_activation_mlp is not None and mixed_output is not None:
    print("Mixed activation MLP:")
    print(mixed_activation_mlp)
    print(f"\nInput shape: {mixed_input.shape}")
    print(f"Output shape: {mixed_output.shape}")
    print(f"Output range: [{mixed_output.min().item():.4f}, {mixed_output.max().item():.4f}]")
    print("(Note: Sigmoid ensures output is between 0 and 1)")

In [14]:
# Test Section 4: Forward Pass
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE3_SECTIONS["Section 4: Forward Pass"]]
test_runner.test_section("Section 4: Forward Pass", validator, section_tests, locals())


Testing: Section 4: Forward Pass
❌ Forward pass through simple_mlp: Unexpected error: Exercise3Validator.test_simple_output() missing 1 required positional argument: 'variables'
❌ MLP with mixed activations: Unexpected error: Exercise3Validator.test_mixed_activation_mlp() missing 1 required positional argument: 'variables'
❌ Forward pass through mixed_activation_mlp: Unexpected error: Exercise3Validator.test_mixed_output() missing 1 required positional argument: 'variables'

❌ Section 4: Forward Pass - Some tests failed. Review the errors above.


False

## Section 5: Understanding Parameter Counting

Understanding how many parameters your network has is crucial for model complexity and memory requirements.

In [15]:
# TODO: Create a function to count parameters in any model
def count_parameters(model):
    """
    Count the total number of trainable parameters in a model.

    Args:
        model: A PyTorch nn.Module

    Returns:
        Total number of parameters
    """
    # TODO: Complete this function
    return None

# Test your function
if count_parameters is not None and simple_mlp is not None:
    param_count = count_parameters(simple_mlp)
    if param_count is not None:
        print(f"Simple MLP parameters: {param_count}")

In [16]:
# TODO: Create a large MLP and calculate its parameters
# Input: 100 -> Hidden1: 64 -> Hidden2: 32 -> Hidden3: 16 -> Output: 10
# Use ReLU activation between layers (except after output)
large_mlp = None

# TODO: Calculate expected number of parameters manually
# Layer 1: (100 * 64) + 64 = ?
# Layer 2: (64 * 32) + 32 = ?
# Layer 3: (32 * 16) + 16 = ?
# Layer 4: (16 * 10) + 10 = ?
expected_params = None  # Sum all layer parameters

if large_mlp is not None and expected_params is not None and count_parameters is not None:
    actual_params = count_parameters(large_mlp)
    if actual_params is not None:
        print(f"Expected parameters: {expected_params}")
        print(f"Actual parameters: {actual_params}")
        print(f"Match: {expected_params == actual_params}")

In [17]:
# Test Section 5: Understanding Parameter Counting
section_tests = [(getattr(validator, name), desc) for name, desc in EXERCISE3_SECTIONS["Section 5: Understanding Parameter Counting"]]
test_runner.test_section("Section 5: Understanding Parameter Counting", validator, section_tests, locals())


Testing: Section 5: Understanding Parameter Counting
❌ Parameter counting function: Unexpected error: Exercise3Validator.test_count_parameters_function() missing 1 required positional argument: 'variables'
❌ Large MLP structure: Unexpected error: Exercise3Validator.test_large_mlp() missing 1 required positional argument: 'variables'
❌ Correct manual parameter calculation: Unexpected error: Exercise3Validator.test_expected_params() missing 1 required positional argument: 'variables'

❌ Section 5: Understanding Parameter Counting - Some tests failed. Review the errors above.


False

## Final Summary

In [18]:
# Display final summary of all tests
test_runner.final_summary()


FINAL TEST SUMMARY
❌ Section 1: Understanding nn.Linear: 0/3 tests passed
❌ Section 2: Activation Functions: 0/5 tests passed
❌ Section 3: Building Networks with nn.Sequential: 0/3 tests passed
❌ Section 4: Forward Pass: 0/3 tests passed
❌ Section 5: Understanding Parameter Counting: 0/3 tests passed
❌ Some tests are still failing. Please review and complete the TODOs.


False