# Day 16: Building Neural Networks with PyTorch - From Theory to Practice

**Welcome to Day 16 of your ML journey!** Today we transition from understanding neural network theory to building them in practice using **PyTorch**, one of the most popular and powerful deep learning frameworks. You'll learn to create, train, and evaluate neural networks for real-world problems.

---

**Goal:** Master PyTorch fundamentals and build your first neural network from scratch, understanding the complete workflow from data preparation to model deployment.

**Topics Covered:**
- PyTorch fundamentals: tensors, autograd, and neural network modules
- Building neural networks with nn.Module and nn.Sequential
- Training loops: forward pass, loss calculation, backpropagation
- Data handling with DataLoader and Dataset classes
- Model evaluation and visualization techniques
- Best practices for neural network development
- Real-world classification and regression examples


---

## 1. Concept Overview

### What is PyTorch?

**PyTorch** is an open-source machine learning library developed by Facebook's AI Research lab. It's designed to be intuitive, flexible, and efficient for both research and production use. PyTorch has become the framework of choice for many researchers and practitioners due to its dynamic computation graph and Pythonic interface.

**Key Advantages of PyTorch:**
1. **Dynamic Computation Graphs**: Build and modify networks on-the-fly
2. **Pythonic Design**: Feels natural to Python developers
3. **Strong Community**: Extensive ecosystem and pre-trained models
4. **Production Ready**: TorchScript for deployment optimization
5. **Research Friendly**: Easy experimentation and rapid prototyping

### Core PyTorch Components

| Component | Purpose | Key Features |
|-----------|---------|--------------|
| **Tensors** | Multi-dimensional arrays | GPU acceleration, automatic differentiation |
| **Autograd** | Automatic differentiation | Gradient computation, backpropagation |
| **nn.Module** | Neural network building blocks | Reusable layers, parameter management |
| **Optimizers** | Training algorithms | SGD, Adam, RMSprop, and more |
| **Loss Functions** | Training objectives | Cross-entropy, MSE, custom losses |

### Neural Network Architecture in PyTorch

**Building Blocks:**
- **Input Layer**: Receives your data
- **Hidden Layers**: Learn complex patterns (fully connected, convolutional, etc.)
- **Activation Functions**: Introduce non-linearity (ReLU, sigmoid, tanh)
- **Output Layer**: Produces predictions
- **Loss Function**: Measures prediction quality
- **Optimizer**: Updates model parameters

**Real-World Applications:**
- **Image Classification**: Recognizing objects in photos
- **Natural Language Processing**: Sentiment analysis, text generation
- **Time Series Forecasting**: Stock prices, weather prediction
- **Recommendation Systems**: Personalized content suggestions


---

## 2. Code Demo: PyTorch Fundamentals

Let's start by exploring PyTorch's core components and building our first neural network.


### 2.1 Environment Setup and Imports


In [5]:
# Import essential libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, mean_squared_error

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Configure matplotlib
plt.style.use('default')
sns.set_palette("husl")

# Check PyTorch version and device availability
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    print("Using CPU")

print(f"Device: {device}")


PyTorch version: 2.9.0+cpu
CUDA available: False
Using CPU
Device: cpu


### 2.2 Understanding PyTorch Tensors

Tensors are the fundamental data structure in PyTorch - think of them as multi-dimensional arrays with GPU acceleration and automatic differentiation capabilities.


In [7]:
# Create tensors from different sources
print("=== Creating Tensors ===")

# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
print(f"From list: {tensor_from_list}")

# From NumPy array
numpy_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.from_numpy(numpy_array)
print(f"From NumPy:\n{tensor_from_numpy}")

# Random tensors
random_tensor = torch.randn(3, 4)  # Normal distribution
uniform_tensor = torch.rand(2, 3)  # Uniform distribution [0, 1]
print(f"Random normal (3x4):\n{random_tensor}")
print(f"Random uniform (2x3):\n{uniform_tensor}")

# Tensor operations
print("\n=== Tensor Operations ===")
a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

print(f"Matrix A:\n{a}")
print(f"Matrix B:\n{b}")
print(f"A + B:\n{a + b}")
print(f"A * B (element-wise):\n{a * b}")
print(f"A @ B (matrix multiplication):\n{a @ b}")
print(f"A.shape: {a.shape}")
print(f"A.dtype: {a.dtype}")


=== Creating Tensors ===
From list: tensor([1, 2, 3, 4, 5])
From NumPy:
tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
Random normal (3x4):
tensor([[-0.3267, -0.2788, -0.4220, -1.3323],
        [-0.3639,  0.1513, -0.3514, -0.7906],
        [-0.0915,  0.2352,  2.2440,  0.5817]])
Random uniform (2x3):
tensor([[0.6440, 0.7071, 0.6581],
        [0.4913, 0.8913, 0.1447]])

=== Tensor Operations ===
Matrix A:
tensor([[1., 2.],
        [3., 4.]])
Matrix B:
tensor([[5., 6.],
        [7., 8.]])
A + B:
tensor([[ 6.,  8.],
        [10., 12.]])
A * B (element-wise):
tensor([[ 5., 12.],
        [21., 32.]])
A @ B (matrix multiplication):
tensor([[19., 22.],
        [43., 50.]])
A.shape: torch.Size([2, 2])
A.dtype: torch.float32


### 2.3 Automatic Differentiation with Autograd

PyTorch's autograd system automatically computes gradients, which is essential for training neural networks.


In [8]:
# Understanding autograd
print("=== Automatic Differentiation ===")

# Create tensors with requires_grad=True for gradient computation
x = torch.tensor(2.0, requires_grad=True)
w = torch.tensor(3.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)

print(f"x = {x}, w = {w}, b = {b}")

# Define a simple function: y = w*x + b
y = w * x + b
print(f"y = w*x + b = {y}")

# Compute gradients
y.backward()

print(f"\nGradients:")
print(f"dy/dx = {x.grad}")  # Should be 3.0 (w)
print(f"dy/dw = {w.grad}")  # Should be 2.0 (x)
print(f"dy/db = {b.grad}")  # Should be 1.0

# More complex example: y = x^2 + 2x + 1
print("\n=== Complex Function Example ===")
x2 = torch.tensor(3.0, requires_grad=True)
y2 = x2**2 + 2*x2 + 1
print(f"y = x² + 2x + 1 = {y2}")

y2.backward()
print(f"dy/dx = {x2.grad}")  # Should be 2*3 + 2 = 8

=== Automatic Differentiation ===
x = 2.0, w = 3.0, b = 1.0
y = w*x + b = 7.0

Gradients:
dy/dx = 3.0
dy/dw = 2.0
dy/db = 1.0

=== Complex Function Example ===
y = x² + 2x + 1 = 16.0
dy/dx = 8.0


### 2.4 Building Your First Neural Network

Now let's build a simple neural network for binary classification using the classic approach.


In [10]:
# Generate synthetic dataset for binary classification
print("=== Dataset Preparation ===")

X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=10,
    n_redundant=5,
    n_clusters_per_class=1,
    random_state=42
)

# Convert to PyTorch tensors
X_tensor = torch.FloatTensor(X)
y_tensor = torch.FloatTensor(y).unsqueeze(1)  # Add dimension for batch processing

print(f"Dataset shape: {X_tensor.shape}")
print(f"Target shape: {y_tensor.shape}")
print(f"Class distribution: {np.bincount(y)}")

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X_tensor, y_tensor, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTraining set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")


=== Dataset Preparation ===
Dataset shape: torch.Size([1000, 20])
Target shape: torch.Size([1000, 1])
Class distribution: [497 503]

Training set: 800 samples
Test set: 200 samples


In [11]:
# Method 1: Building Neural Network using nn.Module (Recommended)
class SimpleNeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNeuralNetwork, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)  # Input to hidden
        self.fc2 = nn.Linear(hidden_size, hidden_size)  # Hidden to hidden
        self.fc3 = nn.Linear(hidden_size, output_size)  # Hidden to output
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.2)
        
    def forward(self, x):
        # Forward pass through the network
        x = F.relu(self.fc1(x))  # First hidden layer with ReLU
        x = self.dropout(x)      # Apply dropout
        x = F.relu(self.fc2(x))  # Second hidden layer with ReLU
        x = self.dropout(x)      # Apply dropout
        x = torch.sigmoid(self.fc3(x))  # Output layer with sigmoid
        return x

# Create model instance
model = SimpleNeuralNetwork(
    input_size=20,    # Number of features
    hidden_size=64,   # Hidden layer size
    output_size=1     # Binary classification
)

print("=== Neural Network Architecture ===")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")


=== Neural Network Architecture ===
SimpleNeuralNetwork(
  (fc1): Linear(in_features=20, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=1, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)

Total parameters: 5,569
Trainable parameters: 5,569


In [12]:
# Method 2: Building Neural Network using nn.Sequential (Alternative)
sequential_model = nn.Sequential(
    nn.Linear(20, 64),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(64, 64),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(64, 1),
    nn.Sigmoid()
)

print("=== Sequential Model ===")
print(sequential_model)

# Both models have the same architecture
print(f"\nSequential model parameters: {sum(p.numel() for p in sequential_model.parameters()):,}")


=== Sequential Model ===
Sequential(
  (0): Linear(in_features=20, out_features=64, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.2, inplace=False)
  (3): Linear(in_features=64, out_features=64, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.2, inplace=False)
  (6): Linear(in_features=64, out_features=1, bias=True)
  (7): Sigmoid()
)

Sequential model parameters: 5,569
