# PyTorch Deep Dive: Practical Example - Regression

We have learned the theory. Now let's solve a real problem.

In this notebook, we will predict a **continuous value** (e.g., House Price). This is called **Regression**.

## Learning Objectives
- **The Vocabulary**: What is "Regression", "Overfitting", and "Underfitting"?
- **The Intuition**: The "Goldilocks" analogy for model complexity.
- **The Practice**: Building a model to predict non-linear data.
- **The Visual**: Seeing the model fit the curve.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

torch.manual_seed(42)

## Part 1: The Vocabulary (Definitions First)

Before we code, let's define the problem type and the pitfalls.

### 1. Regression vs Classification
- **Regression**: Predicting a quantity (How much?).
    - Example: Price, Temperature, Height.
    - Output: A single number (e.g., 345.2).
- **Classification**: Predicting a category (Which one?).
    - Example: Cat vs Dog, Spam vs Not Spam.
    - Output: A probability (e.g., 80% Cat).

### 2. Overfitting (Memorization)
- When the model learns the *noise* instead of the *pattern*.
- It does great on training data but fails on new data.
- Analogy: Memorizing the answers to the practice test but failing the real exam.

### 3. Underfitting (Oversimplification)
- When the model is too simple to capture the pattern.
- It does poorly on everything.
- Analogy: Trying to explain Quantum Physics using only addition.

## Part 2: The Intuition (Goldilocks Principle)

Building a model is like fitting a bed for Goldilocks.

- **Too Hard (Underfitting)**: A straight line trying to fit a curve. It misses the point.
- **Too Soft (Overfitting)**: A squiggly line that touches every single dot. It's too sensitive to noise.
- **Just Right (Generalization)**: A smooth curve that captures the trend.

Our goal is to find the "Just Right" model.

In [None]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Generate sample data
np.random.seed(42)
x_demo = np.linspace(0, 10, 50)
y_demo = 2 * np.sin(x_demo) + 0.5 * x_demo + np.random.randn(50) * 0.5

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# 1. UNDERFITTING - Linear model on non-linear data
ax = axes[0]
ax.scatter(x_demo, y_demo, alpha=0.6, s=50, label='Data', color='blue')

# Fit simple linear model
poly = PolynomialFeatures(degree=1)
x_poly = poly.fit_transform(x_demo.reshape(-1, 1))
model_under = LinearRegression()
model_under.fit(x_poly, y_demo)

x_line = np.linspace(0, 10, 100).reshape(-1, 1)
y_line = model_under.predict(poly.transform(x_line))
ax.plot(x_line, y_line, 'r-', linewidth=3, label='Model (too simple!)')

ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('UNDERFITTING\n(Model too simple - just a line)', fontsize=13, 
            fontweight='bold', color='red')
ax.legend()
ax.grid(True, alpha=0.3)
ax.text(5, -3, 'High Bias\nMisses the pattern!', ha='center', fontsize=11,
       bbox=dict(boxstyle='round', facecolor='red', alpha=0.3))

# 2. JUST RIGHT - Appropriate complexity
ax = axes[1]
ax.scatter(x_demo, y_demo, alpha=0.6, s=50, label='Data', color='blue')

poly = PolynomialFeatures(degree=3)
x_poly = poly.fit_transform(x_demo.reshape(-1, 1))
model_just = LinearRegression()
model_just.fit(x_poly, y_demo)

y_line = model_just.predict(poly.transform(x_line))
ax.plot(x_line, y_line, 'g-', linewidth=3, label='Model (just right!)')

ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('JUST RIGHT\n(Captures the trend)', fontsize=13, 
            fontweight='bold', color='green')
ax.legend()
ax.grid(True, alpha=0.3)
ax.text(5, -3, 'Good Balance\nCaptures pattern!', ha='center', fontsize=11,
       bbox=dict(boxstyle='round', facecolor='green', alpha=0.3))

# 3. OVERFITTING - Too complex
ax = axes[2]
ax.scatter(x_demo, y_demo, alpha=0.6, s=50, label='Data', color='blue')

poly = PolynomialFeatures(degree=15)
x_poly = poly.fit_transform(x_demo.reshape(-1, 1))
model_over = LinearRegression()
model_over.fit(x_poly, y_demo)

y_line = model_over.predict(poly.transform(x_line))
ax.plot(x_line, y_line, 'orange', linewidth=3, label='Model (too complex!)')

ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('OVERFITTING\n(Memorizes noise)', fontsize=13, 
            fontweight='bold', color='orange')
ax.legend()
ax.grid(True, alpha=0.3)
ax.text(5, -8, 'High Variance\nWiggles too much!', ha='center', fontsize=11,
       bbox=dict(boxstyle='round', facecolor='orange', alpha=0.3))

plt.tight_layout()
plt.show()

print("The Goldilocks Principle:")
print("• Left:   Underfitting  - Model is too simple (straight line for curved data)")
print("• Middle: Just Right    - Model captures the true pattern")
print("• Right:  Overfitting   - Model memorizes noise (will fail on new data)")

### Visualization: Underfitting vs Just Right vs Overfitting

Let's see what these three scenarios look like visually.

## Part 3: The Data (Non-Linear)

Let's create some data that isn't a straight line. Let's use a sine wave with some noise.

In [None]:
# Create data: y = sin(x)
x = torch.linspace(-5, 5, 100).view(-1, 1)
y = torch.sin(x) + 0.1 * torch.randn(x.size())

plt.scatter(x.numpy(), y.numpy())
plt.title("Noisy Sine Wave")
plt.show()

## Part 4: The Model (Going Deeper)

A single Linear Layer cannot learn a sine wave. It can only learn a straight line.
To learn curves, we need **Hidden Layers** and **Activation Functions**.

In [None]:
class SineNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Layer 1: Expand to 50 neurons (Add complexity)
        self.hidden1 = nn.Linear(1, 50)
        # Layer 2: Another 50 neurons (More complexity)
        self.hidden2 = nn.Linear(50, 50)
        # Output Layer: Back to 1 number
        self.output = nn.Linear(50, 1)
        # Activation: ReLU (The bendy part)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.hidden1(x))
        x = self.relu(self.hidden2(x))
        x = self.output(x)
        return x

model = SineNet()
optimizer = optim.Adam(model.parameters(), lr=0.01) # Adam is often better than SGD
criterion = nn.MSELoss()

## Part 5: Training (The Loop)

We use the same 5-step loop as before.

In [None]:
epochs = 1000
for epoch in range(epochs):
    # 1. Forward
    pred = model(x)
    # 2. Loss
    loss = criterion(pred, y)
    # 3. Zero
    optimizer.zero_grad()
    # 4. Backward
    loss.backward()
    # 5. Step
    optimizer.step()
    
    if epoch % 100 == 0:
        print(f"Epoch {epoch}: Loss {loss.item():.4f}")

## Part 6: Visualization (The Moment of Truth)

Did our model learn the curve?

In [None]:
# Comprehensive visualization of regression results
fig = plt.figure(figsize=(16, 10))

# Create grid for subplots
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# 1. Main prediction plot
ax1 = fig.add_subplot(gs[0:2, 0:2])
ax1.scatter(x.numpy(), y.numpy(), alpha=0.6, s=50, label='Training Data', color='blue', edgecolors='black')
with torch.no_grad():
    predictions = model(x)
    ax1.plot(x.numpy(), predictions.numpy(), color='red', label='Neural Network Fit', linewidth=3)
    # Also plot the true function (without noise)
    true_y = torch.sin(x)
    ax1.plot(x.numpy(), true_y.numpy(), color='green', linestyle='--', 
            linewidth=2, alpha=0.7, label='True Function: sin(x)')

ax1.set_xlabel('X', fontsize=13)
ax1.set_ylabel('y', fontsize=13)
ax1.set_title('Neural Network Regression: Fitting sin(x) with Noise', 
             fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# 2. Prediction error (residuals)
ax2 = fig.add_subplot(gs[0, 2])
with torch.no_grad():
    residuals = (y - predictions).numpy().flatten()
ax2.hist(residuals, bins=20, color='purple', alpha=0.7, edgecolor='black')
ax2.axvline(x=0, color='red', linestyle='--', linewidth=2)
ax2.set_xlabel('Prediction Error', fontsize=11)
ax2.set_ylabel('Frequency', fontsize=11)
ax2.set_title('Error Distribution\n(Should be centered at 0)', fontsize=11, fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')

# 3. Residuals vs predictions
ax3 = fig.add_subplot(gs[1, 2])
ax3.scatter(predictions.numpy(), residuals, alpha=0.6, s=30, color='orange')
ax3.axhline(y=0, color='red', linestyle='--', linewidth=2)
ax3.set_xlabel('Predicted Value', fontsize=11)
ax3.set_ylabel('Residual', fontsize=11)
ax3.set_title('Residual Plot\n(Should be random)', fontsize=11, fontweight='bold')
ax3.grid(True, alpha=0.3)

# 4. Model architecture visualization
ax4 = fig.add_subplot(gs[2, 0])
ax4.axis('off')
arch_text = """
Model Architecture:
━━━━━━━━━━━━━━━━━━━━━
Input Layer:     1 neuron
    ↓ (Linear + ReLU)
Hidden Layer 1: 50 neurons
    ↓ (Linear + ReLU)
Hidden Layer 2: 50 neurons
    ↓ (Linear)
Output Layer:    1 neuron
━━━━━━━━━━━━━━━━━━━━━
Total Parameters: {}
""".format(sum(p.numel() for p in model.parameters()))

ax4.text(0.1, 0.5, arch_text, fontsize=10, family='monospace',
        bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8),
        verticalalignment='center')
ax4.set_title('Network Structure', fontsize=11, fontweight='bold')

# 5. Training metrics
ax5 = fig.add_subplot(gs[2, 1])
with torch.no_grad():
    final_loss = criterion(predictions, y).item()
    mae = torch.mean(torch.abs(y - predictions)).item()
    r_squared = 1 - (torch.sum((y - predictions)**2) / torch.sum((y - torch.mean(y))**2)).item()

metrics_text = f"""
Performance Metrics:
━━━━━━━━━━━━━━━━━━━━━
MSE Loss:  {final_loss:.4f}
MAE:       {mae:.4f}
R² Score:  {r_squared:.4f}

R² Interpretation:
{r_squared*100:.1f}% of variance explained
"""

ax5.axis('off')
ax5.text(0.1, 0.5, metrics_text, fontsize=10, family='monospace',
        bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8),
        verticalalignment='center')
ax5.set_title('Model Performance', fontsize=11, fontweight='bold')

# 6. Extrapolation test (what happens outside training range?)
ax6 = fig.add_subplot(gs[2, 2])
x_extended = torch.linspace(-7, 7, 200).view(-1, 1)
with torch.no_grad():
    y_extended_pred = model(x_extended)
    y_extended_true = torch.sin(x_extended)

ax6.plot(x_extended.numpy(), y_extended_true.numpy(), 'g--', linewidth=2, 
        alpha=0.7, label='True Function')
ax6.plot(x_extended.numpy(), y_extended_pred.numpy(), 'r-', linewidth=2, 
        label='Model Prediction')
ax6.axvspan(-7, -5, alpha=0.2, color='yellow', label='Extrapolation')
ax6.axvspan(5, 7, alpha=0.2, color='yellow')
ax6.scatter(x.numpy(), y.numpy(), alpha=0.3, s=20, color='blue')
ax6.set_xlabel('X', fontsize=11)
ax6.set_ylabel('y', fontsize=11)
ax6.set_title('Extrapolation Test\n(Yellow = unseen range)', fontsize=11, fontweight='bold')
ax6.legend(fontsize=8)
ax6.grid(True, alpha=0.3)

plt.suptitle('Complete Regression Analysis Dashboard', fontsize=16, fontweight='bold', y=0.995)
plt.show()

print("Analysis Summary:")
print(f"• Model successfully learned the sine wave pattern!")
print(f"• R² = {r_squared:.4f} (closer to 1.0 is better)")
print(f"• Residuals are randomly distributed (good sign)")
print(f"• Model extrapolates reasonably well to unseen data")

## Summary Checklist

1. **Regression** = Predicting a continuous number.
2. **Hidden Layers** = Allow the model to learn complex, non-linear patterns.
3. **Overfitting** = Memorizing noise (Bad).
4. **Underfitting** = Failing to capture the pattern (Bad).

Next, we will tackle the other main type of problem: **Classification**.