# Linear Regression From Scratch

The simplest ML model: fit a line through data points.

**What you'll learn:**
- How to implement linear regression using only numpy
- The math behind finding the best-fit line
- What a loss function is and why it matters
- How to visualize your model's predictions

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Make plots look nice
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## 1. Generate Some Data

Let's create some fake data that roughly follows a line, with some noise added.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate 50 random x values between 0 and 10
X = np.random.uniform(0, 10, 50)

# The "true" relationship: y = 2x + 1 (plus some noise)
true_slope = 2
true_intercept = 1
noise = np.random.normal(0, 2, 50)  # Gaussian noise

y = true_slope * X + true_intercept + noise

# Plot it
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.7, label='Data points')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Our Dataset (true line: y = 2x + 1)')
plt.legend()
plt.show()

print(f"We have {len(X)} data points")

## 2. The Goal

We want to find the line `y = mx + b` that best fits our data.

- **m** is the slope (how steep the line is)
- **b** is the intercept (where the line crosses the y-axis)

But what does "best" mean?

## 3. The Loss Function: Mean Squared Error

We need a way to measure how "wrong" our line is. The most common way is **Mean Squared Error (MSE)**:

$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Where:
- $y_i$ is the actual value
- $\hat{y}_i$ is our prediction
- We square the errors so negatives don't cancel out positives

In [None]:
def predict(X, m, b):
    """Make predictions using y = mx + b"""
    return m * X + b

def mean_squared_error(y_true, y_pred):
    """Calculate the mean squared error"""
    return np.mean((y_true - y_pred) ** 2)

# Let's try a random guess: m=1, b=0
m_guess, b_guess = 1, 0
y_pred = predict(X, m_guess, b_guess)
mse = mean_squared_error(y, y_pred)

print(f"With m={m_guess}, b={b_guess}: MSE = {mse:.2f}")

# Visualize how bad this guess is
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.7, label='Data')
plt.plot([0, 10], [predict(0, m_guess, b_guess), predict(10, m_guess, b_guess)], 
         'r-', linewidth=2, label=f'Guess: y = {m_guess}x + {b_guess}')
plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Our First Guess (MSE = {mse:.2f})')
plt.legend()
plt.show()

## 4. Finding the Best Line: The Closed-Form Solution

There's actually a formula that gives us the optimal m and b directly!

$$m = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}$$

$$b = \bar{y} - m\bar{x}$$

Where $\bar{x}$ and $\bar{y}$ are the means of X and y.

In [None]:
def fit_linear_regression(X, y):
    """
    Find the optimal slope and intercept using the closed-form solution.
    
    This is the math you need to understand:
    - We're finding m and b that minimize the sum of squared errors
    - Taking derivatives and setting to zero gives us these formulas
    """
    x_mean = np.mean(X)
    y_mean = np.mean(y)
    
    # Numerator: sum of (x - x_mean) * (y - y_mean)
    numerator = np.sum((X - x_mean) * (y - y_mean))
    
    # Denominator: sum of (x - x_mean)^2
    denominator = np.sum((X - x_mean) ** 2)
    
    # Calculate slope and intercept
    m = numerator / denominator
    b = y_mean - m * x_mean
    
    return m, b

# Fit our model
m_optimal, b_optimal = fit_linear_regression(X, y)
print(f"Optimal parameters: m = {m_optimal:.4f}, b = {b_optimal:.4f}")
print(f"True parameters:    m = {true_slope}, b = {true_intercept}")

In [None]:
# Let's see how well our optimal line fits
y_pred_optimal = predict(X, m_optimal, b_optimal)
mse_optimal = mean_squared_error(y, y_pred_optimal)

plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.7, label='Data')
plt.plot([0, 10], [predict(0, m_optimal, b_optimal), predict(10, m_optimal, b_optimal)], 
         'g-', linewidth=2, label=f'Fitted: y = {m_optimal:.2f}x + {b_optimal:.2f}')
plt.plot([0, 10], [predict(0, true_slope, true_intercept), predict(10, true_slope, true_intercept)], 
         'r--', linewidth=2, alpha=0.5, label=f'True: y = {true_slope}x + {true_intercept}')
plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Fitted Line (MSE = {mse_optimal:.2f})')
plt.legend()
plt.show()

print(f"Our MSE: {mse_optimal:.4f}")
print(f"This is close to the variance of our noise ({2**2} = 4)")

## 5. Visualizing the Error Surface

Let's see what the MSE looks like for different values of m and b. This helps us understand why our solution is optimal.

In [None]:
# Create a grid of m and b values
m_range = np.linspace(0, 4, 100)
b_range = np.linspace(-3, 5, 100)
M, B = np.meshgrid(m_range, b_range)

# Calculate MSE for each combination
MSE_grid = np.zeros_like(M)
for i in range(M.shape[0]):
    for j in range(M.shape[1]):
        y_pred = predict(X, M[i,j], B[i,j])
        MSE_grid[i,j] = mean_squared_error(y, y_pred)

# Plot as a contour
plt.figure(figsize=(10, 8))
contour = plt.contour(M, B, MSE_grid, levels=20)
plt.clabel(contour, inline=True, fontsize=8)
plt.plot(m_optimal, b_optimal, 'r*', markersize=15, label='Our solution')
plt.plot(true_slope, true_intercept, 'g^', markersize=12, label='True values')
plt.xlabel('Slope (m)')
plt.ylabel('Intercept (b)')
plt.title('Error Surface: MSE for different m and b values')
plt.legend()
plt.colorbar(contour, label='MSE')
plt.show()

## 6. The Matrix Form (Optional but Powerful)

The same solution can be written with matrices. This generalizes to multiple variables:

$$\theta = (X^TX)^{-1}X^Ty$$

Where $\theta = [b, m]$ and X has a column of 1s for the intercept.

In [None]:
def fit_linear_regression_matrix(X, y):
    """
    Fit using the normal equation: theta = (X'X)^(-1) X'y
    
    This is exactly equivalent to our earlier solution,
    but works for multiple features.
    """
    # Add column of 1s for intercept
    X_with_bias = np.column_stack([np.ones(len(X)), X])
    
    # Normal equation
    theta = np.linalg.inv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y
    
    b, m = theta[0], theta[1]
    return m, b

m_matrix, b_matrix = fit_linear_regression_matrix(X, y)
print(f"Matrix solution: m = {m_matrix:.4f}, b = {b_matrix:.4f}")
print(f"Our solution:    m = {m_optimal:.4f}, b = {b_optimal:.4f}")
print("\nThey match!")

## Key Takeaways

1. **Linear regression finds the best line** by minimizing the sum of squared errors
2. **MSE is our loss function** - it tells us how wrong our predictions are
3. **The closed-form solution** gives us the optimal m and b directly (no iteration needed)
4. **The matrix form** generalizes to multiple variables

## What's Next?

In the next notebook, we'll learn about **gradient descent** - an iterative method to find the minimum. This is more important because:
- It works when we can't compute the closed-form solution
- It's how neural networks learn
- It scales better to huge datasets

In [None]:
# Quick summary
print("=" * 50)
print("LINEAR REGRESSION SUMMARY")
print("=" * 50)
print(f"Data points: {len(X)}")
print(f"True parameters: m={true_slope}, b={true_intercept}")
print(f"Fitted parameters: m={m_optimal:.4f}, b={b_optimal:.4f}")
print(f"Mean Squared Error: {mse_optimal:.4f}")
print("=" * 50)