# Linear Regression: From Theory to Implementation

Linear regression is one of the fundamental algorithms in machine learning and statistics. In this notebook, we'll explore the mathematical foundations and implement it from scratch using NumPy.

## Mathematical Foundation

Linear regression assumes a linear relationship between input features $X$ and target variable $y$:

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n + \epsilon$$

In matrix form: $y = X\beta + \epsilon$

Where:
- $\beta$ are the parameters we want to learn
- $\epsilon$ represents the error term
- Our goal is to minimize the sum of squared errors (SSE)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

## Generate Sample Data

Let's create a simple dataset to work with:

In [None]:
# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, alpha=0.7, label='Training data')
plt.scatter(X_test, y_test, alpha=0.7, label='Test data', color='red')
plt.xlabel('Feature (X)')
plt.ylabel('Target (y)')
plt.title('Linear Regression Dataset')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Implement Linear Regression from Scratch

The normal equation for linear regression is:

$$\beta = (X^T X)^{-1} X^T y$$

This gives us the optimal parameters that minimize the mean squared error.

In [None]:
class LinearRegressionFromScratch:
    def __init__(self):
        self.weights = None
        self.bias = None
    
    def fit(self, X, y):
        """Fit the model using the normal equation"""
        # Add bias term (intercept) by adding a column of ones
        X_with_bias = np.c_[np.ones(X.shape[0]), X]
        
        # Normal equation: β = (X^T X)^(-1) X^T y
        params = np.linalg.inv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y
        
        self.bias = params[0]
        self.weights = params[1:]
        
        return self
    
    def predict(self, X):
        """Make predictions"""
        return self.bias + X @ self.weights
    
    def mse(self, X, y):
        """Calculate Mean Squared Error"""
        predictions = self.predict(X)
        return np.mean((y - predictions) ** 2)
    
    def r_squared(self, X, y):
        """Calculate R-squared (coefficient of determination)"""
        predictions = self.predict(X)
        ss_res = np.sum((y - predictions) ** 2)  # Sum of squared residuals
        ss_tot = np.sum((y - np.mean(y)) ** 2)   # Total sum of squares
        return 1 - (ss_res / ss_tot)

# Create and train the model
model = LinearRegressionFromScratch()
model.fit(X_train, y_train)

print(f"Learned parameters:")
print(f"Bias (β₀): {model.bias:.2f}")
print(f"Weight (β₁): {model.weights[0]:.2f}")

## Evaluate Model Performance

In [None]:
# Make predictions
train_predictions = model.predict(X_train)
test_predictions = model.predict(X_test)

# Calculate metrics
train_mse = model.mse(X_train, y_train)
test_mse = model.mse(X_test, y_test)
train_r2 = model.r_squared(X_train, y_train)
test_r2 = model.r_squared(X_test, y_test)

print("Model Performance:")
print(f"Training MSE: {train_mse:.2f}")
print(f"Test MSE: {test_mse:.2f}")
print(f"Training R²: {train_r2:.3f}")
print(f"Test R²: {test_r2:.3f}")

# Plot results
plt.figure(figsize=(12, 5))

# Plot 1: Training data with regression line
plt.subplot(1, 2, 1)
plt.scatter(X_train, y_train, alpha=0.7, label='Training data')
X_line = np.linspace(X_train.min(), X_train.max(), 100).reshape(-1, 1)
y_line = model.predict(X_line)
plt.plot(X_line, y_line, color='red', linewidth=2, label=f'Regression line (R² = {train_r2:.3f})')
plt.xlabel('Feature (X)')
plt.ylabel('Target (y)')
plt.title('Training Data with Regression Line')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 2: Predictions vs Actual
plt.subplot(1, 2, 2)
plt.scatter(y_test, test_predictions, alpha=0.7)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', linewidth=2)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Predictions vs Actual (Test Set)')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Key Takeaways

1. **Linear Regression Assumptions**:
   - Linear relationship between features and target
   - Independence of observations
   - Homoscedasticity (constant variance of errors)
   - Normality of residuals

2. **Normal Equation vs Gradient Descent**:
   - Normal equation: Exact solution, but computationally expensive for large datasets O(n³)
   - Gradient descent: Iterative approach, scales better to large datasets

3. **Performance Metrics**:
   - **MSE**: Lower is better, sensitive to outliers
   - **R²**: Proportion of variance explained, ranges from 0 to 1

4. **When to Use Linear Regression**:
   - When you need interpretable models
   - As a baseline for more complex models
   - When the relationship is approximately linear

This implementation demonstrates the core concepts of linear regression and provides a foundation for understanding more advanced machine learning algorithms!