# Linear Regression - Hands-On Tutorial

This notebook provides a comprehensive guide to Linear Regression with practical examples.

## Contents
1. Import Libraries
2. Load and Explore Dataset
3. Data Preprocessing
4. Simple Linear Regression
5. Multiple Linear Regression
6. Model Evaluation
7. Visualization
8. Implementation from Scratch

## 1. Import Libraries

In [None]:
# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Settings
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')
%matplotlib inline

## 2. Load and Explore Dataset

We'll use a sample dataset for demonstration. Replace this with your own dataset.

In [None]:
# Load dataset (example: California Housing or your own dataset)
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = data.target

print("Dataset Shape:", df.shape)
df.head()

In [None]:
# Dataset information
df.info()

In [None]:
# Statistical summary
df.describe()

In [None]:
# Check for missing values
df.isnull().sum()

## 3. Data Preprocessing

In [None]:
# Separate features and target
X = df.drop('Target', axis=1)
y = df['Target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

## 4. Simple Linear Regression

Using one feature to predict the target.

In [None]:
# Select one feature for simple linear regression
feature = 'MedInc'  # Median Income
X_simple = X_train[[feature]]
X_test_simple = X_test[[feature]]

# Train model
simple_model = LinearRegression()
simple_model.fit(X_simple, y_train)

# Make predictions
y_pred_simple = simple_model.predict(X_test_simple)

# Model parameters
print(f"Coefficient (slope): {simple_model.coef_[0]:.4f}")
print(f"Intercept: {simple_model.intercept_:.4f}")

In [None]:
# Visualize simple linear regression
plt.figure(figsize=(10, 6))
plt.scatter(X_test_simple, y_test, alpha=0.5, label='Actual')
plt.plot(X_test_simple, y_pred_simple, color='red', linewidth=2, label='Predicted')
plt.xlabel(feature)
plt.ylabel('Target')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()

## 5. Multiple Linear Regression

Using all features to predict the target.

In [None]:
# Train model with all features
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Model coefficients
coefficients = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': model.coef_
}).sort_values('Coefficient', ascending=False)

print("\nFeature Coefficients:")
print(coefficients)

## 6. Model Evaluation

In [None]:
# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model Performance:")
print(f"R² Score: {r2:.4f}")
print(f"Mean Squared Error: {mse:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")

## 7. Visualization

In [None]:
# Actual vs Predicted
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Values')
plt.show()

In [None]:
# Residual plot
residuals = y_test - y_pred

plt.figure(figsize=(10, 6))
plt.scatter(y_pred, residuals, alpha=0.5)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()

In [None]:
# Feature importance (coefficient magnitude)
plt.figure(figsize=(10, 6))
coefficients.plot(x='Feature', y='Coefficient', kind='barh', figsize=(10, 6))
plt.xlabel('Coefficient Value')
plt.title('Feature Importance (Coefficients)')
plt.show()

## 8. Implementation from Scratch

Understanding the math behind Linear Regression.

In [None]:
class LinearRegressionScratch:
    def __init__(self):
        self.coefficients = None
        self.intercept = None
    
    def fit(self, X, y):
        # Add bias term
        X_b = np.c_[np.ones((X.shape[0], 1)), X]
        
        # Normal equation: θ = (X^T X)^-1 X^T y
        theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
        
        self.intercept = theta[0]
        self.coefficients = theta[1:]
    
    def predict(self, X):
        return X.dot(self.coefficients) + self.intercept

# Test custom implementation
custom_model = LinearRegressionScratch()
custom_model.fit(X_train.values, y_train.values)
y_pred_custom = custom_model.predict(X_test.values)

# Compare with sklearn
print("Custom R² Score:", r2_score(y_test, y_pred_custom))
print("Sklearn R² Score:", r2_score(y_test, y_pred))

## 9. Exercises

1. Try different train-test split ratios
2. Add feature scaling and compare results
3. Implement gradient descent for Linear Regression
4. Try polynomial features
5. Add regularization (Ridge, Lasso)
6. Use a real-world dataset from Kaggle

## Conclusion

You've learned:
- ✅ Simple and Multiple Linear Regression
- ✅ Model evaluation metrics
- ✅ Visualization techniques
- ✅ Implementation from scratch

**Next Steps:** Try with your own dataset!