# Linear Regression

In this tutorial, we will cover the foundational concepts of machine learning and then delve into a practical example using simple linear regression. The aim is to provide a clear understanding of the machine learning process, from data collection to model evaluation.

## Table of Contents
1. Introduction to Machine Learning
2. Simple Linear Regression
   - Theory and Mathematical Formulation
   - Practical Example with Python
3. Summary and Key Takeaways

## 1. Introduction to Machine Learning

Machine learning is a subset of artificial intelligence (AI) that focuses on building systems that can learn from data. Instead of being explicitly programmed to perform a task, a machine learning model uses patterns in data to make decisions and predictions.

### Types of Machine Learning:
- **Supervised Learning**: The model is trained on a labeled dataset, meaning the training data includes the desired solution, known as a label. The goal is to learn a mapping from inputs to outputs. Examples include regression and classification problems.
- **Unsupervised Learning**: The model is trained on an unlabeled dataset, meaning the training data does not include labels. The goal is to find patterns or relationships in the data. Examples include clustering and association.
- **Reinforcement Learning**: The model learns by interacting with an environment and receiving feedback in the form of rewards or penalties. The goal is to learn a strategy to maximize the cumulative reward over time.

### Key Concepts:
- **Training Data**: The data on which the model is trained. It includes both the input data and the corresponding desired output.
- **Testing Data**: Data that is separate from the training data and is used to evaluate the performance of the model.
- **Features**: Input variables that the model uses to make predictions.
- **Target/Label**: The output variable that the model aims to predict (in supervised learning).
- **Model**: A mathematical representation of a real-world process. In machine learning, a model is the output of a training algorithm, which is used to make predictions.
- **Training**: The process of adjusting a model's parameters to fit the training data.
- **Prediction**: Using the trained model to make predictions on new, unseen data.
- **Evaluation**: Assessing the performance of a trained model using certain metrics (e.g., accuracy, mean squared error).

With this foundational knowledge, we can now delve into a practical example using simple linear regression.

## 2. Simple Linear Regression

Simple linear regression is a linear approach to modeling the relationship between a dependent variable and one independent variable. The goal is to find the best line (in terms of least squares error) that fits the data.

### Theory and Mathematical Formulation:

The relationship between the dependent variable y and the independent variable x is represented as:

y = β0 + β1 x + ε

Where:
- y is the dependent variable (what we're trying to predict).
- x is the independent variable (the input).
- β0 is the y-intercept.
- β1 is the slope of the line.
- ε is the error term (difference between observed and predicted values).

The goal of simple linear regression is to find the values of β0 and β1 that minimize the sum of the squared differences between the observed values (actual values) and the values predicted by the model.

In the next section, we'll implement a practical example of simple linear regression using Python.

### Practical Example with Python

For this example, we'll use a simple dataset that contains information about the relationship between the number of hours studied and the scores obtained in an exam. Our goal is to predict the exam score based on the number of hours studied.

#### Data Loading and Visualization

Let's start by importing the necessary libraries and loading the data.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Generate a synthetic dataset
np.random.seed(42)
hours_studied = np.random.rand(100) * 10
exam_scores = 5 + 2.5 * hours_studied + np.random.randn(100) * 2

# Convert to DataFrame for easier handling
data = pd.DataFrame({'Hours_Studied': hours_studied, 'Exam_Score': exam_scores})

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(data['Hours_Studied'], data['Exam_Score'], color='blue', label='Data points')
plt.title('Relationship between Hours Studied and Exam Score')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Splitting the data into training and testing sets
X = data[['Hours_Studied']]
y = data['Exam_Score']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training the linear regression model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Making predictions
y_pred = regressor.predict(X_test)

# Visualizing the training set results
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, color='blue', label='Training data')
plt.scatter(X_test, y_test, color='red', label='Testing data')
plt.plot(X_train, regressor.predict(X_train), color='green', label='Regression line')
plt.title('Relationship between Hours Studied and Exam Score (Training set)')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Evaluating the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

evaluation_metrics = pd.DataFrame({'Metric': ['Mean Absolute Error', 'Mean Squared Error', 'Root Mean Squared Error', 'R-squared'],
                                  'Value': [mae, mse, rmse, r2]})
evaluation_metrics

## Gradient Descent for Linear Regression

Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively adjusting the model's parameters. The idea is to take repeated steps in the opposite direction of the gradient (or slope) of the function at the current point, because this is the direction of steepest descent.

For linear regression, our goal is to minimize the Mean Squared Error (MSE) loss function. The gradient descent algorithm updates the parameters (weights and bias) using the following formula:

$$\theta_{new} = \theta_{old} - \alpha \times \nabla_{\theta} J(\theta)$$

Where:
- $$\theta $$ are the parameters (weights and bias).
- $$ \alpha $$ is the learning rate, a hyperparameter that determines the step size at each iteration while moving towards a minimum of the loss function.
- $$ \nabla_{\theta} J(\theta) $$ is the gradient of the loss function with respect to the parameters.

Next, we'll implement gradient descent from scratch and train our linear regression model using this algorithm.

In [None]:
def compute_gradient(X, y, theta0, theta1):
    m = len(y)
    theta0_gradient = -(2/m) * sum(y - (theta0 + theta1*X))
    theta1_gradient = -(2/m) * sum(X * (y - (theta0 + theta1*X)))
    return theta0_gradient, theta1_gradient

def gradient_descent(X, y, learning_rate=0.01, iterations=1000):
    theta0 = 0
    theta1 = 0
    for _ in range(iterations):
        theta0_gradient, theta1_gradient = compute_gradient(X, y, theta0, theta1)
        theta0 = theta0 - learning_rate * theta0_gradient
        theta1 = theta1 - learning_rate * theta1_gradient
    return theta0, theta1

# Train the model using gradient descent
theta0_gd, theta1_gd = gradient_descent(X_train['Hours_Studied'], y_train)
theta0_gd, theta1_gd

In [None]:
# Making predictions using the model trained with gradient descent
y_pred_gd = theta0_gd + theta1_gd * X_test['Hours_Studied']

# Evaluating the model trained with gradient descent
mae_gd = mean_absolute_error(y_test, y_pred_gd)
mse_gd = mean_squared_error(y_test, y_pred_gd)
rmse_gd = np.sqrt(mse_gd)
r2_gd = r2_score(y_test, y_pred_gd)

evaluation_metrics_gd = pd.DataFrame({'Metric': ['Mean Absolute Error', 'Mean Squared Error', 'Root Mean Squared Error', 'R-squared'],
                                     'Value (Gradient Descent)': [mae_gd, mse_gd, rmse_gd, r2_gd],
                                     'Value (Analytical Solution)': [mae, mse, rmse, r2]})
evaluation_metrics_gd

# Checkpointing Strategy

In [None]:
import pickle
import os

def save_checkpoint(iteration, theta0, theta1, path='checkpoints'):
    """Save model parameters as a checkpoint."""
    if not os.path.exists(path):
        os.makedirs(path)
    checkpoint = {
        'iteration': iteration,
        'theta0': theta0,
        'theta1': theta1
    }
    checkpoint_path = os.path.join(path, f'checkpoint_{iteration}.pkl')
    with open(checkpoint_path, 'wb') as f:
        pickle.dump(checkpoint, f)

def load_checkpoint(path):
    """Load model parameters from a checkpoint."""
    with open(path, 'rb') as f:
        checkpoint = pickle.load(f)
    return checkpoint['iteration'], checkpoint['theta0'], checkpoint['theta1']

def gradient_descent_with_checkpointing(X, y, learning_rate=0.01, iterations=1000, checkpoint_frequency=100):
    theta0 = 0
    theta1 = 0
    for i in range(iterations):
        theta0_gradient, theta1_gradient = compute_gradient(X, y, theta0, theta1)
        theta0 = theta0 - learning_rate * theta0_gradient
        theta1 = theta1 - learning_rate * theta1_gradient
        if i % checkpoint_frequency == 0:
            save_checkpoint(i, theta0, theta1)
    return theta0, theta1

# Train the model using gradient descent with checkpointing
theta0_gd_checkpointed, theta1_gd_checkpointed = gradient_descent_with_checkpointing(X_train['Hours_Studied'], y_train)
theta0_gd_checkpointed, theta1_gd_checkpointed