In [None]:
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


# Table of Contents

## Introduction

## What is Linear Regression?
- Importance and Applications

## Mathematical Foundation
- The Linear Regression Equation
- Cost Function (Mean Squared Error - MSE)
- Optimization using Gradient Descent
- Analytical Solution: Normal Equation

## Dataset Preparation
- Selecting a Dataset (e.g., Synthetic Data or Real-world Data)
- Data Preprocessing (Handling Missing Values, Normalization, etc.)
- Splitting Data into Training and Testing Sets

## Implementation from Scratch
- Implementing Linear Regression using NumPy
- Computing Cost Function
- Implementing Gradient Descent
- Making Predictions

## Implementation using Scikit-Learn
- Using `LinearRegression` from `sklearn`
- Training the Model
- Evaluating Model Performance

## Model Evaluation Metrics
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
- R² Score

## Visualizing Results
- Plotting Regression Line on Training Data
- Residual Analysis

## Regularization Techniques (Optional)
- Ridge Regression (L2 Regularization)
- Lasso Regression (L1 Regularization)

## Comparison of Approaches
- Gradient Descent vs. Normal Equation
- When to Use Each Method

## Conclusion & Future Scope
- Summary of Key Learnings
- Possible Enhancements (Polynomial Regression, Multiple Linear Regression)


## What is Linear Regression?

Linear Regression is a fundamental statistical and machine learning technique used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between input features and the target variable. The goal is to find the best-fitting straight line that minimizes the difference between the actual and predicted values.

### Importance and Applications

- **Predictive Modeling**: Used for forecasting and trend analysis, such as predicting house prices or sales revenue.
- **Interpretability**: Provides a clear understanding of the impact of each feature on the target variable.
- **Foundation for Advanced Models**: Serves as a building block for more complex regression techniques and machine learning algorithms.
- **Ease of Implementation**: Computationally efficient and easy to implement using various tools like NumPy and Scikit-Learn.


## Mathematical Foundation

Linear Regression is based on a strong mathematical foundation that defines how a model learns and makes predictions. This section covers the key mathematical components.

### The Linear Regression Equation

The fundamental equation for simple linear regression is:

$$
y = mx + b
$$

For multiple linear regression:

$$
y = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n
$$

where:
- $y$ is the predicted output.
- $x_i$ are the input features ($i$th feature of the dataset).
- $θ_i$ are the model parameters (weights associated with each feature).
- $θ_0$ is the bias term (intercept).

### Cost Function (Mean Squared Error - MSE)

To measure how well our model fits the data, we use the **Mean Squared Error (MSE)**:

$$
MSE = \frac{1}{m} \sum_{i=1}^{m} (y_i - \hat{y}_i)^2
$$

where:
- m is the number of training examples.
- $y_i$ is the actual value of the $i$th sample.
- $\hat{y}_i$ is the predicted value of the $i$th sample.

The goal is to minimize this cost function to improve model accuracy.

### Optimization using Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the cost function by updating model parameters iteratively:

$$
\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}
$$

where:
- α is the learning rate (step size in parameter updates).
- J(θ) is the cost function.

Gradient Descent ensures that we converge towards the optimal values of `$θ$`.

### Analytical Solution: Normal Equation

Instead of using Gradient Descent, we can directly compute the optimal parameters using the **Normal Equation**:

$$
\theta = (X^T X)^{-1} X^T y
$$

where:
- X is the matrix of input features.
- y is the output vector.

This method is computationally efficient for small datasets but may be slow for large datasets due to matrix inversion.


### Import Neccessary Libraries


In [None]:
import pandas as pd
import numpy as np

### Lost DataSet

In [None]:
data = pd.read_csv('gdrive/My Drive/Datasets/data_for_LR.csv')
data

Unnamed: 0,x,y
0,24.0,21.549452
1,50.0,47.464463
2,15.0,17.218656
3,38.0,36.586398
4,87.0,87.288984
...,...,...
695,58.0,58.595006
696,93.0,
697,82.0,88.603770
698,66.0,63.648685


In [None]:
data.dropna(inplace=True)
data

Unnamed: 0,x,y
0,24.0,21.549452
1,50.0,47.464463
2,15.0,17.218656
3,38.0,36.586398
4,87.0,87.288984
...,...,...
694,81.0,81.455447
695,58.0,58.595006
697,82.0,88.603770
698,66.0,63.648685


In [None]:
X = np.array(data.iloc[:,:-1])
Y = np.array(data.iloc[:,-1])

In [None]:
def splitData(X,Y,testPercentage= 0.2):
      trainPercentage = 1 - testPercentage
      Xtrain = []
      Ytrain = []
      Xtest = []
      Ytest = []

      # labels , counts  = np.unique(Y,return_counts=True)
      # # print(labels,counts)

      # for i in range(len(labels)):
      #   indices = np.where(Y == labels[i])
      #   Xt = X[indices]
      #   Yt = Y[indices]
      #   # print(Xt)
        # print(Yt)
      randomArray = np.arange(0,len(X))
      np.random.shuffle(randomArray)
      # print(randomArray)
      # print(round(trainPercentage * counts[i]))
      Xtrain_indices_After_shuffling = randomArray[0:(round(trainPercentage * len(X)))]
      Xtest_indices_After_shuffling = randomArray[(round(trainPercentage * len(X))): ]
      Xtrain_shuffle = X[Xtrain_indices_After_shuffling]
      Xtest_shuffle =  X[Xtest_indices_After_shuffling]

      Xtrain.extend(Xtrain_shuffle)
      Ytrain.extend(Y[Xtrain_indices_After_shuffling])
      Xtest.extend(Xtest_shuffle)
      Ytest.extend(Y[Xtest_indices_After_shuffling])


      Xtrain = np.array(Xtrain)
      Ytrain = np.array(Ytrain)
      Xtest = np.array(Xtest)
      Ytest = np.array(Ytest)

      return Xtrain,Ytrain,Xtest,Ytest


In [None]:
# Function to split the dataset
Xtrain,Ytrain,Xtest,Ytest = splitData(X,Y)
print (" Training Data Set Dimensions=", Xtrain.shape, "Training True Class labels dimensions", Ytrain.shape)
print (" Test Data Set Dimensions=", Xtest.shape, "Test True Class labels dimensions", Ytest.shape)


 Training Data Set Dimensions= (551, 1) Training True Class labels dimensions (551,)
 Test Data Set Dimensions= (138, 1) Test True Class labels dimensions (138,)


# **Gradient Descent for Linear Regression**

A **Linear Regression** model can be trained using **Gradient Descent**, an optimization algorithm that iteratively updates the model’s parameters to reduce the **Mean Squared Error (MSE)**. The goal is to update ( $ \theta_1 \ $) and ( $\theta_2 \ $) to minimize the cost function and achieve the best-fit line.

The idea is:
1. Start with random values of ( $ \theta_1 \ $) and ( $ \theta_2 \ $).
2. Iteratively update these values to reach the minimum cost.

## **Gradient and Derivatives**
A **gradient** is the derivative of a function that tells us how outputs change with a small change in inputs.

### **Computing Partial Derivatives of Cost Function**
The cost function is:

$$
J(\theta_1, \theta_2) = \frac{1}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2
$$

#### **Derivative w.r.t. \( $\theta_1 \$)**

$$
J'_{\theta_1} = \frac{\partial J(\theta_1, \theta_2)}{\partial \theta_1}
$$

$$
= \frac{\partial}{\partial \theta_1} \left[ \frac{1}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 \right]
$$

$$
= \frac{1}{n} \sum_{i=1}^{n} 2( \hat{y}_i - y_i ) \frac{\partial}{\partial \theta_1} ( \theta_1 + \theta_2 x_i - y_i )
$$

$$
= \frac{1}{n} \sum_{i=1}^{n} 2( \hat{y}_i - y_i ) (1)
$$

$$
= \frac{2}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )
$$

#### **Derivative w.r.t. ($\theta_2 \$)**

$$
J'_{\theta_2} = \frac{\partial J(\theta_1, \theta_2)}{\partial \theta_2}
$$

$$
= \frac{\partial}{\partial \theta_2} \left[ \frac{1}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )^2 \right]
$$

$$
= \frac{1}{n} \sum_{i=1}^{n} 2( \hat{y}_i - y_i ) \frac{\partial}{\partial \theta_2} ( \theta_1 + \theta_2 x_i - y_i )
$$

$$
= \frac{1}{n} \sum_{i=1}^{n} 2( \hat{y}_i - y_i ) x_i
$$

$$
= \frac{2}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i ) \cdot x_i
$$

---

## **Updating Parameters Using Gradient Descent**
The objective of **Linear Regression** is to find the best coefficients. We update parameters by taking steps in the **negative direction** of the gradient:

$$
\theta_1 = \theta_1 - \alpha \cdot J'_{\theta_1}
$$

$$
= \theta_1 - \alpha \cdot \frac{2}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i )
$$

$$
\theta_2 = \theta_2 - \alpha \cdot J'_{\theta_2}
$$

$$
= \theta_2 - \alpha \cdot \frac{2}{n} \sum_{i=1}^{n} ( \hat{y}_i - y_i ) \cdot x_i
$$

where:
- \( $\alpha\$) is the **learning rate** that determines step size.

---

### **Conclusion**
- **Gradient Descent** helps find the optimal values of ($ \theta_1\ $) and
($ \theta_2\ $).
- We move in the **negative gradient direction** to minimize the cost function.
- The choice of **learning rate** (\($ \alpha \ $)) affects convergence speed.

This is the core idea of **Gradient Descent in Linear Regression**! 🚀


In [None]:


class LinearRegression:
    """
    A simple implementation of Linear Regression using Gradient Descent with optional Regularization.

    Attributes:
        epochs (int): Number of iterations for gradient descent.
        lr (float): Learning rate.
        lambda_ (float): Regularization strength.
        loss (str): Type of loss function ("ridge" for L2, "lasso" for L1, or None for normal MSE).
        weights (np.ndarray): Model coefficients.
        bias (float): Model bias term.
    """

    def __init__(self, epochs=500, lr=0.01, lambda_=0.1, loss=None):
        """
        Initializes the Linear Regression model.

        Args:
            epochs (int): Number of training iterations.
            lr (float): Learning rate.
            lambda_ (float): Regularization strength.
            loss (str): "ridge" for Ridge Regression, "lasso" for Lasso Regression, or None for normal MSE.
        """
        self.epochs = epochs
        self.lr = lr
        self.lambda_ = lambda_
        self.loss = loss
        self.weights = None
        self.bias = None
        self.nexamples = None

    def hypothesis(self):
        """
        Computes predictions using the current weights and bias.

        Returns:
            np.ndarray: Predicted values.
        """
        return np.dot(self.X, self.weights) + self.bias

    def loss_function(self):
        """
        Computes Mean Squared Error (MSE) loss with optional regularization.

        Returns:
            float: The computed loss value.
        """
        y_pred = self.hypothesis().reshape(self.nexamples,1)
        loss = np.mean((self.Y - y_pred) ** 2)

        if self.loss == "ridge":
            return loss + self.Ridge_Regression()
        elif self.loss == "lasso":
            return loss + self.Lasso_Regression()
        return loss  # No regularization

    def gradient_descent(self):
        """
        Computes the gradients for weights and bias.

        Returns:
            tuple: Gradients for weights and bias.
        """
        y_pred = self.hypothesis().reshape(self.nexamples,1)
        error = (self.Y - y_pred)
        print("error shape : ",error.shape)
        print("After error * self.X  Shape : ",(self.X).shape)
        new_thetas = -2 / self.nexamples * np.sum(error * self.X, axis=0)
        new_bias = -np.mean(error)

        return new_thetas, new_bias

    def Ridge_Regression(self):
        """
        Computes L2 (Ridge) Regularization loss.

        Returns:
            float: L2 regularization loss.
        """
        return self.lambda_ * np.sum(self.weights ** 2)

    def Lasso_Regression(self):
        """
        Computes L1 (Lasso) Regularization loss.

        Returns:
            float: L1 regularization loss.
        """
        return self.lambda_ * np.sum(abs(self.weights))

    def train(self, X, Y):
        """
        Trains the model using Gradient Descent.

        Args:
            X (np.ndarray): Training features (num_samples, num_features).
            Y (np.ndarray): Target values (num_samples,).
        """
        X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
        self.nexamples, nfeatures = X.shape
        self.weights = np.zeros(nfeatures)
        self.bias = 0
        self.X = X
        self.Y = Y.reshape(self.nexamples,1)
        print(self.X.shape ,"     ",self.Y.shape)
        prev_loss = float('inf')
        tol = 1e-3
        losses = []
        for epoch in range(self.epochs):
            new_thetas, new_bias = self.gradient_descent()

            # Update weights based on regularization type
            if self.loss == "ridge":
                self.weights -= self.lr * (new_thetas + (2 * self.lambda_ * self.weights))  # L2 Regularization
            elif self.loss == "lasso":
                self.weights -= self.lr * (new_thetas + (self.lambda_ * np.sign(self.weights)))  # L1 Regularization
            else:
                self.weights -= self.lr * new_thetas  # No Regularization

            # Bias is updated normally (no regularization)
            self.bias -= self.lr * new_bias

            loss = self.loss_function()
            losses.append(loss)
            print("Loss : ",loss)
            # Early stopping
            # if abs(prev_loss - loss) < tol:
            #     print(f"Early stopping at epoch {epoch+1}")
            #     break

            # prev_loss = loss

    def normalize_features(self, X):
        """
        Standardizes the features using mean and standard deviation.
        Args:
            X (np.ndarray): Input features.
        Returns:
            np.ndarray: Normalized features.
        """
        return (X - np.mean(X, axis=0)) / np.std(X, axis=0)

    def r2_score(self, Y_true, Y_pred):
        """
        Computes the R-squared (coefficient of determination) score.
        Args:
            Y_true (np.ndarray): Actual values.
            Y_pred (np.ndarray): Predicted values.
        Returns:
            float: R² score.
        """
        ss_total = np.sum((Y_true - np.mean(Y_true)) ** 2)
        ss_residual = np.sum((Y_true - Y_pred) ** 2)
        return 1 - (ss_residual / ss_total)
    def plot_learning_curve(self, losses):
        """
        Plots the training loss over epochs.
        Args:
            losses (list): Loss values recorded over iterations.
        """
        import matplotlib.pyplot as plt
        plt.plot(range(len(losses)), losses, label="Training Loss")
        plt.xlabel("Epochs")
        plt.ylabel("Loss")
        plt.title("Learning Curve")
        plt.legend()
        plt.show()

    def get_params(self):
        """
        Returns the model parameters (weights & bias).
        """
        return {"weights": self.weights, "bias": self.bias}


    def predict(self, X):
        """
        Predicts output values for given input features.

        Args:
            X (np.ndarray): Input features for prediction.

        Returns:
            np.ndarray: Predicted values.
        """
        return np.dot(X, self.weights) + self.bias


In [None]:
LR = LinearRegression(epochs = 500,lr=0.01)
LR.train(Xtrain,Ytrain)
LR.loss_function()
LR.get_params()

(551, 1)       (551, 1)
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  3213.5913968620152
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  3133.9424853878113
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  3056.5064198344353
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2981.2143023977937
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2907.999590882041
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2836.798012835108
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2767.5474829368636
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2700.188023514024
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2634.661688060885
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss :  2570.912487649652
error shape :  (551, 1)
After error * self.X  Shape :  (551, 1)
Loss : 

{'weights': array([28.94658446]), 'bias': 49.16940274610614}