<h1 style="text-align: center;">Implementing</h1>
<h2 style="text-align: center;">A Simple Linear Regression Model</h2>
<h3 style="text-align: center;">with Object Oriented Programing Class</h3>

I implement a simple linear regression from scratch without using scikit learn or sklearn library. I first show the whole code, then later explain each module line by line then asnwer one question below.

## content:

  Import numpy library.
1. Class Definition
2. __init__ Method (Constructor)
3. Fit Method
4. Predict Method
5. Rsquared Method
6. Example Usage
7. Summary
8. Question
9. Update the fit method
10. Update the predict method

In [8]:
import numpy as np

class LinearRegression:
    def __init__(self):
        # Initialize the coefficients and intercept, 
        # these will be calculated after fitting the model
        self.intercept = None
        self.coefficient = None

    def fit(self, X, y):
        # Add a column of ones for the intercept (bias term)
        ones = np.ones((len(X),1))
        X = np.concatenate((ones, X), axis=1)

        # Calculate the Normal Equation (no need to memorize):
        # (X^T * X)^−1 * X^T * y
        XT = X.T # Transpose X
        XTX = XT.dot(X) # X^T * X
        XTX_inv = np.linalg.inv(XTX) # Inverse of (X^T * X)
        XTy = XT.dot(y) # X^T * y
        self.coefficient = XTX_inv.dot(XTy) # Calculate the coefficients
        
    def predict(self, X):
        # Add a column of ones to match the structure used in fitting the model
        ones = np.ones((len(X),1))
        X = np.concatenate((ones, X), axis = 1)

        # Use the calculated coefficients to make predictions
        return X.dot(self.coefficient)
        
    def Rsquared(self, X, y):
        # R^2 = 1 - (SSR/SST)
        # Calculate R-squared to evaluate model performance
        ypred = self.predict(X)
        # SST = (Y - MEAN(Y))**2
        ss_total = np.sum((y - np.mean(y))**2) # Total sum of squares
        # SSR = (Y - Y^)**2
        ss_residual = np.sum((y - ypred)**2) # Residual sum of squares

        return 1 - (ss_residual/ss_total) # R-squared formula


import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

# Initialize and fit the model
model = LinearRegression()
model.fit(X, y)

# Make predictions and calculate R-squared
ypred = model.predict(X)
print(ypred)  # Predicted values

print(model.Rsquared(X, y))  # R-squared value

[1. 2. 3. 4. 5.]
1.0


### Import numpy library for array manipulation

In [9]:
import numpy as np

The class in the image is implementing a basic Linear Regression model from scratch in Python using NumPy. Below is an explanation of each part of the class and methods in detail.


## 1. Class Definition:

In [None]:
class LinearRegression:

This line defines a new class called LinearRegression. This class will have methods for fitting a linear regression model, making predictions, and calculating the R-squared value.

## 2. __init__ Method (Constructor):

In [None]:
def __init__(self):
    # Initialize the coefficients and intercept, 
    # these will be calculated after fitting the model
    self.intercept = None
    self.coefficient = None

This is the constructor method that is called when an instance of the class is created.
It initializes two instance variables:
self.intercept: This will store the intercept (or bias term) after the model is trained.
self.coefficients: This will store the coefficients (weights) of the features in the model.
Initially, both are set to None because they will be calculated after the model is fitted.

## 3. fit Method:

In [None]:
def fit(self, X, y):
    # Add a column of ones for the intercept (bias term)
    ones = np.ones((len(X),1))
    X = np.concatenate((ones, X), axis=1)

    # Calculate the Normal Equation (no need to memorize):
    # (X^T * X)^−1 * X^T * y
    XT = X.T # Transpose X
    XTX = XT.dot(X) # X^T * X
    XTX_inv = np.linalg.inv(XTX) # Inverse of (X^T * X)
    XTy = XT.dot(y) # X^T * y
    self.coefficient = XTX_inv.dot(XTy) # Calculate the coefficients

The fit method calculates the coefficients of the linear regression model using the Normal Equation:

Steps:

Adding Ones for Intercept:

A column of ones is added to the input matrix X to account for the intercept (bias term) in the linear regression equation.

np.ones((len(X), 1)) creates a column vector of ones with the same number of rows as X.

The ones are concatenated to X using np.concatenate().

Normal Equation:

The coefficients are calculated using the Normal Equation:

𝜃 = (𝑋𝑇𝑋)^−1𝑋𝑇𝑦

XT = X.T: This is the transpose of matrix X.

XTX = XT.dot(X): This computes 𝑋𝑇𝑋.

XTX_inv = np.linalg.inv(XTX): This calculates the inverse of 𝑋𝑇𝑋.

XTy = XT.dot(y): This computes 𝑋𝑇𝑦.

Finally, the coefficients are computed as XTX_inv.dot(XTy).

The result is stored in self.coefficients.

## 4. predict Method:

In [None]:
def predict(self, X):
    # Add a column of ones to match the structure used in fitting the model
    ones = np.ones((len(X),1))
    X = np.concatenate((ones, X), axis = 1)

    # Use the calculated coefficients to make predictions
    return X.dot(self.coefficient)

The predict method takes in new input data X and makes predictions using the coefficients calculated by the fit method.

Steps:
Adding Ones for Intercept:

Just like in the fit method, it adds a column of ones to the input X to account for the intercept term.

Prediction:

It multiplies X with the calculated coefficients using matrix multiplication (X.dot(self.coefficients)).

This gives the predicted values (outputs) for the input X.

## 5. Rsquared Method:

In [None]:
def Rsquared(self, X, y):
    # R^2 = 1 - (SSR/SST)
    # Calculate R-squared to evaluate model performance
    ypred = self.predict(X)
    # SST = (Y - MEAN(Y))**2
    ss_total = np.sum((y - np.mean(y))**2) # Total sum of squares
    # SSR = (Y - Y^)**2
    ss_residual = np.sum((y - ypred)**2) # Residual sum of squares

    return 1 - (ss_residual/ss_total) # R-squared formula

The Rsquared method calculates the R-squared value, which is a measure of how well the model fits the data. The R-squared value ranges between 0 and 1, with 1 meaning a perfect fit.

Steps:

Prediction:

It first makes predictions using the predict method and stores the predicted values in ypred.

Total Sum of Squares (ss_total):

This measures the total variance in the dependent variable y.

It's calculated as the sum of squared differences between y and the mean of y:
𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = ∑(𝑦 − mean(𝑦))2
 
Residual Sum of Squares (ss_residual):

This measures the error (residuals) between the true values and the predicted values.

It's calculated as the sum of squared differences between y and ypred:
𝑆𝑆𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = ∑(𝑦−𝑦^)2
 
R-squared Formula:

The R-squared value is computed as:
𝑅2 = 1 − (𝑆𝑆𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙/𝑆𝑆𝑡𝑜𝑡𝑎𝑙)
 
It returns this value as the output.

## 6. Example Usage:

The bottom part of the code demonstrates how to use the class:

In [None]:
import numpy as np

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

# Initialize and fit the model
model = LinearRegression()
model.fit(X, y)

# Make predictions and calculate R-squared
ypred = model.predict(X)
print(ypred)  # Predicted values

print(model.Rsquared(X, y))  # R-squared value

Data: The input X is a set of feature values, and y is the target variable.
    
Model Initialization: A LinearRegression model is instantiated.

Model Fitting: The model is trained using the fit method with the provided X and y.
    
Predictions: The predict method is used to generate predictions for X.
                                                               
R-squared: Finally, the Rsquared method calculates how well the model fits the data.
                                                               
Summary:
                                                               
__init__: Initializes intercept and coefficients to None.
                                                               
fit: Fits the linear regression model by computing the coefficients using the Normal Equation.

predict: Predicts values using the calculated coefficients.

Rsquared: Calculates the R-squared value to measure the model's performance.

This is a simple implementation of linear regression, which manually calculates the coefficients using matrix algebra rather than relying on machine learning libraries like scikit-learn.

## one question:

In the code intercept was initialized but never used, why?

The intercept is initialized in the __init__ method:

In [6]:
self.intercept = None

However, the code does not explicitly calculate or store the intercept separately. Instead, it includes the intercept (bias term) in the coefficients array by adding a column of ones to the feature matrix X in both the fit and predict methods.

Why the intercept isn't used directly:

Intercept as Part of Coefficients:

In linear regression, the intercept (bias term) is just a constant that shifts the regression line. The code accounts for this by adding a column of ones to X before calculating the coefficients.

The first coefficient in the self.coefficients array represents the intercept.
    
No Need to Store Separately:

Since the intercept is implicitly included as the first value in the coefficients array, there's no need to store it separately as self.intercept.

To make use of the intercept explicitly:

If you want to make the intercept more explicit, you can modify the code to store it separately:

# Modified Code:

## 1. Update the fit method:

In [None]:
def fit(self, X, y):
    # Add a column of ones for the intercept (bias term)
    ones = np.ones((len(X), 1))
    X = np.concatenate((ones, X), axis=1)

    # Calculate the Normal Equation
    XT = X.T  # Transpose of X
    XTX = XT.dot(X)  # X^T * X
    XTX_inv = np.linalg.inv(XTX)  # Inverse of (X^T * X)
    XTy = XT.dot(y)  # X^T * y
    self.coefficients = XTX_inv.dot(XTy)  # Calculate the coefficients
    
    # Store the intercept separately
    self.intercept = self.coefficients[0]  # First coefficient is the intercept
    self.coefficients = self.coefficients[1:]  # Remaining coefficients are for the features

## 2. Update the predict method:

In [None]:
def predict(self, X):
    # No need to add a column of ones, intercept is now separate
    return X.dot(self.coefficients) + self.intercept

## What changed:
In fit: The first coefficient is now stored in self.intercept. The remaining coefficients for the features are stored in self.coefficients.

In predict: The intercept is added separately when making predictions.
Now, the intercept is explicitly stored and used during prediction!