<h1><i> CREATING LINEAR REGRESSION MODEL FROM SCRATCH </i></h1>

<h3>What is Linear Regression?</h3>
<p>
  Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting line that represents the relationship between these variables. Mathematically, linear regression aims to minimize the sum of squared differences between the observed data points and the predicted values on the line.
</p>

<p>
  Let's consider a simple case of simple linear regression with one independent variable and one dependent variable. We'll denote the independent variable as "X" and the dependent variable as "Y." We want to find a linear equation of the form:
</p>

<p><b>
  Y = β₀ + β₁X
</b></p>

<p>
  where β₀ is the y-intercept (the value of Y when X is 0) and β₁ is the slope of the line.
</p>

<p>
  To find the best-fitting line, we need to estimate the values of β₀ and β₁ that minimize the sum of squared differences. This process is commonly known as the method of least squares. Here's how we can do it step by step:
</p>

<ol>
  <li>Calculate the means of X and Y, denoted as X̄ and Ȳ, respectively.</li>
    <li>Calculate the differences between each X value and X̄, and denote it as <b>(Xᵢ - X̄)</b>.</li>
    <li>Calculate the differences between each Y value and Ȳ, and denote it as <b>(Yᵢ - Ȳ)</b>.</li>
  <li>Calculate the sum of the products of the differences calculated in steps 2 and 3, denoted as <b>Σ((Xᵢ - X̄)(Yᵢ - Ȳ))</b>.</li>
  <li>Calculate the sum of the squared differences between each X value and X̄, denoted as <b>Σ((Xᵢ - X̄)²)</b>.</li>
  <li>Now, we can calculate the slope β₁ using the following formula:</li>
</ol>

<p><b>
  β₁ = Σ((Xᵢ - X̄)(Yᵢ - Ȳ)) / Σ((Xᵢ - X̄)²)
</b></p>

<p>
  With β₁ known, we can calculate the y-intercept β₀ using the formula:
</p>

<p><b>
  β₀ = Ȳ - β₁X̄
</b></p>

<p>
  Once we have obtained the values of β₀ and β₁, we can write the final equation for the best-fitting line:
</p>

<p><b>
  Y = β₀ + β₁X
</b></p>


 <ol><b><i> Single-Input linear Regression </i></b></ol>

In [1]:
import numpy as np

In [2]:
class LinearRegression:
    
    def __init__(self):
        self.coefficients = None

    def fit(self, X, y):
        """
        Fits the linear regression model to the given training data.

        Args:
            X (array-like): The input features.
            y (array-like): The target values.
        """
        
        X = np.array(X)
        print('X: ', X)
        y = np.array(y)
        print('y: ', y)

        X_mean = np.mean(X)
        print('X_mean: ', X_mean)
        y_mean = np.mean(y)
        print('y_mean: ', y_mean)

        numerator = np.sum((X - X_mean) * (y - y_mean))
        print('Numerator: ', numerator)
        denominator = np.sum((X - X_mean) ** 2)
        print('Denominator: ', denominator)

        self.coefficients = numerator / denominator
        print('Coefficients: ', self.coefficients)
        self.intercept = y_mean - self.coefficients * X_mean
        print('Intercept: ', self.intercept)

    def predict(self, X):
        """
        Predicts the target values for the given input features.

        Args:
            X (array-like): The input features.

        Returns:
            array-like: The predicted target values.
        """
        
        X = np.array(X)
        predicted = self.intercept + self.coefficients * X
        return predicted


In [3]:
# Create an instance of LinearRegression
regression = LinearRegression()

# Generate some example data
X = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

# Fit the model to the data
regression.fit(X, y)

# Predict using the trained model
X_test = [6, 7, 8]
predictions = regression.predict(X_test)
print('Predicted: ', predictions)

X:  [1 2 3 4 5]
y:  [2 4 5 4 5]
X_mean:  3.0
y_mean:  4.0
Numerator:  6.0
Denominator:  10.0
Coefficients:  0.6
Intercept:  2.2
Predicted:  [5.8 6.4 7. ]


<ol><b><i> Multi-Input linear Regression </i></b></ol>

<p>For multiple independent variables (X values), the single-linear regression can no longer be used and this is problem is now known as multiple linear regression problem. The mathematical formulation changes slightly to accommodate multiple X values.</p>

<p>The multiple linear regression equation can be written as:</p>

<p><b>Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ</b></p>

<p>where p is the number of independent variables, and X₁, X₂, ..., Xₚ are the values of the independent variables.</p>


In [4]:
class LinearRegression:
    
    def __init__(self):
        self.coefficients = None

    def fit(self, X, y):
        """
        Fits the linear regression model to the training data.

        Args:
            X (array-like): The feature matrix.
            y (array-like): The target values.
        """
        
        X = np.array(X)
        print('X:', X)
        y = np.array(y)
        print('y:', y)

        ones_column = np.ones((X.shape[0], 1))
        print('ones_column:', ones_column)
        X = np.concatenate((ones_column, X), axis=1)
        print('X:', X)

        X_transpose_X = X.T.dot(X)
        print('X_transpose_X:', X_transpose_X)
        X_transpose_y = X.T.dot(y)
        print('X_transpose_y:', X_transpose_y)
        # Solve the equation X^T * X * coefficients = X^T * y for coefficients
        coefficients = self.solve_equation(X_transpose_X, X_transpose_y)

        self.coefficients = coefficients
        print('Coefficients:', self.coefficients)

    def solve_equation(self, A, b):
        """
        Solves the equation Ax = b for x.

        Args:
            A (array-like): The coefficient matrix.
            b (array-like): The target values.

        Returns:
            array-like: The solution vector x.
        """
        
        # Perform Gaussian elimination [dont want to use np.linalg.inv]
        n = A.shape[0]
        augmented = np.concatenate((A, b.reshape(n, 1)), axis=1)
        for i in range(n):
            pivot_row = augmented[i]
            pivot = pivot_row[i]

            if pivot != 0:
                pivot_row /= pivot

                for j in range(i + 1, n):
                    row = augmented[j]
                    multiplier = row[i]
                    row -= pivot_row * multiplier

        # Back-substitution
        x = np.zeros(n)
        for i in range(n - 1, -1, -1):
            row = augmented[i]
            x[i] = row[-1]
            for j in range(i + 1, n):
                x[i] -= row[j] * x[j]

        return x

    def predict(self, X):
        X = np.array(X)
        ones_column = np.ones((X.shape[0], 1))
        X = np.concatenate((ones_column, X), axis=1)
        return X.dot(self.coefficients)


In [5]:
# Create an instance of LinearRegression
regression = LinearRegression()

# Generate some example data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [3, 5, 7, 8, 10]

# Fit the model to the data
regression.fit(X, y)

# Predict using the trained model
X_test = [[6, 7], [7, 8]]
predictions = regression.predict(X_test)

print('Predicted: ', predictions)


X: [[1 2]
 [2 3]
 [3 4]
 [4 5]
 [5 6]]
y: [ 3  5  7  8 10]
ones_column: [[1.]
 [1.]
 [1.]
 [1.]
 [1.]]
X: [[1. 1. 2.]
 [1. 2. 3.]
 [1. 3. 4.]
 [1. 4. 5.]
 [1. 5. 6.]]
X_transpose_X: [[ 5. 15. 20.]
 [15. 55. 70.]
 [20. 70. 90.]]
X_transpose_y: [ 33. 116. 149.]
Coefficients: [1.5 1.7 0. ]
Predicted:  [11.7 13.4]
