# Least Squares Regression

Least Square regression is a determinstic model which means that unlike other stochastic model the output or the calculated weights does not depend on the state of the algorithm rather they solely depend on the input data.

## Curve Fitting

The method least square can be better understood by understanding the term curev fitting. In curve fitting, we try to "fit" a kind of curve onto a set of data points given to us as inputs. The curve can be of any kind for example a straight line, a quadratic or cubic curve and even a non standard curve.

<img src="https://machinelearningmastery.com/wp-content/uploads/2020/10/Plot-of-Straight-Line-Fit-to-Economic-Dataset-1024x768.png" width="500">

In the method of Least Square we try to fit a staright line onto the data points by minimizing the squared difference between the predicted value and the observed value of a given dependent variable.

# Math behind Least Square Regression (Multi-Linear)

The model is fairly simple and it only relies on some basic matrix calculation (and also some partial derivatives but that won't be detailed in this notebook)

## The Model

To simplify the derivations we can omit the intercept term by centering the data as shown below:  



$$y_i = \beta_0 + \beta_1x_1 \\ \bar{y} = \beta_0 + \beta_1\bar{x} \\ y_i - \bar{y} = 0 + \beta_1 \left( x_i - \bar{x} \right)$$  



By using this fact we continue our analysis while omiting intercept.

The general multi-linear model:



$$y_i = \beta_1 x_1 + \beta_2 x_2 +\dots+ \beta_kx_k + \epsilon_i$$  

$$y_i = \left [x_1,x_2,\dots,x_k \right ] \begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix} + \epsilon_i$$  

$$y_i = x^T \beta + \epsilon_i$$



Now, We can write the above equation n times for the n observation in the data:    



$$\begin{split}\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_n \end{bmatrix} &=
\begin{bmatrix} x_{1,1} & x_{1,2} & \ldots & x_{1,k}\\
                x_{2,1} & x_{2,2} & \ldots & x_{2,k}\\
                \vdots  & \vdots  & \vdots & \vdots\\
                x_{n,1} & x_{n,2} & \ldots & x_{n,k}\\
\end{bmatrix}
\begin{bmatrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{bmatrix} +
\begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots \\ \epsilon_n \end{bmatrix}\end{split}$$



Now, Using matrix notation we can write the generated model as:  



$$y = Xb + \epsilon$$  

Where:  

y $\rightarrow$ Matrix of predictions $(n\times 1)$   

X $\rightarrow$ Matrix of feature values $(n\times k)$  

b $\rightarrow$ Matrix of Coefficients $(n\times 1)$  

$\epsilon \rightarrow$ Matrix of error terms $(n\times 1)$ 

## What is Least Squares in Least Squares:

In a least squares model, we aim to minimize the sum of squares of the errors in vector(matrix) $\epsilon$ . This least squares objective function can be written as:  



$$f(b) = \epsilon^T\epsilon\\\Rightarrow \left ( y - Xb \right)^T \left( y - Xb \right)\\ \Rightarrow y^Ty - 2y^TXb+bX^TXb$$   



By taking partial derivatives with respect to b and equate that to a vector of zeros to get the minimizing condition, We get:  



$$b = (X^TX)^{-1}X^Ty$$



And Bingo, This equation is the Crux of Least Square model. By using this equation we calculate all the required coefficients as long as the inverse of $(X^TX)$ exists.

### Note: 
While implementing the model we just need to concatenate the data with a vector of one with shape $(n \times 1)$. by doing this we force the model to calculate the intercept term.



$$\begin{bmatrix} x_{1,1} & x_{1,2} & \ldots & x_{1,k}\\
                x_{2,1} & x_{2,2} & \ldots & x_{2,k}\\
                \vdots  & \vdots  & \vdots & \vdots\\
                x_{n,1} & x_{n,2} & \ldots & x_{n,k}\\
\end{bmatrix} \Rightarrow \begin{bmatrix} 
                1 & x_{1,1} & x_{1,2} & \ldots & x_{1,k}\\
                1 & x_{2,1} & x_{2,2} & \ldots & x_{2,k}\\
                \vdots & \vdots  & \vdots  & \vdots & \vdots\\
                1 & x_{n,1} & x_{n,2} & \ldots & x_{n,k}\\
\end{bmatrix}$$

# Implementing Least Squares Regression from scratch.

In [1]:
#First we need a sample dataset on which we can test our algorithms
#Using sklearn to create a random regression problem
from sklearn.datasets import make_regression

X,y = make_regression(n_samples=200,
                      n_features=5, 
                      n_targets=1, 
                      noise = 10,
                      random_state=42)

#looking at the generated Data
X[:5,:] #First Five rows

array([[-0.3853136 ,  0.1990597 , -0.60021688,  0.46210347,  0.06980208],
       [ 0.13074058,  1.6324113 , -1.43014138, -1.24778318, -0.44004449],
       [-0.77300978,  0.22409248,  0.0125924 , -0.40122047,  0.0976761 ],
       [-0.57677133, -0.05023811, -0.23894805,  0.27045683, -0.90756366],
       [-0.57581824,  0.6141667 ,  0.75750771, -0.2209696 , -0.53050115]])

In [2]:
#Creating the Regression Algorithm
import numpy as np
from numpy.linalg import inv

class MyLeastSquares():

  def __init__(self):
    self.coef = None #initializing a empty variable to store coefficients
    self.intercept = None #initializing a empty variable to store intercept

  def _concat_ones(self,X):
    ones = np.ones(shape = X.shape[0]).reshape(-1,1) #Creating the method to concatenate ones
    return np.concatenate((ones,X), axis = 1)

  def fit(self,X,y): #Method to fit the model to a given data
    if len(X.shape) == 1:
      X = X.reshape(-1,1)
    
    X = self._concat_ones(X)
    self.coef = inv(X.transpose().dot(X)).dot(X.transpose()).dot(y)
    self.intercept = self.coef[0]
    self.coef = self.coef[1:]

  def predict(self,x): #Method for prediction
    if type(x) != 'numpy.ndarray': 
      x = np.array(x)

    return self.intercept + x.dot(self.coef) #Calculate and return prediction.

In [3]:
#Now, we can test our Linear Regression
model = MyLeastSquares() #Creating an instances
model.fit(X,y) #Fitting the model

#Lets see the coefficients and the intercept
print('The Calculated Model:')
print('The Coefficients = {}'.format(model.coef.round(3)))
print('The Intercept {:.3f}'.format(model.intercept))

The Calculated Model:
The Coefficients = [ 3.326 10.661 64.132 17.723 70.294]
The Intercept 0.615


In [4]:
#Now, lets try to predict
test = X[42,:] #taking the 42 entry from the data to predict
print('Test Entry = {}'.format(test.round(3)))
print('The Acutal Value = {}'.format(y.round(3)[42]))
y_pred = model.predict(test)
print('The Predicted Value = {}'.format(y_pred))

Test Entry = [ 0.622 -0.562  0.632  0.708  0.973]
The Acutal Value = 105.326
The Predicted Value = 118.1528761494485


In [5]:
#lets see the predicted value for 10 randomly selected entries
choices = np.random.choice(200, size=10, replace=False)
batch_test = X[choices, :]
y_actual = y[choices]
y_actual = y_actual.reshape(-1,1)

y_preds = []
for test in batch_test:
  y_preds.append(round(model.predict(test),3))

#Making a dataframe of actual and predicted results
y_preds = np.array(y_preds).reshape(-1,1)
import pandas as pd
df = pd.DataFrame(np.concatenate((y_actual,y_preds), axis = 1), columns = ['Actual', 'Predicted'])
df['Error(Residual)'] = df['Actual'] - df['Predicted']
df

Unnamed: 0,Actual,Predicted,Error(Residual)
0,13.842149,6.374,7.468149
1,-185.594177,-189.141,3.546823
2,-19.674477,-21.835,2.160523
3,-6.146519,-5.846,-0.300519
4,26.59679,20.682,5.91479
5,41.386697,31.078,10.308697
6,-79.724308,-81.383,1.658692
7,-26.445052,-32.852,6.406948
8,-10.795434,-16.033,5.237566
9,64.733314,64.82,-0.086686


We have successfully implemented the Least Square Algorithm, we can also check that our algorithm works as it is supposed to by comparing the calculated model by our algorithm with that of calculated by sklearn Linear Regression Since sklearn Linear Regression is also a implementation of Least Square.

In [6]:
#Calculating Least Square model using sklearn
from sklearn.linear_model import LinearRegression
model2 = LinearRegression()
model2.fit(X,y)

#Lets see the coefficients and the intercept
print('The Model Using Sklearn:')
print('The Coefficients = {}'.format(model2.coef_.round(3)))
print('The Intercept {:.3f}'.format(model2.intercept_))

The Model Using Sklearn:
The Coefficients = [ 3.326 10.661 64.132 17.723 70.294]
The Intercept 0.615


As, we can see these values are exactly same as calculated by our implementation