## Problem statement : 
> How gradient descent works for linear regression problem

## Objectives : 
- Implement Linear Regression with Gradient Descent.

- Use necessary visualisations, plots, and other relevant factors to prove that the model's loss is converging to the minima. 

- Check whether the number of iterations (epochs) or learning rate change the way in which the algorithm is trained. 

- Compare the trained model with any OLS based linear regression model and comment on your learnings.

Mean sqaured error : 

$((w*x + b) - y)^2$


In [2]:
!pip install sklearn

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sklearn
  Downloading sklearn-0.0.post4.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone
  Created wheel for sklearn: filename=sklearn-0.0.post4-py3-none-any.whl size=2973 sha256=155526918a02241a5b0ceee370087341578caa372d43004ae957b3a995faebb1
  Stored in directory: /root/.cache/pip/wheels/d5/b2/a9/590d15767d34955f20a9a033e8db973b79cb5672d95790c0a9
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0.post4


In [3]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

ImportError: ignored

In [None]:
boston_dataset = load_boston()
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)
boston['MEDV'] = boston_dataset.target

In [None]:
def y_cap(w,b,x):
  y = w*x + b
  return y

In [None]:
def derivative_w(n,w,b,X,Y):
  delW = 0
  for i in range(0,n):
    del_w = (Y[i] - y_cap(w,b,X[i])) * X[i]
    delW += (- del_w*2)
  return delW/n


In [None]:
def derivative_b(n,w,b,X,Y):
  delB = 0
  for i in range(0,n):
    del_b = (Y[i] - y_cap(w,b,X[i]))
    delB += (- del_b*2)
  return delB/n

In [None]:
def loss_function(n,w,b,X,Y):
  loss = 0
  for i in range(0,n):
    loss_ = (Y[i] - y_cap(w,b,X[i]))**2
    loss += loss_
  return loss/(n)

In [None]:
'''epoch = [i for i in range(random.randint())]
alpha = 0.1'''

scaler = MinMaxScaler()
X = scaler.fit_transform(boston[['INDUS']])
Y = scaler.fit_transform(boston[['MEDV']])
n = len(boston['INDUS'])

In [None]:

# Epoch and alpha are hyper parameters
epochs = [10, 50, 100]
alphas = [0.01, 0.1, 1]

for epoch in epochs:
  
    for alpha in alphas:
        b = [0]
        w = [0]
        list_of_loss = []
        list_of_w = []
        list_of_b = []
        for i in range(epoch):
              loss = loss_function(n,w[0],b[0],X, Y)
              list_of_loss.append(loss)
              list_of_w.append(w)
              list_of_b.append(b)

              w -= derivative_w(n,w,b,X,Y)*alpha
              b -= derivative_b(n,w,b,X,Y)*alpha
        
        fig, axs = plt.subplots(2, 2, figsize=(10, 8))

        axs[0,0].plot(X, Y)
        axs[0,0].set_title('Actual Values')

        axs[0,1].plot(list_of_w)
        axs[0,1].set_title('Weights')

        axs[1,0].plot(list_of_b)
        axs[1,0].set_title('Bias')

        axs[1,1].plot(list_of_loss)
        axs[1,1].set_title('Loss')

        fig.suptitle('Gradient Descent for Linear Regression')
        
        print()
        print("Loss:", loss, "Epochs:", epoch, "Alpha:", alpha)
        print()
        plt.show()



## Inference : 
> As Epoch increases and alpha decreases the graph is getting a sharper elbow point

> With epoch = 10, 50 or 100 and alpha = 0.01 
- there is a significant change in the  graph
- while for other alpha values; the graph is has a sharper elbow.

In [None]:
import statsmodels.api as sps
import numpy as np


In [None]:
lr = LinearRegression()
lr.fit(X, Y)
print(lr.intercept_)
print(lr.coef_)

In [None]:
x = sps.add_constant(X)
model = sps.OLS(Y, x).fit()

# Compute the residuals
ols_residuals = Y - model.predict(x)

# Fit a linear regression model using the sklearn library
reg = LinearRegression().fit(X.reshape(-1, 1), Y)
sklearn_predicted = reg.predict(X.reshape(-1, 1))

# Compute the loss function (mean squared error)
mse = np.mean(np.square(sklearn_predicted - Y))

# Print the results
print("OLS residuals:", ols_residuals)
print("Mean squared error:", mse)

In [None]:
print(list_of_loss[-1])

From the above two values we can see that mean square error for using gradient descent and statmodels is same.

In [None]:

def pred_y(list_of_w,list_of_b,X):
  yPred=[]
  for i in range(len(list_of_w)):
    y_pred = (list_of_w[i]*X[i]) - list_of_b[i]
    yPred.append(y_pred)
  return yPred

var = pred_y(list_of_w,list_of_b,X)
plt.plot(var, 'r')