<h1>Gradient Descent</h1>
<p>
    Gradient Descent is an optimization algorithm used to minimize the cost function in machine learning models. It works by iteratively adjusting the parameters (weights) of the model in the opposite direction of the gradient of the cost function with respect to the parameters. The goal is to find the parameter values that minimize the cost function, leading to the most accurate model predictions.
</p>

<h2>Batch Gradient Descent</h2>
<p>
    Batch Gradient Descent is a specific type of Gradient Descent where the entire dataset is used to compute the gradient of the cost function in each iteration. This method involves the following steps:
</p>
<ul>
    <li>
        <strong>Step 1: Initialization:</strong> Start with an initial set of weights or parameters, usually chosen randomly.
    </li>
    <li>
        <strong>Step 2: Compute Gradient:</strong> Calculate the gradient of the cost function with respect to each parameter using the entire dataset. The gradient represents the direction and rate at which the cost function increases.
    </li>
    <li>
        <strong>Step 3: Update Parameters:</strong> Update the parameters by moving in the opposite direction of the gradient. The magnitude of the step is controlled by the learning rate, a hyperparameter that determines how quickly or slowly the model learns.
    </li>
    <li>
        <strong>Step 4: Repeat:</strong> Repeat steps 2 and 3 until convergence is reached, i.e., when the change in the cost function becomes negligible, indicating that the model has found the optimal parameters.
    </li>
</ul>
<p>
    Batch Gradient Descent is characterized by its stability, as it uses the entire dataset to calculate the gradient, which leads to a smooth and consistent descent towards the minimum of the cost function. However, this also means that each iteration can be computationally expensive and time-consuming, especially for large datasets. Additionally, Batch Gradient Descent requires the entire dataset to fit into memory, which may not be feasible for very large datasets.
</p>
<p>
    Despite its computational cost, Batch Gradient Descent is often preferred in scenarios where the dataset is relatively small, or when a high level of accuracy and stability is required in the model training process.
</p>


# Import laibraries

In [1]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# loading of dataset
from sklearn.datasets import load_diabetes


## Spliting Of Dependent And Independent Columns

In [2]:
x,y=load_diabetes(return_X_y=True)


In [3]:
print("shape of x: ",x.shape)
print("shape of y: ",y.shape)

shape of x:  (442, 10)
shape of y:  (442,)


## Train and Test Split of dataset

In [4]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20,random_state=2)

# Applying the LinearRegression algorithums

## Fit data to the model

In [5]:
model=LinearRegression()
model.fit(x_train,y_train)

In [6]:
# coeficient and intercept values of the model 
print(model.coef_)
print(model.intercept_)

[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]
151.88331005254167


In [7]:
# prediction from the model
y_pred=model.predict(x_test)

## To print the R2 score of the model

In [8]:
print("r2 score of this model is : ",r2_score(y_test,y_pred))

r2 score of this model is :  0.4399338661568969


In [9]:
x_train.shape

(353, 10)

# Applying Batch Gradiant descent

## Class And Function To Handle The Batch Gradient Descent

In [10]:
class GDRegressor():
    def __init__(self,learning_rate=0.01,epochs=100):
        self.coef_=None
        self.intercept_=None
        self.lr=learning_rate
        self.epochs=epochs

    def fit(self,x_train,y_train):
        self.intercept_=0
        self.coef_=np.ones(x_train.shape[1])

        for i in range(self.epochs):
            #update the coefficent and intercept
            y_hat=np.dot(x_train,self.coef_) +self.intercept_
            # print("shape of y_hat",y_hat.shape)
            intercept_der=-2 * np.mean(y_train - y_hat)
            self.intercept_=self.intercept_ -(self.lr * intercept_der)

            
            coef_der=-2 * np.dot((y_train - y_hat),x_train)/x_train.shape[0]
            self.coef_ = self.coef_  - (self.lr * coef_der)
        print(self.intercept_,self.coef_)

    def predict(self,x_test):
        return np.dot(x_test,self.coef_) + self.intercept_
    
        

In [11]:
# By increasing and decreasing the learning_rate and epochs we can find the best fit lines according to the data distribution
GDR=GDRegressor(epochs=1000,learning_rate=0.5)

# Fit data to the model

In [12]:
GDR.fit(x_train,y_train)

152.01351687661833 [  14.38990585 -173.7235727   491.54898524  323.91524824  -39.32648042
 -116.01061213 -194.04077415  103.38135565  451.63448787   97.57218278]


# Pridiction from Model

In [13]:
y_pred=GDR.predict(x_test)
y_pred

array([152.26392304, 198.96222354, 127.66111541, 104.59596478,
       265.23062371, 252.09467525, 112.76592254, 115.72549839,
        96.37765691, 187.64845451, 144.9482918 , 172.110596  ,
       178.81497695, 136.51444368, 292.15564227,  87.25795061,
       202.18473262, 149.11155912, 132.30895031, 128.70828962,
       148.38757935, 171.81318343, 150.93593445, 174.47559507,
       127.76388814, 221.82234243, 199.96855698, 101.54518353,
        54.85644772, 237.61948938, 244.2801351 , 112.91877003,
        68.12192242,  96.00468527, 204.32975531, 163.99882781,
       160.95172334, 191.90398957, 113.33794145, 238.46002509,
       141.40211434, 120.45598718, 188.12639096, 186.46474321,
       174.98259299, 143.24561624, 168.80798895, 299.18508813,
       105.40854525, 169.51466009, 254.37509674, 142.60026818,
       151.7158263 , 122.70403085, 191.52875115,  94.27792144,
       129.03875584,  75.96073902, 157.91752518, 156.36603694,
       163.20324594, 160.93274887, 102.3002858 , 227.76

# r2_score of GDR model

In [14]:
print("r2_score of GDR model is :",r2_score(y_test,y_pred))

r2_score of GDR model is : 0.4534503034722803


# iT accuracy is improving by appling the GDR instead of linear regression