<h1 align = 'center'>Stochastic Gradient Descent </h1>

### __SGD__
1. In SGD, instead of using the entire dataset for each iteration, only __a single random training example__ (or a small batch) is selected to calculate the gradient and update the model parameters. This __random selection introduces randomness__ into the optimization process, hence the term __stochastic__ in stochastic Gradient Descent.

### Advantages  

1. <Code>Speed</code>: SGD is __faster__ than other variants of Gradient Descent such as Batch Gradient Descent and Mini-Batch Gradient Descent since it uses only one example to update the parameters.



2. <code>Memory Efficiency</code>: Since SGD updates the parameters for each training example one at a time, it is __memory-efficient__ and can __handle large datasets__ that cannot fit into memory.



3. <code>Avoidance of Local Minima</code>: Due to the noisy updates in SGD, it has the __ability to escape from local minima__ and converges to a global minimum.



### Disadvantages

1. <code>Noisy updates</code>: The updates in SGD are noisy and have a high variance, which can make the __optimization process less stable__ and lead to __oscillations around the minimum__.



2. <code>Slow Convergence</code>: SGD may require __more iterations to converge__ to the minimum since it updates the parameters for each training example one at a time.



3. <code>Sensitivity to Learning Rate</code>: The choice of learning rate can be critical in SGD since using a __high learning rate__ can cause the algorithm to __overshoot__ the minimum, while a __low learning rate__ can make the algorithm __converge slowly__.



4. <code>Less Accurate</code>: Due to the noisy updates, SGD may not converge to the exact global minimum and can result in __a suboptimal solution__. This can be mitigated by using techniques such as __learning rate scheduling__ and __momentum-based updates__

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes

In [2]:
X,y = load_diabetes(return_X_y=True)

In [3]:
X

array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990842, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06832974, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286377, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04687948,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452837, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00421986,  0.00306441]])

In [4]:
y

array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 28

In [5]:
X.shape, y.shape

((442, 10), (442,))

#### Train test split

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train,X_test, y_train, y_test=train_test_split(X,y, test_size=0.2, random_state=2)

#### Model Training

In [8]:
from sklearn.linear_model import LinearRegression

In [9]:
model = LinearRegression()
model.fit(X_train, y_train)

LinearRegression()

In [10]:
model.coef_

array([  -9.16088483, -205.46225988,  516.68462383,  340.62734108,
       -895.54360867,  561.21453306,  153.88478595,  126.73431596,
        861.12139955,   52.41982836])

In [11]:
model.intercept_

151.88334520854633

#### Performance metrics

In [12]:
from sklearn.metrics import r2_score

In [13]:
y_pred = model.predict(X_test)
print("r2_score = ", r2_score(y_test,y_pred))

r2_score =  0.4399387660024645


### Class 

In [14]:
X.shape

(442, 10)

In [21]:
class SGDRegressor:
    
    def __init__ (self, learning_rate=0.01, epochs= 100):
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs =epochs
        
    def fit(self,X_train, y_train):
        #initialize parameters : b=0, b1,b2,---bn = 1
        self.intercept_ = 0 
        
        #X_train shape will tell how many coef are there
        self.coef_ = np.ones(X_train.shape[1])
        
        for i in range(self.epochs):
            for j in range(X_train.shape[0]): # iteration equal to no. of row
                idx = np.random.randint(0,X_train.shape[0]) #0 - 353
                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_  #y_hat = b + wT.x
                intercept_derv = - 2 *(y_train[idx]-y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_derv)
                
                coef_derv = -2* np.dot((y_train[idx]- y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_derv)
            
        #print(y_hat.shape)
        print(self.intercept_, self.coef_)
        
    def predict(self, X_test):
        return np.dot(X_test, self.coef_)+self.intercept_

In [23]:
gd = SGDRegressor(epochs=10)

In [24]:
gd.fit(X_train,y_train) # Initialize 

144.2818311847835 [ 29.8288291    5.38106884 126.26012747  93.72738329  37.80234636
  27.94029661 -75.79005043  77.3961412  120.10355706  69.16175625]


In [35]:
gd = SGDRegressor(epochs=100)
gd.fit(X_train,y_train)

152.0872105768451 [  26.56647128 -126.6927916   457.522761    304.65065331  -32.26954774
  -96.16596436 -197.86459445  109.84692136  401.95586828  111.49259611]


In [34]:
y_pred = gd.predict(X_test)

In [None]:
r2_score(y_test,y_pred)

In [17]:
np.random.randint(0,X_train.shape[0])

270

In [18]:
coef_ = np.ones(X_train.shape[1])

In [19]:
intercept_ = 0

In [20]:
np.dot(X_train[270],coef_) + intercept_ 

-0.06471576166478094

In [36]:
import time

In [37]:
start = time.time()
sgd = SGDRegressor(epochs=10)
sgd.fit(X_train,y_train)
print("Time take is ", time.time()-start)


146.51277027801356 [ 38.84127986   1.13802609 131.06524011  97.77657188  32.35504551
  19.09808829 -73.85403877  71.54903363 123.25580051  70.65376942]
Time take is  0.08668971061706543
