# Stochastic Gradient descent  
[source](https://www.youtube.com/watch?v=V7KBAa_gh4c&list=PLKnIA16_Rmvbr7zKYQuBfsVkjoLcJgxHH&index=58&pp=iAQB)
### Draw back of Batch radient Descent:  
1. **Number of Computations** <br>
For calculation of every column's coefficient + intercept's, we have to calculate n number of derivates, and then this process is done epoch number of times.
For example, you have rows = 1000, columns = 5 (meaning 5+1 coefficent), epochs = 100. you will have to calculate 1000 x 6 x 100 = 600000 calculations of derivates. thats huge.. and this examplory dataset wasn't that big.. its below average.
2. **Hardware**<br>
While doing vectorization, you have to load whole X_train into ram. if you are working with big data you will need as much bigger ram. so **resource intensive**

In Batch, we update the coefficeints after going through all of the data (1epoch = 1 update),  
In stochastic, we update the coefficients after each row, so you are doing updates more frequently, large number of updates make you reach the convergence quickly (converged: minimum values of coefficients) so you would also need less number of epochs    
<br>
In schotastic we dont need to load the whole dataset, instead only one row can be loaded  

Now we know that it picks one row and updates the coefficients through that. that row is picked at random, because of random pick, this solution (stochastic gd) is not steady sollution, every time it gives different results.. but they are close

## Code

In [1]:

from sklearn.datasets import load_diabetes
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

In [2]:
data = load_diabetes()
X = data.data
y = data.target

In [3]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [4]:
lr = LinearRegression()
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)
r2_score(y_test,y_pred)

0.4399338661568969

In [5]:
print(lr.coef_)

[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]


In [6]:
print(lr.intercept_)

151.88331005254167


In [387]:
class SGD_Regressor:
    def __init__(self,lr=0.01,epochs=100):
    
        self.coef_ = None
        self.intercept_ = None
        self.lr = lr
        self.epochs = epochs
    def fit(self,X_train,y_train):
        # intializing coefficients: 
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])
        # print(self.intercept_,self.coef_) 

        for i in range(self.epochs):
            idx = np.random.randint(0,X_train.shape[0])
            y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_ # taking one row instead of whole X_train

            lr_t = self.lr / (1 + 0.0001 * i)# reducing learning rate with each epoch to reduce randomness near convergence.
            
            # updating intercept
            inter_der = -2 * (y_train[idx] - y_hat)
            self.intercept_ = self.intercept_ - (lr_t * inter_der)

            # updating coef
            coef_der = -2 * np.dot((y_train[idx]- y_hat) , X_train[idx])
            self.coef_ = self.coef_ - (lr_t * coef_der)
   
        print(self.intercept_,self.coef_)
        
    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

In [410]:
import time
sgd = SGD_Regressor(lr = 0.1, epochs = 5000)
start = time.time()
sgd.fit(X_train,y_train)
print('time taken: ',time.time() - start) #  even though epochs are higher, the time takes is less than batch gd

148.61898497145143 [  38.51097004 -162.10556453  471.90050959  297.30738209  -43.53146628
 -102.79441453 -204.26658905  100.23392508  406.07460236  120.21988358]
time taken:  0.35077691078186035


In [411]:
y_pred1 = sgd.predict(X_test)
r2_score(y_test,y_pred1)

0.45490112697487217

![ytss](assets/sgdVSbgd.png)

on the right, in both grpahs s behaviour of Stochastic while on left is batch

## Using Sklearn

In [450]:
from sklearn.linear_model import SGDRegressor

sgd = SGDRegressor(
    loss="squared_error",
    penalty="l2",
    max_iter=6800,
    learning_rate="invscaling",
    eta0=0.01
)

sgd.fit(X_train, y_train)
print(sgd.coef_, sgd.intercept_)
r2_score(y_test,sgd.predict(X_test))

[  24.73926533 -149.24748997  459.29686781  307.07156088  -27.005906
  -98.01854655 -191.0497421   109.3303328   417.73507253  107.62713172] [152.03552604]




0.4542511346032271

In [1]:
0.0001 * 2


0.0002