<a href="https://colab.research.google.com/github/Susovan88/Machine_Learning/blob/main/Linear%20Regression/Stochastic_and_MiniBatch_Gradient_Descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
from sklearn.datasets import load_diabetes
import numpy as np
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import time
import random

x,y=load_diabetes(return_X_y=True)

print(x.shape,y.shape)

(442, 10) (442,)


In [17]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=2)

st=time.time()

reg=LinearRegression()
reg.fit(x_train,y_train)

print("time takes ->", time.time()-st)

print(reg.coef_ ,reg.intercept_)

y_pred=reg.predict(x_test)
print("r2 score -> ",r2_score(y_test,y_pred))

time takes -> 0.005373716354370117
[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238] 151.88331005254167
r2 score ->  0.4399338661568968


**Stochastic Gradient Descent**:

Uses only 1 random sample at a time to compute gradient and update parameters.

Update happens for each sample.

📌 Example:
Suppose 1000 data points.

Pick 1 data point → update weights.

Pick next data point → update again.

Repeats 1000 times per epoch.

✅ Pros: Faster updates, can escape local minima.

❌ Cons: Very noisy updates, may “bounce” around optimum.

In [18]:
class StochasticDGR:
  def __init__(self,learning_rate,epochs=50):
    self.learning_rate=learning_rate
    self.epochs=epochs
    self.coef_=None
    self.intercept_=None

  def fit(self,x_train,y_train):
    self.intercept_=0
    self.coef_=np.ones(x_train.shape[1])

    for i in range(self.epochs):
      for j in range(x_train.shape[0]):
        idx=np.random.randint(0,x_train.shape[0]) ## random index

        y_cap=np.dot(x_train[idx],self.coef_)+self.intercept_
        intercept_diff=-2*(y_train[idx]-y_cap)
        self.intercept_=self.intercept_-(self.learning_rate * intercept_diff)

        coef_diff=-2 * np.dot((y_train[idx]-y_cap),x_train[idx])
        self.coef_=self.coef_-(self.learning_rate* coef_diff)

    print("intercept_ -> ",self.intercept_)
    print("coef_ -> ",self.coef_)

  def predict(self,x_test):
    return np.dot(x_test,self.coef_)+self.intercept_


In [19]:
st=time.time()

sgd=StochasticDGR(0.01,50)
sgd.fit(x_train,y_train)

print("time takes ->", time.time()-st)

y_pred=sgd.predict(x_test)
print("r2 score -> ",r2_score(y_test,y_pred))

intercept_ ->  154.12237709236007
coef_ ->  [  55.32283926  -67.8570262   350.06049801  243.41546453    9.24196429
  -32.69348633 -178.06601024  131.56950897  315.85384302  126.0698431 ]
time takes -> 0.21212029457092285
r2 score ->  0.4348347947492889


In [20]:
x_test.shape,sgd.coef_.shape

((89, 10), (10,))

In [21]:
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score

sgd = SGDRegressor(max_iter=90, learning_rate='constant', eta0=0.01)

# Fit the model
sgd.fit(x_train, y_train)

# Now you can predict
y_pred = sgd.predict(x_test)

print("r2 score -> ", r2_score(y_test, y_pred))


r2 score ->  0.42664387198535625




**Mini-Batch Gradient Descent (most popular 🚀)**

Uses a small batch (subset) of data (like 32, 64, 128 samples) to compute gradient and update.

Update happens per batch.

📌 Example:
Suppose 1000 data points, batch size = 100.

First 100 → compute gradient → update.

Next 100 → update again.

So 10 updates per epoch.

✅ Pros: Balance between speed & stability.
✅ Works best with GPUs (parallel computation).
❌ Slightly less accurate than full batch, but much faster.

In [25]:
class MiniBatchDGR:
  def __init__(self,learning_rate,batch_size,epochs=50):
    self.learning_rate=learning_rate
    self.epochs=epochs
    self.batch_size=batch_size
    self.coef_=None
    self.intercept_=None

  def fit(self,x_train,y_train):
    self.intercept_=0
    self.coef_=np.ones(x_train.shape[1])

    for i in range(self.epochs):
      for j in range(int(x_train.shape[0]/self.batch_size)):
        idxs=random.sample(range(0,x_train.shape[0]),self.batch_size)  ## random indexs

        y_cap=np.dot(x_train[idxs],self.coef_)+self.intercept_
        intercept_diff=-2*np.mean(y_train[idxs]-y_cap)
        self.intercept_=self.intercept_-(self.learning_rate * intercept_diff)

        coef_diff=-2 * np.dot((y_train[idxs]-y_cap),x_train[idxs])
        self.coef_=self.coef_-(self.learning_rate* coef_diff)

    print("intercept_ -> ",self.intercept_)
    print("coef_ -> ",self.coef_)

  def predict(self,x_test):
    return np.dot(x_test,self.coef_)+self.intercept_

In [32]:
mBgd=MiniBatchDGR(0.05,35,50)
mBgd.fit(x_train,y_train)

y_pred=mBgd.predict(x_test)
print("r2 score -> ",r2_score(y_test,y_pred))

intercept_ ->  151.66867372970202
coef_ ->  [  -1.58710876 -199.80902407  513.67470919  339.42296811  -62.17689375
 -140.14879445 -197.48012816   86.97636538  511.64217417   87.32257704]
r2 score ->  0.44660068671052766


array([330,  15, 176,  23, 113, 352, 265,  67, 349,  66])

In [37]:
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score

mBgd = SGDRegressor(learning_rate='constant', eta0=0.2)

# Fit the model -> miniBatch_GD
batch_size=50
epochs=50
for i in range(epochs):
  idxs=random.sample(range(x_train.shape[0]),batch_size)
  mBgd.partial_fit(x_train[idxs],y_train[idxs])

# Now you can predict
y_pred = mBgd.predict(x_test)

print("r2 score -> ", r2_score(y_test, y_pred))

r2 score ->  0.4435140659732463
