<a href="https://colab.research.google.com/github/KhotNoorin/Machine-Learning-/blob/main/Mini_Batch_Gradient_Descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Mini-Batch Gradient Descent:


---

Mini-Batch Gradient Descent is an optimization algorithm used in training machine learning models, especially neural networks. It combines the advantages of Batch Gradient Descent and Stochastic Gradient Descent (SGD).



---

What is it?

Instead of using the entire dataset (Batch GD) or just one sample (SGD) at a time to compute the gradient, Mini-Batch Gradient Descent uses a small subset (mini-batch) of the dataset in each iteration


---

How it works:

1. Split the dataset into small batches (e.g., 32, 64, 128 samples).

2. For each mini-batch:

  * Compute the gradient of the loss function.

  * Update the model parameters using the gradient.

3. Repeat this process for all mini-batches (1 epoch = one full pass over the dataset).

---

Advantages:

* Faster convergence than batch gradient descent.

* Less noisy updates than stochastic gradient descent.

* Efficient use of vectorized operations on hardware like GPUs.



---

Common Batch Sizes:

* Powers of 2 like 32, 64, 128, 256 are common.

* The choice depends on your dataset size and hardware capacity.

---



In [1]:
from sklearn.datasets import load_diabetes

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [2]:
X,y = load_diabetes(return_X_y=True)

In [3]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


In [4]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [5]:
reg = LinearRegression()
reg.fit(X_train,y_train)

In [6]:
print(reg.coef_)
print(reg.intercept_)

[  -9.15865318 -205.45432163  516.69374454  340.61999905 -895.5520019
  561.22067904  153.89310954  126.73139688  861.12700152   52.42112238]
151.88331005254167


In [7]:
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.4399338661568968

In [8]:
import random

class MBGDRegressor:

    def __init__(self,batch_size,learning_rate=0.01,epochs=100):

        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size

    def fit(self,X_train,y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])

        for i in range(self.epochs):

            for j in range(int(X_train.shape[0]/self.batch_size)):

                idx = random.sample(range(X_train.shape[0]),self.batch_size)

                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_
                #print("Shape of y_hat",y_hat.shape)
                intercept_der = -2 * np.mean(y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_der)

                coef_der = -2 * np.dot((y_train[idx] - y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_der)

        print(self.intercept_,self.coef_)

    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

In [9]:
mbr = MBGDRegressor(batch_size=int(X_train.shape[0]/50),learning_rate=0.01,epochs=100)

In [10]:
mbr.fit(X_train,y_train)

149.94306749634202 [  20.67375475 -136.11845526  452.34443137  300.2959334   -22.7002892
  -90.71242481 -192.4980703   112.40475918  410.68660342  116.51915458]


In [11]:
y_pred = mbr.predict(X_test)

In [12]:
r2_score(y_test,y_pred)

0.4525178486475616

# Using sklearn

In [13]:
from sklearn.linear_model import SGDRegressor

In [26]:
sgd = SGDRegressor(learning_rate='constant',eta0=0.1)

In [27]:
batch_size = 35

for i in range(100):

    idx = random.sample(range(X_train.shape[0]),batch_size)
    sgd.partial_fit(X_train[idx],y_train[idx])

In [28]:
sgd.coef_

array([  60.36542699,  -61.91338164,  331.5516074 ,  243.3449239 ,
         26.76341841,  -22.84461073, -172.60139682,  127.83826804,
        328.29138848,  117.98023379])

In [29]:
sgd.intercept_

array([145.02685032])

In [30]:
y_pred = sgd.predict(X_test)

In [31]:
r2_score(y_test,y_pred)

0.4180736011249875