# ⚙️ Mini-Batch Gradient Descent from Scratch

In [1]:
!pip install numpy sklearn random



ERROR: Could not find a version that satisfies the requirement random (from versions: none)
ERROR: No matching distribution found for random
You should consider upgrading via the 'c:\users\hp\.pyenv\pyenv-win\versions\3.8.10\python.exe -m pip install --upgrade pip' command.


## 🧠 Introduction

In this notebook, we implement **Mini-Batch Gradient Descent (MBGD)** from scratch and compare it with:
- `LinearRegression` (analytical solution)
- `SGDRegressor` (scikit-learn)

The dataset used is the **Diabetes dataset** from `sklearn.datasets`.

In [2]:
import numpy as np
from sklearn.datasets import load_diabetes

## 📦 Load Dataset

We use the `load_diabetes()` function from sklearn, which provides a clean and ready-to-use regression dataset with 10 features and a continuous target variable.

In [3]:
X,y = load_diabetes(return_X_y=True)

In [4]:
X.shape

(442, 10)

In [5]:
y.shape

(442,)

## ✂️ Train-Test Split

Split the dataset into training and testing sets using an 80-20 ratio.  
This ensures that our model evaluations are done on unseen data.

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=18)

In [8]:
X_train.shape

(353, 10)

In [9]:
y_train.shape

(353,)

In [10]:
X_test.shape

(89, 10)

In [11]:
y_test.shape

(89,)

## 📐 Baseline: Linear Regression

Train a standard `LinearRegression` model to serve as a baseline.  
We'll compare its coefficients and performance metrics with our manual implementation later.

In [12]:
from sklearn.linear_model import LinearRegression

In [13]:
lr = LinearRegression()
lr.fit(X_train, y_train)

In [14]:
lr.intercept_

153.32760069777532

In [15]:
lr.coef_

array([  33.86613698, -273.44971856,  466.58017053,  337.65891227,
       -806.35981263,  592.36046805,   92.48275413,  182.36792969,
        697.41630107,   89.11282567])

In [16]:
y_pred_lr = lr.predict(X_test)

## 📊 Evaluate Linear Regression

We predict on the test set using the `LinearRegression` model and calculate the R² score.  
This gives a reference for our custom mini-batch implementation.

In [17]:
from sklearn.metrics import r2_score

In [18]:
score_lr = r2_score(y_test, y_pred_lr)
score_lr

0.49152545179667817

## 🛠 Mini-Batch Gradient Descent - Custom Implementation

We define a custom class `MBGDRegressor` that:
- Initializes weights
- Updates them using mini-batch gradient descent
- Uses MSE loss to guide updates

In [19]:
import random

In [20]:
class MBGDRegressor:
    def __init__(self, lr, epochs, batch_size):
        self.lr = lr
        self.epochs = epochs
        self.batch_size = batch_size
        self.w0 = 0
        self.w = None

    def fit(self, X_train, y_train):
        self.w = np.ones(X_train.shape[1])

        for i in range(self.epochs):
            for j in range(int(X_train.shape[0]/self.batch_size)):
                indexes = random.sample(range(X_train.shape[0]),self.batch_size)

                y_hat = self.w0 + np.dot(X_train[indexes],self.w)
                error = y_train[indexes] - y_hat

                slope_w0 = -2 * np.mean(error)
                slope_w = -2 * np.dot(X_train[indexes].T,error)

                self.w0 -= (self.lr*slope_w0)
                self.w -= (self.lr*slope_w)

        print(f"Bias is: {self.w0}\nWeight Vector is: {self.w}")

    def predict(self, X_test):
        return self.w0 + np.dot(X_test, self.w)

In [21]:
mbgd = MBGDRegressor(0.01, 100, int(X_train.shape[0]/50))

In [22]:
mbgd.fit(X_train, y_train)

Bias is: 153.43092976728695
Weight Vector is: [  39.83264356 -175.70948177  399.09129239  280.37275145   14.11369985
  -29.28401265 -196.69344423  150.69927828  349.91423955  146.73204801]


In [23]:
y_pred_mbgd = mbgd.predict(X_test)

## 🔍 Evaluate MBGDRegressor Model

We evaluate our custom mini-batch gradient descent model on the test set using R² score.
This helps validate whether our implementation is learning effectively.

In [24]:
score_mbgd = r2_score(y_test, y_pred_mbgd)
score_mbgd

0.4492493378904633

## ⚡ Benchmark: SGDRegressor

We also train `SGDRegressor` from `sklearn.linear_model`, which internally uses mini-batch stochastic updates.  
This gives us a strong benchmark to compare against our custom MBGD model.

In [25]:
from sklearn.linear_model import SGDRegressor
mbgd_sk = SGDRegressor(learning_rate='constant', eta0=0.01)

In [26]:
batch_size = int(X_train.shape[0]/50)

for i in range(100):
    indexes = random.sample(range(X_train.shape[0]),batch_size)
    mbgd_sk.partial_fit(X_train[indexes],y_train[indexes])

In [27]:
mbgd_sk.intercept_

array([146.69117579])

In [28]:
mbgd_sk.coef_

array([ 7.01369154,  1.20251162, 13.19323692, 13.11199021,  7.0196444 ,
        6.07557024, -9.89616884, 11.61133626, 15.33718926, 12.39185054])

In [29]:
y_pred_sk = mbgd_sk.predict(X_test)

## 🧮 Evaluate SGDRegressor

We compare predictions and R² score of `SGDRegressor` against our manual implementation and `LinearRegression`.

In [30]:
score_sk = r2_score(y_test, y_pred_sk)
score_sk

0.029350414490897037

## ✅ Conclusion

- Our custom Mini-Batch Gradient Descent performs comparably to scikit-learn models.
- The implementation validates the core logic of gradient-based optimization.
- Mini-batch approach offers a scalable path for large datasets.