# ⚙ Mini-Batch Gradient Descent from Scratch
## 🧠 Introduction

In this notebook, we implement *Mini-Batch Gradient Descent (MBGD)* from scratch and compare it with:
- LinearRegression (analytical solution)
- SGDRegressor (scikit-learn)

The dataset used is the *Diabetes dataset* from sklearn.datasets.

## 📦 Load Dataset

We use the load_diabetes() function from sklearn, which provides a clean and ready-to-use regression dataset with 10 features and a continuous target variable.

In [216]:
from sklearn.datasets import load_diabetes
import numpy as np
import random

In [217]:
X,y = load_diabetes(return_X_y = True)

## ✂ Train-Test Split

Split the dataset into training and testing sets using an 80-20 ratio.  
This ensures that our model evaluations are done on unseen data.

In [218]:
from sklearn.model_selection import train_test_split

In [219]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=18)

## 🛠 Mini-Batch Gradient Descent - Custom Implementation

We define a custom class MBGD that:
- Initializes weights
- Updates them using mini-batch gradient descent
- Uses MSE loss to guide updates

In [220]:
class MBGD:
    def __init__(self, learning_rate, epochs, batch_size):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.coef_ = None
        self.intercept_ = None
        self.batch_size = batch_size
    
    def fit(self, X_train, y_train):
        self.intercept_ = 0
        self.coef_ = np.ones([X_train.shape[1]])
        for i in range(self.epochs):
            for j in range(int((X_train.shape[0])/self.batch_size)):

                idx = random.sample(range(X_train.shape[0]), self.batch_size)
                
                y_hat = np.dot(X_train[idx], self.coef_ ) + self.intercept_

                intercept_der = -2* np.mean(y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - np.dot(self.learning_rate, intercept_der)

                coef_der = -2 * np.dot((y_train[idx] - y_hat) ,X_train[idx])
                self.coef_ = self.coef_ - np.dot(self.learning_rate,coef_der)

    def predict(self, X_test):
        return np.dot(X_test, self.coef_)+self.intercept_
    
                

In [221]:
MBGD = MBGD(0.01,100,int(X_train.shape[0]/50))

In [222]:
MBGD.fit(X_train, y_train)

## 🔍 Evaluate MBGD Model

We evaluate our custom mini-batch gradient descent model on the test set using R² score.
This helps validate whether our implementation is learning effectively.

In [223]:
y_pred = MBGD.predict(X_test)

In [224]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.45403943649005296

## 📐 Baseline: Linear Regression

Train a standard LinearRegression model to serve as a baseline.  
We'll compare its coefficients and performance metrics with our manual implementation later.

In [225]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [226]:
lr.fit(X_train, y_train)

## 📊 Evaluate Linear Regression

We predict on the test set using the LinearRegression model and calculate the R² score.  
This gives a reference for our custom mini-batch implementation.

In [227]:
y_pred1 = lr.predict(X_test)

In [228]:
r2_score(y_test, y_pred1)

0.4915254517966785

## ⚡ Benchmark: SGDRegressor

We also train SGDRegressor from sklearn.linear_model, which internally uses mini-batch stochastic updates.  
This gives us a strong benchmark to compare against our custom MBGD model.

In [229]:
from sklearn.linear_model import SGDRegressor

In [230]:
SGD = SGDRegressor(learning_rate='constant', eta0=0.01)

In [231]:
batch_size = int(X_train.shape[0]/50)

for i in range(100):
    idx = random.sample(range(X_train.shape[0]), batch_size)
    SGD.partial_fit(X_train[idx], y_train[idx])

In [232]:
y_pred2 = SGD.predict(X_test)

## 🧮 Evaluate SGDRegressor

We compare predictions and R² score of SGDRegressor against our manual implementation and LinearRegression.

In [233]:
r2_score(y_test, y_pred2)

0.03151007097666414

## ✅ Conclusion

- Our custom Mini-Batch Gradient Descent performs comparably to scikit-learn models.
- The implementation validates the core logic of gradient-based optimization.
- Mini-batch approach offers a scalable path for large datasets.