### Stochastic Gradient Descent Implementation

Here we are implementing linear regression using stochastic gradient descent. In this method, instead of using the entire dataset for each update, we use one data point (or a small batch) at a time to update the model parameters.

1. **Write the Loss Function**
   - \( L = (y_i - y_{pred})^2 \)

2. **Initialize some values for the parameters m and b**

3. **Iterate through each epoch and for each data point (x, y) in the dataset:**
   - a. **Compute the predicted value** \( y_{pred} = mx + b \)
   - b. **Calculate the gradients** (partial derivatives):
     - \( \frac{\partial L}{\partial m} = -2 \times x \times (y - y_{pred}) \)
     - \( \frac{\partial L}{\partial b} = -2 \times (y - y_{pred}) \)

4. **Update the values of m and b**:
   - \( b_{\text{new}} = b_{\text{old}} - \text{learning\_rate} \times \frac{\partial L}{\partial b} \)
   - \( m_{\text{new}} = m_{\text{old}} - \text{learning\_rate} \times \frac{\partial L}{\partial m} \)

5. **Repeat the process for the specified number of epochs** to minimize the loss function.

By using one data point at a time, the model parameters are updated frequently, leading to faster convergence, but with potentially more variance in updates.


In [6]:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score


In [40]:
class SGDRegressor:
    
    def __init__(self, epochs = 1000, learning_rate = 0.01):
        self.intercept_ = None
        self.coef_ = None
        self.epochs = epochs
        self.lr = learning_rate
    
    def fit(self,X_train,y_train):
        # init your coefs
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1])
        
        for i in range(self.epochs):
            for j in range(X_train.shape[0]):
                
                #picking a random row from 0 to total rows
                idx = np.random.randint(0,X_train.shape[0])
                
                #calculating the prediction for that row(Scalar)
                y_hat = np.dot(X_train[idx],self.coef_) + self.intercept_
                
                # derivative for the intercept(only for that row) and update
                intercept_der = -2 * (y_train[idx] - y_hat)
                self.intercept_ = self.intercept_ - (self.lr * intercept_der)
                
                #derivative for the coeff(only wrt that row) and update
                coef_der = -2 * np.dot((y_train[idx] - y_hat),X_train[idx])
                self.coef_ = self.coef_ - (self.lr * coef_der)
        
       
        
    
    def predict(self, X_test):
        
        return np.dot(X_test, self.coef_) + self.intercept_
                

In [41]:
#sklearn model regressor
X, y = load_diabetes(return_X_y=True)

In [42]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

In [43]:
reg = LinearRegression()
reg.fit(X_train, y_train)

LinearRegression()

In [44]:
y_pred = reg.predict(X_test)
r2_score(y_pred, y_test)

-0.1434592396027481

In [45]:
# now our own clas performance
sgd = SGDRegressor(epochs = 1000, learning_rate = 0.01)
sgd.fit(X_train, y_train)

In [46]:
y_pred = sgd.predict(X_test)

In [47]:
r2_score(y_pred, y_test)

-0.11592890624431362