# What is Gradient Descent?

Gradient Descent is an iterative optimization algorithm used to minimize the cost function by adjusting the model's parameters (weights) step by step in the direction that reduces the cost.



# Types of Gradient Descent


# 1. Batch Gradient Descent


Definition: Batch Gradient Descent calculates the gradient of the cost function for the entire dataset and updates the parameters once per iteration.



In [1]:
import numpy as np

def batch_gradient_descent(X, y, lr=0.01, iterations=1000):
    m, n = X.shape
    theta = np.zeros(n)
    for _ in range(iterations):
        gradient = (1/m) * X.T.dot(X.dot(theta) - y)
        theta -= lr * gradient
    return theta


# 2. Stochastic Gradient Descent (SGD)


Definition: Stochastic Gradient Descent updates the parameters for each training example, one at a time, instead of using the entire dataset.



In [2]:
def stochastic_gradient_descent(X, y, lr=0.01, iterations=1000):
    m, n = X.shape
    theta = np.zeros(n)
    for _ in range(iterations):
        for i in range(m):
            gradient = X[i] * (X[i].dot(theta) - y[i])
            theta -= lr * gradient
    return theta


# 3. Mini-Batch Gradient Descent


Definition: Mini-Batch Gradient Descent splits the dataset into small batches and updates the parameters for each batch.



In [3]:
def mini_batch_gradient_descent(X, y, lr=0.01, iterations=1000, batch_size=32):
    m, n = X.shape
    theta = np.zeros(n)
    for _ in range(iterations):
        indices = np.random.permutation(m)
        X_shuffled = X[indices]
        y_shuffled = y[indices]
        for i in range(0, m, batch_size):
            X_batch = X_shuffled[i:i+batch_size]
            y_batch = y_shuffled[i:i+batch_size]
            gradient = (1/batch_size) * X_batch.T.dot(X_batch.dot(theta) - y_batch)
            theta -= lr * gradient
    return theta
