# ðŸ“‰ Gradient Descent â€” Key Equations (Short & Clear)

## ðŸ”¹ 1. Mean Squared Error (MSE)

For a dataset with predictions $(\hat{y}_i)$ and true values $(y_i)$:


$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

---

## ðŸ”¹ 2. Gradient (Partial Derivative of MSE)

For a model prediction
$$
\hat{y} = w x + b
$$

The gradients are:

$$
\frac{\partial \text{MSE}}{\partial w}
= -\frac{2}{n} \sum_{i=1}^n x_i (y_i - \hat{y}_i)
$$

$$
\frac{\partial \text{MSE}}{\partial b}
= -\frac{2}{n} \sum_{i=1}^n (y_i - \hat{y}_i)
$$

---

## ðŸ”¹ 3. Gradient Descent Update Rule

To update parameters (w) and (b):

$$
w_{\text{new}} = w_{\text{old}} - \alpha \frac{\partial \text{MSE}}{\partial w}
$$

$$
b_{\text{new}} = b_{\text{old}} - \alpha \frac{\partial \text{MSE}}{\partial b}
$$

---

## ðŸ”¹ 4. Learning Rate ((\alpha))

Controls the step size of gradient descent:

* Small $(\alpha)$: slow learning
* Large $(\alpha)$: may overshoot / diverge
* Just right: smooth convergence

$$
\theta_{\text{new}} = \theta_{\text{old}} - \alpha \nabla_{\theta} J(\theta)
$$

Where:

* $(\theta)$ â†’ model parameters $(e.g., (w, b))$
* $(J(\theta))$ â†’ cost function (MSE)

---

## ðŸŒŸ Combined Gradient Descent Equation

$$
\theta := \theta - \alpha \nabla_{\theta} J(\theta)
$$

This compact form represents **all parameters updated together** using the gradient of the cost function.

---

In [None]:
from pprint import pprint

import numpy as np

def gradiant_decent(x,y):
    m_curr = b_curr = 0.0
    iterations = 10000
    learning_rate = 0.08
    n = len(x)

    mse_list = []
    m_list = []
    b_list = []

    for i in range(iterations):
        y_pred = x * m_curr + b_curr
        cost = (1/n)*sum([val**2 for val in (y-y_pred)])

        md = -(2/n)*sum(x*(y - y_pred)) # derivative of the m
        bd = -(2/n)*sum(y - y_pred) # derivative of the b

        m_curr = m_curr - learning_rate * md # finding the new value of m
        b_curr = b_curr - learning_rate * bd # finding the new value of b


        print(f"i: {i} , m: {m_curr} ,d: {b_curr}, cost: {cost}")

        mse_list.append(cost)
        m_list.append(md)
        b_list.append(bd)

    return m_curr, b_curr, mse_list, m_list, b_list



In [None]:
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])

m, b, mse_list, m_list, b_list = gradiant_decent(x, y)

print("\nFinal m:", m)
print("Final b:", b)

## Math vs CS csv

In [None]:
def gradient_descent(csv):
    import pandas as pd
    import math
    df = pd.read_csv(csv)
    x= np.array(df['math'])
    y = np.array(df['cs'])

    b_curr = m_curr = 0.0
    iterations = 1000000
    learning_rate = 0.0002
    n = len(x)
    cost_previous = 0

    for i in range(iterations):

        y_pred = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_pred)])
        md = -(2/n)*sum(x*(y- y_pred))
        bd = -(2/n)*sum(y- y_pred)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        if math.isclose(cost, cost_previous, rel_tol=1e-20):
            break
        cost_previous = cost

        print(f" m : {m_curr}, \tb : {b_curr}, \tcost : {cost}, \ti : {i}")
    return m_curr, b_curr





In [None]:
from sklearn.linear_model import LinearRegression
import pandas as  pd

def predict_using_sklean():
    df = pd.read_csv("assets/test_scores.csv")
    r = LinearRegression()
    r.fit(df[['math']],df.cs)
    return r.coef_, r.intercept_

In [None]:
m, b = gradient_descent("assets/test_scores.csv")
print("Using gradient descent function: Coef {} Intercept {}".format(m, b))

m_sklearn, b_sklearn = predict_using_sklean()
print("Using sklearn: Coef {} Intercept {}".format(m_sklearn,b_sklearn))