# Gradient Descent Demo — Linear Regression

This notebook shows **gradient descent** finding a best-fit line for a small dataset.

### Simple idea (student-friendly):
- Imagine you're on a hill in the dark. You take **small steps downhill** until you reach the bottom.
- Here, the "hill" is the error (Mean Squared Error). The "bottom" is the smallest error.
- We update the slope `m` and intercept `b` step by step to go **downhill**.


## 1) Data
We'll reuse a simple dataset (hours studied vs exam score).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

X = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([50, 55, 65, 70, 75], dtype=float)

plt.figure()
plt.scatter(X, y)
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('Data (hours vs score)')
plt.show()

## 2) Gradient Descent Functions
We'll implement the update rules for `m` and `b`.

In [None]:
def compute_gradients(X, y, m, b):
    y_pred = m * X + b
    error = y - y_pred
    dm = -(2/len(X)) * np.sum(X * error)
    db = -(2/len(X)) * np.sum(error)
    return dm, db

def mse(X, y, m, b):
    return np.mean((y - (m*X + b))**2)

## 3) Run Gradient Descent
We start with guesses for `m` and `b`, then take many small steps. Try changing `alpha` (learning rate) and `epochs`.

In [None]:
alpha = 0.01   # learning rate (step size)
epochs = 1000  # number of steps
m, b = 0.0, 0.0

history = {'m': [], 'b': [], 'loss': []}

for _ in range(epochs):
    dm, db = compute_gradients(X, y, m, b)
    m -= alpha * dm
    b -= alpha * db
    history['m'].append(m)
    history['b'].append(b)
    history['loss'].append(mse(X, y, m, b))

print('Final slope (m):', round(m, 4))
print('Final intercept (b):', round(b, 4))

## 4) Loss Curve (Are we going downhill?)
If gradient descent is working, the loss (MSE) should go down over time.

In [None]:
plt.figure()
plt.plot(history['loss'])
plt.xlabel('Epoch')
plt.ylabel('MSE (loss)')
plt.title('Loss decreasing over time')
plt.show()

## 5) See the Line Improve (Early → Middle → Final)
We plot the fitted line at three moments in training. One plot per cell.

In [None]:
def line_y(m, b, X):
    return m*X + b

x_line = np.linspace(X.min(), X.max(), 100)

# Early training (epoch ~ 10)
m10, b10 = history['m'][9], history['b'][9]
plt.figure()
plt.scatter(X, y)
plt.plot(x_line, line_y(m10, b10, x_line))
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('Early fit (~epoch 10)')
plt.show()

In [None]:
# Middle training (epoch ~ 100)
m100, b100 = history['m'][99], history['b'][99]
plt.figure()
plt.scatter(X, y)
plt.plot(x_line, line_y(m100, b100, x_line))
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('Middle fit (~epoch 100)')
plt.show()

In [None]:
# Final training (epoch ~ 1000)
mF, bF = history['m'][-1], history['b'][-1]
plt.figure()
plt.scatter(X, y)
plt.plot(x_line, line_y(mF, bF, x_line))
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('Final fit (~epoch 1000)')
plt.show()

## 6) Compare to scikit-learn's LinearRegression (closed-form)
This shows the slope & intercept found by the **Normal Equation** method used by scikit-learn (no gradient descent).

In [None]:
from sklearn.linear_model import LinearRegression
X_2d = X.reshape(-1, 1)
sk = LinearRegression().fit(X_2d, y)
print('sklearn slope:', sk.coef_[0])
print('sklearn intercept:', sk.intercept_)

plt.figure()
plt.scatter(X, y)
plt.plot(x_line, sk.predict(x_line.reshape(-1,1)))
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('scikit-learn best-fit line')
plt.show()

## 7) Try it yourself
- Change `alpha` and see if the loss still goes down. Too big can make it diverge.
- Change `epochs` to see how many steps you need.
- Add noise to `y` and see how the fit changes.
