# **Classical ML**

Welcome to **Module L2: Classical ML**.

In L1, we built the *engine* (Linear Algebra). In L2, we teach the engine how to *drive itself* (Learning).

We are shifting focus from "calculating outputs" to **optimizing weights**.

### **Concept 1:**
Imagine you are trying to predict house prices based on their size. Linear Regression simply tries to draw the "best-fitting straight line" through that data. Mathematically, it tries to find the optimal weights ($w$) and bias ($b$) so that the difference between your predicted value ($\hat{y} = wx + b$) and the actual value ($y$) is minimized.

### **The Intuition: Drawing the Best Line**

Imagine you have a piece of graph paper.
* **X-axis:** Number of hours studied.
* **Y-axis:** Exam score.

You plot 5 dots representing 5 students. The dots go generally up (more study = higher score), but they aren't in a perfect straight line.

**The Goal:**
We want to draw **one straight line** through those dots that allows us to predict the score for *any* number of hours studied.

**The Problem:**
You can draw infinite lines. Some are too steep, some are too flat, some are too high. How do you mathematically prove which line is the "best"?

### **The "Why": Measuring the Mistake**

To find the best line, we need a way to score how "bad" a line is. We call this the **Cost Function** or **Loss Function**.

1.  **Calculate the Error:** We look at a specific dot. We measure the vertical distance between the **actual dot** (Real Score) and the **line** (Predicted Score). That distance is the "Error."
2.  **Square It:** Why square it?
    * *Reason 1:* Sometimes the dot is below the line (negative error). If we just added up the errors, $-5$ and $+5$ would cancel out to $0$, making us think the line is perfect when it's not. Squaring makes everything positive.
    * *Reason 2:* It punishes big mistakes. Being off by 10 points ($10^2 = 100$) is much worse than being off by 1 point ($1^2 = 1$).
3.  **Take the Average:** We add up all those squared errors and divide by the number of students.

This gives us the **Mean (Average) Squared Error (MSE)**.

**The "best fit line" is simply the line that results in the lowest possible MSE.**


**The Micro-Task:**
Before we use Scikit-Learn, I want you to understand the cost function.
1.  Create two NumPy arrays: `y_true` (actual values) and `y_pred` (predicted values).
2.  Write a Python function `calculate_mse(y_true, y_pred)` from scratch (using NumPy, no sklearn yet) that calculates the **Mean Squared Error (MSE)**.

*Equation hint: $MSE = \frac{1}{n} \sum (y_{true} - y_{pred})^2$*


In [16]:
import numpy as np
import time
## Create some random y_true and y_pred
y_true = np.array([55, 74, 49, 82, 94])
y_pred = np.array([50, 60, 70, 80, 90])

start_time1 = time.time()
mse1 = 0
n = len(y_true)
for y in range(n):
    mse1 += ((y_true[y]-y_pred[y])**2)/n

end_time1 = time.time()
total_time1 = end_time1 - start_time1
start_time2 = time.time()
mse2 = np.mean((y_true - y_pred)**2)
end_time2 = time.time()
total_time2 = end_time2 - start_time2
print(f"Traditional Loop MSE = {mse1} took {total_time1} sec\nNP Mean MSE = {mse2} took {total_time2} sec")

Traditional Loop MSE = 136.4 took 0.0015187263488769531 sec
NP Mean MSE = 136.4 took 0.0 sec


**Perfect.** You have just built the engine that measures "how wrong" a model is.

Now, we need the engine that **fixes** the mistake.

---

### **Concept 2: Gradient Descent**

**The "Why":**
We calculated the MSE (the error). Our goal is to make that MSE as close to 0 as possible. We do this by changing the weights (the slope of our line).

**The Intuition (The Mountain Analogy):**
Imagine you are standing on top of a mountain blindfolded.
* **Height:** The Error (MSE). You want to get to the bottom (Zero Error).
* **Position:** Your current Weights.
* **Strategy:** You feel the ground with your foot. If it slopes down to the right, you take a step right. If it slopes down to the left, you take a step left.

**The Math:**
We calculate the **Gradient** (the slope of the mountain at your feet). Then we take a step in the *opposite* direction of the slope to go downhill.

The formula to update our weight ($w$) is:

$$w_{new} = w_{old} - (\text{learning\_rate} \times \text{gradient})$$

* **Learning Rate:** How big of a step you take. (Too big = you jump over the valley; Too small = it takes forever).



---

**Micro-Task: The Single Step**

I want you to write a function that performs **one single step** of Gradient Descent.

**Task:**
1.  Define a function `update_weight(weight, gradient, learning_rate)`.
2.  It should return the new weight using the formula above.
3.  Test it with:
    * `weight = 10`
    * `gradient = 2` (Slope is positive, so we should go down/left)
    * `learning_rate = 0.1`

**Write the code and tell me the new weight.**