In [2]:
from functools import partial
import matplotlib.pyplot as plt
import numpy as np

# Algebra

## Slope
Slope also called **Gradient**, is just a way to measure how steep a line is.

🔹 Think of a Hill

Imagine you are riding a bike up a hill:
	•	If the hill is steep, you struggle to pedal → High slope
	•	If the hill is flat, it’s easy to ride → Low slope
	•	If you go downhill, you speed up → Negative slope

🔹 Math Definition

Slope tells us how much y changes when x changes.

🟢 Why is Slope Important?
	1.	Predicting Trends: If you plot data (like time vs. temperature), slope tells you how fast things are changing.
	2.	Machine Learning: Linear regression uses slope to find the best line that fits data for predictions.
	3.	Physics: In motion, slope can show speed (velocity).

Basically, slope helps us understand relationships between things!

<div>
<img src="./images/slope.png" width="350"/>
</div>

## What is y-intercept and x-intercept?
🔹 X-Intercept (Where the Line Touches the X-Axis)

The x-intercept is where the line crosses the x-axis (horizontal line).
	•	This happens when y = 0.
	•	To find it, set  y = 0  and solve for  x .
	•	Example: If  y = 2x + 3 , set  0 = 2x + 3  → Solve for  x .
	•	The x-intercept tells us when y reaches zero.

🔹 Y-Intercept (Where the Line Starts on the Y-Axis)

The y-intercept is where the line crosses the y-axis (vertical line).
	•	This happens when x = 0.
	•	Example: In  y = 2x + 3 , the y-intercept is 3 (the line crosses the y-axis at  (0,3) ).
	•	It tells us the starting value of y when x is zero.

<div>
<img src="./images/XYIntercepts.jpg" width="350"/>
</div>

🟢 Putting It All Together: Slope + X & Y Intercepts + Real-World Use

At its core, these math tools (slope, intercepts) help us understand relationships between things. They’re the foundation of linear equations, which are heavily used in Machine Learning (ML) and data analysis.
<br>
1️⃣ What Do These Mean in Simple Terms?<br>
	1.	Slope (￼) → How fast something is changing<br>
	•	Steep slope = rapid change<br>
	•	Flat slope = slow change<br>
	2.	Y-Intercept (￼) → The starting value when ￼<br>
	3.	X-Intercept → When the value of ￼ becomes zero<br>

Mathematically, a straight-line equation looks like:<br>
￼
where:<br>
	•	￼ (slope) shows how fast ￼ changes as ￼ changes<br>
	•	￼ (y-intercept) tells us where the line starts on the y-axis<br>
<br>
2️⃣ How is This Used in Machine Learning?<br>
<br>
🔹 Predicting Future Outcomes (Linear Regression 📈)<br>
<br>
Imagine you want to predict house prices based on size (square meters).<br>
	•	￼ = house size<br>
	•	￼ = price<br>
	•	Slope (￼) = How much the price changes per square meter<br>
	•	Y-intercept (￼) = The base price when house size = 0<br>
<br>
If a trained model gives you:<br>

- is the slope (every extra m² adds €5000)<br>
- is the y-intercept (base price before size adds value)<br>
- If a house is 100m², the price is 5000 × 100 + 50,000 = €550,000<br>

👉 This is exactly how machine learning models make predictions!<br>

🔹 Classifying Data in ML (Decision Boundaries 🛑✅)<br>

In classification tasks (e.g., spam vs. non-spam email), linear equations help draw boundaries between categories.<br>

If a dataset has two features:<br>
	1.	Email length<br>
	2.	Number of suspicious words<br>

We can plot emails and fit a line to separate spam from non-spam. The equation helps decide when an email crosses the boundary into spam territory.<br>

3️⃣ How Do I Know When to Use These?<br>

🔹 Use Slope & Intercepts When You Need:<br>
✅ To predict trends (e.g., sales, weather, prices)<br>
✅ To classify data (e.g., spam detection, sentiment analysis)<br>
✅ To analyze relationships (e.g., Does advertising spend increase sales?)<br>

🔹 When NOT to Use Linear Models:<br>
❌ If your data doesn’t follow a straight-line pattern<br>
❌ If relationships are complex (e.g., deep learning, non-linear patterns)<br>

4️⃣ Summary: Why Does This Matter in ML?<br>

These concepts help machines learn patterns and make decisions:<br>
✔ Linear Regression → Predicts numerical values (house prices, temperature, stock market)<br>
✔ Classification (Logistic Regression) → Separates categories (spam detection, disease prediction)<br>
✔ Neural Networks → Even deep learning uses slope (gradient descent) to optimize models!<br>

🚀 So understanding slope & intercepts helps you build better ML models!

### Scenario:
Imagine you're working with a dataset where you have two variables:
- `x` (input): The size of the house in square feet.
- `y` (output): The price of the house in thousands of dollars.

You want to create a simple linear model to predict the price (`y`) of a house given its size (`x`). A linear model can be represented as:

$$
y = mx + b
$$

Where:
- $m$ is the slope (how much the price changes per square foot).
- $b$ is the y-intercept (the predicted price when the house size is 0).

### Example Data:
Let's say you have the following data points:

| House Size (x, in sq ft) | Price (y, in $1000s) |
|---------------------------|-----------------------|
| 1000                      | 200                  |
| 1500                      | 300                  |

### Step 1: Calculate the Slope ($m$)
The formula for the slope between two points $(x_1, y_1)$ and $(x_2, y_2)$ is:

$$
m = \frac{y_2 - y_1}{x_2 - x_1}
$$

Using our data points:
- $(x_1, y_1) = (1000, 200)$
- $(x_2, y_2) = (1500, 300)$

$$
m = \frac{300 - 200}{1500 - 1000} = \frac{100}{500} = 0.2
$$

So, the slope $m = 0.2$. This means that for every additional square foot, the price increases by $200 (since $0.2 \times 1000 = 200$).

### Step 2: Calculate the Y-Intercept ($b$)
The y-intercept $b$ is the value of $y$ when $x = 0$. We can use the equation of the line $y = mx + b$ and one of the data points to solve for $b$.

Using the point $(1000, 200)$:
$$
200 = 0.2(1000) + b
$$
$$
200 = 200 + b
$$
$$
b = 0
$$

So, the y-intercept $b = 0$. This means that if the house size is 0 square feet, the predicted price would be $0.

### Final Linear Model:
Now we have the complete linear equation:
$$
y = 0.2x + 0
$$

Or simply:
$$
y = 0.2x
$$

### Real-World Prediction:
If you want to predict the price of a house that is 2000 square feet:
$$
y = 0.2(2000) = 400
$$

This means the predicted price of a 2000 square foot house is $400,000.

### Summary:
- **Slope ($m$)**: Represents the rate of change of price with respect to house size (in this case, $0.2$ or $200 per square foot).
- **Y-Intercept ($b$)**: Represents the base price when the house size is 0 (in this case, $0).

This is a very basic example of how slope and intercept are used in a real-world machine learning scenario to make predictions.

### Quadratic Equation
A quadratic equation is a type of mathematical equation where the highest power of the variable (usually x) is 2. In simpler terms, it's an equation that involves a squared term $(x2)$, and no higher powers like $x^3, x^4$ etc.

The general form of a quadratic equation is:

$$ax2+bx+c=0$$

Where:
a, b, and c are constants (numbers),
a != 0 (because if a=0, the equation would no longer have an $x^2$ term and wouldn't be quadratic).

In a quadratic equation, the variables and constants $a$, $x$, $b$, and $c$ each have specific roles. Let's break them down in detail:

#### 1. **$a$: The Coefficient of $x^2$**
- $a$ is the coefficient of the squared term ($x^2$).
- It determines the "shape" and "direction" of the parabola (the graph of the quadratic equation):
  - If $a > 0$, the parabola opens **upwards** (like a smile).
  - If $a < 0$, the parabola opens **downwards** (like a frown).
- $a$ also affects how "wide" or "narrow" the parabola is:
  - A larger absolute value of $a$ makes the parabola narrower.
  - A smaller absolute value of $a$ makes the parabola wider.

#### 2. **$x$: The Variable**
- $x$ is the variable (or unknown) in the equation.
- The goal of solving a quadratic equation is often to find the values of $x$ that satisfy the equation (i.e., the roots or solutions).
- $x$ can take on any real number value, depending on the context of the problem.

#### 3. **$b$: The Coefficient of $x$**
- $b$ is the coefficient of the linear term ($x$).
- It influences the slope of the parabola and helps determine the position of the vertex (the highest or lowest point of the parabola).
- If $b = 0$, the parabola is symmetric about the $y$-axis.

#### 4. **$c$: The Constant Term**
- $c$ is the constant term in the equation.
- It represents the $y$-intercept of the parabola, which is the point where the graph crosses the $y$-axis.
- When $x = 0$, the value of $y$ is equal to $c$.

---

### Example:
Consider the quadratic equation:

$$
2x^2 - 4x + 1 = 0
$$

Here:
- $a = 2$: The parabola opens upwards because $a > 0$. The parabola is relatively narrow because $|a| = 2$ is not very small.
- $b = -4$: This affects the slope of the parabola and shifts its position.
- $c = 1$: The parabola crosses the $y$-axis at the point $(0, 1)$.

---

### Key Points About Each Component:
- **$a$:** Controls the shape and direction of the parabola.
- **$x$:** The variable whose values we solve for.
- **$b$:** Influences the slope and position of the parabola.
- **$c$:** Determines the $y$-intercept of the parabola.

Understanding these components helps you analyze and solve quadratic equations effectively!

In [None]:
def quad(a,b,c,x): return a*(x**2) + b*x + c
def make_quad(a,b,c): return partial(quad,a,b,c)

f = make_quad(3,2,1)
f(1.5)

plot_function(f)

# Regression Metrics: Understanding MSE, MAE, and MAD
When you build a model to predict numbers—like house prices, temperatures, or sales—how do you know if it’s doing a good job? That’s where _regression metrics_ come in. They’re like scorecards that tell you how close your predictions are to the real values. Let’s explore three popular ones: **Mean Absolute Error (MAE)**, **Mean Squared Error (MSE)**, and **Mean Absolute Deviation (MAD)**, and see how they work in everyday situations.

## 1. Mean Squared Error (MSE)

### What is MSE?
MSE measures the average of the squared differences between predicted and actual values. By squaring the errors, it penalizes larger discrepancies more heavily than smaller ones.

### Formula:
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

- $ y_i $: Actual value  
- $ \hat{y}_i $: Predicted value  
- $ n $: Number of data points  

### Key Characteristics:
- **Sensitivity to Outliers**: Because MSE squares the differences, it gives more weight to large errors. This makes it ideal for scenarios where outliers are important.
- **Units**: The result is in squared units of the data, which can make interpretation less intuitive.

### Real-World Scenario: Stock Price Prediction
Imagine you're building a model to predict stock prices. Even small errors in stock price predictions can lead to significant financial losses. Since MSE heavily penalizes large prediction errors, it ensures the model focuses on minimizing these deviations. For example, if your model predicts a stock price of $100 but the actual price is $120, the squared error will be much larger than if the actual price was $105. This makes MSE a good choice for scenarios where large errors are undesirable.

---

## 2. Mean Absolute Error (MAE)

### What is MAE?
MAE measures the average of the absolute differences between predicted and actual values. Unlike MSE, it treats all errors equally, regardless of their size.

### Formula:
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
$$

- $ y_i $: Actual value  
- $ \hat{y}_i $: Predicted value  
- $ n $: Number of data points  

### Key Characteristics:
- **Robustness to Outliers**: MAE is less sensitive to outliers because it doesn't square the errors. This makes it a better choice when outliers are not critical to the analysis.
- **Units**: The result is in the same units as the data, making it easier to interpret.

### Real-World Scenario: Weather Forecasting
Consider a weather forecasting model that predicts daily temperatures. While occasional large errors (e.g., predicting 30°C instead of 40°C) might occur, they are less impactful compared to the overall trend. In this case, MAE is a suitable metric because it provides a clear, interpretable measure of the average error in degrees Celsius without being overly influenced by rare extreme deviations.

---

## 3. Mean absolute deviation (MAD) or L1 Norm
The mean absolute deviation of a dataset is the average distance between each data point and the mean. It gives us an idea about the variability in a dataset.

Here's how to calculate the mean absolute deviation.

- Find the Average (Mean):
First, add up all the numbers in your list and then divide by how many numbers there are. This gives you the average or "mean" value.<br><br>
- Calculate Deviations:
For each number in your list, figure out how far it is from the average. If the number is bigger than the average, subtract the average from it. If it's smaller, subtract it from the average. This difference is the "deviation."<br><br>
- Make Deviations Positive:
Since you want to measure how far numbers are from the mean regardless of whether they're above or below it, you take the absolute value of each deviation. That means you just look at the size of the number, ignoring if it's positive or negative.<br><br>
- Average These Deviations:
Now, add up all these absolute deviations and divide by the number of items in your list. This gives you the Mean Absolute Deviation.<br><br>
Example:<br><br>

Suppose you have the numbers: $2, 4, 6, 8$.<br>
Mean: $(2 + 4 + 6 + 8) / 4 = 5$<br>
Deviations:<br>

$2 - 5 = -3 (absolute value = 3)$<br>
$4 - 5 = -1 (absolute value = 1)$<br>
$6 - 5 = 1$<br>
$8 - 5 = 3$<br>
Sum of Absolute Deviations: $3 + 1 + 1 + 3 = 8$<br>
MAD: $8 / 4 = 2$

$$\text{MAD}=\dfrac{\sum{\lvert x_i-\bar{x} \rvert}}{n}$$

- $\bar{x}$ is the mean of all the $x_i$ values.
- $x_i$ represents each individual data point in your sample or population.
- $|x_i - \bar{x}|$ is the absolute deviation of each $x_i$ from the mean $\bar{x}$.
- $\sum$ means sum up all these absolute deviations.
- $n$ is the number of observations or data points.

Following these steps in the example below is probably the best way to learn about mean absolute deviation. <br>
Let's find the mean absolute deviation.

In [10]:
sample = np.array([10, 15, 15, 17, 18, 21])
mean = np.mean(sample)
print(f"MEAN: {mean}")

deviation = np.absolute([x for x in sample - mean])
print(f"DEVIATION: {deviation}")

deviations_avg = np.mean(deviation)
print(f"DEVIATION AVERAGE: {deviations_avg}")


MEAN: 16.0
DEVIATION: [6. 1. 1. 1. 2. 5.]
DEVIATION AVERAGE: 2.6666666666666665



## Comparison of Metrics

| Metric   | Sensitivity to Outliers | Units          | Use Case                                      |
|----------|--------------------------|----------------|-----------------------------------------------|
| **MSE**  | High                     | Squared units  | When large errors are undesirable             |
| **MAE**  | Low                      | Same as data   | When you want an interpretable average error  |
| **MAD**  | Very low                 | Same as data   | When robustness to outliers is crucial        |

---

## Conclusion

Choosing the right regression metric depends on the specific requirements of your problem and the nature of your data. If large errors are particularly problematic, **MSE** is the way to go. For a straightforward, interpretable measure of average error, **MAE** is ideal. And when dealing with datasets that contain significant outliers, **MAD** provides a robust alternative. By understanding these metrics and their applications, you can effectively evaluate and improve the performance of your regression models in real-world scenarios.

---

## Example Calculation

Let's consider an example with the following data:

| Actual ($y$) | Predicted ($\hat{y}$) | Difference | Squared Difference | Absolute Difference |
|---------------|-------------------------|------------|---------------------|----------------------|
| 10            | 12                      | -2         | 4                   | 2                    |
| 15            | 20                      | -5         | 25                  | 5                    |
| 20            | 18                      | 2          | 4                   | 2                    |

### Calculations:
1. **MSE**:
$$
\text{MSE} = \frac{1}{3} \times (4 + 25 + 4) = \frac{33}{3} = 11
$$

2. **MAE**:
$$
\text{MAE} = \frac{1}{3} \times (2 + 5 + 2) = \frac{9}{3} = 3
$$

3. **MAD**:
$$
\text{Absolute Differences} = [2, 5, 2]
$$
$$
\text{MAD} = \text{median}([2, 5, 2]) = 2
$$

Thus:
- **MSE = 11**
- **MAE = 3**
- **MAD = 2**

These metrics provide different insights into model performance, and the choice depends on the specific needs of your application.