# Lasso Regression

---

## What is Lasso Regression?

Lasso Regression is a special type of linear regression.
It’s just like normal linear regression but it can **remove useless features** (columns) all by itself!

---

## What is L1 Regularization?

* L1 regularization is a special rule.
* It **punishes big numbers** in your model by adding their **absolute values** to the error.
* "Absolute value" just means ignoring whether the number is negative or positive.

**Formula:**

$$
\text{Total Error} = \text{Prediction Error} + \lambda \cdot (\text{Sum of absolute values of weights})
$$

* $\lambda$ is just a number you pick for how strong the punishment should be.

---

## Feature Selection (How Lasso Picks the Best Columns)

Here’s the magic part:

* Lasso can make some weights **exactly zero**!
* When a weight is zero, that feature is **not used at all**—it’s like it disappeared!
* So Lasso automatically keeps the most useful columns and ignores the rest.

---

## Real-World Example

Suppose you want to predict the price of a toy using:

* color
* weight
* brand
* barcode number

But only **weight** and **brand** really matter.
Lasso can figure this out and set the other weights (like color and barcode) to **zero**!
So your final formula only uses the important features.

---

## Quick Comparison (Lasso vs Ridge)

|         | Ridge Regression      | **Lasso Regression**            |
| ------- | --------------------- | ------------------------------- |
| Penalty | Adds squared weights  | Adds absolute values of weights |
| Effect  | Makes weights smaller | Makes some weights exactly zero |
| Use     | Keeps all features    | **Removes useless features**    |

---

## Simple Analogy

Think of Lasso as a smart robot that:

* **Keeps only the things you need**
* **Throws away** (sets to zero) the things you don’t!

---

## Bottom Line

* **Lasso Regression** = Linear Regression + L1 Regularization
* **L1 Regularization** helps the model pick only the best features by setting others to zero
* This makes your model **simpler and smarter**!



# Lasso Regression _Example

Suppose you want to fit a line:

$$
y = w \cdot x
$$

with some very simple data:

| x | y (actual) |
| - | ---------- |
| 1 | 2          |
| 2 | 4          |
| 3 | 6          |

Let's see how **Lasso Regression** works!

---

## Step 1: Lasso Cost Function

$$
\text{Cost} = \text{Average Error} + \lambda \cdot |w|
$$

Let's pick $\lambda = 1$ (just for example).

---

## Try $w = 2$

$$
y_{\text{predicted}} = 2 \cdot x
$$

| x | y_actual | y_predicted | Error | Error² |
| - | -------- | ----------- | ----- | ------ |
| 1 | 2        | 2           | 0     | 0      |
| 2 | 4        | 4           | 0     | 0      |
| 3 | 6        | 6           | 0     | 0      |

- Total error = $0 + 0 + 0 = 0$
- Average error = $\frac{0}{3} = 0$
- L1 penalty = $1 \times |2| = 2$
- **Total cost = 0 + 2 = 2**

---

## Try $w = 1$

$$
y_{\text{predicted}} = 1 \cdot x
$$

| x | y_actual | y_predicted | Error | Error² |
| - | -------- | ----------- | ----- | ------ |
| 1 | 2        | 1           | 1     | 1      |
| 2 | 4        | 2           | 2     | 4      |
| 3 | 6        | 3           | 3     | 9      |

- Total error = $1 + 4 + 9 = 14$
- Average error = $\frac{14}{3} \approx 4.67$
- L1 penalty = $1 \times |1| = 1$
- **Total cost = 4.67 + 1 = 5.67**

---

## Try $w = 0$

$$
y_{\text{predicted}} = 0 \cdot x = 0
$$

| x | y_actual | y_predicted | Error | Error² |
| - | -------- | ----------- | ----- | ------ |
| 1 | 2        | 0           | 2     | 4      |
| 2 | 4        | 0           | 4     | 16     |
| 3 | 6        | 0           | 6     | 36     |

- Total error = $4 + 16 + 36 = 56$
- Average error = $\frac{56}{3} \approx 18.67$
- L1 penalty = $1 \times |0| = 0$
- **Total cost = 18.67 + 0 = 18.67**

---



## Final Table

| w | Average Error | Penalty \|w\| | Total Cost |
|:---:|:-------------:|:----------:|:----------:|
| 2 | 0.00 | 2 | 2.00 |
| 1 | 4.67 | 1 | 5.67 |
| 0 | 18.67 | 0 | 18.67 |

---

## Final Answer

The **best value for $w$ is 2**, because it gives the lowest total cost!

**But:**
If you used a bigger $\lambda$, sometimes Lasso would set $w = 0$ (removing the feature).

---

# **Key Point:**

- **Lasso can make weights exactly zero** (removes useless features automatically).
- The penalty part, $\lambda |w|$, is called **L1 regularization**.