# Bais and variance

> - Model complexity vs. Error
> - Bias and variance of a model
> - Sources of model error
> - The bias-variance tradeoff

## Model complexity vs. Error

![image.png](attachment:image.png)

### Choosing the level of complexity

![image-2.png](attachment:image-2.png)

### How well Does the model generalize?

![image-3.png](attachment:image-3.png)

### Bias and variance: Intuition

![image-4.png](attachment:image-4.png)

### 3 Sources of Model Error 

![image-5.png](attachment:image-5.png)

### 3 Sources of Model Error: Bias 

**Tendency** Of predictions to miss true values.

- Worsened by missing information, overly-simplistic assumptions.
- Miss real patterns (underfitting).

![image-6.png](attachment:image-6.png)

### 3 Sources of Model Error: Variance

**Tendency** of predictions to fluctuate. 

- Characterized by sensitivity or output to small changes in input data.
- Often due to overly complex or poorly-fit models.

![image-7.png](attachment:image-7.png)

### 3 sources of model Error: Irreducible Error

**Tendency** to intrinsic uncertainty/randomness. 
- Present in even the best possible model. 

![image-8.png](attachment:image-8.png)

---

### Bias-Variance Tradeoff, Visualized

![image-9.png](attachment:image-9.png)

### Bias-Variance Tradeoff: Discussion 

- The higher the degree of a polynomial regression, the more complex the model (lower bias, higher variance).

- At lower degrees, we can see **visual signs of bias**: predictions are too rigid to capture the curve pattern in the data.

- At higher degrees, we see **visual signs of variance**: predictions fluctuate wildly because of the model's sensitivity.

- The goal is to find the right degree, such that model has sufficiente complexiy to describe the data without overfitting..

### Bias-Variance Tradeoff: Example

![image-10.png](attachment:image-10.png)

---

## Regularization and Model Selection

> - Model complexity and error 
> - Regularization as an approach to over-fitting
> - Standard approaches to regularization including Ridge, Lasso, and Elastic Net
> - Recursive feature elimination

### Model Complexity and Error

![image-11.png](attachment:image-11.png)

### Tuning the model

Can we tune with more granularity that choosing polynomial degree?
Yes, by using **regularization**.

### What does Regularization Accomplish? 

![image-12.png](attachment:image-12.png)

![image-13.png](attachment:image-13.png)

### Regularization and Feature Selection

Regularization performs feature selection by shrinking the contribution of features. 

For L1-regularization, this is accomplished by driving some coefficients to zero.

Feature selection can also be performed by removing features. 

### Why is Features selection important? 

Reducing the number of features can prevent overfitting.

For some models, fewer features can improve fitting time and/or results.

Identifying most critical features can improve model interpretability.

--- 
### Reg Cost Function: Ridge Regression 

![image-14.png](attachment:image-14.png)

![image-15.png](attachment:image-15.png)

**Ridge Regression**:

The complexity penalty $\lambda$ is applied proportionally to squared coefficients values.
- The penalty term has the effect of "shrinking" the coefficients towards zero.
- This imposes bias on the model, but also reduces variance.
- We can select the best regularization strength $\lambda$ using cross-validation.
- It's best practice to scale features (i.e. using StandardScaler) so penalties aren't impacted by variable scale.


### Ridge Regression (L2)

![image-16.png](attachment:image-16.png)

### Preventing Under-and Over-fitting

![image-17.png](attachment:image-17.png)

![image-18.png](attachment:image-18.png)

### Ridge Regression In Action

![image-19.png](attachment:image-19.png)

![image-20.png](attachment:image-20.png)

---

### Lasso Regression 

### Alternative: LASSO Regression 

![image-21.png](attachment:image-21.png)

In **Lasso Regression**: The complexity Penalty $\lambda$ is proportional to the absolute value of coefficients.

- LASSO: Least Absolute Shrinkage and Selection Operator.
- Similar effect to **Ridge** in terms of complexity tradeoff:
    Increasing $\lambda$ raises bias but lowers variance.
- LASSO is more likely than Ridge to perform **feature selection**.
   in that for a fixed $\lambda$, LASSO is more likely to result in coefficients being set to zero.

### Lasso Regression (L1)

![image-22.png](attachment:image-22.png)

### Preventing Under-and Over-fitting

![image-23.png](attachment:image-23.png)

![image-24.png](attachment:image-24.png)

### Lasso Regression In Action

![image-25.png](attachment:image-25.png)


![image-26.png](attachment:image-26.png)

### Between Ridge and Lasso: Elastic Net

![image-27.png](attachment:image-27.png)

### Elastic Net Regularization

![image-28.png](attachment:image-28.png)

![image-29.png](attachment:image-29.png)

---

### Recursive Feature Elimination

**Recursive Feature Elimination (RFE)** is an approach that combines: 
- A model or estimation approach
- A desired number of features

RFE then repeatedly applies the model, measures feature importance and recursively removes less importnat features.

### Recursive Feature Elimination: The syntax
```python
# import the class containing the features selection method
from sklearn.feature_selection import RFE

# Create an instance of the class 
rfeMod = RFE(est, n_features_to_select=5)

# Fit the instance on the data and then predicted the expected value

rfeMod = rfeMod.fit(X_train, y_train)
y_pred = rfeMod.predict(X_test)

# The RFECV class will perform features elimination using cross-validation

```


![image-30.png](attachment:image-30.png)

---

## Elastic Net Regularization

### Between Ridge and Lasso: Elastic Net

![image-31.png](attachment:image-31.png)

### Elastic Net Regularization

![image-32.png](attachment:image-32.png)


![image-33.png](attachment:image-33.png)






