# Reguarization

## 02 Overfitting and Occam's razor

In the last subject, we discussed **model complexity** and the ability to **generalize** from data. We saw two cases.

1. **Underfitting** - the model is too simple and fails to fit the data/signal
2. **Overfitting** - the model is too complex and fits the noise in addition to the signal

In this subject, we will see how to control overfitting using **regularization**. But first, let’s talk about **Occam’s razor** which is the basic idea behind it, but also an interesting principle in general.

## Occam’s razor

Occam’s razor is a principle which states that if multiple solutions are available, the simplest one is better than the others. The idea is that it’s easy to build overly complicated solutions with **ad-hoc rules** that don’t generalize well.

In the context of machine learning, the principle says that we should prefer simpler models unless we are sure that the complex ones are necessary.

We often say that generalization is the central goal of machine learning. Occam’s razor is one of the important principles to achieve this. You can take a look at section 3 and 4 of the paper “A few useful things to know about machine learning” by Pedro Domingos to learn more about the intuition behind generalization. [Here is the link](https://scholar.google.ch/scholar?cluster=4404716649035182981&hl=en&as_sdt=0,5) to the google scholar page.

## Increasing the amount of data
The amount of data also plays a role in the **under-/overfitting balance**. Let’s do a quick experiment. In this image, we show two polynomial regressions of degree 9 fitted to 10 and 80 data points from the same source of data.

![image.png](attachment:c09f19b6-f067-4fe7-8d89-67b92c5b46fc.png)

In the first case, the model is strongly overfitting. In fact, the polynomial passes through each data point. The problem is lessened in the second case.

### Summary
In this unit, we learned about Occam’s razor which is an important principle in machine learning. In the next unit, we will learn about regularization which is an efficient way to reduce overfitting.

## 03 Regularization
In practice, we use **regularization** to fight overfitting and to improve the generalizability of a model. By regularization we opt for models that are less complex, because more complex model do not generalize well on the unseen data even though they may provide a good fit on the training data. In this unit, we will see the basic idea behind it. We will then implement regularization with Scikit-learn in the next unit.

### L2 regularization

![image.png](attachment:ac4a56f5-6896-4fa1-8806-db7997d9f3a5.png)

![image.png](attachment:c648b50d-8d8e-42ff-ab52-d0df5d2b4035.png)

![image.png](attachment:ebd80331-7d0a-4cca-a37b-fb25200cb861.png)

![image.png](attachment:966ed21f-0066-4887-a47c-db6f50852843.png)

You can take a look at [this article](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression) from the Scikit-learn documentation which shows the effect of the regularization strength α on the model coefficients.

### Geometrical interpretation

![image.png](attachment:c6f781ca-914a-47b1-857b-511fd63f59cd.png)

[Euclidean norm](https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm)

![image.png](attachment:6840d6dd-aa02-407b-9888-ec0804a3155d.png)

![image.png](attachment:f81d5020-b18e-4f89-8428-dd4d2fe24783.png)

Adapted from Bishop, C. [Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book) Figure 3.4

![image.png](attachment:bcfe1d74-ad5a-43f3-a20e-ba7d36d0922a.png)

### Other regularizers

![image.png](attachment:84f9d637-fb7c-404d-98fc-884dc4887ee0.png)

Adapted from Bishop, C. [Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book) Figure 3.4

![image.png](attachment:8d9588a0-b73f-485b-b8fc-26bd8a4656e1.png)

[this thread](https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when/answer/Xavier-Amatriain)

### Summary

Let’s summarize what we’ve learned in this unit. Here are a few takeaways.

- The idea behind regularization is to add a **constraint** on the amplitude of the coefficients.
- This constraint corresponds to an additional term in the cost function called the **penalization term**.
- We use an alpha α parameter to control the **regularization strength**.

In the next unit, we will implement L2 regularization for linear regressions.

## 04. Ridge regression

![image.png](attachment:56ddefec-e8d8-4be3-aae9-9eccafdf55ac.png)

### Sine curve data set
Let’s start by loading the training data set.

In [1]:
import pandas as pd

# Load the training data
training_data = pd.read_csv("Ressources/c3_data-points.csv")

# Print shape
print("Shape:", training_data.shape)

Shape: (50, 2)
