## Regularisation:
Is a very useful technique to improve our models and ensure they dont overfit. Regularisation basically takes the complexity of a model into cnsideration when calculating the error. And by so doing simpler models would normally score less error levels than complex models. This final score is weighted by a factor called **lambda** ($\lambda$)

### Computing the compleity of a model as part of the error:
#### L1 regularisation:
* We add the absolute values of the model coefficients to the error

#### L2 regularisation:
* We add the squared values of the model coefficients to the error

By default, both L1 and L2 regularisations tend to punish complex models more. But the question is what need do we need?
If the problem requires a higher accuracy and precision, with little chance for error, like a model to predict the onset of a heart attack or flight to the moon, then we can penalize the complex model less, while penalizing more to a less strategic model that requires lower accuracy, such as a model to recommend friends to a new user.

Thus in every case, we want to tune howmuch we punish complexity in each model. Therefore we use a parameter called **Lambda** ($\lambda$) to regularize how much we tune these models based on either L1 or L2 regularisation methods.

In summary, if we have a large lambda, then we're multiplying the complexity part (The coefficients) of the model with a lot, which punishes the more complex model more and favors a simpler model. While a small lambda multiplies a little to the complexity part of the model, which punishes a simpler model more and favors more complex, more precise models.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso, LinearRegression

In [2]:
df = pd.read_csv('datasets/regularisation_dataset.csv', header=None)
df.head()

Unnamed: 0,0
0,"1.25664,2.04978,-6.23640,4.71926,-4.26931,0.20..."
1,"-3.89012,-0.37511,6.14979,4.94585,-3.57844,0.0..."
2,"5.09784,0.98120,-0.29939,5.85805,0.28297,-0.20..."
3,"0.39034,-3.06861,-5.63488,6.43941,0.39256,-0.0..."
4,"5.84727,-0.15922,11.41246,7.52165,1.69886,0.29..."


In [3]:
# Let's expand the single string column 0 into the required 7 different columns
df = df[0].str.split(',', expand=True)
df.head()

Unnamed: 0,0,1,2,3,4,5,6
0,1.25664,2.04978,-6.2364,4.71926,-4.26931,0.2059,12.31798
1,-3.89012,-0.37511,6.14979,4.94585,-3.57844,0.0064,23.67628
2,5.09784,0.9812,-0.29939,5.85805,0.28297,-0.20626,-1.53459
3,0.39034,-3.06861,-5.63488,6.43941,0.39256,-0.07084,-24.6867
4,5.84727,-0.15922,11.41246,7.52165,1.69886,0.29022,17.54122


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
0    100 non-null object
1    100 non-null object
2    100 non-null object
3    100 non-null object
4    100 non-null object
5    100 non-null object
6    100 non-null object
dtypes: object(7)
memory usage: 5.6+ KB


In [5]:
# finally, let's convert all columns from string to float
df = df.apply(pd.to_numeric)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 7 columns):
0    100 non-null float64
1    100 non-null float64
2    100 non-null float64
3    100 non-null float64
4    100 non-null float64
5    100 non-null float64
6    100 non-null float64
dtypes: float64(7)
memory usage: 5.6 KB


In [6]:
df.head(3)

Unnamed: 0,0,1,2,3,4,5,6
0,1.25664,2.04978,-6.2364,4.71926,-4.26931,0.2059,12.31798
1,-3.89012,-0.37511,6.14979,4.94585,-3.57844,0.0064,23.67628
2,5.09784,0.9812,-0.29939,5.85805,0.28297,-0.20626,-1.53459


In [7]:
# Let's save the updated dataframe
df.to_csv('datasets/regularisation_dataset2.csv', index=False)
print('saved!')

saved!


**Let's build a Linear Model and an L1 (lasso) model and see the coefficients**

In [8]:
X = df.iloc[:,:-1].values
y = df.iloc[:, -1].values

In [9]:
lasso_reg = Lasso().fit(X, y)
linear_reg = LinearRegression().fit(X,y)

In [10]:
print(f'Lasso Coefs: {lasso_reg.coef_}\n\nLinear Coefs: {linear_reg.coef_}')

Lasso Coefs: [ 0.          2.35793224  2.00441646 -0.05511954 -3.92808318  0.        ]

Linear Coefs: [-6.19918532e-03  2.96325160e+00  1.98199191e+00 -7.86249920e-02
 -3.95818772e+00  9.30786141e+00]


From the above, we can see how the **Lasso-Reg(L1 reg)** converts unimportant features coefficients to zero for features 1 and 6, while the regular **Linear-Reg** keeps these coefficients. L1 regularisation does this to irrelevant features whose penalty of removing them from the features is small.

### Difference Between L1 and L2 Reg:

1. L1 is computationally inefficient, even though it seems simple, this is because those absolute values are hard to differentiate. Except, unless the data is sparse. While L2 is easier to compute as squares often have very nice derivatives.
2. The only time L1 is better than L2 is if the data is sparse, such as having a 1000 features but only 10 are relevant and the rest are mostly zeroes. While L2 is better for non-sparse outputs.
3. L1 has another benefit in that it gives us feature-selection. say for example we have 1000 features and most are irrelevant and noise, while only 10 are relevant, L1 will detect this and make the irrelevant columns into zeros as we saw in the Lasso/Linear reg exercises above. L2 on the other hand, won't do this, it will take all columns and treat them similarly.