Regularization is a technique used in machine learning to prevent over-fitting and improve the generalization performance of a model. It involves adding a penalty term to the loss function during training, which discourages the model from fitting the training data too closely and helps to control the complexity of the model.

In other words adding some parameters to the cost function for having some bias added so that model does not over fit the data.

If your machine learning model learns the pattern of data very well while training, it can lead to the problem of Overfitting. Overfitting is a situation in which models perform very well on the training data and poor on testing data. To resolve this issue, a penalty term can be added to the equation of the model, this process is called Regularization.


    Cost Function of Linear Regression

<img src="cost-function.webp">

    Cost Function of Lasso Regression
The cost function of Linear Regression is modified by adding a Regularization term to it.

<img src="lasso-regression.webp" width="350">


<strong>L = ∑(Ŷi– Yi)² + <span style="color: #ff0000;">λ∑|β|</span></strong>

    Cost Function of Ridge Regression
The cost function of Linear Regression is modified by adding a Regularization term to it.

<img src="ridge-regerssion.webp" width="250">


<strong>L = ∑(Ŷi– Yi)² + <span style="color: #339966;">λ∑β²</span></strong>

    Elastic net Regression
Elastic net is a modified version of linear regression that adds regularization penalties from both lasso and ridge regression during the training.

<strong>L = ∑(Ŷi– Yi)² + <span style="color: #339966;">λ∑β²</span> + <span style="color: #ff0000;">λ∑|β|</span></strong>

In Lasso and Ridge both Cost Function will increase and M/Slope will be decrease but in Lasso the slope can be zero but in Ridge slope will not be zero anytime

------------------

1. **Lasso regression** can be used for feature selection while ridge regression 
doesnot support feature selection.

2. **Lasso regression** can be used in cases where less features are significant 
as it reduces the value of insignificant features to zero whereas **Ridge 
regression** can be used in cases where more no. of features are significant.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ridge vs Lasso vs Elastic Net Comparison</title>
<style>
    table {
        border-collapse: collapse;
        width: 90%;
    }
    th, td {
        border: 1px solid #dddddd;
        text-align: left;
        padding: 8px;
    }
    th {
        background-color: #f2f2f2;
    }
</style>
</head>
<body>

<h2>Ridge vs Lasso vs Elastic Net Comparison</h2>

<table>
    <tr>
        <th>Feature</th>
        <th>Ridge Regression</th>
        <th>Lasso Regression</th>
        <th>Elastic Net</th>
    </tr>
    <tr>
        <td>Regularization</td>
        <td>L2 regularization</td>
        <td>L1 regularization</td>
        <td>Combination of L1 and L2 regularization</td>
    </tr>
    <tr>
        <td>Objective Function</td>
        <td>Minimize: RSS + α∑βj^2</td>
        <td>Minimize: RSS + α∑|βj|</td>
        <td>Minimize: RSS + α(λ∑|βj| + (1-λ)∑βj^2)</td>
    </tr>
    <tr>
        <td>Usage</td>
        <td>Deals with multicollinearity and prevents overfitting</td>
        <td>Useful for feature selection by driving some coefficients to zero</td>
        <td>Combines benefits of Ridge and Lasso, useful for correlated features</td>
    </tr>
    <tr>
        <td>Shrinking Effect</td>
        <td>Shrinks coefficients towards zero</td>
        <td>Can zero out coefficients, performing feature selection</td>
        <td>Shrinks coefficients towards zero and performs feature selection</td>
    </tr>
    <tr>
        <td>Solution Path</td>
        <td>Coefficients tend to be smaller but never exactly zero</td>
        <td>Coefficients can be exactly zero, leading to sparse models</td>
        <td>Coefficients can be zero and tend to be smaller, balances Ridge and Lasso</td>
    </tr>
    <tr>
        <td>Suitable for</td>
        <td>When all features are potentially useful, or when multicollinearity is present</td>
        <td>When there are many features and some are irrelevant or redundant</td>
        <td>When there are many features, some correlated, and feature selection is needed</td>
    </tr>
    <tr>
        <td>Computational Cost</td>
        <td>Less computationally expensive than Lasso</td>
        <td>Generally more computationally expensive than Ridge</td>
        <td>Moderate computational cost due to combination of L1 and L2 regularization</td>
    </tr>
</table>

</body>
</html>


    why lasso makes coffeffiecnt 0 and ridge near zero ? find out

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = sns.load_dataset('mpg')

In [3]:
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,usa,ford mustang gl
394,44.0,4,97.0,52.0,2130,24.6,82,europe,vw pickup
395,32.0,4,135.0,84.0,2295,11.6,82,usa,dodge rampage
396,28.0,4,120.0,79.0,2625,18.6,82,usa,ford ranger


In [4]:
df.drop("name", axis=1, inplace=True)

In [5]:
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,usa
1,15.0,8,350.0,165.0,3693,11.5,70,usa
2,18.0,8,318.0,150.0,3436,11.0,70,usa
3,16.0,8,304.0,150.0,3433,12.0,70,usa
4,17.0,8,302.0,140.0,3449,10.5,70,usa
...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,usa
394,44.0,4,97.0,52.0,2130,24.6,82,europe
395,32.0,4,135.0,84.0,2295,11.6,82,usa
396,28.0,4,120.0,79.0,2625,18.6,82,usa


In [6]:
df.isna().sum()

mpg             0
cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
model_year      0
origin          0
dtype: int64

In [7]:
df['horsepower'].median() # using median as no outliers treatment

93.5

In [8]:
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median)

In [9]:
df.isna().sum()

mpg             0
cylinders       0
displacement    0
horsepower      0
weight          0
acceleration    0
model_year      0
origin          0
dtype: int64

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    398 non-null    object 
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
dtypes: float64(3), int64(3), object(2)
memory usage: 25.0+ KB


In [11]:
# data encoding for origin

df.origin.unique()

array(['usa', 'japan', 'europe'], dtype=object)

In [12]:
df.origin.value_counts()

origin
usa       249
japan      79
europe     70
Name: count, dtype: int64

In [13]:
# data encoding for origin using labeling one hot encoding can be done
df['origin'] = df['origin'].map({"usa":1,"japan":2,"europe":3}) #this will put 1 for usa, 2 for japan

In [14]:
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,1
1,15.0,8,350.0,165.0,3693,11.5,70,1
2,18.0,8,318.0,150.0,3436,11.0,70,1
3,16.0,8,304.0,150.0,3433,12.0,70,1
4,17.0,8,302.0,140.0,3449,10.5,70,1
...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,1
394,44.0,4,97.0,52.0,2130,24.6,82,3
395,32.0,4,135.0,84.0,2295,11.6,82,1
396,28.0,4,120.0,79.0,2625,18.6,82,1


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    398 non-null    object 
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    int64  
dtypes: float64(3), int64(4), object(1)
memory usage: 25.0+ KB


In [16]:
#X = df.drop('mpg',axis=1)
X = df.iloc[:,1:]

#y = df['mpg']
y = df.iloc[:,0]

In [17]:
X

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,8,307.0,130.0,3504,12.0,70,1
1,8,350.0,165.0,3693,11.5,70,1
2,8,318.0,150.0,3436,11.0,70,1
3,8,304.0,150.0,3433,12.0,70,1
4,8,302.0,140.0,3449,10.5,70,1
...,...,...,...,...,...,...,...
393,4,140.0,86.0,2790,15.6,82,1
394,4,97.0,52.0,2130,24.6,82,3
395,4,135.0,84.0,2295,11.6,82,1
396,4,120.0,79.0,2625,18.6,82,1


In [18]:
y

0      18.0
1      15.0
2      18.0
3      16.0
4      17.0
       ... 
393    27.0
394    44.0
395    32.0
396    28.0
397    31.0
Name: mpg, Length: 398, dtype: float64

In [19]:
type(X)

pandas.core.frame.DataFrame

In [20]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1)

In [21]:
X_train.shape,X_test.shape

((278, 7), (120, 7))

In [22]:
y_train

350    34.7
59     23.0
120    19.0
12     15.0
349    34.1
       ... 
393    27.0
255    25.1
72     15.0
235    26.0
37     18.0
Name: mpg, Length: 278, dtype: float64

In [23]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

TypeError: float() argument must be a string or a real number, not 'method'

In [None]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train,y_train)