# What does regularization model do?

<img src="images\\Regularization_Models_D1.png" alt="image" width="700px">

- Basically, __Regularization is used to reduce the overfitting problem__
- We have already seen other things to do same task like PCA and feature selection methods
- __Coming to the image__:
  - As we can see, let's suppose the irregular line above is the __Best Fit Line__ for __overfitted__ model
    - i.e. curve is passing though all datapoints with no error
  - In this case __Training Accuracy__ will be __100%__, but when we try to predict for unknown datapoints(red dots below), __it fails__
  - Meaning that, as the model is overfitted, every external datapoint will be far away which will lead to more __error in prediction__
  - So to reduce this error in predictions by reducing overfitting of model we use __Regularization Models__
  - These models try to shift the __Current Best Fit Line__ to a __New Best Fit Line(black line)__ so that it will have <br>
      __less training errors as well as testing errors__

<img src="images\\Regularization_Models_D2.png" alt="image" width="700px" border="2px">
<img src="images\\Regularization_Models_D5.png" alt="image" width="700px" border="2px">

<h3>Our main aim is always to build a model with Low Bias and Low Variance, but is is possible?</h3>

<h2 style="color:limegreen">Bias Variance Trade Off -</h2>

__What is bias?__

- Bias is the difference between the average prediction of our model and the correct value which we are trying to predict
- Model with high bias pays very little attention to the training data and __oversimplifies the model__
- It always leads to high error on training and testing data

__What is variance?__

- Variance is the variability of the model prediction for a given data point or a value which tells us spread of our data
- Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn't seen before
- As a result, such models perform very well on training data but has high errors on testing data


<h4>Trade of means one increases other decreases and vice versa</h4>

- If we try to increase training accuracy, testing accuracy will decrease
- Similarly if you try to increase testing accuracy, training accuracy decrease
- This is known as __BIAS VARIANCE TRADE OFF__

<h2 style="color:limegreen">LASSO(L1) - </h2>

<img src="images\\Regularization_Models_D3.png" alt="image" width="700px" border="2px">

__What is Slope?__

- We can see, some formulations above but in general Slope is nothing but __rise over run__
- i.e. With change in X how much Y changes
- We also know __linear regression equation: y = ß<sub>0</sub> + ß<sub>1</sub>X<sub>1</sub> + ß<sub>2</sub>X<sub>2</sub> + error__
- In this equation every value ß correponds to slope associated to X

__How LASSO works here -__

- So whenever we try to find best fit line, we calculate __MSE__ i.e. Mean Squared Error
- In LASSO model, the __Best fit line__ is shifted to __new best fit line__ by using __L1 = MSE + α * abs(slope)__
- In short, it shifts the line by changing the slope
- For multiple lines, __L1 = MSE + α * |M<sub>1</sub> + M<sub>2</sub> + M<sub>3</sub>+...+ M<sub>n</sub>|__, where M is slope

__LASSO can also be used to get important features__

- LASSO tries to make minimal slope of the lines to __0__ so that they won't have consideration and the columns can be vomitted from dataset
- Like in img above, it tries to make ß0 = 0.001 to 0


<h2 style="color:limegreen">Ridge and Elastic Net - </h2>

<img src="images\\Regularization_Models_D4.png" alt="image" width="700px" border="2px">


__Ridge model -__

- Ridge model is very much similar to LASSO
- The only difference is that, __instead of absolute slope, it takes squares of them__
- i.e. __L2 = MSE + α * (slope)^2__, and for multiple lines __L1 = MSE + α * (M<sub>1</sub> + M<sub>2</sub> + M<sub>3</sub>+...+ M<sub>n</sub>)^2__


__Elastic Net -__

- It is combination of both L1 and L2. __EN = L1 + L2__
- __EN = MSE + α * abs(slope) + α * (slope)^2__

# PPT Part - 

<h2 style="color:red">Ridge Regression (or Tikhonov regularization) -</h2>

- It is a technique for analyzing multiple __regression__ data that suffer from __multicollinaerity__
- It adds __penalty equivalent__ to half of __squares of the magnitude__ of coefficients


<img src="images\\Regularization_Models_D6.png" alt="image" width="300px" style="margin-left:200px">

- If lambda is 0 then we will get back OLS. Whereas, if lambda is very large then a larger weight is added to loss function
- This will also cause the model to under-fit
- It reduces the model complexity by __coefficient shrinkage__, means it does not make coefficients as 0 __like L1__

__Ridge Regression is a regularized version of Linear Regression__

- A regularization term equal to <img src="images\\Regularization_Models_D7.png" alt="image" width="80px"> is added to the __cost function__
- This forces the learning algorithm to not only fit the data but also __keep the model weights as small as possible__
- The __hyperparameter α__ controls how much you want to regularize the model
- If __α = 0__, then Ridge Regression is just Linear Regression
- If __α is very large__, then all weights end up very close to 0 and the result is a flat line going through the __data's mean__

<img src="images\\Regularization_Models_D8.png" alt="image" width="250px" style="margin-left:30%">


<div style="display:flex;flex-direction:row;justify-content:space-between">
    
<img src="images\\Regularization_Models_D9.png" alt="image" width="50%" border="2px">
<img src="images\\Regularization_Models_D10.png" alt="image" width="40%" border="2px">

</div>

<h2 style="color:red">Lasso Regression (Least Absolute Shrinkage and Selection Operator)-</h2>

- It is another __regularized version of Linear Regression__
- Just like Ridge Regression, it adds a regularization term to the __cost function__
- But it uses the __l1 or ell(1) norm of the weight vector__ instead of half the square of the __l2 or ell(2)__ norm

<img src="images\\Regularization_Models_D11.png" alt="image" width="250px" style="margin-left:30%">

- It is a regression analysis method that performs __both variable selection and regularization__
- Performs __L1__ regularization, adds panalty equivalent to __absolute value of the magnitude__ of coefficients

<img src="images\\Regularization_Models_D12.png" alt="image" width="400px" style="margin-left:20%">

- If __lamda is 0__ then we will get back to __OLS__.
- Whereas, very large value will make coefficients __0__ hence it will __under-fit__
- Useful when more number of variables are there - __feature selection__


<div style="display:flex;flex-direction:row;justify-content:space-between">
    
<img src="images\\Regularization_Models_D13.png" alt="image" width="50%" border="2px">
<img src="images\\Regularization_Models_D14.png" alt="image" width="40%" border="2px">

</div>

<h2 style="color:red">Elastic Net -</h2>

<h3>Elastic Net is middlegroud between Ridge and Lasso Regression</h3>

- The regularization term is __simple mix of Ridge and Lasso's regularization terms__ and you can __control the mix ratio r__
- When __r = 0__, Elastic Net is equivalent to __Ridge Regression__
- When __r = 1__, Elastic Net is equivalent to __Lasso Regression__

<img src="images\\Regularization_Models_D15.png" alt="image" width="350px" style="margin-left:30%">


<h2 style="color:red">Which one to be used??</h3>

- When should ye can use plain Linear Regression, Ridge, Lasso or Elastic Net
- It is almost preferable to have at least a little bit of regularization, so generally you should __avoid plain Linear Regression__
- __Ridge is a good default__, but if we suspect that only a few features are useful, we should prefer __Lasso or Elastic Net__ because <br>
    they tend to reduce the useless features' weights down to __0__, as we have discussed

In genral, __Elasctic Net is preffered over Lasso__

<img src="images\\Regularization_Models_D16.png" alt="image" width="600px">


<div style="border:1px solid black;padding:20px;background-color:beige">

<h1 style="color:red">Summary</h1>


<h4 style="line-height:1.3;color:slateblue">Regularization is used to reduce the over dependence on any particular independent variable by adding the penalty 
    term to <br> the Loss function. This term prevents the coefficients of the independent variables to take extreme values</h4>

____

- __Ridge Regression adds L2 regularization penalty term to loss function__
- This term reduces the coefficients but __does not make them 0__ and thus __doesn't eliminate any independent variable__ completely
- It can be used to measure the impact of the different independent variables


____

- __Lasso Regression adds L2 regularization penalty term to loss function__
- This term reduces the coefficients as well as __makes them 0__ thus effectively __eliminate the corresponding independent variables__ completely
- It can be used for __feature selection__

____

- __Elastic Net is a combination of both of the above regularization.__
- It contains both the __L1 and L2 as its penalty term__
- It performs better than Ridge and Lasso Regression for most of the test cases

</div>

# Example Code

In [75]:
# In this code we will also be discusing about SimpleImputer and Ordinal Encoder

In [76]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
# fillna(): replaces na with mean mode or median from single column at a time.
# SimpleImputer will replace na from multiple columns at a time

import matplotlib.pyplot as plt
from sklearn.preprocessing import OrdinalEncoder
# We have seen OHE: for x variables, Lable Encoding: for y variable
# OrdinalEncoder: works same as Lable Encoder but on x variables

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet

import warnings
warnings.filterwarnings('ignore')

In [77]:
df = pd.read_csv("Datasets\\cars_new.csv")
df

Unnamed: 0,symboling,normalized-losses,make,fuel-type,body-style,drive-wheels,engine-location,width,height,engine-type,engine-size,horsepower,city-mpg,highway-mpg,price
0,3,?,alfa-romero,gas,convertible,rwd,front,64.1,48.8,dohc,130,111,21,27,13495
1,3,?,alfa-romero,gas,convertible,rwd,front,64.1,48.8,dohc,130,111,21,27,16500
2,1,?,alfa-romero,gas,hatchback,rwd,front,65.5,52.4,ohcv,152,154,19,26,16500
3,2,164,audi,gas,sedan,fwd,front,66.2,54.3,ohc,109,102,24,30,13950
4,2,164,audi,gas,sedan,4wd,front,66.4,54.3,ohc,136,115,18,22,17450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,-1,95,volvo,gas,sedan,rwd,front,68.9,55.5,ohc,141,114,23,28,16845
201,-1,95,volvo,gas,sedan,rwd,front,68.8,55.5,ohc,141,160,19,25,19045
202,-1,95,volvo,gas,sedan,rwd,front,68.9,55.5,ohcv,173,134,18,23,21485
203,-1,95,volvo,diesel,sedan,rwd,front,68.9,55.5,ohc,145,106,26,27,22470


In [78]:
df.info()
# ? is missing value because of it datatype has become object

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   symboling          205 non-null    int64  
 1   normalized-losses  205 non-null    object 
 2   make               205 non-null    object 
 3   fuel-type          205 non-null    object 
 4   body-style         205 non-null    object 
 5   drive-wheels       205 non-null    object 
 6   engine-location    205 non-null    object 
 7   width              205 non-null    float64
 8   height             205 non-null    float64
 9   engine-type        205 non-null    object 
 10  engine-size        205 non-null    int64  
 11  horsepower         205 non-null    object 
 12  city-mpg           205 non-null    int64  
 13  highway-mpg        205 non-null    int64  
 14  price              205 non-null    int64  
dtypes: float64(2), int64(5), object(8)
memory usage: 24.2+ KB


In [79]:
df.isna().sum() # not identified ? as missing value

symboling            0
normalized-losses    0
make                 0
fuel-type            0
body-style           0
drive-wheels         0
engine-location      0
width                0
height               0
engine-type          0
engine-size          0
horsepower           0
city-mpg             0
highway-mpg          0
price                0
dtype: int64

In [80]:
df['normalized-losses'].value_counts()

?      41
161    11
91      8
150     7
134     6
128     6
104     6
85      5
94      5
65      5
102     5
74      5
168     5
103     5
95      5
106     4
93      4
118     4
148     4
122     4
83      3
125     3
154     3
115     3
137     3
101     3
119     2
87      2
89      2
192     2
197     2
158     2
81      2
188     2
194     2
153     2
129     2
108     2
110     2
164     2
145     2
113     2
256     1
107     1
90      1
231     1
142     1
121     1
78      1
98      1
186     1
77      1
Name: normalized-losses, dtype: int64

In [81]:
df[df['horsepower']=='?']

Unnamed: 0,symboling,normalized-losses,make,fuel-type,body-style,drive-wheels,engine-location,width,height,engine-type,engine-size,horsepower,city-mpg,highway-mpg,price
130,0,?,renault,gas,wagon,fwd,front,66.5,55.2,ohc,132,?,23,31,9295
131,2,?,renault,gas,hatchback,fwd,front,66.6,50.5,ohc,132,?,23,31,9895


In [82]:
df.replace('?',np.nan,inplace=True)

In [83]:
df.isna().sum()   # Now nulls will be there

symboling             0
normalized-losses    41
make                  0
fuel-type             0
body-style            0
drive-wheels          0
engine-location       0
width                 0
height                0
engine-type           0
engine-size           0
horsepower            2
city-mpg              0
highway-mpg           0
price                 0
dtype: int64

In [84]:
df.dtypes     # We have converted to all floats, still dtypes will be as it was


symboling              int64
normalized-losses     object
make                  object
fuel-type             object
body-style            object
drive-wheels          object
engine-location       object
width                float64
height               float64
engine-type           object
engine-size            int64
horsepower            object
city-mpg               int64
highway-mpg            int64
price                  int64
dtype: object

In [85]:
# Before changung the dtypes(can do it after too), let's impute missing values with SimpleImputer

In [86]:
si = SimpleImputer(strategy='median')

In [87]:
df.iloc[:,[1,11]].head()   # 2 columns containing null values

Unnamed: 0,normalized-losses,horsepower
0,,111
1,,111
2,,154
3,164.0,102
4,164.0,115


In [88]:
df.iloc[:,[1,11]] = si.fit_transform(df.iloc[:,[1,11]])
# fit will calculate median of col 1 and 11, transform: replace missing value by median

In [89]:
df.isna().sum()

symboling            0
normalized-losses    0
make                 0
fuel-type            0
body-style           0
drive-wheels         0
engine-location      0
width                0
height               0
engine-type          0
engine-size          0
horsepower           0
city-mpg             0
highway-mpg          0
price                0
dtype: int64

In [41]:
df.select_dtypes(int).head() # will display only colums of int data type, try float

Unnamed: 0,symboling,engine-size,city-mpg,highway-mpg,price
0,3,130,21,27,13495
1,3,130,21,27,16500
2,1,152,19,26,16500
3,2,109,24,30,13950
4,2,136,18,22,17450


In [43]:
df.select_dtypes([int,float]).head() # all numerical columns

Unnamed: 0,symboling,normalized-losses,width,height,engine-size,horsepower,city-mpg,highway-mpg,price
0,3,115.0,64.1,48.8,130,111.0,21,27,13495
1,3,115.0,64.1,48.8,130,111.0,21,27,16500
2,1,115.0,65.5,52.4,152,154.0,19,26,16500
3,2,164.0,66.2,54.3,109,102.0,24,30,13950
4,2,164.0,66.4,54.3,136,115.0,18,22,17450


In [47]:
df.select_dtypes(object).head() # all numerical columns

Unnamed: 0,make,fuel-type,body-style,drive-wheels,engine-location,engine-type
0,alfa-romero,gas,convertible,rwd,front,dohc
1,alfa-romero,gas,convertible,rwd,front,dohc
2,alfa-romero,gas,hatchback,rwd,front,ohcv
3,audi,gas,sedan,fwd,front,ohc
4,audi,gas,sedan,4wd,front,ohc


In [49]:
df.dtypes

symboling              int64
normalized-losses    float64
make                  object
fuel-type             object
body-style            object
drive-wheels          object
engine-location       object
width                float64
height               float64
engine-type           object
engine-size            int64
horsepower           float64
city-mpg               int64
highway-mpg            int64
price                  int64
dtype: object

In [107]:
# Somehow SimpleImputer changes the dtypes of bot col1 and col11

In [109]:
df.select_dtypes(object)

Unnamed: 0,make,fuel-type,body-style,drive-wheels,engine-location,engine-type
0,alfa-romero,gas,convertible,rwd,front,dohc
1,alfa-romero,gas,convertible,rwd,front,dohc
2,alfa-romero,gas,hatchback,rwd,front,ohcv
3,audi,gas,sedan,fwd,front,ohc
4,audi,gas,sedan,4wd,front,ohc
...,...,...,...,...,...,...
200,volvo,gas,sedan,rwd,front,ohc
201,volvo,gas,sedan,rwd,front,ohc
202,volvo,gas,sedan,rwd,front,ohcv
203,volvo,diesel,sedan,rwd,front,ohc


In [119]:
# We have to encode this columns for model building 

# Let's get column names first

cat_col = df.select_dtypes(object).columns
cat_col

Index(['make', 'fuel-type', 'body-style', 'drive-wheels', 'engine-location',
       'engine-type'],
      dtype='object')

In [121]:
df[cat_col].head()

Unnamed: 0,make,fuel-type,body-style,drive-wheels,engine-location,engine-type
0,alfa-romero,gas,convertible,rwd,front,dohc
1,alfa-romero,gas,convertible,rwd,front,dohc
2,alfa-romero,gas,hatchback,rwd,front,ohcv
3,audi,gas,sedan,fwd,front,ohc
4,audi,gas,sedan,4wd,front,ohc


In [123]:
# Applying Ordinal Encoder for label encoding all above independent variables

oe = OrdinalEncoder() # convert categorical values to numerical
df[cat_col] = oe.fit_transform(df[cat_col])

In [125]:
df

Unnamed: 0,symboling,normalized-losses,make,fuel-type,body-style,drive-wheels,engine-location,width,height,engine-type,engine-size,horsepower,city-mpg,highway-mpg,price
0,3,115.0,0.0,1.0,0.0,2.0,0.0,64.1,48.8,0.0,130,111.0,21,27,13495
1,3,115.0,0.0,1.0,0.0,2.0,0.0,64.1,48.8,0.0,130,111.0,21,27,16500
2,1,115.0,0.0,1.0,2.0,2.0,0.0,65.5,52.4,5.0,152,154.0,19,26,16500
3,2,164.0,1.0,1.0,3.0,1.0,0.0,66.2,54.3,3.0,109,102.0,24,30,13950
4,2,164.0,1.0,1.0,3.0,0.0,0.0,66.4,54.3,3.0,136,115.0,18,22,17450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,-1,95.0,21.0,1.0,3.0,2.0,0.0,68.9,55.5,3.0,141,114.0,23,28,16845
201,-1,95.0,21.0,1.0,3.0,2.0,0.0,68.8,55.5,3.0,141,160.0,19,25,19045
202,-1,95.0,21.0,1.0,3.0,2.0,0.0,68.9,55.5,5.0,173,134.0,18,23,21485
203,-1,95.0,21.0,0.0,3.0,2.0,0.0,68.9,55.5,3.0,145,106.0,26,27,22470


In [127]:
# So all columns with object dtype are now label encode and not one-hot encode, meaning it won't split into multiple columns

## __Model Building__

In [131]:
# Diving into x and y

x = df.iloc[:,:-1]
y = df['price']

In [133]:
# train test split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=1)

In [137]:
# Linear Regression Model 

lr = LinearRegression() 
lr.fit(x_train,y_train)
preds = lr.predict(x_test)
preds

array([ 6114.11943789,  6899.1299745 ,  5179.12129606,  6534.49022809,
        9125.91144455, 26384.12259969,  7406.15558344,   895.58275157,
        5172.9291232 , 13513.24663171, 15407.85345859, 14421.55136235,
       16418.89788598, 11145.35574554, 16463.66533212, 13896.61774947,
        7335.98228673,  8845.0632645 , 10862.74899523,  7555.18709439,
       11006.99145217,  6830.04912601, 13638.73999814,  6363.9621527 ,
       13732.90563984,  8845.0632645 , 15040.20874209,  6544.19455256,
        4706.60204753,  9314.78618354,  8128.3940468 , 14321.0634702 ,
       25655.82279003, 11987.68138692, 19723.90650907,  6557.53154026,
        8154.91470876, 30658.97297566,  9834.8439003 , 16923.44920496,
        6520.66900847, 20194.14945986,  7692.55861481,  8459.52258636,
        8670.04245903,  7166.21791784, 40486.41308962,  7569.40057796,
       17170.51789287, 18954.92930717, 26190.76513016, 16914.80136249,
       21856.35461232,  6225.96759922, 13090.60124574,  7294.0335782 ,
      

In [139]:
lr.intercept_ # y intercept i.e. c value

-64935.113579782475

In [141]:
lr.coef_ # slope or coefficient or weight

array([ 5.71727164e+01,  4.76320989e-01, -2.01309566e+02, -6.22705136e+02,
       -1.63712110e+02,  1.88863899e+03,  1.63884484e+04,  7.90632094e+02,
        3.61221503e+02,  2.81207534e+02,  9.82290864e+01, -1.06474945e+01,
        3.08435166e+02, -4.17126915e+02])

In [145]:
lr.score(x_train,y_train),lr.score(x_test,y_test) # Train accuracy, Test accuracy
# overfitted model as high train accuracy and low test accuracy

(0.8504229026078213, 0.7964854785429522)

## __Lasso (L1) Regression -__

In [153]:
# after lasso, train accuracy will decrease but test accuracy will increase

l1 = Lasso() # alpha = 0.1 by default
l1.fit(x_train,y_train)
l1.score(x_train,y_train),l1.score(x_test,y_test)

# we got same accuracy and overfitted model

(0.8504215478243033, 0.7966615211575689)

In [155]:
l1.coef_
# same as previous results: no coefficient is 0 so go for hyperparameter tuning of alpha

array([ 5.70766693e+01,  4.67141700e-01, -2.01139953e+02, -6.14206970e+02,
       -1.64751649e+02,  1.88558824e+03,  1.63181959e+04,  7.88680118e+02,
        3.61783500e+02,  2.81248690e+02,  9.83193653e+01, -1.05310631e+01,
        3.07274772e+02, -4.15725804e+02])

<h2 style="color:green">Hyperparameter Tuning - </h2>

In [166]:
for i in range(100,200): # try 50 to 100, 100 to 200
  l1 = Lasso(alpha = i)
  l1.fit(x_train,y_train)
  print(f"Alpha: {i} Train: {l1.score(x_train,y_train)} Test: {l1.score(x_test,y_test)}")

Alpha: 100 Train: 0.8372483974026499 Test: 0.8092040910955615
Alpha: 101 Train: 0.8369899229541409 Test: 0.8092979610738256
Alpha: 102 Train: 0.8367288785613758 Test: 0.8093910017461713
Alpha: 103 Train: 0.8364651676681871 Test: 0.8094832684251523
Alpha: 104 Train: 0.8361989818307348 Test: 0.809574650362481
Alpha: 105 Train: 0.8359302253870347 Test: 0.8096652035299013
Alpha: 106 Train: 0.8356588978359675 Test: 0.8097549282309101
Alpha: 107 Train: 0.8353849991129143 Test: 0.8098438245155561
Alpha: 108 Train: 0.8351084257059384 Test: 0.8099319454841268
Alpha: 109 Train: 0.8348293842179736 Test: 0.8100191840285436
Alpha: 110 Train: 0.8345477714936768 Test: 0.8101055942385992
Alpha: 111 Train: 0.8342635877040373 Test: 0.8101911760417996
Alpha: 112 Train: 0.8339767222137543 Test: 0.8102759813628974
Alpha: 113 Train: 0.8336873954613211 Test: 0.8103599056262563
Alpha: 114 Train: 0.8333954973355284 Test: 0.8104430016665038
Alpha: 115 Train: 0.8331010276993127 Test: 0.8105252695475362
Alpha: 11

In [168]:
# as alpha increases train accuracy decreases and test accuracy increases
# so try big alpha value
# in above code try changing range from 50 to 100, 150 to 200

l1 = Lasso(alpha = 175) # select any value where train and test accuracy is same
l1.fit(x_train,y_train)
l1.score(x_train,y_train),l1.score(x_test,y_test)


(0.8107234361496892, 0.8139459609474126)

In [170]:
l1.coef_ # check columns whose slope is 0. They are not important in model building
# normalized-losses, fuel-type are not important

array([  26.15091464,   -0.        , -173.02547787,   -0.        ,
       -363.80743898, 1247.84193718, 3867.709301  ,  391.04314711,
        451.57194036,  297.35122938,  113.32977728,   11.00742675,
         70.90090498, -160.35202828])

In [172]:
x_train.columns

Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'body-style',
       'drive-wheels', 'engine-location', 'width', 'height', 'engine-type',
       'engine-size', 'horsepower', 'city-mpg', 'highway-mpg'],
      dtype='object')

## __Ridge (L2) Regression -__

In [180]:
l2 = Ridge()
l2.fit(x_train,y_train)
l2.score(x_train,y_train), l2.score(x_test,y_test)

(0.8435840853399228, 0.8075632224690532)

In [184]:
for i in range(1,100):
    l2 = Ridge(alpha=i)
    l2.fit(x_train,y_train)
    print(f"Alpha: {i} Train: {l2.score(x_train,y_train)} Test: {l2.score(x_test,y_test)}")
    # alpha = 6 to 10 same train and test accuracies

Alpha: 1 Train: 0.8435840853399228 Test: 0.8075632224690532
Alpha: 2 Train: 0.8356695734845091 Test: 0.8112192014374253
Alpha: 3 Train: 0.8296379623431074 Test: 0.8129299663310144
Alpha: 4 Train: 0.8250699092246865 Test: 0.8138839096972438
Alpha: 5 Train: 0.8215093087765014 Test: 0.8144682684596012
Alpha: 6 Train: 0.818648610383485 Test: 0.8148435627265228
Alpha: 7 Train: 0.8162882573020809 Test: 0.8150880725612084
Alpha: 8 Train: 0.8142964263180523 Test: 0.815244730507158
Alpha: 9 Train: 0.812583522729097 Test: 0.8153392574436445
Alpha: 10 Train: 0.8110868722186452 Test: 0.8153881483263247
Alpha: 11 Train: 0.8097614513602517 Test: 0.8154025610279507
Alpha: 12 Train: 0.8085741366835051 Test: 0.8153903693701572
Alpha: 13 Train: 0.8075000372738094 Test: 0.815357321289201
Alpha: 14 Train: 0.8065200924084595 Test: 0.8153077294072232
Alpha: 15 Train: 0.8056194580416447 Test: 0.8152449025615434
Alpha: 16 Train: 0.8047863980733031 Test: 0.815171426484327
Alpha: 17 Train: 0.8040115065187472 Te

In [186]:
l2 = Ridge(alpha = 8)
l2.fit(x_train,y_train)
l2.score(x_train,y_train),l2.score(x_test,y_test)

(0.8142964263180523, 0.815244730507158)

In [190]:
l2.coef_   # It doen't make any slope to 0

array([ 232.21099351,   -3.20670335, -189.34649798, -973.96360026,
       -611.0457004 , 1676.24489552, 3064.9010721 ,  383.53935829,
        561.24716944,  513.14971523,  103.43849153,   20.83054838,
        211.07802458, -276.03878654])

## __Elastic Net__

In [194]:
en = ElasticNet(alpha = 1) # default alpha=0.1
en.fit(x_train,y_train)
en.score(x_train,y_train),en.score(x_test,y_test)

(0.7866253599240466, 0.809826481725613)

In [196]:
# We can go for Ridge Regression or Lasso