## Ridge Regression
Tikhonov regularization, named for Andrey Tikhonov, is a method of regularization of ill-posed problems. Ridge regression is a special case of Tikhonov regularization in which all parameters are regularized equally. Ridge regression is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias (see bias–variance tradeoff).

For more information - https://en.wikipedia.org/wiki/Ridge_regression

## Lasso Regression
In statistics and machine learning, lasso (least absolute shrinkage and selection operator; also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

For more information - https://en.wikipedia.org/wiki/Lasso_(statistics)

In [2]:
## Importing Necessary packages and libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

In [5]:
## Importing csv file
df=pd.read_csv("S:/ML/datasets/housing.csv")

In [7]:
df.head()

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000.0
1,6.421,9.14,17.8,453600.0
2,7.185,4.03,17.8,728700.0
3,6.998,2.94,18.7,701400.0
4,7.147,5.33,18.7,760200.0


In [8]:
## Data dividing into x and y
# x=independent variable
# y=dependent variable

x=df.drop(columns="MEDV")
y=df["MEDV"]

In [11]:
## Data splitting into train and test
from sklearn.model_selection import train_test_split 
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=0)

In [12]:
## Checking shape of training and testing data
print(xtrain.shape)
print(xtest.shape)
print(ytrain.shape)
print(ytest.shape)

(391, 3)
(98, 3)
(391,)
(98,)


In [26]:
## Importing LinearRegression,Ridge,Lasso
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

## Linear regression
lr=LinearRegression()
lr.fit(xtrain,ytrain)
print("LR Coefficient:",lr.coef_)

LR Coefficient: [ 82131.07271594 -11531.40087939 -19425.18723615]


In [27]:
## ridge is L1 Regularization technique
## Ridge model training
##In ridge we apply penelty on data in the form of alpha to avoid overfitting and underfitting
ridge=Ridge(alpha=0.1)  ## alpha value can be anything should be a number only
ridge.fit(xtrain,ytrain)
print("Ridge Coefficient:",ridge.coef_)

Ridge Coefficient: [ 82052.7056329  -11535.52526012 -19426.02193535]


In [28]:
## Lasso is L2 Regularization technique
#In lasso also we apply penelty on data in the form of alpha to avoid overfitting and underfittling
## alpha value can be anything should be a number only
lasso=Lasso(alpha=0.1)
lasso.fit(xtrain,ytrain)
print("Lasso Coefficient:",lasso.coef_)

Lasso Coefficient: [ 82130.851335   -11531.41526053 -19425.16627829]


### Result onTraining and Testing data

In [29]:
##Checking training and testing score of all 3 models on linear data

print("Linear Regression on training data:",lr.score(xtrain,ytrain))
print("Linear Regression on testing data:",lr.score(xtest,ytest))
print("Ridge Regression on training data:",ridge.score(xtrain,ytrain))
print("Ridge Regression on testing data:",ridge.score(xtest,ytest))
print("Lasso Regression on training data:",lasso.score(xtrain,ytrain))
print("Lasso Regression on testing data:",lasso.score(xtest,ytest))

Linear Regression on training data: 0.7326740414596575
Linear Regression on testing data: 0.6574622113312862
Ridge Regression on training data: 0.7326739811258268
Ridge Regression on testing data: 0.6574315578258965
Lasso Regression on training data: 0.7326740414590787
Lasso Regression on testing data: 0.6574621167499852


## Polynomial Regression

In [40]:
from sklearn.preprocessing import PolynomialFeatures
pf=PolynomialFeatures(degree=4)
poly_xtrain=pf.fit_transform(xtrain)
poly_xtest=pf.fit_transform(xtest)

In [41]:
lr=LinearRegression()
lr.fit(poly_xtrain,ytrain)
print("LR Coefficient:",lr.coef_)

LR Coefficient: [-5.54749548e-02 -1.05142940e+07 -6.68710790e+06 -2.48484128e+07
 -2.61248143e+06  6.14259566e+05  3.07080570e+06  4.36684296e+04
  8.53249343e+05  1.31779576e+06  3.17964653e+05  1.86716324e+03
 -1.55887847e+04 -3.03519235e+03 -6.58273684e+04 -1.45462313e+05
 -1.70710773e+02 -3.32912827e+03 -3.38339171e+04 -2.61727159e+04
 -9.69598374e+03  4.31667364e+02 -5.18521330e+03  5.01768100e+01
 -7.13700508e+02  3.13181036e+03 -1.88129089e+01  1.77130331e+02
  1.94722311e+03  1.59722980e+03 -3.12691643e-01  1.66936730e+01
  3.96746874e+01  3.83831730e+02  1.77120209e+02]


In [42]:
##Ridge model training on polynomial data
ridge=Ridge(alpha=0.1)  ## alpha value can be anything should be a number only
ridge.fit(poly_xtrain,ytrain)
print("Ridge Coefficient:",ridge.coef_)

Ridge Coefficient: [ 0.00000000e+00  6.61203922e+03 -7.24354747e+04 -2.56156116e+04
  1.45603017e+04 -1.39901817e+05 -4.14272426e+04 -2.52939514e+04
  1.13420501e+05 -1.71613931e+04  3.97192667e+03  1.10106924e+04
  7.35694555e+03  2.61337974e+03  9.41425374e+02  1.67895390e+03
  1.86350502e+02  1.25113727e+03 -7.31360038e+03  1.54044354e+03
 -2.51210683e+02  8.83591087e+02 -9.63219497e+02  9.64563690e+00
 -1.47144537e+03  5.26847683e+02 -2.82488429e+01 -4.96859445e+01
  5.19495767e+02 -2.68743599e+02 -8.20660833e-01  3.69454487e+00
 -3.27554703e+01  8.95523089e+01 -6.39696741e+00]


In [43]:
##Lasso model training on polynomial data
lasso=Lasso(alpha=0.1)
lasso.fit(poly_xtrain,ytrain)
print("Lasso Coefficient:",lasso.coef_)

Lasso Coefficient: [ 0.00000000e+00  1.13701710e+04  1.21849985e+04  9.85996830e+04
  2.56537571e+04  6.84852300e+02 -5.88599927e+03 -3.59490251e+02
  2.48698575e+02  1.23697463e+02  7.37165256e+02 -1.46135292e+02
 -1.97735691e+02  9.40896121e+00  2.96396711e+01 -2.03381748e+02
 -3.75217923e+00 -5.58097168e+00 -2.61661037e+00  6.70209028e+00
 -3.56451812e+01 -1.04760176e+02  4.71165074e+01  1.32539911e+01
  7.38657352e+00 -8.48848103e+00 -1.26546091e-01  8.51276360e-01
 -1.20871947e+00 -7.10703144e+00  6.50202744e-01 -9.29922058e-01
 -7.61072068e-01 -1.22283848e-01  6.02619157e-01]


  model = cd_fast.enet_coordinate_descent(


In [44]:
## checking training and testing score on polynomial data

print("Linear Regression on training data:",lr.score(poly_xtrain,ytrain))
print("Linear Regression on testing data:",lr.score(poly_xtest,ytest))
print("Ridge Regression on training data:",ridge.score(poly_xtrain,ytrain))
print("Ridge Regression on testing data:",ridge.score(poly_xtest,ytest))
print("Lasso Regression on training data:",lasso.score(poly_xtrain,ytrain))
print("Lasso Regression on testing data:",lasso.score(poly_xtest,ytest))

Linear Regression on training data: 0.8782597300464372
Linear Regression on testing data: 0.7752997053780455
Ridge Regression on training data: 0.8703674435798919
Ridge Regression on testing data: 0.7683747394988618
Lasso Regression on training data: 0.8584009532038034
Lasso Regression on testing data: 0.7753414036953719
