Question:
Which one of these are representation of linear regression model?
<br>
1) y = $\beta_0$ + $\beta_1$ * $x_1^2$ + $\beta_2$ * $x_2^3$
<br>
2) y = $\beta_0$ + $\beta_1$ * $x_1$ + $\beta_2$ * $x_2$
<br>
3) y = $\beta_0$ + $\beta_1^2$ * $x_1$ + $\beta_2^2$ * $x_2$
<br>
4) y = $\hat{\beta_0}$ + $\hat{\beta_1}$ * $\hat{x_1}$ + $\hat{\beta_2}$ * $\hat{x_2}$
<br>
Where $\hat{\beta_0}$, $\hat{\beta_1}$, $\hat{\beta_2}$, $\hat{x_1}$, $\hat{x_2}$ are n-dimensional vectors.

In [None]:
## Importing Libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge  
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

##Import Data using an url

In [None]:
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'

In [None]:
dataframe = pd.read_csv(url, header=None)

In [None]:
data = dataframe.values
X, y = data[:, :-1], data[:, -1]

Loading Toy Datasets from Sklearn

In [None]:
# Load the diabetes dataset
X1, y1 = datasets.load_diabetes(return_X_y=True)

## Loading Data from Local Machine

In [None]:
from google.colab import files
u = files.upload()

In [None]:
df = pd.read_csv('.csv')

Splitting the data randomly

In [None]:
np.shape(X)

(506, 13)

In [None]:
np.shape(y)

(506,)

In [None]:
# Splitting the Data into training and testing
x_train, x_test,y_train,y_test = train_test_split(X,y,test_size =0.2)
# print the data


In [None]:
np.set_printoptions(precision =2, suppress=True)
x_train

array([[  0.08,   0.  ,  13.92, ...,  16.  , 396.9 ,   8.58],
       [  0.19,   0.  ,   6.91, ...,  17.9 , 396.9 ,  14.15],
       [  0.54,   0.  ,  21.89, ...,  21.2 , 396.9 ,  18.46],
       ...,
       [  1.52,   0.  ,  19.58, ...,  14.7 , 388.45,   3.32],
       [  0.09,  30.  ,   4.93, ...,  16.6 , 383.78,   7.37],
       [  0.04,  80.  ,   3.64, ...,  16.4 , 395.18,   9.25]])

In [None]:
np.shape(x_train)

(404, 13)

Other methods to evaluate a model

1) Leave one out
<br>
2) K-Fold Cross Validation
<br>
3) Stratified K-Fold Cross Validation

For Reference: Documentation of Linear Regression, Ridge and Lasso in Scikit learn
<br>
[Linear Reg Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)
<br>
[Ridge Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)
<br>
[Lasso Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html)


In [None]:
## Linear Regression model creation
clf = LinearRegression(fit_intercept = True, normalize = False)

In [None]:
## Ridge Regression Model Creation
clf1 = Ridge(alpha = 1, fit_intercept = True, normalize = False, solver = 'auto')

In [None]:
## Lasso Model Creation
clf2 = Lasso(alpha = 1, fit_intercept = True, normalize = False, selection = 'cyclic')

In [None]:
## Fitting the model to the split data
clf.fit(x_train,y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)




LinearRegression(normalize=True)

In [None]:
## Fitting the model to the split data
clf1.fit(x_train,y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 


Ridge(alpha=1, normalize=True, tol=3)

In [None]:
## Fitting the model to the split data
clf2.fit(x_train,y_train)

If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

model = make_pipeline(StandardScaler(with_mean=False), Lasso())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:

kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * np.sqrt(n_samples). 


Lasso(alpha=1, normalize=True, tol=3)

In [None]:
clf.coef_

array([-1.22220590e-01,  5.56317023e-02,  4.37572729e-02,  3.14997011e+00,
       -1.79248080e+01,  3.91535262e+00,  6.86068512e-04, -1.41899691e+00,
        3.24079396e-01, -1.18204608e-02, -9.39898550e-01,  7.60867110e-03,
       -5.25906066e-01])

In [None]:
clf1.coef_

array([-6.03064636e-02,  2.36052193e-02, -6.65816756e-02,  2.50360115e+00,
       -3.90737327e+00,  2.89196795e+00, -9.23459082e-03, -2.35098108e-01,
        1.89473135e-03, -2.53160213e-03, -5.31219347e-01,  5.13734039e-03,
       -2.61312311e-01])

In [None]:
clf2.coef_

array([-0.,  0., -0.,  0., -0.,  0., -0.,  0., -0., -0., -0.,  0., -0.])

In [None]:
clf1.intercept_

28.877868339105163

In [None]:
clf.n_features_in_

13

In [None]:
clf2.n_features_in_

13

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred = clf.predict(x_test)

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred_Ridge = clf1.predict(x_test)

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred_Lasso = clf2.predict(x_test)

In [None]:
## Checking mean squared error loss Linear Reg
mean_squared_error(y_test, y_test_pred)

19.29042539183503

In [None]:
## Checking mean squared error loss Ridge
mean_squared_error(y_test, y_test_pred_Ridge)

27.45623391044386

In [None]:
## Checking mean squared error loss Lasso
mean_squared_error(y_test, y_test_pred_Lasso)

83.97931658829103

In [None]:
# Load the diabetes dataset
X, Y = datasets.load_diabetes(return_X_y=True)

In [None]:
# X1, Y1 = datasets.load_iris(return_X_y=True)

In [None]:
# Splitting the Data into training and testing
x_train, x_test,y_train,y_test = train_test_split(X,Y,test_size =0.2)
# print the data
x_train

array([[-0.01,  0.05, -0.01, ..., -0.  ,  0.01,  0.07],
       [ 0.04,  0.05, -0.02, ..., -0.  ,  0.  , -0.05],
       [ 0.04,  0.05, -0.01, ..., -0.  , -0.01,  0.  ],
       ...,
       [-0.06,  0.05, -0.08, ..., -0.04, -0.02, -0.05],
       [-0.07, -0.04, -0.01, ..., -0.04, -0.04, -0.  ],
       [-0.02, -0.04, -0.04, ..., -0.04, -0.07, -0.08]])

In [None]:
## Creating a linear regression model
clf = LinearRegression()

In [None]:
clf1 = Ridge()

In [None]:
clf2 = Lasso()

In [None]:
## Fitting the model to the split data
clf.fit(x_train,y_train)

LinearRegression()

In [None]:
## Fitting the model to the split data
clf1.fit(x_train,y_train)

Ridge()

In [None]:
## Fitting the model to the split data
clf2.fit(x_train,y_train)

Lasso()

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred = clf.predict(x_test)

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred_Ridge = clf1.predict(x_test)

In [None]:
## Predicting Y values for test case(x_test)
y_test_pred_Lasso = clf2.predict(x_test)

In [None]:
## Checking mean squared error loss
mean_squared_error(y_test, y_test_pred)

2938.4744207530184

In [None]:
## Checking mean squared error loss
mean_squared_error(y_test, y_test_pred_Ridge)

3528.2117263852865

In [None]:
## Checking mean squared error loss
mean_squared_error(y_test, y_test_pred_Lasso)

3973.5419521346444