# CHAPTER - 13: Linear Regression

Used when target vector is a quantitative value.

## 13.1 Fitting a line

To train a model that represents a linear relationship between feature and target vector.

In [9]:
# loading libraries

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression

In [10]:
# loading data with only two features

california = fetch_california_housing()
features = california.data[:,0:2]
target = california.target

In [11]:
# creating linear regression

regression = LinearRegression()

In [14]:
# fit the linear regression

model = regression.fit(features, target)

# print("Feat", features.shape)
# print("target", target.shape)

Linear Regression assumes that relationship between features and target is linear i.e., the effect of features on the target is constant.

here we are taking only 2 features, so the linear model looks like:

y_hat = b0 + b1 x1 + b2 x2 + e

y_hat: target

xi: data for single feature

b1, b2: coefficients identified by fitting the model

b0: bias/intercept

e: error

In [6]:
# to view the intercept

model.intercept_

-0.10189032759082695

In [7]:
# to view the coefficients

model.coef_

array([0.43169191, 0.01744134])

In [12]:
# target value is the median of the dataset, so the price of the first home from dataset is:

target[0] * 1000

4526.0

In [13]:
# value of the first house prediction using the model

model.predict(features)[0]*1000

4207.126263821179

The model created here is just $319 off

## 13.2 Handling Interactive Effects
A feature whose effect on the target variable depends on another feature

Craeting an interaction term to capture that dependence using scikit-learn's PolynomialFeatures:

In [59]:
# loading libraries

from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import PolynomialFeatures

In [60]:
# loading data with only 2 features

california = fetch_california_housing()
features = california.data[:,0:2]
target = california.target

In [61]:
# reating interaction term
interaction = PolynomialFeatures(
degree = 3, include_bias = False, interaction_only = True)
features_interaction = interaction.fit_transform(features)

In [62]:
# linear regression

regression = LinearRegression()

In [63]:
# fitting the linear regression

model = regression.fit(features_interaction, target)
# print("Feat", features_interaction.shape)
# print("target", target.shape)

Linear regression assumes that, the relationship between features and the target vector is approximately linear, i.e., the effect of features on the target vector is constant. Th equation with features looks like:

y^ = b0^ + b1^ x1 + b2^ x2 + e

y^: target vector

xi: data of single feature

b1^, b2^: coefficients

e: error

b0^: bias/intercept

In [68]:
# to view the feature values for first observation

features[0]

array([ 8.3252, 41.    ])

In [69]:
# to create an interactive term we multiply those 2 values together for every observations

import numpy as np

In [70]:
interaction_term = np.multiply(features[:,0], features[:,1])

In [72]:
# interaction term for the first observation

interaction_term[0]

341.33320000000003

PolynomialFeatures of scikit-learn is used to create interaction terms, then we can use model selection strategies to identify the combination of features and interaction terms that produce the best model.

There are 3 important parameters we must set for PolynomialFeatures:- 
1) interaction_only = True: tells the PolynomialFeatures to only return interaction terms.
2) include_bias = False: by default PolynomialFeatures will ad  a feature containing ones called a bias.
3) degree: determines maximum no.of features to create.

In [73]:
# values of first observation

features_interaction[0]

array([  8.3252,  41.    , 341.3332])

## 13.3 Fitting a Nonlinear Relationship

Creating a Polynomial regression by including a polynomial features in a linear regression model

In [74]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import PolynomialFeatures

In [75]:
# loading data with one feature

california = fetch_california_housing()
features = california.data[:,0:1]
target = california.target

In [76]:
# creating polynomial features x^2 and x^3

polynomial = PolynomialFeatures(degree = 3, include_bias = False)
features_polynomial = polynomial.fit_transform(features)

In [77]:
# create linear regression

regression = LinearRegression()

In [78]:
# fitting the linear regression

model = regression.fit(features_polynomial, target)

In [80]:
# only one observation, 1st observation

print("first degree term:",features[0])

# to create a polynomial feature, we can rasie the power by second degree, this forms a new feature

print("second degree term:",features[0]**2)

# we can increase the degree and add one more feature

print("third degree term:",features[0]**3)


first degree term: [8.3252]
second degree term: [69.30895504]
third degree term: [577.0109125]


In [81]:
# all these can be included to form a single feature matrix and then running a feature matrix

features_polynomial[0]

array([  8.3252    ,  69.30895504, 577.0109125 ])

## 13.4 Reducing Variance with Regularization

To reduce variance in Linear Regression model, using a learning algorithm that includes a shrinkage penalty, also called Regularization, like Ridge regression and Lasso regression:



In [83]:
# loading libraries

from sklearn.linear_model import Ridge
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

In [84]:
# loading california dataset

california = fetch_california_housing()
features = california.data
target = california.target

In [85]:
# standardizing the features

scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

In [86]:
# creating ridge regression with an alpha value

regression = Ridge(alpha = 0.5)

In [87]:
# fitting the linear regression

model = regression.fit(features_standardized, target)

As we know Linear regression is used to reduce the sum of squared errors between true value and predicted value or residual sum of squares(RSS).

Regularized regression learners are same, except they attempt to minimize RSS and some penalty for total size of coefficient values.

There are 2 types of regularized regression learners:
1) Ridge: Shrinkage penalty is a tuning hyperparameter multiplied by the **squared sum** of all coefficients.
2) Lasso: Shrinkage penalty is a tuning hyperparameter multiplied by **sum of the absolute value** of all the coefficients.

Scikit-learn includes a RidgeCV method that allows us to select the ideal  value of alpha

In [88]:
# loading the library

from sklearn.linear_model import RidgeCV

In [89]:
# creating Ridge regression with 3 alpha values

regr_cv = RidgeCV(alphas = [0.1,1.0,10.0])

In [90]:
# fitting the linear regression model

model_cv = regr_cv.fit(features_standardized, target)

In [91]:
# view the coefficients

model_cv.coef_

array([ 0.8293461 ,  0.11939823, -0.26422311,  0.30398067, -0.00427544,
       -0.03936068, -0.8937389 , -0.86433656])

In [92]:
# we can now easily view the models alpha value:

model_cv.alpha_

10.0

## 13.5 Reducing features with Lasso Regrssion

simplifying linear regression model by redusing the no of features

In [93]:
# loading libraries

from sklearn.linear_model import Lasso
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

In [94]:
# loading data

california = fetch_california_housing()
features = california.data
target = california.target

In [95]:
# standardizing the features

scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

In [96]:
# creating lasso regression with alpha value

regression = Lasso(alpha = 0.5)

In [97]:
# fit the linear regression model

model = regression.fit(features_standardized, target)

One charecteristic of Lasso regressions penalty is that it can shrink the coefficients of a model to zero, there by reducing the no of features in the model, in the above we set alpha to 0.5 and most of the model features are 0, means they are not used in the model.

In [98]:
model.coef_

array([ 0.29398939,  0.        ,  0.        , -0.        , -0.        ,
       -0.        , -0.        , -0.        ])

If we increase the alpha to a much higher value, we see almost none of the features are used. Increasing the alpha value to 5

In [100]:
lasso_5 = Lasso(alpha = 5)

model5 = lasso_5.fit(features_standardized, target)
model5.coef_

array([ 0.,  0.,  0., -0., -0., -0., -0., -0.])