# Linear Regression in Python 

### Fitting a line for linear regression

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

### Loading the boston housing dataset

In [3]:
boston=load_boston()
features=boston.data[:,0:2]
target=boston.target

In [6]:
boston

{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
         4.9800e+00],
        [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
         9.1400e+00],
        [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
         4.0300e+00],
        ...,
        [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         5.6400e+00],
        [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
         6.4800e+00],
        [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         7.8800e+00]]),
 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
        18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
        15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
        13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
        21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
        35.4, 24.7, 3

In [7]:
#Create Linear Regression
lin_reg=LinearRegression()

In [8]:
#Fit the linear regression
model=lin_reg.fit(features, target)

In Boston dataset, only 2 predictors are considered. Therefore, y=B0+B1X1+b2X2+e where B0 intercept, e is ther error 

In [9]:
#View the feature coefficients
model.coef_

array([-0.35207832,  0.11610909])

In [10]:
#View the intercept
model.intercept_

22.485628113468223

In [11]:
#In dataset, target value is the median value of Boston home in thousands of dollars. Therefore, price of first home in the dataset is multiplied by 1000
target[0]*1000

24000.0

In [12]:
#Predict the target value of the first observation, multiplied by 1000
model.predict(features)[0]*1000

24573.366631705547

In [13]:
#First co-efficient multiplied by 1000
model.coef_[0]*1000

-352.0783156402677

1st feature represents number of crimes per capita. -352.07 Indicates that every single crime per capita will decrease the price of the house by approximately $350

### Handling Interactive Effects

In reality, we have features whose effect on the target variables depends on another feature.To capture this interaction effect in our model, we can use scikit-learn's PolynomialFeature() function

Example: We have to predict if the coffee tastes sweet, and we have 2 binary features: sugar (if sugar is present or not) and stirred (if we have stirred or not).The coffee will taste sweet only when both sugar=1 and stirred =1. Just putting sugar in the coffee(sugar=1,stirred=0) OR just stirring the coffee(sugar=0,stirred=1) won't make the coffee taste sweet. Thus, the effect of sugar and coffee are dependent on each other. In such a case, we say there is an interaction effect between features sugar and stirred. This interaction effect can be included in our linear regression model, by including a new feature comprising product of corresponding values from the interacting features: y=B0+ B1X1 + B2X2 + B3X1X2 + e. Now, for practical understanding, let's implement this on our boston dataset 

In [16]:
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.preprocessing import PolynomialFeatures

In [17]:
boston=load_boston()
features=boston.data[:,0:2]
target=boston.target

In [35]:
#Create interaction term
interaction=PolynomialFeatures(degree=3,include_bias=False,interaction_only=True)
features_interaction=interaction.fit_transform(features)

By default, polynomial features will add a bias, which can be dropped by include_bias=False. interaction_only=True tells the Polynomial to include only interaction terms (x1.x2) and not the PolynomialFeatures(X1^2, x2^2)

In [37]:
features_interaction[0]

array([6.3200e-03, 1.8000e+01, 1.1376e-01])

In [38]:
#fitting the regression with interactive features
regression=LinearRegression()

In [39]:
model=regression.fit(features_interaction, target)

### Fitting a Non-Linear Relatipnship

In reality, many a times, our data is non-linear.This non-linearity in data can also be captured using PolynomialFeatures()

Example of Linear data: number of stories a building has and the buidling's height. In Linear regression, we assume that the effect of number of storeis and buidling height is constant. A 20 story building height will be roughly twice as high as a 10-story building, which will be twice as high as a 5-story building.

Example of Non-linear data: Relationship between the number of hours a student studies and the marks he/she gets in a test. Intuitively, one can imagine there is a big difference in the test scores between students who study for 1 hour as compared to students who did not study at all. However, there is much smaller difference in test scores between a student who studied for 99 hours and a student who studied for 100 hours. The effect of one hour of studying that has on student's test scores, decreases as the number of hours increases.

In [41]:
non_linearity=PolynomialFeatures(degree=3,include_bias=False)
non_linear_features=non_linearity.fit_transform(features)

Here, by default, interaction_term=False and hence all terms are included

In [44]:
non_linear_features[0]

array([6.32000000e-03, 1.80000000e+01, 3.99424000e-05, 1.13760000e-01,
       3.24000000e+02, 2.52435968e-07, 7.18963200e-04, 2.04768000e+00,
       5.83200000e+03])

In [45]:
non_linear_reg=LinearRegression()

In [47]:
non_linear_model=non_linear_reg.fit(features,target)

### Using Regularization to Reduce Overfitting (reduce variance)

Regularized regression learners attempt to minimize RSS=(summation of (y-y')^2) and shrinkage penalty (some penalty for total size of coefficient values)

#### Ridge Regression

In ridge regression, the shrinkage penalty is a hyperparameter multiplied by the squared sum of all coefficients: RSS + alpha * summation of B^2 (for all co-efficients B)

In [53]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

In [102]:
# Load Data
boston=load_boston()
features=boston.data
target=boston.target

In [103]:
# standardize features
scaler=StandardScaler()
features_standardized=scaler.fit_transform(features)

In [104]:
#Create ridge refression with alpha value
regression_r=Ridge(alpha=0.5)

In [105]:
#Fit the linear regression
model=regression_r.fit(features_standardized,target)

When alpha is too low, it's as good as no penalization and model will be overfitting, complex. Alternatively, when alpha is too high, model will underfit and will become to simple. So alpha needs to be tunded. This can be done using RidgeCV

In [106]:
from sklearn.linear_model import RidgeCV

In [121]:
# Create ridge regression with 3 alpha values
ridge_cv=RidgeCV(alphas=[0.1,1.0,5,7,10,20,100])

In [122]:
# Fit the linear regression
model_ridgeCV=ridge_cv.fit(features_standardized,target)

In [123]:
model_ridgeCV.coef_

array([-0.89015213,  1.01207982,  0.03670572,  0.69670303, -1.92608614,
        2.71265969, -0.00928493, -2.97506515,  2.3446415 , -1.78380629,
       -2.02178301,  0.84709801, -3.68122188])

In [124]:
#View alpha
model_ridgeCV.alpha_

5.0

In [111]:
model_ridgeCV.best_score_

-23.718112644972546

#### Lasso Regression

Lasso is also used for feature selection. In lasso regression, the shrinkage penalty is a hyperparameter multiplied by sum of absolute value of all the coefficients: RSS + alpha * summation of |B|

In [112]:
from sklearn.linear_model import Lasso

In [113]:
#Create lasso with alpha value
regression_l=Lasso(alpha=0.5)

In [114]:
model_lasso=regression_l.fit(features_standardized,target)

In [115]:
model_lasso.coef_

array([-0.11526463,  0.        , -0.        ,  0.39707879, -0.        ,
        2.97425861, -0.        , -0.17056942, -0.        , -0.        ,
       -1.59844856,  0.54313871, -3.66614361])

In [116]:
from sklearn.linear_model import LassoCV

In [126]:
#Create lasso Regression with apha values
lasso_cv=LassoCV(alphas=[0.05,0.5,1,5,7,10,20,100])

In [128]:
#Fit the linear regression
model_lassoCV=lasso_cv.fit(features_standardized, target)

In [132]:
#View co-efficeints
model_lassoCV.coef_

array([-0.78186122,  0.88839401, -0.        ,  0.67374712, -1.79294898,
        2.74738475, -0.        , -2.78126471,  1.90060723, -1.41445404,
       -1.98475002,  0.80473834, -3.72701745])

In [133]:
#View alpha
model_lassoCV.alpha_

0.05