In [1]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [2]:
matplotlib.rcParams['figure.figsize'] = [20, 10]

We are going to explain very important technique called regularization. We have already discuss that sometimes models overfit. If we observe that we could try to make model simpler. But it is not always possible and sometimes it is complicated to manually find out how to do this. So regularization is doing this for us in a sort of automated way. We will see how.

We will discuss here two very popular types of regularizations:
* $l_2$ regularization
* $l_1$ regularization.
We will explain them using an example from <kaggle.com> the competition called __House Prices: Advanced Regression Techniques__, see <https://www.kaggle.com/c/house-prices-advanced-regression-techniques>. So please download `train.cvs` dataset from there. 

So are what $l_1$ and $l_2$ regularizations_ The name comes from the fact that those techniques try to reduce the variance of the predictions of the model. In simple words it means that they try to make predictions smoother. They do no change hyperparameters of the model, they change the function that is use to optimize parameters. Let's see.

### l2 regularization

Do you remember what metric we have been using when we had evaluated regression model. We have been using MSE:

$$MSE(\hat{Y}) = \frac{1}{N}\sum_i^N \left(y_i - \hat{y}_i\right)^2$$

At the same time this metric is used when we fit the training data to the model, that is when the train algorithm finds the best parameters. We have not been discussing how exactly it is done, but it is very important to understand that in the case of Linear regression this algorithm is trying to minimize the above function. In this context it is called __loss function__ or __objective function__. 

It can be a bit confusing so let's repeat it and write it down. While we train the linear model

$$\hat{y} = a_1x_1 + \ldots + a_nx_n + b$$

we call the method
```
reg.fit(X_train, y_train).
```
Then the algorithm behind the `fit` tries to find best parameters $a_1, \ldots, a_n, b$ such the the __loss function__ given by

$$l(\hat{Y}) = \frac{1}{N}\sum_i^N \left(y_i - \hat{y}_i\right)^2$$

is as small as possible. In other words it is finding global minimum.

In this case __loss function__ and __evaluation metrics__ are the same. But it does not have to be that way. It is sometimes not possible to have them being equal, since optimization algorithms often require the loss function to be differentiable. But it also often beneficial to have them different. The first example is $l_1$ regularization. In this case __loss function__ is defined as follows.

$$l(\hat{Y}) = \frac{1}{N}\sum_i^N \left(y_i - \hat{y}_i\right)^2 + \alpha \sum(a_1^2 + \ldots + a_n^2)$$

A linear model that use this loss function is also called __Ridge__ regression. Let's see how it works. In `sklearn` we have `Ridge` regression class. So what we will do is the following.
1. We will prepare data by taking only numerical columns that do not have not defined values.
2. We will do `train-dev-test` split.
3. We will train standard linear model with it.
4. We will train Ridge models using grid search to find the best alpha.
5. We compare them.

Later we move to $l_1$ regularization.

In [3]:
houses = pd.read_csv("data/house-prices-advanced-regression-techniques/train.csv", index_col="Id")
houses.columns.values

array(['MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu',
       'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars',
       'GarageArea', 'GarageQual', 'GarageCond', 'PavedDriv

### Selecting numerical columns

In [28]:
all_numeric_columns = ['MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd',   'MasVnrArea', 'BsmtFinSF1', 
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 
       'TotRmsAbvGrd',  'Fireplaces',  'GarageYrBlt', 'GarageCars',
       'GarageArea', 
       'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',
       'ScreenPorch', 'PoolArea', 
       'MiscVal', 'MoSold', 'YrSold', 
       'SalePrice']
all_non_numeric_columns= ['MSZoning', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond',
        'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
       'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
        'PoolQC', 'Fence', 'MiscFeature',
        'SaleType', 'SaleCondition'             ]
numeric_columns =  ['MSSubClass', 'LotArea', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 
       'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 
       'TotRmsAbvGrd',  'Fireplaces', 'GarageCars', 'GarageArea', 
       'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch',
       'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'SalePrice']

In [29]:
Xy = houses[numeric_columns]
target_col = "SalePrice"

### Detecting columns with not defined values and removing them

In [30]:
cols_with_na = []
for col in Xy.columns.values:
    nas = sum(Xy[col].isna())
    if nas > 0:
        cols_with_na.append(col)
        print(col, sum(Xy[col].isna()))
        
Xy = Xy.drop(cols_with_na, axis=1)

## Train-dev-test split

In [31]:
X = Xy.drop(target_col, axis=1)
y = Xy[target_col]

X_train_dev, X_test, y_train_dev, y_test = train_test_split(X, y, random_state=666, test_size=0.2)
X_train, X_dev, y_train, y_dev = train_test_split(X_train_dev, y_train_dev, random_state=667, test_size=0.25)

## Linear model

In [32]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(X_train, y_train)
y_dev_hat = reg.predict(X_dev)
print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)))

34385.234048404694


## Ridge Regression (l2-normalization)

In [33]:
from sklearn.linear_model import Ridge

for i in range(-10, 10):
    alpha = 2**i
    ridge_reg = Ridge(alpha=alpha)
    ridge_reg.fit(X_train, y_train)
    y_dev_hat = ridge_reg.predict(X_dev)
    print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)), alpha)

34385.21627188204 0.0009765625
34385.198497077225 0.001953125
34385.16295262101 0.00390625
34385.091884315094 0.0078125
34384.94983006951 0.015625
34384.666050567335 0.03125
34384.09980372219 0.0625
34382.97252847458 0.125
34380.73861305588 0.25
34376.35145764797 0.5
34367.8856120807 1
34352.0838653716 2
34324.302917764035 4
34280.00538001611 8
34218.279686735215 16
34144.49706742589 32
34073.99183046314 64
34054.336869484076 128
34206.535653918574 256
34711.584549416664 512


So the best model is the model with alpha 256 and it works a bit better than the standard linear model.

## Lasso Regression (l1 regularization)

In the case of $l_1$-regularization the __loss function__ is defined as follows.

$$l(\hat{Y}) = \frac{1}{N}\sum_i^N \left(y_i - \hat{y}_i\right)^2 + \alpha \sum(|a_1| + \ldots + |a_n|)$$

The linear model with this loss function is often called Lasso. It has the same name in `sklearn`.

In [34]:
from sklearn.linear_model import Lasso

for i in range(-10, 10):
    alpha = 2**i
    lasso_reg = Lasso(alpha=alpha)
    lasso_reg.fit(X_train, y_train)
    y_dev_hat = lasso_reg.predict(X_dev)
    print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)), alpha)

34385.232390226694 0.0009765625
34385.23073205115 0.001953125
34385.22741571128 0.00390625
34385.220783122604 0.0078125
34385.20751809857 0.015625
34385.18098887226 0.03125
34385.12793384828 0.0625
34385.021837840366 0.125
34384.80970238467 0.25
34384.3856600976 0.5
34383.53847847767 1
34381.84771210018 2
34378.480581855554 4
34371.80402161136 8
34358.681686191914 16
34333.360856228275 32
34286.429998580745 64
34223.5487321279 128
34173.03689578013 256
34221.15737373851 512


Here the best parameter seems to be 128. 

### l1 regularization nulls the coefficients 

There is an interesting effect when of l1 regularization. To observe it, let's train the model with high alpha = 100000. Then let's look at coefficients. One can observe that most of them where actually nulled.

In [35]:
lasso_reg = Lasso(alpha=100000)
lasso_reg.fit(X_train, y_train)
lasso_reg.coef_

array([-5.32584635e+01,  1.84814242e-01,  0.00000000e+00,  0.00000000e+00,
        4.61600753e+02,  3.77706089e+02,  1.25281624e+01,  0.00000000e+00,
       -0.00000000e+00,  2.42549189e+01,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  6.60391554e+01,  0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  5.98967107e+01,
        3.91127200e+01, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        6.25981928e+01, -1.63844160e+02, -5.90782181e+00,  0.00000000e+00,
       -0.00000000e+00])

## Elastic Net: both regularization at the same time 

We can also have them both at the same time. In this case the __loss function__ is defined as follows.

$$l(\hat{Y}) = \frac{1}{N}\sum_i^N \left(y_i - \hat{y}_i\right)^2 + 
\alpha_1 \sum(|a_1| + \ldots + |a_n|)+
\alpha_1 \sum((a_1)^2 + \ldots + (a_n)^2)
$$

The linear model with this loss function is often called Elastic Net. 

It has the name `ElasticNet` in `sklearn`. It also has two parameters, but they are not $\alpha_1$ and $\alpha_2$. They are called `alpha` and `l1_ratio`. Then the relation between them is the following:

$$\alpha_1 = \texttt{alpha}\cdot\texttt{l1_ratio}$$

$$\alpha_2 = \texttt{alpha}\cdot(\texttt{1 - l1_ratio})$$

The following code shows how to use `ElasticNet` class from `sklearn`.

In [None]:
from sklearn.linear_model import ElasticNet
import warnings
warnings.filterwarnings("ignore")
alpha_ratio_score = []

for i in range(-10, 10):
    for j in range(20):
        alpha = 2**i
        l1_ratio = 0.6**j
        en_reg = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
        en_reg.fit(X_train, y_train)
        y_dev_hat = en_reg.predict(X_dev)
        alpha_ratio_score.append([alpha, l1_ratio, np.sqrt(mean_squared_error(y_dev, y_dev_hat)), alpha, l1_ratio])
        
scores = pd.DataFrame({
    "alpha": [ars[0] for ars in alpha_ratio_score],
    "ratio": [ars[1] for ars in alpha_ratio_score],
    "score": [ars[2] for ars in alpha_ratio_score]
})

In [None]:
scores.sort_values('score').head()

In [40]:
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

def plot_corr_matrix(Xy):
    corr_matrix = Xy.corr()
    trace = go.Heatmap(z=corr_matrix)
    data=[trace]
    iplot(data, filename='basic-heatmap')
    return corr_matrix
    
corr_matrix = plot_corr_matrix(Xy)

In [39]:
correlated_with_target = list(corr_matrix[target_col][np.abs(corr_matrix[target_col] >= 0.3)].index)
correlated_with_target

['OverallQual',
 'YearBuilt',
 'YearRemodAdd',
 'BsmtFinSF1',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'GrLivArea',
 'FullBath',
 'TotRmsAbvGrd',
 'Fireplaces',
 'GarageCars',
 'GarageArea',
 'WoodDeckSF',
 'OpenPorchSF',
 'SalePrice']

In [20]:
reg.fit(X_train[correlated_with_target[:-1]], y_train)
y_dev_hat = reg.predict(X_dev[correlated_with_target[:-1]])
print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)))

33607.99679829484


In [21]:
corr_matrix = plot_corr_matrix(Xy[correlated_with_target])

In [17]:
correlated_with_target.pop(7)
reg.fit(X_train[correlated_with_target[:-1]], y_train)
y_dev_hat = reg.predict(X_dev[correlated_with_target[:-1]])
print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)))

33589.51098495284


In [18]:
from sklearn.linear_model import ElasticNet

alpha_ratio_score = []

for i in range(-10, 10):
    for j in range(20):
        alpha = 2**i
        l1_ratio = 0.6**j
        en_reg = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
        en_reg.fit(X_train[correlated_with_target[:-1]], y_train)
        y_dev_hat = en_reg.predict(X_dev[correlated_with_target[:-1]])
        alpha_ratio_score.append([alpha, l1_ratio, np.sqrt(mean_squared_error(y_dev, y_dev_hat)), alpha, l1_ratio])
        
scores = pd.DataFrame({
    "alpha": [ars[0] for ars in alpha_ratio_score],
    "ratio": [ars[1] for ars in alpha_ratio_score],
    "score": [ars[2] for ars in alpha_ratio_score]
})


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.



In [19]:
scores.sort_values("score").head()

Unnamed: 0,alpha,ratio,score
179,0.25,6.1e-05,32896.113122
178,0.25,0.000102,32896.115009
177,0.25,0.000169,32896.118157
176,0.25,0.000282,32896.123408
175,0.25,0.00047,32896.132175


In [20]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler = StandardScaler()
#scaler = MinMaxScaler()
reg = LinearRegression()
scaler.fit(X_train)


X_train_scaled = scaler.transform(X_train)
reg.fit(X_train_scaled, y_train)
y_dev_hat = reg.predict(scaler.transform(X_dev))
print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)))

34383.590182169915



Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.



In [21]:
from sklearn.linear_model import ElasticNet

alpha_ratio_score = []

for i in range(-10, 10):
    for j in range(20):
        alpha = 2**i
        l1_ratio = 0.6**j
        en_reg = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
        en_reg.fit(scaler.transform(X_train), y_train)
        y_dev_hat = en_reg.predict(scaler.transform(X_dev))
        alpha_ratio_score.append([alpha, l1_ratio, np.sqrt(mean_squared_error(y_dev, y_dev_hat)), alpha, l1_ratio])
        
scores = pd.DataFrame({
    "alpha": [ars[0] for ars in alpha_ratio_score],
    "ratio": [ars[1] for ars in alpha_ratio_score],
    "score": [ars[2] for ars in alpha_ratio_score]
})


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Da


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. 


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. 


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. 


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. 


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Objective did not converge. You might want to increase the number of iterations. 


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to float64 by StandardScaler.


Data with input dtype int64 were all converted to f

In [22]:
scores.sort_values("score").head()

Unnamed: 0,alpha,ratio,score
380,512.0,1.0,34036.686811
360,256.0,1.0,34149.917909
144,0.125,0.1296,34201.192404
161,0.25,0.6,34201.389109
143,0.125,0.216,34201.77097


In [23]:
reg_params = pd.DataFrame({
    "column": X_train.columns.values,
    "params": reg.coef_})

reg_params['params_abs'] = np.abs(reg.coef_)

In [24]:
important_columns = reg_params.sort_values("params_abs", ascending=False)[:20]["column"]

In [25]:
reg.fit(X_train[important_columns], y_train)
y_dev_hat = reg.predict(X_dev[important_columns])
print(np.sqrt(mean_squared_error(y_dev, y_dev_hat)))

34462.240602423946
