# Linear regression model

## Answer the questions
   

> Derive an analytical solution to the regression problem. Use a vector form of the equation.

$w=(X^TX)^{-1}X^Ty$ 

> What changes in the solution when L1 and L2 regularizations are added to the loss function.

Ridge regression
$w=(X^TX+λI)^{-1} X^Ty$
Lasso is non differentiable need to use gradient decent for optimization

> Explain why L1 regularization is often used to select features. Why are there many weights equal to 0 after the model is fit?

L1 is used as it is able to select the most important features by shrinking, meaning reducing coffients to 0. The level of feature selection is based on the lambda. This could be due to low impact or multicollinearity.

> Explain how you can use the same models (Linear regression, Ridge, etc.) but make it possible to fit nonlinear dependencies.

It is possible to fit but it is not optimal as it will assume linearity and the model will not be accurate. To imprve accuracy it is possible to make the regresion quadratic or normalizing it using logarithms or any other forms of altering x.

## Introduction — make all the preprocessing staff from the previous lesson
>Import libraries. 

In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
from sklearn.model_selection import train_test_split
from numpy import random
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.metrics import r2_score, mean_absolute_error, root_mean_squared_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler, PolynomialFeatures

>Read Train and Test Parts.

In [62]:
data = pd.read_json('data/train.json')
upper_perc = np.percentile(data['price'], 99)
lower_perc = np.percentile(data['price'], 1)

data = data[(data['price'] < upper_perc) & (data['price'] > lower_perc)]
data = data.reset_index(drop=True)

>Preprocess "Interest Level" feature.

In [63]:
print(data['interest_level'].value_counts())
data['interest_level'] = data['interest_level'].map({'low': 0, 'medium': 1, 'high': 2})
print(data['interest_level'].value_counts())

interest_level
low       33672
medium    11114
high       3557
Name: count, dtype: int64
interest_level
0    33672
1    11114
2     3557
Name: count, dtype: int64


## Intro data analysis part 2
> Let's generate additional features for better model quality. Consider a column called "Features". It consists of a list of highlights of the current flat. 


> Remove unused symbols ([,], ', ", and space) from the column.


In [64]:
data['features'] = data['features'].astype(str).str.replace(r"[\[\]'\"\s]", "", regex=True)


In [65]:
data['features']

0        DiningRoom,Pre-War,LaundryinBuilding,Dishwashe...
1        Doorman,Elevator,LaundryinBuilding,Dishwasher,...
2        Doorman,Elevator,LaundryinBuilding,LaundryinUn...
3                                                         
4         Doorman,Elevator,FitnessCenter,LaundryinBuilding
                               ...                        
48338                   Elevator,Dishwasher,HardwoodFloors
48339    CommonOutdoorSpace,CatsAllowed,DogsAllowed,NoF...
48340    DiningRoom,Elevator,Pre-War,LaundryinBuilding,...
48341    Pre-War,LaundryinUnit,Dishwasher,NoFee,Outdoor...
48342    DiningRoom,Elevator,LaundryinBuilding,Dishwash...
Name: features, Length: 48343, dtype: object

> Split values in each row with the separator "," and collect the result in one huge list for the whole dataset. You can use DataFrame.iterrows().


In [66]:
all_features = []
for index, row in data.iterrows():
    features = row['features'].split(',')
    all_features.extend(features)

> How many unique values does a result list contain?


In [67]:
print(len(set(all_features)))

1530


> Let's get acquainted with the new library — Collections. With this package you could effectively get quantity statistics about your data. 


> Count the most popular functions from our huge list and take the top 20 for this moment.


In [68]:
print(Counter(all_features).most_common(21))

[('Elevator', 25375), ('HardwoodFloors', 23146), ('CatsAllowed', 23135), ('DogsAllowed', 21652), ('Doorman', 20479), ('Dishwasher', 20081), ('NoFee', 17793), ('LaundryinBuilding', 16082), ('FitnessCenter', 12989), ('Pre-War', 8971), ('LaundryinUnit', 8437), ('RoofDeck', 6417), ('OutdoorSpace', 5132), ('DiningRoom', 4890), ('HighSpeedInternet', 4223), ('', 3106), ('Balcony', 2898), ('SwimmingPool', 2643), ('LaundryInBuilding', 2564), ('NewConstruction', 2504), ('Terrace', 2177)]


> If everything is correct, you should get next values:  'Elevator', 'HardwoodFloors', 'CatsAllowed', 'DogsAllowed', 'Doorman', 'Dishwasher', 'NoFee', 'LaundryinBuilding', 'FitnessCenter', 'Pre-War', 'LaundryinUnit', 'RoofDeck', 'OutdoorSpace', 'DiningRoom', 'HighSpeedInternet', 'Balcony', 'SwimmingPool', 'LaundryInBuilding', 'NewConstruction', 'Terrace'.


> Now create 20 new features based on the top 20 values: 1 if the value is in the "Feature" column, otherwise 0.


In [69]:
top20_features = ['Elevator', 'HardwoodFloors', 'CatsAllowed', 'DogsAllowed', 'Doorman', 'Dishwasher', 'NoFee', 'LaundryinBuilding', 'FitnessCenter', 'Pre-War', 'LaundryinUnit', 'RoofDeck', 'OutdoorSpace', 'DiningRoom', 'HighSpeedInternet', 'Balcony', 'SwimmingPool', 'LaundryInBuilding', 'NewConstruction', 'Terrace']

In [70]:
def create_binary_features(features, top_features):
    feature_set = set(features.split(','))
    return [1 if feature in feature_set else 0 for feature in top_features]

for feature in top20_features:
    data[feature] = data['features'].apply(lambda x: create_binary_features(x, [feature])[0])

> Extend our feature set with 'bathrooms', 'bedrooms', 'interest_level' and create a special variable feature_list with all feature names. Now we have 23 values. All models should be trained on these 23 features.



In [71]:
X = data[['bathrooms', 'bedrooms', 'interest_level', 'Elevator', 'HardwoodFloors', 'CatsAllowed', 'DogsAllowed', 'Doorman', 'Dishwasher', 'NoFee', 'LaundryinBuilding', 'FitnessCenter', 'Pre-War', 'LaundryinUnit', 'RoofDeck', 'OutdoorSpace', 'DiningRoom', 'HighSpeedInternet', 'Balcony', 'SwimmingPool', 'LaundryInBuilding', 'NewConstruction', 'Terrace']]
y = data['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=21)

In [72]:
X_train.head()

Unnamed: 0,bathrooms,bedrooms,interest_level,Elevator,HardwoodFloors,CatsAllowed,DogsAllowed,Doorman,Dishwasher,NoFee,...,LaundryinUnit,RoofDeck,OutdoorSpace,DiningRoom,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace
43187,1.0,1,0,1,1,1,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
26391,1.0,1,0,0,0,1,1,1,0,0,...,0,0,0,0,0,0,0,0,0,0
38916,1.0,2,0,1,0,1,1,1,0,0,...,0,0,1,0,0,0,0,0,0,1
3760,2.0,2,0,1,1,0,0,1,1,1,...,0,1,0,1,0,0,1,0,0,1
19124,1.0,2,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Models implementation — Linear regression
   > Implement a Python class for a linear regression algorithm with two basic methods — fit and predict. Use stochastic gradient descent to find optimal model weights. For better understanding, we recommend implementing separate versions of the algorithm with the analytical solution and non-stochastic gradient descent under the hood.


In [73]:
class AnalytLinReg():
    def __init__(self):
        pass

    def fit(self, X, y):
        x_matrix = X.values
        y_matrix = y.values

        x_matrix = np.c_[np.ones((x_matrix.shape[0], 1)), x_matrix]
        weight = np.dot(np.linalg.inv(np.dot(x_matrix.T, x_matrix)), np.dot(x_matrix.T ,y_matrix))
        self.w = weight[1:]
        self.b = weight[0]
        
    def predict(self, X):
        X = X.values
        return np.dot(X, self.w) +self.b

In [74]:
class GradDescLinReg():
    def __init__(self):
        pass

    def fit(self, X, y, learning_rate = 0.01, iter = 100000):
        x_matrix = X_train.values
        y_matrix  = y_train.values
        n_samples, n_features = x_matrix.shape

        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(iter):
            rand = random.randint(0, n_samples - 1)

            pred = np.dot(x_matrix[rand], self.w) + self.b 
            error = y_matrix[rand] - pred
            
            rss_w = -2 * x_matrix[rand] * error
            rss_b = -2 * error

            self.w -= learning_rate * rss_w
            self.b -= learning_rate * rss_b  
                
    def predict(self, X):
        X = X.values
        return np.dot(X, self.w) + self.b
    

> Define the R squared (R2) coefficient and implement a function to calculate it.


R squared coefficient is a measure to see the varience in the Y based on the X. It ranges from 0 to 1, with 1 meaning perfect fit of the model to the data.<br>
It is calculated using $1-RSS/TSS$

In [75]:
def Rsquared(y, pred):
    rss = np.sum((y - pred)**2)
    tss = np.sum((y - np.mean(y))**2)

    return 1 - (rss/tss)

> Make predictions with your algorithm and estimate the model with MAE, RMSE and R2 metrics.


In [76]:
my_reg = AnalytLinReg()
my_reg.fit(X_train, y_train)
my_train_pred = my_reg.predict(X_train)
my_test_pred = my_reg.predict(X_test)

In [77]:
my_rsq_train = Rsquared(y_train, my_train_pred)
my_mae_train = mean_absolute_error(y_train, my_train_pred)
my_rmse_train = root_mean_squared_error(y_train, my_train_pred)

my_rsq_test = Rsquared(y_test, my_test_pred)
my_mae_test = mean_absolute_error(y_test, my_test_pred)
my_rmse_test = root_mean_squared_error(y_test, my_test_pred)


> Initialize LinearRegression() from sklearn.linear_model, fit the model, and predict the training and test parts as in the previous lesson.


In [78]:
reg = LinearRegression()
reg.fit(X_train, y_train)
train_pred = reg.predict(X_train)
test_pred = reg.predict(X_test)

> Compare the quality metrics and make sure the difference is small (between your implementations and sklearn).


In [79]:
rsq_train = Rsquared(y_train, train_pred)
mae_train = mean_absolute_error(y_train, train_pred)
rmse_train = root_mean_squared_error(y_train, train_pred)

rsq_test = Rsquared(y_test, test_pred)
mae_test = mean_absolute_error(y_test, test_pred)
rmse_test = root_mean_squared_error(y_test, test_pred)


In [80]:
print(f'My implementation of Linear regression train metrics: R2 {my_rsq_train:.5f}, MAE {my_mae_train:.2f}, RMSE {my_rmse_train:.2f}')
print(f'My implementation of Linear regression test metrics: R2 {my_rsq_test:.5f}, MAE {my_mae_test:.2f}, RMSE {my_rmse_test:.2f}')
print(f'Linear regression train metrics: R2 {rsq_train:.5f}, MAE {mae_train:.2f}, RMSE {rmse_train:.2f}')
print(f'Linear regression test metrics: R2 {rsq_test:.5f}, MAE {mae_test:.2f}, RMSE {rmse_test:.2f}')

My implementation of Linear regression train metrics: R2 0.60206, MAE 687.03, RMSE 996.99
My implementation of Linear regression test metrics: R2 0.61650, MAE 689.01, RMSE 994.58
Linear regression train metrics: R2 0.60206, MAE 687.03, RMSE 996.99
Linear regression test metrics: R2 0.61650, MAE 689.01, RMSE 994.58


> Store the metrics as in the previous lesson in a table with columns model, train, test for MAE table, RMSE table, and R2 coefficient.



In [81]:
result_R2 = pd.DataFrame(columns=['model', 'train', 'test'])
result_MAE = pd.DataFrame(columns=['model', 'train', 'test'])
result_RMSE = pd.DataFrame(columns=['model', 'train', 'test'])

In [82]:
def metrics(model, model_name, X_train = X_train, y_train = y_train, X_test = X_test, y_test = y_test):
    model.fit(X_train, y_train)
    train_pred = model.predict(X_train)
    test_pred = model.predict(X_test)

    rsq_train = Rsquared(y_train, train_pred)
    mae_train = mean_absolute_error(y_train, train_pred)
    rmse_train = root_mean_squared_error(y_train, train_pred)

    rsq_test = Rsquared(y_test, test_pred)
    mae_test = mean_absolute_error(y_test, test_pred)
    rmse_test = root_mean_squared_error(y_test, test_pred)

    result_R2.loc[len(result_R2)] = [model_name, rsq_train, rsq_test]
    result_MAE.loc[len(result_MAE)] = [model_name, mae_train, mae_test]
    result_RMSE.loc[len(result_RMSE)] = [model_name, rmse_train, rmse_test]

    return rsq_train, mae_train, rmse_train, rsq_test, mae_test, rmse_test

In [83]:
metrics(my_reg, 'My Linear Regression')
metrics(reg, 'Linear Regression')

(0.6020619693126079,
 687.0304713287287,
 996.9914127231038,
 0.616495015905854,
 689.0101553085179,
 994.5812050999856)

In [84]:
print(result_R2)

                  model     train      test
0  My Linear Regression  0.602062  0.616495
1     Linear Regression  0.602062  0.616495


## Regularized models implementation — Ridge, Lasso, ElasticNet    
> Implement Ridge, Lasso, ElasticNet algorithms: extend the loss function with L2, L1 and both regularizations accordingly.


In [85]:
class myRidge():
    def __init__(self, alpha):
        self.alpha = alpha

    def fit(self, X, y):
        x_matrix = X.values
        y_matrix  = y.values
        x_matrix = np.c_[np.ones((x_matrix.shape[0], 1)), x_matrix]
        I_matrix = np.identity(x_matrix.shape[1])
        weight = np.dot(np.linalg.inv(np.dot(x_matrix.T, x_matrix) + np.dot(self.alpha, I_matrix)), np.dot(x_matrix.T ,y_matrix))
        self.w = weight[1:]
        self.b = weight[0]
                
    def predict(self, X):
        X = X.values
        return np.dot(X, self.w) +self.b

In [86]:
class myLasso():
    def __init__(self, alpha):
        self.alpha = alpha

    def fit(self, X, y, learning_rate = 0.01, iter = 100000):
        x_matrix = X_train.values
        y_matrix  = y_train.values
        n_samples, n_features = x_matrix.shape

        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(iter):
            rand = random.randint(0, n_samples - 1)

            pred = np.dot(x_matrix[rand], self.w) + self.b 
            error = y_matrix[rand] - pred
            
            rss_w = -2 * x_matrix[rand] * error + self.alpha * np.sign(self.w)
            rss_b = -2 * error

            self.w -= learning_rate * rss_w
            self.b -= learning_rate * rss_b  
                
    def predict(self, X):
        X = X.values
        return np.dot(X, self.w) + self.b

In [87]:
class myElasticNet():
    def __init__(self, alpha, l1_ratio):
        self.alpha = alpha
        self.l1_ratio = l1_ratio

    def fit(self, X, y, learning_rate = 0.01, iter = 100000):
        x_matrix = X_train.values
        y_matrix  = y_train.values
        n_samples, n_features = x_matrix.shape

        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(iter):
            rand = random.randint(0, n_samples - 1)

            pred = np.dot(x_matrix[rand], self.w) + self.b
            error = y_matrix[rand] - pred
            
            rss_w = (-2 * x_matrix[rand] * error) + (self.alpha * self.l1_ratio * np.sign(self.w)) + \
                (self.alpha * (1 - self.l1_ratio) * self.w)
            rss_b = -2 * error

            self.w -= learning_rate * rss_w
            self.b -= learning_rate * rss_b

    def predict(self, X):
        X = X.values
        return np.dot(X, self.w) + self.b

> Make predictions with your algorithm and estimate the model with MAE, RMSE and R2 metrics.


In [88]:
my_ridge = myRidge(alpha = 1)
my_ridge_r2_train, my_ridge_mae_train, my_ridge_rmse_train, \
    my_ridge_r2_test, my_ridge_mae_test, my_ridge_rmse_test = metrics(my_ridge, 'My Ridge')

In [89]:
my_lasso = myLasso(alpha = 0.1)
my_lasso_r2_train, my_lasso_mae_train, my_lasso_rmse_train, \
    my_lasso_r2_test, my_lasso_mae_test, my_lasso_rmse_test = metrics(my_lasso, 'My Lasso')

In [90]:
my_elastic_net = myElasticNet(alpha=0.1, l1_ratio=0.5)
my_elastic_net_r2_train, my_elastic_net_mae_train, my_elastic_net_rmse_train, \
    my_elastic_net_r2_test, my_elastic_net_mae_test, my_elastic_net_rmse_test = metrics(my_elastic_net, 'My Elastic Net')

> Initialize Ridge(), Lasso(), and ElasticNet() from sklearn.linear_model, fit the model, and make predictions for the training and test samples as in the previous lesson.


In [91]:
ridge = Ridge(alpha = 1)
ridge_r2_train, ridge_mae_train, ridge_rmse_train, \
    ridge_r2_test, ridge_mae_test, ridge_rmse_test = metrics(ridge, 'Ridge')

In [92]:
# 1.89 is alpha with lowest MAE
lasso = Lasso(alpha = 0.1)
lasso_r2_train, lasso_mae_train, lasso_rmse_train, \
    lasso_r2_test, lasso_mae_test, lasso_rmse_test = metrics(lasso, 'Lasso')

In [93]:
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net_r2_train, elastic_net_mae_train, elastic_net_rmse_train, \
    elastic_net_r2_test, elastic_net_mae_test, elastic_net_rmse_test = metrics(elastic_net, 'Elastic Net')

> Compare quality metrics and make sure the difference is small (between your implementations and sklearn).


In [94]:
print(f'My implementation of Ridge train metrics: R2 {my_ridge_r2_train:.5f}, MAE {my_ridge_mae_train:.2f}, RMSE {my_ridge_rmse_train:.2f}')
print(f'My implementation of Ridge test metrics: R2 {my_ridge_r2_test:.5f}, MAE {my_ridge_mae_test:.2f}, RMSE {my_ridge_rmse_test:.2f}')
print(f'Ridge train metrics: R2 {ridge_r2_train:.5f}, MAE {ridge_mae_train:.2f}, RMSE {ridge_rmse_train:.2f}')
print(f'Ridge test metrics: R2 {ridge_r2_test:.5f}, MAE {ridge_mae_test:.2f}, RMSE {ridge_rmse_test:.2f}')

My implementation of Ridge train metrics: R2 0.60206, MAE 687.02, RMSE 996.99
My implementation of Ridge test metrics: R2 0.61649, MAE 689.00, RMSE 994.58
Ridge train metrics: R2 0.60206, MAE 687.03, RMSE 996.99
Ridge test metrics: R2 0.61649, MAE 689.01, RMSE 994.58


In [95]:
print(f'My implementation of Lasso train metrics: R2 {my_lasso_r2_train:.5f}, MAE {my_lasso_mae_train:.2f}, RMSE {my_lasso_rmse_train:.2f}')
print(f'My implementation of Lasso test metrics: R2 {my_lasso_r2_test:.5f}, MAE {my_lasso_mae_test:.2f}, RMSE {my_lasso_rmse_test:.2f}')
print(f'Lasso train metrics: R2 {lasso_r2_train:.5f}, MAE {lasso_mae_train:.2f}, RMSE {lasso_rmse_train:.2f}')
print(f'Lasso test metrics: R2 {lasso_r2_test:.5f}, MAE {lasso_mae_test:.2f}, RMSE {lasso_rmse_test:.2f}')

My implementation of Lasso train metrics: R2 0.39467, MAE 956.41, RMSE 1229.64
My implementation of Lasso test metrics: R2 0.41179, MAE 965.51, RMSE 1231.75
Lasso train metrics: R2 0.60206, MAE 686.95, RMSE 996.99
Lasso test metrics: R2 0.61648, MAE 688.96, RMSE 994.60


In [96]:
print(f'My implementation of Elastic Net train metrics: R2 {my_elastic_net_r2_train:.5f}, MAE {my_elastic_net_mae_train:.2f}, RMSE {my_elastic_net_rmse_train:.2f}')
print(f'My implementation of Elastic Net test metrics: R2 {my_elastic_net_r2_test:.5f}, MAE {my_elastic_net_mae_test:.2f}, RMSE {my_elastic_net_rmse_test:.2f}')
print(f'Elastic Net train metrics: R2 {elastic_net_r2_train:.5f}, MAE {elastic_net_mae_train:.2f}, RMSE {elastic_net_rmse_train:.2f}')
print(f'Elastic Net test metrics: R2 {elastic_net_r2_test:.5f}, MAE {elastic_net_mae_test:.2f}, RMSE {elastic_net_rmse_test:.2f}')

My implementation of Elastic Net train metrics: R2 0.55936, MAE 717.82, RMSE 1049.12
My implementation of Elastic Net test metrics: R2 0.57191, MAE 719.56, RMSE 1050.80
Elastic Net train metrics: R2 0.59212, MAE 689.40, RMSE 1009.37
Elastic Net test metrics: R2 0.60360, MAE 693.18, RMSE 1011.17


## Feature normalization


> First, write several examples of why and where feature normalization is mandatory and vice versa.


1. The variance of the feature data is high, feature with higher scales dominate feature with lower scales
2. different scales 
3. It is not needed for binary data as scales are the same

> Let's consider the first of the classical normalization methods — MinMaxScaler. Write a mathematical formula for this method.


$X = \frac{X - min(X)}{max(X)-min(X)}$

> Implement your own function for MinMaxScaler feature normalization.


In [97]:
def myMinMaxScaler(X, feature_range = (0, 1), axis = None):
    X = X.values
    a, b = feature_range
    maxval = np.max(X, axis=0)
    minval = np.min(X, axis=0)

    scaled_X = a + (X - minval) * (b - a) / (maxval - minval)

    return scaled_X

> Initialize MinMaxScaler() from sklearn.preprocessing.


In [98]:
minmax_scaler = MinMaxScaler()

> Compare the feature normalization with your own method and with sklearn.


In [99]:
minmax_scaler.fit_transform(X_train)

array([[0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.25 , 0.   , ..., 0.   , 0.   , 1.   ],
       ...,
       [0.1  , 0.25 , 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.   , 0.   , ..., 0.   , 0.   , 0.   ]])

In [100]:
myMinMaxScaler(X_train)

array([[0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.25 , 0.   , ..., 0.   , 0.   , 1.   ],
       ...,
       [0.1  , 0.25 , 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.125, 0.   , ..., 0.   , 0.   , 0.   ],
       [0.1  , 0.   , 0.   , ..., 0.   , 0.   , 0.   ]])

> Repeat the steps from b to e for another normalization method StandardScaler.



$X = \frac{X - m}{std}$

In [101]:
def myStandardScaler(X):
    X = X.values
    m = np.mean(X, axis=0)
    std = np.std(X, axis=0)

    scaled_X = (X - m)/std

    return scaled_X

In [102]:
SS = StandardScaler()
SS.fit_transform(X_train)

array([[-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754,  0.42436052, -0.61022649, ..., -0.23377249,
        -0.23475508,  4.58381752],
       ...,
       [-0.42709754,  0.42436052, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -1.39764546, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877]])

In [103]:
myStandardScaler(X_train)

array([[-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754,  0.42436052, -0.61022649, ..., -0.23377249,
        -0.23475508,  4.58381752],
       ...,
       [-0.42709754,  0.42436052, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -0.48664247, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877],
       [-0.42709754, -1.39764546, -0.61022649, ..., -0.23377249,
        -0.23475508, -0.21815877]])

## Fit models with normalization


> Fit all models — Linear Regression, Ridge, Lasso, and ElasticNet — with MinMaxScaler.


In [104]:
minmax_X_train = minmax_scaler.fit_transform(X_train)
minmax_X_test = minmax_scaler.transform(X_test)

metrics(reg, 'Linear Regression MinMaxScaler', minmax_X_train, y_train, minmax_X_test, y_test)
metrics(ridge, 'Ridge MinMaxScaler', X_train=minmax_X_train, y_train=y_train, X_test = minmax_X_test, y_test = y_test)
metrics(lasso, 'Lasso MinMaxScaler', X_train=minmax_X_train, y_train=y_train, X_test=minmax_X_test, y_test=y_test)
metrics(elastic_net, 'Elastic Net MinMaxScaler', minmax_X_train, y_train, minmax_X_test, y_test)

(0.33370615159549954,
 894.606791001895,
 1290.0798628820205,
 0.33845432858818636,
 906.6991178397658,
 1306.2751280506643)

In [105]:
result_MAE

Unnamed: 0,model,train,test
0,My Linear Regression,687.030471,689.010155
1,Linear Regression,687.030471,689.010155
2,My Ridge,687.022686,689.004305
3,My Lasso,956.4072,965.507767
4,My Elastic Net,717.818027,719.555826
5,Ridge,687.026383,689.007983
6,Lasso,686.954576,688.960619
7,Elastic Net,689.400127,693.175419
8,Linear Regression MinMaxScaler,687.030471,689.010155
9,Ridge MinMaxScaler,687.156337,689.224459


> Fit all models — Linear Regression, Ridge, Lasso, and ElasticNet — with StandardScaler.


In [106]:
SS_X_train = SS.fit_transform(X_train)
SS_X_test = SS.transform(X_test)

metrics(reg, 'Linear Regression StandardScaler', SS_X_train, y_train, SS_X_test, y_test)
metrics(ridge, 'Ridge StandardScaler', SS_X_train, y_train, SS_X_test, y_test)
metrics(lasso, 'Lasso StandardScaler', SS_X_train, y_train, SS_X_test, y_test)
metrics(elastic_net, 'Elastic Net StandardScaler', SS_X_train, y_train, SS_X_test, y_test)

(0.6013633493931274,
 685.5754624907606,
 997.8661879055473,
 0.6153043411400323,
 688.1740607793716,
 996.1239555821163)

## Overfit models
> Let's look at an overfitted model in practice. From theory, you know that polynomial regression is easy to overfit. So let's create a toy example and see how regularization works in real life.<br>
> In the previous lesson, we created polynomial features with degree 10. Here we repeat these steps from the previous lesson, remembering that we have only 3 basic features — 'bathrooms', 'bedrooms', ''interest_level'.

In [107]:
poly = PolynomialFeatures(degree=10)
basic_X_train = X_train[['bathrooms', 'bedrooms', 'interest_level']]
basic_X_test = X_test[['bathrooms', 'bedrooms', 'interest_level']]

In [108]:
poly_X_train = poly.fit_transform(basic_X_train)
poly_X_test = poly.transform(basic_X_test)

> And train and fit all our implemented algorithms — Linear Regression, Ridge, Lasso, and ElasticNet — on a set of polynomial features.


In [109]:
metrics(reg, 'Linear Regression PolynomialFeatures', poly_X_train, y_train, poly_X_test, y_test)
metrics(ridge, 'Ridge PolynomialFeatures', poly_X_train, y_train, poly_X_test, y_test)
metrics(lasso, 'Lasso PolynomialFeatures', poly_X_train, y_train, poly_X_test, y_test)
metrics(elastic_net, 'Elastic Net PolynomialFeatures', poly_X_train, y_train, poly_X_test, y_test)

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


(0.5708453266841035,
 725.2751966443968,
 1035.3581764404166,
 0.5405532141567237,
 736.6915405476283,
 1088.6100780446063)

> Analyze the results and select the best model according to your opinion.


In [110]:
result_R2

Unnamed: 0,model,train,test
0,My Linear Regression,0.602062,0.616495
1,Linear Regression,0.602062,0.616495
2,My Ridge,0.602062,0.616493
3,My Lasso,0.394672,0.411789
4,My Elastic Net,0.559362,0.571913
5,Ridge,0.602062,0.616493
6,Lasso,0.60206,0.616477
7,Elastic Net,0.592117,0.603598
8,Linear Regression MinMaxScaler,0.602062,0.616495
9,Ridge MinMaxScaler,0.602024,0.616333


In [111]:
result_MAE

Unnamed: 0,model,train,test
0,My Linear Regression,687.030471,689.010155
1,Linear Regression,687.030471,689.010155
2,My Ridge,687.022686,689.004305
3,My Lasso,956.4072,965.507767
4,My Elastic Net,717.818027,719.555826
5,Ridge,687.026383,689.007983
6,Lasso,686.954576,688.960619
7,Elastic Net,689.400127,693.175419
8,Linear Regression MinMaxScaler,687.030471,689.010155
9,Ridge MinMaxScaler,687.156337,689.224459


In [112]:
result_RMSE

Unnamed: 0,model,train,test
0,My Linear Regression,996.991413,994.581205
1,Linear Regression,996.991413,994.581205
2,My Ridge,996.991418,994.583214
3,My Lasso,1229.64358,1231.746481
4,My Elastic Net,1049.118881,1050.801738
5,Ridge,996.991419,994.58412
6,Lasso,996.993471,994.604735
7,Elastic Net,1009.372018,1011.165845
8,Linear Regression MinMaxScaler,996.991413,994.581205
9,Ridge MinMaxScaler,997.038846,994.791755


Lasso MinMaxScaler seems to be the best model as on average shows the best results on test dataset

## Native models
> Calculate the mean and median metrics from the previous lesson and add the results to the final dataframe.



In [113]:
mean_train = [y_train.mean()]*len(y_train)
mean_test =[y_test.mean()]*len(y_test)
median_train = [y_train.median()]*len(y_train)
median_test = [y_test.median()]*len(y_test)

r2_train_mean = r2_score(y_train, mean_train)
r2_test_mean = r2_score(y_test, mean_test)
r2_train_median = r2_score(y_train, median_train)
r2_test_median = r2_score(y_test, median_test)

mae_train_mean = mean_absolute_error(y_train, mean_train)
mae_test_mean = mean_absolute_error(y_test, mean_test)
mae_train_median = mean_absolute_error(y_train, median_train)
mae_test_median = mean_absolute_error(y_test, median_test)

rmse_train_median = root_mean_squared_error(y_train, median_train)
rmse_train_mean = root_mean_squared_error(y_train, mean_train)
rmse_test_median = root_mean_squared_error(y_test, median_test)
rmse_test_mean = root_mean_squared_error(y_test, mean_test)

result_R2.loc[len(result_R2)] = ['Native_Median', r2_train_mean, r2_test_mean]
result_MAE.loc[len(result_MAE)] = ['Native_Mean', mae_train_mean, mae_test_mean]
result_RMSE.loc[len(result_RMSE)] = ['Native_Mean', rmse_train_mean, rmse_test_mean]
result_R2.loc[len(result_R2)] = ['Native_Median', r2_train_median, r2_test_median]
result_MAE.loc[len(result_MAE)] = ['Native_Median', mae_train_median, mae_test_median]
result_RMSE.loc[len(result_RMSE)] = ['Native_Median', rmse_train_median, rmse_test_median]

## Compare results
> Print your final tables


In [114]:
result_R2

Unnamed: 0,model,train,test
0,My Linear Regression,0.602062,0.616495
1,Linear Regression,0.602062,0.616495
2,My Ridge,0.602062,0.616493
3,My Lasso,0.394672,0.411789
4,My Elastic Net,0.559362,0.571913
5,Ridge,0.602062,0.616493
6,Lasso,0.60206,0.616477
7,Elastic Net,0.592117,0.603598
8,Linear Regression MinMaxScaler,0.602062,0.616495
9,Ridge MinMaxScaler,0.602024,0.616333


In [115]:
result_MAE

Unnamed: 0,model,train,test
0,My Linear Regression,687.030471,689.010155
1,Linear Regression,687.030471,689.010155
2,My Ridge,687.022686,689.004305
3,My Lasso,956.4072,965.507767
4,My Elastic Net,717.818027,719.555826
5,Ridge,687.026383,689.007983
6,Lasso,686.954576,688.960619
7,Elastic Net,689.400127,693.175419
8,Linear Regression MinMaxScaler,687.030471,689.010155
9,Ridge MinMaxScaler,687.156337,689.224459


In [116]:
result_RMSE

Unnamed: 0,model,train,test
0,My Linear Regression,996.991413,994.581205
1,Linear Regression,996.991413,994.581205
2,My Ridge,996.991418,994.583214
3,My Lasso,1229.64358,1231.746481
4,My Elastic Net,1049.118881,1050.801738
5,Ridge,996.991419,994.58412
6,Lasso,996.993471,994.604735
7,Elastic Net,1009.372018,1011.165845
8,Linear Regression MinMaxScaler,996.991413,994.581205
9,Ridge MinMaxScaler,997.038846,994.791755


> What is the best model?


Lasso MinMaxScaler is the best model

> Which is the most stable model?



Lasso MinMaxScaler is also very stable showing good results on all metrics