# I Introducion

***№1***

Для задачи линейной регрессии:
$$f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b$$


Целевая функция (MSE):
$$J(w) = \frac{1}{N} \|Xw - y\|^2$$

Для нахождения минимума:
$$\frac{\partial J}{\partial w} = \frac{2}{N} X^T (Xw - y) = 0$$

Аналитическое решение:
$$w = (X^T X)^{-1} X'^T y$$

***№2***

Регуляризация добавляет штраф за слишком большие значения весов:

L2 — then the linear model is called Ridge model
$$R(\boldsymbol{\theta}) = \| \boldsymbol{\theta}\|_2^2 = \sum_{i=1}^d \theta_i^2$$

L1 — then the linear model is called Lasso model
$$R(\boldsymbol{\theta}) = \| \boldsymbol{\theta}\|_1 = \sum_{i=1}^d |\theta_i|$$

***№3***

L1 регуляризация способствует разреженности весов, что означает, что многие веса становятся равными нулю. Это происходит потому, что L1 регуляризация добавляет штраф за большие значения весов, что заставляет модель "отключать" некоторые признаки, которые не вносят значительного вклада в предсказание. В результате, после подгонки модели, мы получаем набор признаков, которые наиболее важны для задачи, что упрощает интерпретацию модели и уменьшает её сложность.

***№4***

Для того чтобы использовать линейные модели для подгонки нелинейных зависимостей можно добавить полиномиальные признаки в матрицу X. Например, если у нас есть признак x, мы можем добавить $x^2$, $x^3$ и т.д. Это позволяет модели захватывать нелинейные зависимости.

## Import modules

In [176]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from collections import Counter
from sklearn.preprocessing import LabelEncoder, MinMaxScaler, StandardScaler, PolynomialFeatures
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

## Read Data

In [2]:
data = pd.read_json('data/train.json')
data.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address,interest_level
4,1.0,1,8579a0b0d54db803821a35a4a615e97a,2016-06-16 05:55:27,Spacious 1 Bedroom 1 Bathroom in Williamsburg!...,145 Borinquen Place,"[Dining Room, Pre-War, Laundry in Building, Di...",40.7108,7170325,-73.9539,a10db4590843d78c784171a107bdacb4,[https://photos.renthop.com/2/7170325_3bb5ac84...,2400,145 Borinquen Place,medium
6,1.0,2,b8e75fc949a6cd8225b455648a951712,2016-06-01 05:44:33,BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you...,East 44th,"[Doorman, Elevator, Laundry in Building, Dishw...",40.7513,7092344,-73.9722,955db33477af4f40004820b4aed804a0,[https://photos.renthop.com/2/7092344_7663c19a...,3800,230 East 44th,low
9,1.0,2,cd759a988b8f23924b5a2058d5ab2b49,2016-06-14 15:19:59,**FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L...,East 56th Street,"[Doorman, Elevator, Laundry in Building, Laund...",40.7575,7158677,-73.9625,c8b10a317b766204f08e613cef4ce7a0,[https://photos.renthop.com/2/7158677_c897a134...,3495,405 East 56th Street,medium


In [3]:
data_test = pd.read_json('data/test.json')
data_test.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address
0,1.0,1,79780be1514f645d7e6be99a3de696c5,2016-06-11 05:29:41,Large with awesome terrace--accessible via bed...,Suffolk Street,"[Elevator, Laundry in Building, Laundry in Uni...",40.7185,7142618,-73.9865,b1b1852c416d78d7765d746cb1b8921f,[https://photos.renthop.com/2/7142618_1c45a2c8...,2950,99 Suffolk Street
1,1.0,2,0,2016-06-24 06:36:34,Prime Soho - between Bleecker and Houston - Ne...,Thompson Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.7278,7210040,-74.0,d0b5648017832b2427eeb9956d966a14,[https://photos.renthop.com/2/7210040_d824cc71...,2850,176 Thompson Street
2,1.0,0,0,2016-06-17 01:23:39,Spacious studio in Prime Location. Cleanbuildi...,Sullivan Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.726,7174566,-74.0026,e6472c7237327dd3903b3d6f6a94515a,[https://photos.renthop.com/2/7174566_ba3a35c5...,2295,115 Sullivan Street


### preprocessing data

In [5]:
#encoding interest level
encoder = LabelEncoder()
data['bathrooms'] = data['bathrooms'].astype(int)
data['interest_level'] = encoder.fit_transform(data['interest_level'])

# II Data Analysis

### Get all features in column 'features'

In [6]:
features_all = []

for features in data['features']:
    if features:
        features_all += features
    

features_all

['Dining Room',
 'Pre-War',
 'Laundry in Building',
 'Dishwasher',
 'Hardwood Floors',
 'Dogs Allowed',
 'Cats Allowed',
 'Doorman',
 'Elevator',
 'Laundry in Building',
 'Dishwasher',
 'Hardwood Floors',
 'No Fee',
 'Doorman',
 'Elevator',
 'Laundry in Building',
 'Laundry in Unit',
 'Dishwasher',
 'Hardwood Floors',
 'Doorman',
 'Elevator',
 'Fitness Center',
 'Laundry in Building',
 'Doorman',
 'Elevator',
 'Loft',
 'Dishwasher',
 'Hardwood Floors',
 'No Fee',
 'Fireplace',
 'Laundry in Unit',
 'Dishwasher',
 'Hardwood Floors',
 'No Fee',
 'Elevator',
 'Laundry in Building',
 'Dishwasher',
 'Hardwood Floors',
 'No Fee',
 'Hardwood Floors',
 'Cats Allowed',
 'Dogs Allowed',
 'Doorman',
 'Elevator',
 'Laundry in Building',
 'Dogs Allowed',
 'Cats Allowed',
 'Roof Deck',
 'Doorman',
 'Elevator',
 'Fitness Center',
 'Pre-War',
 'Laundry in Building',
 'High Speed Internet',
 'Dishwasher',
 'Hardwood Floors',
 'No Fee',
 'Dogs Allowed',
 'Cats Allowed',
 'Swimming Pool',
 'Roof Deck',
 '

### Count of unique features in list

In [7]:
counter = Counter(features_all)
len(counter)

1556

### 20 most common features

In [8]:
most_common_features = counter.most_common(20)
most_common_features
for feature_name, _ in most_common_features:
    print(f"{feature_name}")

Elevator
Cats Allowed
Hardwood Floors
Dogs Allowed
Doorman
Dishwasher
No Fee
Laundry in Building
Fitness Center
Pre-War
Laundry in Unit
Roof Deck
Outdoor Space
Dining Room
High Speed Internet
Balcony
Swimming Pool
Laundry In Building
New Construction
Terrace


### Create feature list with 22 features and new features in dataset

In [9]:
features_list = [feature_name.replace(' ', '') 
                 for feature_name, _ in most_common_features]
features_list

['Elevator',
 'CatsAllowed',
 'HardwoodFloors',
 'DogsAllowed',
 'Doorman',
 'Dishwasher',
 'NoFee',
 'LaundryinBuilding',
 'FitnessCenter',
 'Pre-War',
 'LaundryinUnit',
 'RoofDeck',
 'OutdoorSpace',
 'DiningRoom',
 'HighSpeedInternet',
 'Balcony',
 'SwimmingPool',
 'LaundryInBuilding',
 'NewConstruction',
 'Terrace']

### Create a dataframe with 22 features

In [10]:
for feature in features_list:
    data[feature] = data['features'].apply(lambda x: 1 if feature 
                                            in [f.replace(' ', '') for f in x] else 0)
    data_test[feature] = data_test['features'].apply(lambda x: 1 if feature 
                                            in [f.replace(' ', '') for f in x] else 0)
    
data.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,...,LaundryinUnit,RoofDeck,OutdoorSpace,DiningRoom,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace
4,1,1,8579a0b0d54db803821a35a4a615e97a,2016-06-16 05:55:27,Spacious 1 Bedroom 1 Bathroom in Williamsburg!...,145 Borinquen Place,"[Dining Room, Pre-War, Laundry in Building, Di...",40.7108,7170325,-73.9539,...,0,0,0,1,0,0,0,0,0,0
6,1,2,b8e75fc949a6cd8225b455648a951712,2016-06-01 05:44:33,BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you...,East 44th,"[Doorman, Elevator, Laundry in Building, Dishw...",40.7513,7092344,-73.9722,...,0,0,0,0,0,0,0,0,0,0
9,1,2,cd759a988b8f23924b5a2058d5ab2b49,2016-06-14 15:19:59,**FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L...,East 56th Street,"[Doorman, Elevator, Laundry in Building, Laund...",40.7575,7158677,-73.9625,...,1,0,0,0,0,0,0,0,0,0


In [11]:
data_test.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,...,LaundryinUnit,RoofDeck,OutdoorSpace,DiningRoom,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace
0,1.0,1,79780be1514f645d7e6be99a3de696c5,2016-06-11 05:29:41,Large with awesome terrace--accessible via bed...,Suffolk Street,"[Elevator, Laundry in Building, Laundry in Uni...",40.7185,7142618,-73.9865,...,1,0,1,0,0,0,0,0,0,0
1,1.0,2,0,2016-06-24 06:36:34,Prime Soho - between Bleecker and Houston - Ne...,Thompson Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.7278,7210040,-74.0,...,0,0,0,0,0,0,0,0,0,0
2,1.0,0,0,2016-06-17 01:23:39,Spacious studio in Prime Location. Cleanbuildi...,Sullivan Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.726,7174566,-74.0026,...,0,0,0,0,0,0,0,0,0,0


In [12]:
## add features bathrooms and bedrooms
features_list += ['bathrooms', 'bedrooms']
target = ['price']

In [13]:
data = data[features_list + ['interest_level'] + target]
data_test = data_test[features_list + target]
data.head()

Unnamed: 0,Elevator,CatsAllowed,HardwoodFloors,DogsAllowed,Doorman,Dishwasher,NoFee,LaundryinBuilding,FitnessCenter,Pre-War,...,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace,bathrooms,bedrooms,interest_level,price
4,0,1,1,1,0,1,0,1,0,1,...,0,0,0,0,0,0,1,1,2,2400
6,1,0,1,0,1,1,1,1,0,0,...,0,0,0,0,0,0,1,2,1,3800
9,1,0,1,0,1,1,0,1,0,0,...,0,0,0,0,0,0,1,2,2,3495
10,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,3,2,3000
15,1,0,0,0,1,0,0,1,1,0,...,0,0,0,0,0,0,1,0,1,2795


In [14]:
data_test.head(3)

Unnamed: 0,Elevator,CatsAllowed,HardwoodFloors,DogsAllowed,Doorman,Dishwasher,NoFee,LaundryinBuilding,FitnessCenter,Pre-War,...,DiningRoom,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace,bathrooms,bedrooms,price
0,1,0,1,0,0,1,0,1,0,0,...,0,0,0,0,0,0,0,1.0,1,2950
1,0,1,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,1.0,2,2850
2,0,1,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,1.0,0,2295


# III Models implementation — Linear regression

In [15]:
# initialize random seed
np.random.seed(21)

In [16]:
class MyLinearRegression:

    def __init__(self):
        self.w = None

    def fit(self, X, y):
        X = np.array(X)
        y = np.array(y)

        assert len(X.shape) == 2 and len(y.shape) == 1
        assert X.shape[0] == y.shape[0]

        y = y[:, np.newaxis] # (l, 1)
        l, n = X.shape
        X_train = np.hstack((X, np.ones((l, 1))))

        self.w = np.linalg.inv(X_train.T @ X_train) @ X_train.T @ y

        return self

    def predict(self, X): 
        l, n = X.shape
        X_test = np.hstack((X, np.ones((l, 1))))
                            
        y_pred = X_test @ self.w
                           
        return y_pred

In [17]:
class MyGradientLinearRegression(LinearRegression):
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
    def fit(self, X, y, learning_rate = 0.01, n_iter = 100):
        X = np.array(X)
        y = np.array(y)

        assert len(X.shape) == 2 and len(y.shape) == 1
        assert X.shape[0] == y.shape[0]

        y = y[:, np.newaxis] # (l, 1)
        l, n = X.shape
        X_train = np.hstack((X, np.ones((l, 1))))
        
        self.w = np.random.randn(n + 1, 1)
        
        for iter_num in range(n_iter):
            y_pred = self.predict(X)
            
            gradient = self._calc_gradient(X_train, y, y_pred)
            
            self.w -= learning_rate * gradient
        
        return self
    
    def _calc_gradient(self, X, y, y_pred):
        grad = (2. / X.shape[0]) * (X.T @ (y_pred - y))

        return grad
    
    
class MySGDLinearRegression(MyGradientLinearRegression):
    
    def __init__(self, n_sample=10, **kwargs):
        super().__init__(**kwargs) 
        self.w = None
        self.n_sample = n_sample

    def _calc_gradient(self, X, y, y_pred):
        inds = np.random.choice(np.arange(X.shape[0]), size=self.n_sample, replace=False)

        grad = 2 / self.n_sample * X[inds].T @ (y_pred[inds] - y[inds])

        return grad

### R2_score

In [18]:
def my_r2_score(y_pred, y_true):

    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    
    mean_y_true = np.mean(y_true)
    
    ss_res = np.sum((y_true - y_pred) ** 2) 
    ss_tot = np.sum((y_true - y_true) ** 2) 

    r2 = 1 - (ss_res / ss_tot) 
    return r2

### Train Models

In [135]:
X_train = data[features_list]
X_test = data_test[features_list]
y_train = data['price']

In [136]:
my_linear_regression = MyLinearRegression()
linear_regression = LinearRegression()

my_linear_regression.fit(X_train, y_train)
linear_regression.fit(X_train, y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


### Predictions of models

In [137]:
y_pred_train_my_ln = my_linear_regression.predict(X_train)
y_pred_train_ln = linear_regression.predict(X_train)
y_pred_test_my_ln = my_linear_regression.predict(X_test)
y_pred_test_ln = linear_regression.predict(X_test)

my_train_mae = mean_absolute_error(y_pred_train_my_ln, y_train)
train_mae = mean_absolute_error(y_pred_train_ln, y_train)
my_test_mae = mean_absolute_error(y_pred_test_my_ln, data_test['price'])
test_mae = mean_absolute_error(y_pred_test_ln, data_test['price'])

my_row_mae = {'model': 'MyLinearRegression', 'train': my_train_mae, 'test': my_test_mae}
row_mae = {'model': 'LinearRegression', 'train': train_mae, 'test': test_mae}

my_train_rmse = np.sqrt(mean_squared_error(y_pred_train_my_ln, y_train))
train_rmse = np.sqrt(mean_squared_error(y_pred_train_ln, y_train))
my_test_rmse = np.sqrt(mean_squared_error(y_pred_test_my_ln, data_test['price']))
test_rmse = np.sqrt(mean_squared_error(y_pred_test_ln, data_test['price']))

my_row_rmse = {'model': 'MyLinearRegression', 'train': float(my_train_rmse), 'test': float(my_test_rmse)}
row_rmse = {'model': 'LinearRegression', 'train': float(train_rmse), 'test': float(test_rmse)}

my_train_r2 = r2_score(y_train, y_pred_train_my_ln)
train_r2 = r2_score(y_train, y_pred_train_ln)
my_test_r2 = r2_score(data_test['price'], y_pred_test_my_ln)
test_r2 = r2_score(data_test['price'], y_pred_test_ln)

my_row_r2 = {'model': 'MyLinearRegression', 'train': my_train_r2, 'test': my_test_r2}
row_r2 = {'model': 'LinearRegression', 'train': train_r2, 'test': test_r2}

### create dataframes MAE RMSE R2

In [138]:
result_MAE = pd.DataFrame(columns=['model', 'train', 'test'])
result_RMSE = pd.DataFrame(columns=['model', 'train', 'test'])
result_R2 = pd.DataFrame(columns=['model', 'train', 'test'])

In [139]:
result_MAE.loc[len(result_MAE)] = row_mae
result_RMSE.loc[len(result_RMSE)] = row_rmse
result_R2.loc[len(result_R2)] = row_r2
result_MAE.loc[len(result_MAE)] = my_row_mae
result_RMSE.loc[len(result_RMSE)] = my_row_rmse
result_R2.loc[len(result_R2)] = my_row_r2

In [140]:
result_MAE

Unnamed: 0,model,train,test
0,LinearRegression,1163.521891,1092.253562
1,MyLinearRegression,1163.521891,1092.253562


In [141]:
result_RMSE

Unnamed: 0,model,train,test
0,LinearRegression,21996.189508,9618.984915
1,MyLinearRegression,21996.189508,9618.984915


In [142]:
result_R2

Unnamed: 0,model,train,test
0,LinearRegression,0.006375,0.01927
1,MyLinearRegression,0.006375,0.01927


# Ridge, Lasso, Elastic

In [143]:
class RegularizedLinearRegression:
    def __init__(self, method='ridge', alpha=1.0, l1_ratio=0.5, learning_rate=0.01, epochs=1000):
        self.method = method  # 'ridge', 'lasso', 'elasticnet'
        self.alpha = alpha
        self.l1_ratio = l1_ratio  # только для elasticnet
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        X = np.array(X)
        y = np.array(y)
        n_samples, n_features = X.shape

        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.epochs):
            y_pred = np.dot(X, self.weights) + self.bias
            error = y_pred - y

            # Базовые градиенты
            dw = (1/n_samples) * np.dot(X.T, error)
            db = (1/n_samples) * np.sum(error)

            # Регуляризация
            if self.method == 'ridge':
                dw += (self.alpha / n_samples) * self.weights  # L2
            elif self.method == 'lasso':
                dw += (self.alpha / n_samples) * np.sign(self.weights)  # L1
            elif self.method == 'elasticnet':
                l1 = self.l1_ratio * np.sign(self.weights)
                l2 = (1 - self.l1_ratio) * self.weights
                dw += (self.alpha / n_samples) * (l1 + l2)
            else:
                raise ValueError("Метод должен быть 'ridge', 'lasso' или 'elasticnet'")

            # Обновление параметров
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        X = np.array(X)
        return np.dot(X, self.weights) + self.bias

    def r2_score(self, y_true, y_pred):
        ss_res = np.sum((y_true - y_pred) ** 2)
        ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
        return 1 - ss_res / ss_tot

    def mae(self, y_true, y_pred):
        return np.mean(np.abs(y_true - y_pred))

    def rmse(self, y_true, y_pred):
        return np.sqrt(np.mean((y_true - y_pred) ** 2))

In [144]:
ridge_model = Ridge(alpha=1.0)
lasso_model = Lasso(alpha=1.0)
elastic_net_model = ElasticNet(alpha=1.0, l1_ratio=0.5)

ridge_model.fit(X_train, y_train)
lasso_model.fit(X_train, y_train)
elastic_net_model.fit(X_train, y_train)

my_lasso_model = RegularizedLinearRegression(method='lasso')
my_ridge_model = RegularizedLinearRegression(method='ridge')
my_elastic_net_model = RegularizedLinearRegression(method='elasticnet')

my_lasso_model.fit(X_train, y_train)
my_ridge_model.fit(X_train, y_train)
my_elastic_net_model.fit(X_train, y_train)

In [145]:
y_pred_train_ridge = ridge_model.predict(X_train)
y_pred_test_ridge = ridge_model.predict(X_test)

ridge_train_mae = mean_absolute_error(y_pred_train_ridge, y_train)
ridge_test_mae = mean_absolute_error(y_pred_test_ridge, data_test['price'])

ridge_row_mae = {'model': 'RidgeRegression', 'train': ridge_train_mae, 'test': ridge_test_mae}

ridge_train_rmse = np.sqrt(mean_squared_error(y_pred_train_ridge, y_train))
ridge_test_rmse = np.sqrt(mean_squared_error(y_pred_test_ridge, data_test['price']))

ridge_row_rmse = {'model': 'RidgeRegression', 'train': float(ridge_train_rmse), 'test': float(ridge_test_rmse)}

ridge_train_r2 = r2_score(y_train, y_pred_train_ridge)
ridge_test_r2 = r2_score(data_test['price'], y_pred_test_ridge)

ridge_row_r2 = {'model': 'RidgeRegression', 'train': ridge_train_r2, 'test': ridge_test_r2}

# Предсказания для Lasso
y_pred_train_lasso = lasso_model.predict(X_train)
y_pred_test_lasso = lasso_model.predict(X_test)

# Метрики для Lasso
lasso_train_mae = mean_absolute_error(y_pred_train_lasso, y_train)
lasso_test_mae = mean_absolute_error(y_pred_test_lasso, data_test['price'])

lasso_row_mae = {'model': 'LassoRegression', 'train': lasso_train_mae, 'test': lasso_test_mae}

lasso_train_rmse = np.sqrt(mean_squared_error(y_pred_train_lasso, y_train))
lasso_test_rmse = np.sqrt(mean_squared_error(y_pred_test_lasso, data_test['price']))

lasso_row_rmse = {'model': 'LassoRegression', 'train': float(lasso_train_rmse), 'test': float(lasso_test_rmse)}

lasso_train_r2 = r2_score(y_train, y_pred_train_lasso)
lasso_test_r2 = r2_score(data_test['price'], y_pred_test_lasso)

lasso_row_r2 = {'model': 'LassoRegression', 'train': lasso_train_r2, 'test': lasso_test_r2}

# Предсказания для ElasticNet
y_pred_train_elastic = elastic_net_model.predict(X_train)
y_pred_test_elastic = elastic_net_model.predict(X_test)

# Метрики для ElasticNet
elastic_train_mae = mean_absolute_error(y_pred_train_elastic, y_train)
elastic_test_mae = mean_absolute_error(y_pred_test_elastic, data_test['price'])

elastic_row_mae = {'model': 'ElasticNet', 'train': elastic_train_mae, 'test': elastic_test_mae}

elastic_train_rmse = np.sqrt(mean_squared_error(y_pred_train_elastic, y_train))
elastic_test_rmse = np.sqrt(mean_squared_error(y_pred_test_elastic, data_test['price']))

elastic_row_rmse = {'model': 'ElasticNet', 'train': float(elastic_train_rmse), 'test': float(elastic_test_rmse)}

elastic_train_r2 = r2_score(y_train, y_pred_train_elastic)
elastic_test_r2 = r2_score(data_test['price'], y_pred_test_elastic)

elastic_row_r2 = {'model': 'ElasticNet', 'train': elastic_train_r2, 'test': elastic_test_r2}

In [146]:
result_MAE.loc[len(result_MAE)] = ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = ridge_row_rmse
result_R2.loc[len(result_R2)] = ridge_row_r2

result_MAE.loc[len(result_MAE)] = lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = lasso_row_rmse
result_R2.loc[len(result_R2)] = lasso_row_r2

result_MAE.loc[len(result_MAE)] = elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = elastic_row_rmse
result_R2.loc[len(result_R2)] = elastic_row_r2

In [147]:
# Предсказания для Ridge
my_y_pred_train_ridge = my_ridge_model.predict(X_train)
my_y_pred_test_ridge = my_ridge_model.predict(X_test)

# Метрики для Ridge
my_ridge_train_mae = mean_absolute_error(y_train, my_y_pred_train_ridge)
my_ridge_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_mae = {'model': 'MyRidgeRegression', 'train': my_ridge_train_mae, 'test': my_ridge_test_mae}

my_ridge_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_ridge))
my_ridge_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_ridge))

my_ridge_row_rmse = {'model': 'MyRidgeRegression', 'train': float(my_ridge_train_rmse), 'test': float(my_ridge_test_rmse)}

my_ridge_train_r2 = r2_score(y_train, my_y_pred_train_ridge)
my_ridge_test_r2 = r2_score(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_r2 = {'model': 'MyRidgeRegression', 'train': my_ridge_train_r2, 'test': my_ridge_test_r2}

# Предсказания для Lasso
my_y_pred_train_lasso = my_lasso_model.predict(X_train)
my_y_pred_test_lasso = my_lasso_model.predict(X_test)

# Метрики для Lasso
my_lasso_train_mae = mean_absolute_error(y_train, my_y_pred_train_lasso)
my_lasso_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_mae = {'model': 'MyLassoRegression', 'train': my_lasso_train_mae, 'test': my_lasso_test_mae}

my_lasso_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_lasso))
my_lasso_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_lasso))

my_lasso_row_rmse = {'model': 'MyLassoRegression', 'train': float(my_lasso_train_rmse), 'test': float(my_lasso_test_rmse)}

my_lasso_train_r2 = r2_score(y_train, my_y_pred_train_lasso)
my_lasso_test_r2 = r2_score(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_r2 = {'model': 'MyLassoRegression', 'train': my_lasso_train_r2, 'test': my_lasso_test_r2}

# Предсказания для ElasticNet
my_y_pred_train_elastic = my_elastic_net_model.predict(X_train)
my_y_pred_test_elastic = my_elastic_net_model.predict(X_test)

# Метрики для ElasticNet
my_elastic_train_mae = mean_absolute_error(y_train, my_y_pred_train_elastic)
my_elastic_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_mae = {'model': 'MyElasticNet', 'train': my_elastic_train_mae, 'test': my_elastic_test_mae}

my_elastic_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_elastic))
my_elastic_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_elastic))

my_elastic_row_rmse = {'model': 'MyElasticNet', 'train': float(my_elastic_train_rmse), 'test': float(my_elastic_test_rmse)}

my_elastic_train_r2 = r2_score(y_train, my_y_pred_train_elastic)
my_elastic_test_r2 = r2_score(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_r2 = {'model': 'MyElasticNet', 'train': my_elastic_train_r2, 'test': my_elastic_test_r2}

In [148]:
result_MAE.loc[len(result_MAE)] = my_ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = my_ridge_row_rmse
result_R2.loc[len(result_R2)] = my_ridge_row_r2

result_MAE.loc[len(result_MAE)] = my_lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = my_lasso_row_rmse
result_R2.loc[len(result_R2)] = my_lasso_row_r2

result_MAE.loc[len(result_MAE)] = my_elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = my_elastic_row_rmse
result_R2.loc[len(result_R2)] = my_elastic_row_r2

In [149]:
result_MAE

Unnamed: 0,model,train,test
0,LinearRegression,1163.521891,1092.253562
1,MyLinearRegression,1163.521891,1092.253562
2,RidgeRegression,1163.471072,1092.20222
3,LassoRegression,1159.835244,1088.469272
4,ElasticNet,1127.612189,1053.076669
5,MyRidgeRegression,1117.669486,1045.682211
6,MyLassoRegression,1117.681346,1045.694324
7,MyElasticNet,1117.675415,1045.688267


In [150]:
result_RMSE

Unnamed: 0,model,train,test
0,LinearRegression,21996.189508,9618.984915
1,MyLinearRegression,21996.189508,9618.984915
2,RidgeRegression,21996.189509,9618.973702
3,LassoRegression,21996.198419,9618.745372
4,ElasticNet,22016.223493,9612.033691
5,MyRidgeRegression,21998.105012,9606.462986
6,MyLassoRegression,21998.104391,9606.465805
7,MyElasticNet,21998.104701,9606.464395


In [151]:
result_R2

Unnamed: 0,model,train,test
0,LinearRegression,0.006375,0.01927
1,MyLinearRegression,0.006375,0.01927
2,RidgeRegression,0.006375,0.019273
3,LassoRegression,0.006374,0.019319
4,ElasticNet,0.004564,0.020687
5,MyRidgeRegression,0.006202,0.021822
6,MyLassoRegression,0.006202,0.021821
7,MyElasticNet,0.006202,0.021822


# MinMaxScaler

In [152]:
class MyMinMaxScaler:
    
    def __init__(self):
        pass
    
    def fit_transform(self, X):
        X = np.array(X)
        X_min = X.min(axis=0)
        X_max = X.max(axis=0)
        
        return (X - X_min) / (X_max - X_min)

In [153]:
my_min_max_scaler = MyMinMaxScaler()
min_max_scaler = MinMaxScaler()

my_X_train_scaled = my_min_max_scaler.fit_transform(data[features_list])
X_train_scaled = min_max_scaler.fit_transform(data[features_list])

In [154]:
print(my_X_train_scaled[0])
print(my_X_train_scaled[0])

[0.    1.    1.    1.    0.    1.    0.    1.    0.    1.    0.    0.
 0.    1.    0.    0.    0.    0.    0.    0.    0.1   0.125]
[0.    1.    1.    1.    0.    1.    0.    1.    0.    1.    0.    0.
 0.    1.    0.    0.    0.    0.    0.    0.    0.1   0.125]


# StandardScaler

In [155]:
class MyStandardScaler:
    
    def __init__(self):
        pass
    
    def fit_transform(self, X):
        X = np.array(X)
        mean = X.mean(axis=0)
        std = X.std(axis=0)
        
        return (X - mean) / std

In [156]:
my_standard_scaler = MyStandardScaler()
standard_scaler = StandardScaler()

my_X_train_scaled = my_standard_scaler.fit_transform(data[features_list])
X_train_scaled = standard_scaler.fit_transform(data[features_list])

print(my_X_train_scaled[0])
print(my_X_train_scaled[0])

[-1.05153709  1.04714687  1.04769987  1.11342245 -0.85699976  1.19001525
 -0.75976649  1.42111894 -0.60588069  2.09638746 -0.46383994 -0.39091529
 -0.34568647  2.93411559 -0.30890281 -0.25404408 -0.24198357 -0.23548793
 -0.23385394 -0.22023456 -0.41204122 -0.48577234]
[-1.05153709  1.04714687  1.04769987  1.11342245 -0.85699976  1.19001525
 -0.75976649  1.42111894 -0.60588069  2.09638746 -0.46383994 -0.39091529
 -0.34568647  2.93411559 -0.30890281 -0.25404408 -0.24198357 -0.23548793
 -0.23385394 -0.22023456 -0.41204122 -0.48577234]


# Models with normalized data

## MinMax normalization

In [157]:
X_train_minmax_scaled = min_max_scaler.fit_transform(data[features_list])
X_test_minmax_scaled = min_max_scaler.fit_transform(data_test[features_list])

In [158]:
#train models
my_linear_regression.fit(X_train_minmax_scaled, y_train)
linear_regression.fit(X_train_minmax_scaled, y_train)

ridge_model.fit(X_train_minmax_scaled, y_train)
lasso_model.fit(X_train_minmax_scaled, y_train)
elastic_net_model.fit(X_train_minmax_scaled, y_train)

my_lasso_model.fit(X_train_minmax_scaled, y_train)
my_ridge_model.fit(X_train_minmax_scaled, y_train)
my_elastic_net_model.fit(X_train_minmax_scaled, y_train)

In [159]:
#predictions
y_pred_train_my_ln = my_linear_regression.predict(X_train_minmax_scaled)
y_pred_train_ln = linear_regression.predict(X_train_minmax_scaled)
y_pred_test_my_ln = my_linear_regression.predict(X_test_minmax_scaled)
y_pred_test_ln = linear_regression.predict(X_test_minmax_scaled)

my_train_mae = mean_absolute_error(y_pred_train_my_ln, y_train)
train_mae = mean_absolute_error(y_pred_train_ln, y_train)
my_test_mae = mean_absolute_error(y_pred_test_my_ln, data_test['price'])
test_mae = mean_absolute_error(y_pred_test_ln, data_test['price'])

my_row_mae = {'model': 'MinMaxMyLinearRegression', 'train': my_train_mae, 'test': my_test_mae}
row_mae = {'model': 'MinMaxLinearRegression', 'train': train_mae, 'test': test_mae}

my_train_rmse = np.sqrt(mean_squared_error(y_pred_train_my_ln, y_train))
train_rmse = np.sqrt(mean_squared_error(y_pred_train_ln, y_train))
my_test_rmse = np.sqrt(mean_squared_error(y_pred_test_my_ln, data_test['price']))
test_rmse = np.sqrt(mean_squared_error(y_pred_test_ln, data_test['price']))

my_row_rmse = {'model': 'MinMaxMyLinearRegression', 'train': float(my_train_rmse), 'test': float(my_test_rmse)}
row_rmse = {'model': 'MinMaxLinearRegression', 'train': float(train_rmse), 'test': float(test_rmse)}

my_train_r2 = r2_score(y_train, y_pred_train_my_ln)
train_r2 = r2_score(y_train, y_pred_train_ln)
my_test_r2 = r2_score(data_test['price'], y_pred_test_my_ln)
test_r2 = r2_score(data_test['price'], y_pred_test_ln)

my_row_r2 = {'model': 'MinMaxMyLinearRegression', 'train': my_train_r2, 'test': my_test_r2}
row_r2 = {'model': 'MinMaxLinearRegression', 'train': train_r2, 'test': test_r2}

In [160]:
result_MAE.loc[len(result_MAE)] = row_mae
result_RMSE.loc[len(result_RMSE)] = row_rmse
result_R2.loc[len(result_R2)] = row_r2
result_MAE.loc[len(result_MAE)] = my_row_mae
result_RMSE.loc[len(result_RMSE)] = my_row_rmse
result_R2.loc[len(result_R2)] = my_row_r2

In [161]:
y_pred_train_ridge = ridge_model.predict(X_train_minmax_scaled)
y_pred_test_ridge = ridge_model.predict(X_test_minmax_scaled)

ridge_train_mae = mean_absolute_error(y_pred_train_ridge, y_train)
ridge_test_mae = mean_absolute_error(y_pred_test_ridge, data_test['price'])

ridge_row_mae = {'model': 'MinMaxRidgeRegression', 'train': ridge_train_mae, 'test': ridge_test_mae}

ridge_train_rmse = np.sqrt(mean_squared_error(y_pred_train_ridge, y_train))
ridge_test_rmse = np.sqrt(mean_squared_error(y_pred_test_ridge, data_test['price']))

ridge_row_rmse = {'model': 'MinMaxRidgeRegression', 'train': float(ridge_train_rmse), 'test': float(ridge_test_rmse)}

ridge_train_r2 = r2_score(y_train, y_pred_train_ridge)
ridge_test_r2 = r2_score(data_test['price'], y_pred_test_ridge)

ridge_row_r2 = {'model': 'MinMAxRidgeRegression', 'train': ridge_train_r2, 'test': ridge_test_r2}

# Предсказания для Lasso
y_pred_train_lasso = lasso_model.predict(X_train_minmax_scaled)
y_pred_test_lasso = lasso_model.predict(X_test_minmax_scaled)

# Метрики для Lasso
lasso_train_mae = mean_absolute_error(y_pred_train_lasso, y_train)
lasso_test_mae = mean_absolute_error(y_pred_test_lasso, data_test['price'])

lasso_row_mae = {'model': 'MinMaxLassoRegression', 'train': lasso_train_mae, 'test': lasso_test_mae}

lasso_train_rmse = np.sqrt(mean_squared_error(y_pred_train_lasso, y_train))
lasso_test_rmse = np.sqrt(mean_squared_error(y_pred_test_lasso, data_test['price']))

lasso_row_rmse = {'model': 'MinMaxLassoRegression', 'train': float(lasso_train_rmse), 'test': float(lasso_test_rmse)}

lasso_train_r2 = r2_score(y_train, y_pred_train_lasso)
lasso_test_r2 = r2_score(data_test['price'], y_pred_test_lasso)

lasso_row_r2 = {'model': 'MinMaxLassoRegression', 'train': lasso_train_r2, 'test': lasso_test_r2}

# Предсказания для ElasticNet
y_pred_train_elastic = elastic_net_model.predict(X_train_minmax_scaled)
y_pred_test_elastic = elastic_net_model.predict(X_test_minmax_scaled)

# Метрики для ElasticNet
elastic_train_mae = mean_absolute_error(y_pred_train_elastic, y_train)
elastic_test_mae = mean_absolute_error(y_pred_test_elastic, data_test['price'])

elastic_row_mae = {'model': 'MinMaxElasticNet', 'train': elastic_train_mae, 'test': elastic_test_mae}

elastic_train_rmse = np.sqrt(mean_squared_error(y_pred_train_elastic, y_train))
elastic_test_rmse = np.sqrt(mean_squared_error(y_pred_test_elastic, data_test['price']))

elastic_row_rmse = {'model': 'MinMaxElasticNet', 'train': float(elastic_train_rmse), 'test': float(elastic_test_rmse)}

elastic_train_r2 = r2_score(y_train, y_pred_train_elastic)
elastic_test_r2 = r2_score(data_test['price'], y_pred_test_elastic)

elastic_row_r2 = {'model': 'MinMaxElasticNet', 'train': elastic_train_r2, 'test': elastic_test_r2}

In [162]:
result_MAE.loc[len(result_MAE)] = ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = ridge_row_rmse
result_R2.loc[len(result_R2)] = ridge_row_r2

result_MAE.loc[len(result_MAE)] = lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = lasso_row_rmse
result_R2.loc[len(result_R2)] = lasso_row_r2

result_MAE.loc[len(result_MAE)] = elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = elastic_row_rmse
result_R2.loc[len(result_R2)] = elastic_row_r2

In [163]:
# Предсказания для Ridge
my_y_pred_train_ridge = my_ridge_model.predict(X_train_minmax_scaled)
my_y_pred_test_ridge = my_ridge_model.predict(X_test_minmax_scaled)

# Метрики для Ridge
my_ridge_train_mae = mean_absolute_error(y_train, my_y_pred_train_ridge)
my_ridge_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_mae = {'model': 'MinMaxMyRidgeRegression', 'train': my_ridge_train_mae, 'test': my_ridge_test_mae}

my_ridge_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_ridge))
my_ridge_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_ridge))

my_ridge_row_rmse = {'model': 'MinMaxMyRidgeRegression', 'train': float(my_ridge_train_rmse), 'test': float(my_ridge_test_rmse)}

my_ridge_train_r2 = r2_score(y_train, my_y_pred_train_ridge)
my_ridge_test_r2 = r2_score(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_r2 = {'model': 'MinMaxMyRidgeRegression', 'train': my_ridge_train_r2, 'test': my_ridge_test_r2}

# Предсказания для Lasso
my_y_pred_train_lasso = my_lasso_model.predict(X_train_minmax_scaled)
my_y_pred_test_lasso = my_lasso_model.predict(X_test_minmax_scaled)

# Метрики для Lasso
my_lasso_train_mae = mean_absolute_error(y_train, my_y_pred_train_lasso)
my_lasso_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_mae = {'model': 'MinMaxMyLassoRegression', 'train': my_lasso_train_mae, 'test': my_lasso_test_mae}

my_lasso_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_lasso))
my_lasso_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_lasso))

my_lasso_row_rmse = {'model': 'MinMaxMyLassoRegression', 'train': float(my_lasso_train_rmse), 'test': float(my_lasso_test_rmse)}

my_lasso_train_r2 = r2_score(y_train, my_y_pred_train_lasso)
my_lasso_test_r2 = r2_score(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_r2 = {'model': 'MinMaxMyLassoRegression', 'train': my_lasso_train_r2, 'test': my_lasso_test_r2}

# Предсказания для ElasticNet
my_y_pred_train_elastic = my_elastic_net_model.predict(X_train_minmax_scaled)
my_y_pred_test_elastic = my_elastic_net_model.predict(X_test_minmax_scaled)

# Метрики для ElasticNet
my_elastic_train_mae = mean_absolute_error(y_train, my_y_pred_train_elastic)
my_elastic_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_mae = {'model': 'MinMaxMyElasticNet', 'train': my_elastic_train_mae, 'test': my_elastic_test_mae}

my_elastic_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_elastic))
my_elastic_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_elastic))

my_elastic_row_rmse = {'model': 'MinMaxMyElasticNet', 'train': float(my_elastic_train_rmse), 'test': float(my_elastic_test_rmse)}

my_elastic_train_r2 = r2_score(y_train, my_y_pred_train_elastic)
my_elastic_test_r2 = r2_score(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_r2 = {'model': 'MinMaxMyElasticNet', 'train': my_elastic_train_r2, 'test': my_elastic_test_r2}

In [164]:
result_MAE.loc[len(result_MAE)] = my_ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = my_ridge_row_rmse
result_R2.loc[len(result_R2)] = my_ridge_row_r2

result_MAE.loc[len(result_MAE)] = my_lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = my_lasso_row_rmse
result_R2.loc[len(result_R2)] = my_lasso_row_r2

result_MAE.loc[len(result_MAE)] = my_elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = my_elastic_row_rmse
result_R2.loc[len(result_R2)] = my_elastic_row_r2

## Standard normalization

In [165]:
X_train_standard_scaled = standard_scaler.fit_transform(data[features_list])
X_test_standard_scaled = standard_scaler.fit_transform(data_test[features_list])

In [166]:
#train models
my_linear_regression.fit(X_train_standard_scaled, y_train)
linear_regression.fit(X_train_standard_scaled, y_train)

ridge_model.fit(X_train_standard_scaled, y_train)
lasso_model.fit(X_train_standard_scaled, y_train)
elastic_net_model.fit(X_train_standard_scaled, y_train)

my_lasso_model.fit(X_train_standard_scaled, y_train)
my_ridge_model.fit(X_train_standard_scaled, y_train)
my_elastic_net_model.fit(X_train_standard_scaled, y_train)

In [167]:
#predictions
y_pred_train_my_ln = my_linear_regression.predict(X_train_standard_scaled)
y_pred_train_ln = linear_regression.predict(X_train_standard_scaled)
y_pred_test_my_ln = my_linear_regression.predict(X_test_standard_scaled)
y_pred_test_ln = linear_regression.predict(X_test_standard_scaled)

my_train_mae = mean_absolute_error(y_pred_train_my_ln, y_train)
train_mae = mean_absolute_error(y_pred_train_ln, y_train)
my_test_mae = mean_absolute_error(y_pred_test_my_ln, data_test['price'])
test_mae = mean_absolute_error(y_pred_test_ln, data_test['price'])

my_row_mae = {'model': 'StandardMyLinearRegression', 'train': my_train_mae, 'test': my_test_mae}
row_mae = {'model': 'StandardLinearRegression', 'train': train_mae, 'test': test_mae}

my_train_rmse = np.sqrt(mean_squared_error(y_pred_train_my_ln, y_train))
train_rmse = np.sqrt(mean_squared_error(y_pred_train_ln, y_train))
my_test_rmse = np.sqrt(mean_squared_error(y_pred_test_my_ln, data_test['price']))
test_rmse = np.sqrt(mean_squared_error(y_pred_test_ln, data_test['price']))

my_row_rmse = {'model': 'StandardMyLinearRegression', 'train': float(my_train_rmse), 'test': float(my_test_rmse)}
row_rmse = {'model': 'StandardLinearRegression', 'train': float(train_rmse), 'test': float(test_rmse)}

my_train_r2 = r2_score(y_train, y_pred_train_my_ln)
train_r2 = r2_score(y_train, y_pred_train_ln)
my_test_r2 = r2_score(data_test['price'], y_pred_test_my_ln)
test_r2 = r2_score(data_test['price'], y_pred_test_ln)

my_row_r2 = {'model': 'StandardMyLinearRegression', 'train': my_train_r2, 'test': my_test_r2}
row_r2 = {'model': 'StandardLinearRegression', 'train': train_r2, 'test': test_r2}

In [168]:
result_MAE.loc[len(result_MAE)] = row_mae
result_RMSE.loc[len(result_RMSE)] = row_rmse
result_R2.loc[len(result_R2)] = row_r2
result_MAE.loc[len(result_MAE)] = my_row_mae
result_RMSE.loc[len(result_RMSE)] = my_row_rmse
result_R2.loc[len(result_R2)] = my_row_r2

In [169]:
y_pred_train_ridge = ridge_model.predict(X_train_standard_scaled)
y_pred_test_ridge = ridge_model.predict(X_test_standard_scaled)

ridge_train_mae = mean_absolute_error(y_pred_train_ridge, y_train)
ridge_test_mae = mean_absolute_error(y_pred_test_ridge, data_test['price'])

ridge_row_mae = {'model': 'StandardRidgeRegression', 'train': ridge_train_mae, 'test': ridge_test_mae}

ridge_train_rmse = np.sqrt(mean_squared_error(y_pred_train_ridge, y_train))
ridge_test_rmse = np.sqrt(mean_squared_error(y_pred_test_ridge, data_test['price']))

ridge_row_rmse = {'model': 'StandardRidgeRegression', 'train': float(ridge_train_rmse), 'test': float(ridge_test_rmse)}

ridge_train_r2 = r2_score(y_train, y_pred_train_ridge)
ridge_test_r2 = r2_score(data_test['price'], y_pred_test_ridge)

ridge_row_r2 = {'model': 'StandardRidgeRegression', 'train': ridge_train_r2, 'test': ridge_test_r2}

# Предсказания для Lasso
y_pred_train_lasso = lasso_model.predict(X_train_standard_scaled)
y_pred_test_lasso = lasso_model.predict(X_test_standard_scaled)

# Метрики для Lasso
lasso_train_mae = mean_absolute_error(y_pred_train_lasso, y_train)
lasso_test_mae = mean_absolute_error(y_pred_test_lasso, data_test['price'])

lasso_row_mae = {'model': 'StandardLassoRegression', 'train': lasso_train_mae, 'test': lasso_test_mae}

lasso_train_rmse = np.sqrt(mean_squared_error(y_pred_train_lasso, y_train))
lasso_test_rmse = np.sqrt(mean_squared_error(y_pred_test_lasso, data_test['price']))

lasso_row_rmse = {'model': 'StandardLassoRegression', 'train': float(lasso_train_rmse), 'test': float(lasso_test_rmse)}

lasso_train_r2 = r2_score(y_train, y_pred_train_lasso)
lasso_test_r2 = r2_score(data_test['price'], y_pred_test_lasso)

lasso_row_r2 = {'model': 'StandardLassoRegression', 'train': lasso_train_r2, 'test': lasso_test_r2}

# Предсказания для ElasticNet
y_pred_train_elastic = elastic_net_model.predict(X_train_standard_scaled)
y_pred_test_elastic = elastic_net_model.predict(X_test_standard_scaled)

# Метрики для ElasticNet
elastic_train_mae = mean_absolute_error(y_pred_train_elastic, y_train)
elastic_test_mae = mean_absolute_error(y_pred_test_elastic, data_test['price'])

elastic_row_mae = {'model': 'StandardElasticNet', 'train': elastic_train_mae, 'test': elastic_test_mae}

elastic_train_rmse = np.sqrt(mean_squared_error(y_pred_train_elastic, y_train))
elastic_test_rmse = np.sqrt(mean_squared_error(y_pred_test_elastic, data_test['price']))

elastic_row_rmse = {'model': 'StandardElasticNet', 'train': float(elastic_train_rmse), 'test': float(elastic_test_rmse)}

elastic_train_r2 = r2_score(y_train, y_pred_train_elastic)
elastic_test_r2 = r2_score(data_test['price'], y_pred_test_elastic)

elastic_row_r2 = {'model': 'StandardElasticNet', 'train': elastic_train_r2, 'test': elastic_test_r2}

In [170]:
result_MAE.loc[len(result_MAE)] = ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = ridge_row_rmse
result_R2.loc[len(result_R2)] = ridge_row_r2

result_MAE.loc[len(result_MAE)] = lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = lasso_row_rmse
result_R2.loc[len(result_R2)] = lasso_row_r2

result_MAE.loc[len(result_MAE)] = elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = elastic_row_rmse
result_R2.loc[len(result_R2)] = elastic_row_r2

In [171]:
# Предсказания для Ridge
my_y_pred_train_ridge = my_ridge_model.predict(X_train_standard_scaled)
my_y_pred_test_ridge = my_ridge_model.predict(X_test_standard_scaled)

# Метрики для Ridge
my_ridge_train_mae = mean_absolute_error(y_train, my_y_pred_train_ridge)
my_ridge_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_mae = {'model': 'StandardMyRidgeRegression', 'train': my_ridge_train_mae, 'test': my_ridge_test_mae}

my_ridge_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_ridge))
my_ridge_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_ridge))

my_ridge_row_rmse = {'model': 'StandardMyRidgeRegression', 'train': float(my_ridge_train_rmse), 'test': float(my_ridge_test_rmse)}

my_ridge_train_r2 = r2_score(y_train, my_y_pred_train_ridge)
my_ridge_test_r2 = r2_score(data_test['price'], my_y_pred_test_ridge)

my_ridge_row_r2 = {'model': 'StandardMyRidgeRegression', 'train': my_ridge_train_r2, 'test': my_ridge_test_r2}

# Предсказания для Lasso
my_y_pred_train_lasso = my_lasso_model.predict(X_train_standard_scaled)
my_y_pred_test_lasso = my_lasso_model.predict(X_test_standard_scaled)

# Метрики для Lasso
my_lasso_train_mae = mean_absolute_error(y_train, my_y_pred_train_lasso)
my_lasso_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_mae = {'model': 'StandardMyLassoRegression', 'train': my_lasso_train_mae, 'test': my_lasso_test_mae}

my_lasso_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_lasso))
my_lasso_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_lasso))

my_lasso_row_rmse = {'model': 'StandardMyLassoRegression', 'train': float(my_lasso_train_rmse), 'test': float(my_lasso_test_rmse)}

my_lasso_train_r2 = r2_score(y_train, my_y_pred_train_lasso)
my_lasso_test_r2 = r2_score(data_test['price'], my_y_pred_test_lasso)

my_lasso_row_r2 = {'model': 'StandardMyLassoRegression', 'train': my_lasso_train_r2, 'test': my_lasso_test_r2}

# Предсказания для ElasticNet
my_y_pred_train_elastic = my_elastic_net_model.predict(X_train_standard_scaled)
my_y_pred_test_elastic = my_elastic_net_model.predict(X_test_standard_scaled)

# Метрики для ElasticNet
my_elastic_train_mae = mean_absolute_error(y_train, my_y_pred_train_elastic)
my_elastic_test_mae = mean_absolute_error(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_mae = {'model': 'StandardMyElasticNet', 'train': my_elastic_train_mae, 'test': my_elastic_test_mae}

my_elastic_train_rmse = np.sqrt(mean_squared_error(y_train, my_y_pred_train_elastic))
my_elastic_test_rmse = np.sqrt(mean_squared_error(data_test['price'], my_y_pred_test_elastic))

my_elastic_row_rmse = {'model': 'StandardMyElasticNet', 'train': float(my_elastic_train_rmse), 'test': float(my_elastic_test_rmse)}

my_elastic_train_r2 = r2_score(y_train, my_y_pred_train_elastic)
my_elastic_test_r2 = r2_score(data_test['price'], my_y_pred_test_elastic)

my_elastic_row_r2 = {'model': 'StandardMyElasticNet', 'train': my_elastic_train_r2, 'test': my_elastic_test_r2}

In [172]:
result_MAE.loc[len(result_MAE)] = my_ridge_row_mae
result_RMSE.loc[len(result_RMSE)] = my_ridge_row_rmse
result_R2.loc[len(result_R2)] = my_ridge_row_r2

result_MAE.loc[len(result_MAE)] = my_lasso_row_mae
result_RMSE.loc[len(result_RMSE)] = my_lasso_row_rmse
result_R2.loc[len(result_R2)] = my_lasso_row_r2

result_MAE.loc[len(result_MAE)] = my_elastic_row_mae
result_RMSE.loc[len(result_RMSE)] = my_elastic_row_rmse
result_R2.loc[len(result_R2)] = my_elastic_row_r2

In [173]:
result_MAE

Unnamed: 0,model,train,test
0,LinearRegression,1163.521891,1092.253562
1,MyLinearRegression,1163.521891,1092.253562
2,RidgeRegression,1163.471072,1092.20222
3,LassoRegression,1159.835244,1088.469272
4,ElasticNet,1127.612189,1053.076669
5,MyRidgeRegression,1117.669486,1045.682211
6,MyLassoRegression,1117.681346,1045.694324
7,MyElasticNet,1117.675415,1045.688267
8,MinMaxLinearRegression,1163.521891,2151.063741
9,MinMaxMyLinearRegression,1163.521891,2151.063741


In [174]:
result_RMSE

Unnamed: 0,model,train,test
0,LinearRegression,21996.189508,9618.984915
1,MyLinearRegression,21996.189508,9618.984915
2,RidgeRegression,21996.189509,9618.973702
3,LassoRegression,21996.198419,9618.745372
4,ElasticNet,22016.223493,9612.033691
5,MyRidgeRegression,21998.105012,9606.462986
6,MyLassoRegression,21998.104391,9606.465805
7,MyElasticNet,21998.104701,9606.464395
8,MinMaxLinearRegression,21996.189508,9857.371623
9,MinMaxMyLinearRegression,21996.189508,9857.371623


In [175]:
result_R2

Unnamed: 0,model,train,test
0,LinearRegression,0.006375,0.01927
1,MyLinearRegression,0.006375,0.01927
2,RidgeRegression,0.006375,0.019273
3,LassoRegression,0.006374,0.019319
4,ElasticNet,0.004564,0.020687
5,MyRidgeRegression,0.006202,0.021822
6,MyLassoRegression,0.006202,0.021821
7,MyElasticNet,0.006202,0.021822
8,MinMaxLinearRegression,0.006375,-0.029943
9,MinMaxMyLinearRegression,0.006375,-0.029943


# Overfit models

### Polynomial features

In [208]:
poly = PolynomialFeatures(degree=10)
X_test = data_test[['bathrooms', 'bedrooms']]
X_test.loc[:, 'interest_level'] = data['interest_level'].mode()[0]
X_train_poly = poly.fit_transform(data[['bathrooms', 'bedrooms', 'interest_level']])
X_test_poly = poly.fit_transform(X_test)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test.loc[:, 'interest_level'] = data['interest_level'].mode()[0]


### Fit  and predicts models

In [187]:
elasticnet_model = ElasticNet(max_iter=5000, alpha=1.0)
linear_regression.fit(X_train_poly, y_train)

ridge_model.fit(X_train_poly, y_train)
lasso_model.fit(X_train_poly, y_train)
elasticnet_model.fit(X_train_poly, y_train)

  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(


0,1,2
,alpha,1.0
,l1_ratio,0.5
,fit_intercept,True
,precompute,False
,max_iter,5000
,copy_X,True
,tol,0.0001
,warm_start,False
,positive,False
,random_state,


In [209]:
linear_poly_predict = linear_regression.predict(X_train_poly)
ridge_poly_predict = ridge_model.predict(X_train_poly)
lasso_poly_predict = lasso_model.predict(X_train_poly)
elastic_poly_predict = elasticnet_model.predict(X_train_poly)

test_linear_poly_predict = linear_regression.predict(X_test_poly)
test_ridge_poly_predict = ridge_model.predict(X_test_poly)
test_lasso_poly_predict = lasso_model.predict(X_test_poly)
test_elastic_poly_predict = elasticnet_model.predict(X_test_poly)

In [215]:
train_mae = mean_absolute_error(linear_poly_predict, y_train)
test_mae = mean_absolute_error(test_linear_poly_predict, data_test['price'])

row_mae = {'model': 'PolyLinearRegression', 'train': train_mae, 'test': test_mae}

train_rmse = np.sqrt(mean_squared_error(linear_poly_predict, y_train))
test_rmse = np.sqrt(mean_squared_error(test_linear_poly_predict, data_test['price']))

row_rmse = {'model': 'PolyLinearRegression', 'train': float(train_rmse), 'test': float(test_rmse)}

train_r2 = r2_score(y_train, linear_poly_predict)
test_r2 = r2_score(data_test['price'], test_linear_poly_predict)

row_r2 = {'model': 'PolyLinearRegression', 'train': train_r2, 'test': test_r2}

In [216]:
ridge_train_mae = mean_absolute_error(ridge_poly_predict, y_train)
ridge_test_mae = mean_absolute_error(test_ridge_poly_predict, data_test['price'])

ridge_row_mae = {'model': 'PolyRidgeRegression', 'train': ridge_train_mae, 'test': ridge_test_mae}

ridge_train_rmse = np.sqrt(mean_squared_error(ridge_poly_predict, y_train))
ridge_test_rmse = np.sqrt(mean_squared_error(test_ridge_poly_predict, data_test['price']))


ridge_row_rmse = {'model': 'PolyRidgeRegression', 'train': float(ridge_train_rmse), 'test': float(ridge_test_rmse)}

ridge_train_r2 = r2_score(y_train, ridge_poly_predict)
ridge_test_r2 = r2_score(data_test['price'], test_ridge_poly_predict)


ridge_row_r2 = {'model': 'PolyRidgeRegression', 'train': ridge_train_r2, 'test': ridge_test_r2}

In [217]:
lasso_train_mae = mean_absolute_error(lasso_poly_predict, y_train)
lasso_test_mae = mean_absolute_error(test_lasso_poly_predict, data_test['price'])

lasso_row_mae = {'model': 'PolyLassoRegression', 'train': lasso_train_mae, 'test': lasso_test_mae}

lasso_train_rmse = np.sqrt(mean_squared_error(lasso_poly_predict, y_train))
lasso_test_rmse = np.sqrt(mean_squared_error(test_lasso_poly_predict, data_test['price']))

lasso_row_rmse = {'model': 'PolyLassoRegression', 'train': float(lasso_train_rmse), 'test': float(lasso_test_rmse)}

lasso_train_r2 = r2_score(y_train, lasso_poly_predict)
lasso_test_r2 = r2_score(data_test['price'], test_lasso_poly_predict)

lasso_row_r2 = {'model': 'PolyLassoRegression', 'train': lasso_train_r2, 'test': lasso_test_r2}


In [218]:
elastic_train_mae = mean_absolute_error(elastic_poly_predict, y_train)
elastic_test_mae = mean_absolute_error(test_elastic_poly_predict, data_test['price'])

elastic_row_mae = {'model': 'PolyElasticRegression', 'train': elastic_train_mae, 'test': elastic_test_mae}

elastic_train_rmse = np.sqrt(mean_squared_error(elastic_poly_predict, y_train))
elastic_test_rmse = np.sqrt(mean_squared_error(test_elastic_poly_predict, data_test['price']))

elastic_row_rmse = {'model': 'PolyElasticRegression', 'train': float(elastic_train_rmse), 'test': float(elastic_test_rmse)}

elastic_train_r2 = r2_score(y_train, elastic_poly_predict)
elastic_test_r2 = r2_score(data_test['price'], test_elastic_poly_predict)

elastic_row_r2 = {'model': 'PolyElasticRegression', 'train': elastic_train_r2, 'test': elastic_test_r2}


In [219]:
result_MAE.loc[len(result_MAE)] = row_mae
result_MAE.loc[len(result_MAE)] = ridge_row_mae
result_MAE.loc[len(result_MAE)] = lasso_row_mae
result_MAE.loc[len(result_MAE)] = elastic_row_mae

result_RMSE.loc[len(result_RMSE)] = row_rmse
result_RMSE.loc[len(result_RMSE)] = ridge_row_rmse
result_RMSE.loc[len(result_RMSE)] = lasso_row_rmse
result_RMSE.loc[len(result_RMSE)] = elastic_row_rmse

result_R2.loc[len(result_R2)] = row_r2
result_R2.loc[len(result_R2)] = ridge_row_r2
result_R2.loc[len(result_R2)] = lasso_row_r2
result_R2.loc[len(result_R2)] = elastic_row_r2

In [223]:
result_MAE.tail(4)

Unnamed: 0,model,train,test
24,PolyLinearRegression,1044.510626,2854906274098700.5
25,PolyRidgeRegression,1044.504539,2730154478150685.0
26,PolyLassoRegression,1041.767589,52712370345.0994
27,PolyElasticRegression,1040.752019,88762138112.51854


In [224]:
result_RMSE.tail(4)

Unnamed: 0,model,train,test
24,PolyLinearRegression,21989.629487,7.80068837928682e+17
25,PolyRidgeRegression,21989.629682,7.459819084276078e+17
26,PolyLassoRegression,21992.125187,14403020845048.1
27,PolyElasticRegression,21991.920253,24253186748060.8


In [225]:
result_R2.tail(4)

Unnamed: 0,model,train,test
24,PolyLinearRegression,0.006968,-6.449955308057991e+27
25,PolyRidgeRegression,0.006968,-5.898579502621852e+27
26,PolyLassoRegression,0.006742,-2.1988622613635057e+18
27,PolyElasticRegression,0.006761,-6.234885306008688e+18


# Naive models

In [230]:
print(data['price'].shape, data_test['price'].shape)

(49352,) (74659,)


### Median

In [231]:
train_median = data['price'].median()
test_median = data_test['price'].median()

train_naive_median = pd.Series([train_median] * 49352)
test_naive_median = pd.Series([test_median] * 74659)

In [238]:
median_train_mae = mean_absolute_error(train_naive_median, y_train)
median_train_rmse = np.sqrt(mean_squared_error(train_naive_median, y_train))

median_test_mae = mean_absolute_error(test_naive_median, data_test['price'])
median_test_rmse = np.sqrt(mean_squared_error(test_naive_median, data_test['price']))

median_train_r2 = r2_score(y_train, train_naive_median)
median_test_r2 = r2_score(data_test['price'], test_naive_median,)

median_row_mae = {'model': 'naive_median', 'train': median_train_mae, 'test': median_test_mae}
median_row_rmse = {'model': 'naive_median', 'train': float(median_train_rmse), 'test': float(median_test_rmse)}
median_row_r2 = {'model': 'naive_median', 'train': median_train_r2, 'test': median_test_r2}

In [240]:
result_MAE.loc[len(result_MAE)] = median_row_mae
result_RMSE.loc[len(result_RMSE)] = median_row_rmse
result_R2.loc[len(result_R2)] = median_row_r2

### Mean

In [241]:
train_mean = data['price'].mean()
test_mean = data_test['price'].mean()

train_naive_mean = pd.Series([train_mean] * 49352)
test_naive_mean = pd.Series([test_mean] * 74659)

In [242]:
mean_train_mae = mean_absolute_error(train_naive_mean, y_train)
mean_train_rmse = np.sqrt(mean_squared_error(train_naive_mean, y_train))

mean_test_mae = mean_absolute_error(test_naive_mean, data_test['price'])
mean_test_rmse = np.sqrt(mean_squared_error(test_naive_mean, data_test['price']))

mean_train_r2 = r2_score(y_train, train_naive_mean)
mean_test_r2 = r2_score(data_test['price'], test_naive_mean)

mean_row_mae = {'model': 'naive_mean', 'train': mean_train_mae, 'test': mean_test_mae}
mean_row_rmse = {'model': 'naive_mean', 'train': float(mean_train_rmse), 'test': float(mean_test_rmse)}
mean_row_r2 = {'model': 'naive_mean', 'train': mean_train_r2, 'test': mean_test_r2}


In [243]:
result_MAE.loc[len(result_MAE)] = mean_row_mae
result_RMSE.loc[len(result_RMSE)] = mean_row_rmse
result_R2.loc[len(result_R2)] = mean_row_r2

# Compare results

In [244]:
result_MAE

Unnamed: 0,model,train,test
0,LinearRegression,1163.521891,1092.253562
1,MyLinearRegression,1163.521891,1092.253562
2,RidgeRegression,1163.471072,1092.20222
3,LassoRegression,1159.835244,1088.469272
4,ElasticNet,1127.612189,1053.076669
5,MyRidgeRegression,1117.669486,1045.682211
6,MyLassoRegression,1117.681346,1045.694324
7,MyElasticNet,1117.675415,1045.688267
8,MinMaxLinearRegression,1163.521891,2151.063741
9,MinMaxMyLinearRegression,1163.521891,2151.063741


In [245]:
result_RMSE

Unnamed: 0,model,train,test
0,LinearRegression,21996.189508,9618.984915
1,MyLinearRegression,21996.189508,9618.984915
2,RidgeRegression,21996.189509,9618.973702
3,LassoRegression,21996.198419,9618.745372
4,ElasticNet,22016.223493,9612.033691
5,MyRidgeRegression,21998.105012,9606.462986
6,MyLassoRegression,21998.104391,9606.465805
7,MyElasticNet,21998.104701,9606.464395
8,MinMaxLinearRegression,21996.189508,9857.371623
9,MinMaxMyLinearRegression,21996.189508,9857.371623


In [247]:
result_R2

Unnamed: 0,model,train,test
0,LinearRegression,0.006375,0.01927
1,MyLinearRegression,0.006375,0.01927
2,RidgeRegression,0.006375,0.019273
3,LassoRegression,0.006374,0.019319
4,ElasticNet,0.004564,0.020687
5,MyRidgeRegression,0.006202,0.021822
6,MyLassoRegression,0.006202,0.021821
7,MyElasticNet,0.006202,0.021822
8,MinMaxLinearRegression,0.006375,-0.029943
9,MinMaxMyLinearRegression,0.006375,-0.029943


### most stable model is ridge

### best model is StandartElasticNet