<b> Data Vocabulary of Features taken to analysis:

"rooms" - quantity of rooms

"price" - price in hryvnas

"flat_area" - area of a flat in square meters

"prestigious" - code of districts:
code_prestigious = {'Печерський': 3,
                'Шевченківський': 3,
                'Голосіївський': 3,
                'Подільський': 2,
                'Святошинський': 1,
                'Солом\'янський': 2,
                'Оболонський': 2,
                'Дніпровський': 1,
                'Дарницький': 1,
                'Деснянський': 1}

code of flat types:
"code_type" = {'Дизайнерський ремонт': 5,
                'Євроремонт': 4,
                'Чудовий стан': 4,
                'Хороший стан': 3,
                'Задовільний стан': 1,
                'Перша здача': 2,
                'Потрібен капітальний ремонт': 1,
                'Незавершений ремонт': 1,
                'Потрібен косметичний ремонт': 1,
                'Від будівельників вільне планування': 1,
                 'Незавершений ремонт': 1}

codes of distances to metro: 1 - closer than 2 km, 2 - from 2 to 5 km, 3 - more than 5 km

<b>Results: 
    
Normalizer did not influence RandomForest, but on other models it caused bad effect descreasing score to 0.14-0.32 and making terrible predictions.



In [118]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.preprocessing import Normalizer


df = pd.read_csv('all_flats_metro.csv')

# df = df[df['rooms'] <= 3]
# df = df[df['price'] <= 200000]

print(len(df))

#Linear regression

df['flat_area'] = df['flat_area'].astype(int)
df['price'] = df['price'].astype(int)
df['rooms'] = df['rooms'].astype(int)

# print(df.head(5))

label_encoder = LabelEncoder()
df['region_name_encoded'] = label_encoder.fit_transform(df['district'])

X = df[['flat_area', 'rooms', 'distance_category', 'code type', 'prestigious']]
y = df['price']

# Normalize the features using Normalizer
normalizer_l1 = Normalizer(norm="l1")
X_l1_normalized = normalizer_l1.transform(X)

normalizer_l2 = Normalizer(norm="l2")
X_l2_normalized = normalizer_l2.transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_l2_normalized, y, test_size=0.25, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred_train=model.predict(X_train)
mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

y_pred_test = model.predict(X_test)
mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

score_train = model.score(X_train, y_train)
score_test = model.score(X_test, y_test)

print('Train score:', score_train)
print('Test score:', score_test)

print('Mean Squared Error (MSE) for mse_y_train_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_train_y_pred:', rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for mse_y_test_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_test_y_pred:', rmse_y_train_y_pred)

print('Model cofficients:', model.coef_)

6020
Train score: 0.31018367639212285
Test score: 0.2723719851679651
Mean Squared Error (MSE) for mse_y_train_y_pred: 689283063.9605926
Root Mean Squared Error (RMSE) for mse_y_train_y_pred: 26254.20088215584
Mean Squared Error (MSE) for mse_y_test_y_pred: 689283063.9605926
Root Mean Squared Error (RMSE) for mse_y_test_y_pred: 26254.20088215584
Model cofficients: [-29307316.47197415  -1887296.16182938  -1596208.17094611
  -1546431.40388801   -735226.62399272]




In [119]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,-29307320.0
rooms,-1887296.0
distance_category,-1596208.0
code type,-1546431.0
prestigious,-735226.6


In [120]:
#Prediction
price_pred = model.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [59524.14374724] hryvnas


In [121]:
#Prediction
price_pred = model.predict(normalizer.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13425.69544291] hryvnas


In [122]:
#Prediction
price_pred = model.predict(normalizer.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [23626.03004842] hryvnas


In [123]:
#Prediction
price_pred = model.predict(normalizer.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [4380.28586392] hryvnas


In [124]:
#Prediction
price_pred = model.predict(normalizer.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [2345.44327225] hryvnas


In [125]:
#Lasso of the model

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=1)

lasso_model.fit(X_train, y_train)


Lasso(alpha=1)

In [126]:
y_pred_train=lasso_model.predict(X_train)
lasso_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
lasso_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

lasso_y_pred_test = lasso_model.predict(X_test)
lasso_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
lasso_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

lasso_score_train = lasso_model.score(X_train, y_train)
lasso_score_test = lasso_model.score(X_test, y_test)

print('Train score for lasso_model:', lasso_score_train)
print('Test score for lasso_model:', lasso_score_test)

print('Mean Squared Error (MSE) for lasso_mse_y_train_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred:', lasso_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for lasso_mse_y_test_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred:', lasso_rmse_y_train_y_pred)

print('Model cofficients:', lasso_model.coef_)

Train score for lasso_model: 0.2971257656750629
Test score for lasso_model: 0.2925191676922848
Mean Squared Error (MSE) for lasso_mse_y_train_y_pred: 702330880.313943
Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred: 26501.525999722035
Mean Squared Error (MSE) for lasso_mse_y_test_y_pred: 702330880.313943
Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred: 26501.525999722035
Model cofficients: [-17615739.84058414  -1643630.28844895  -1315689.99577337
  -1059912.92497151   -408380.22305121]


In [127]:
# Оцінка кoефіцієнтів
lasso_df_coef = pd.DataFrame(lasso_model.coef_, X.columns, columns=['Model coefficients'])
lasso_df_coef

Unnamed: 0,Model coefficients
flat_area,-17615740.0
rooms,-1643630.0
distance_category,-1315690.0
code type,-1059913.0
prestigious,-408380.2


In [128]:
#Prediction
price_pred = lasso_model.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [31555.37625055] hryvnas


In [129]:
#Prediction
price_pred = lasso_model.predict(normalizer.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [8232.16564187] hryvnas


In [130]:
#Prediction
price_pred = model.predict(normalizer.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [2345.44327225] hryvnas


In [131]:
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1)

ridge_model.fit(X_train, y_train)


Ridge(alpha=1)

In [132]:
y_pred_train=ridge_model.predict(X_train)
ridge_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
ridge_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

ridge_y_pred_test = ridge_model.predict(X_test)
ridge_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
ridge_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

ridge_score_train = ridge_model.score(X_train, y_train)
ridge_score_test = ridge_model.score(X_test, y_test)

print('Train score for ridge_model:', ridge_score_train)
print('Test score for ridge_model:', ridge_score_test)

print('Mean Squared Error (MSE) for ridge_mse_y_train_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred:', ridge_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for ridge_mse_y_test_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred:', ridge_rmse_y_train_y_pred)

print('Model cofficients:', ridge_model.coef_)

Train score for ridge_model: 0.1482191040851767
Test score for ridge_model: 0.15880982385201303
Mean Squared Error (MSE) for ridge_mse_y_train_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred: 29174.00264322367
Mean Squared Error (MSE) for ridge_mse_y_test_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred: 29174.00264322367
Model cofficients: [  11509.14487926 -158119.55001795 -280215.55314276 -293565.34499352
  -90166.95564692]


In [133]:
# Оцінка кoефіцієнтів
ridge_df_coef = pd.DataFrame(ridge_model.coef_, X.columns, columns=['Model coefficients'])
ridge_df_coef

Unnamed: 0,Model coefficients
flat_area,11509.144879
rooms,-158119.550018
distance_category,-280215.553143
code type,-293565.344994
prestigious,-90166.955647


In [134]:
#Prediction
price_pred = ridge_model.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [-2848.76892324] hryvnas


In [135]:
#Prediction
price_pred = model.predict(normalizer.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [2345.44327225] hryvnas


In [136]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
# I added max_depth param = 4
regressor = DecisionTreeRegressor(max_depth=4)

regressor.fit(X_train, y_train)


DecisionTreeRegressor(max_depth=4)

In [137]:
regressor_y_pred_train=regressor.predict(X_train)
regressor_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_y_pred_test = regressor.predict(X_test)
regressor_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

train_score = regressor.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", train_score)

test_score = regressor.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", test_score)

print('Mean Squared Error (MSE) for regressor_mse_y_train_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred:', regressor_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_mse_y_test_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred:', regressor_rmse_y_train_y_pred)

print('Model feature importances:', regressor.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.6489205993331945
Test R2 Score for X_test, y_test: 0.47524966932563584
Mean Squared Error (MSE) for regressor_mse_y_train_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred: 29174.00264322367
Mean Squared Error (MSE) for regressor_mse_y_test_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred: 29174.00264322367
Model feature importances: [3.59817200e-04 1.62534810e-01 8.00565541e-01 1.51542684e-03
 3.50244054e-02]
Cross-validated R2 scores: [0.56127939 0.6853766  0.39996804 0.29724204 0.49320218]
Mean R2 score: 0.4874136520805739


In [138]:
# Оцінка кoефіцієнтів
regr_df_imp = pd.DataFrame(regressor.feature_importances_, X.columns, columns=['Feature Importances'])
regr_df_imp

Unnamed: 0,Feature Importances
flat_area,0.00036
rooms,0.162535
distance_category,0.800566
code type,0.001515
prestigious,0.035024


In [139]:
#Prediction
price_pred = regressor.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [9808.00854037] hryvnas


In [140]:
#Prediction
price_pred = regressor.predict(normalizer.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [9808.00854037] hryvnas


In [141]:
#Prediction
price_pred = regressor.predict(normalizer.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [9808.00854037] hryvnas


In [142]:
#Prediction
price_pred = regressor.predict(normalizer.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [9808.00854037] hryvnas


In [143]:
#Prediction
price_pred = regressor.predict(normalizer.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [9808.00854037] hryvnas


In [144]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import r2_score

regressor_forest = RandomForestRegressor(n_estimators=150, max_depth=12, min_samples_split=3, min_samples_leaf=2)

regressor_forest.fit(X_train, y_train)


RandomForestRegressor(max_depth=12, min_samples_leaf=2, min_samples_split=3,
                      n_estimators=150)

In [145]:
regressor_forest_y_pred_train=regressor_forest.predict(X_train)
regressor_forest_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_forest_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_forest_y_pred_test = regressor_forest.predict(X_test)
regressor_forest_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_forest_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

f_train_score = regressor_forest.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", f_train_score)

f_test_score = regressor_forest.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", f_test_score)

print('Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred:', regressor_forest_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred:', regressor_forest_rmse_y_train_y_pred)

print('Model regressor_forest feature importances:', regressor_forest.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor_forest, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.8542535559815916
Test R2 Score for X_test, y_test: 0.6024539444703703
Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred: 29174.00264322367
Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred: 851122430.2268217
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred: 29174.00264322367
Model regressor_forest feature importances: [0.07037496 0.12109909 0.65313028 0.08721139 0.06818429]
Cross-validated R2 scores: [0.61094799 0.57657058 0.56213527 0.66649804 0.48991627]
Mean R2 score: 0.581213632339156


In [146]:
# Оцінка кoефіцієнтів
regrfor_df_imp = pd.DataFrame(regressor_forest.feature_importances_, X.columns, columns=['Feature Importances'])
regrfor_df_imp

Unnamed: 0,Feature Importances
flat_area,0.070375
rooms,0.121099
distance_category,0.65313
code type,0.087211
prestigious,0.068184


In [147]:
#Predictions
print(regressor_forest.predict(normalizer.transform([[80, 2, 1, 1, 1]])))
#80 meters, 2 rooms, distant from metro, Задовільний стан  state of flat, non-prestigious district
print(regressor_forest.predict(normalizer.transform([[160, 3, 3, 5, 1]])))
#160 sqmeters, 3 rooms, close to metro, Дизайнерський ремонт, non-prestigious district
print(regressor_forest.predict(normalizer.transform([[100, 2, 3, 1, 1]])))
#100 sqmeters, 2 rooms, close to metro,Потрібен капітальний ремонт, non-prestigious district 
print(regressor_forest.predict(normalizer.transform([[50, 2, 2, 1, 1]])))
#50 sqmeters, 2 rooms, middle distance to metro, Потрібен капітальний ремонт, non-prestigious district 

[6714.59201906]
[10739.25467748]
[6742.24670264]
[7910.97404808]


In [148]:
#Prediction
price_pred = regressor_forest.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [13444.67689308] hryvnas


In [149]:
#Prediction
price_pred = regressor_forest.predict(normalizer.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [20046.80861232] hryvnas


In [150]:
#Prediction
price_pred = regressor_forest.predict(normalizer.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [10748.83661152] hryvnas


In [151]:
#Prediction
price_pred = regressor_forest.predict([[160, 4, 3, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [9476.72687302] hryvnas


In [152]:
#Prediction
price_pred = regressor_forest.predict(normalizer.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1: [6600.45301413] hryvnas


In [153]:
from sklearn.svm import SVR

regressor = SVR()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

score = regressor.score(X_test, y_test)
print("R2 Score:", score)

R2 Score: -0.1125854976811822


In [154]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

normalizer_l2 = Normalizer(norm="l2")
X_l2_normalized = normalizer_l2.transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_l2_normalized, y, test_size=0.25, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)

param_grid = {'fit_intercept': [True, False]}

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)  


best_params = grid_search.best_params_
best_model = LinearRegression(**best_params)
best_model.fit(X_train, y_train)  

# Print the evaluation metrics
print('Best Hyperparameters:', best_params)

Best Hyperparameters: {'fit_intercept': True}




In [155]:
best_model_y_pred_train = best_model.predict(X_train)
y_pred_train = best_model_y_pred_train
best_model_mse_y_train_y_pred = mean_squared_error(y_train, y_pred_train)
best_model_rmse_y_train_y_pred = np.sqrt(mean_squared_error(y_train, y_pred_train))

best_model_y_pred_test = best_model.predict(X_test)
y_pred_test = best_model_y_pred_test
best_model_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
best_model_rmse_y_test_y_pred = np.sqrt(mean_squared_error(y_test, y_pred_test))

gs_train_score = best_model.score(X_train, y_train)
print("Training  Score for X_train, y_train:", gs_train_score)

gs_test_score = best_model.score(X_test, y_test)
print("Test Score for X_test, y_test:", gs_test_score)

print('Mean Squared Error (MSE) for best_model_mse_y_train_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred:', best_model_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for best_model_y_test_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred:', best_model_rmse_y_train_y_pred)

print('Model best_model GridSearchCV coefficients:', best_model.coef_) 


Training  Score for X_train, y_train: 0.31018367639212285
Test Score for X_test, y_test: 0.2723719851679651
Mean Squared Error (MSE) for best_model_mse_y_train_y_pred: 689283063.9605926
Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred: 26254.20088215584
Mean Squared Error (MSE) for best_model_y_test_y_pred: 689283063.9605926
Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred: 26254.20088215584
Model best_model GridSearchCV coefficients: [-29307316.47197415  -1887296.16182938  -1596208.17094611
  -1546431.40388801   -735226.62399272]


In [156]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(best_model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,-29307320.0
rooms,-1887296.0
distance_category,-1596208.0
code type,-1546431.0
prestigious,-735226.6


In [157]:
#Prediction
price_pred = best_model.predict(normalizer.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")


Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [59524.14374724] hryvnas


In [158]:
#Prediction
price_pred = best_model.predict(normalizer.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13425.69544291] hryvnas


In [159]:
#Prediction
price_pred = best_model.predict(normalizer.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [23626.03004842] hryvnas


In [160]:
#Prediction
price_pred = best_model.predict(normalizer.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [4380.28586392] hryvnas


In [161]:
#Prediction
price_pred = best_model.predict(normalizer.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1: [-4087.23610962] hryvnas
