<b> Data Vocabulary of Features taken to analysis:

"rooms" - quantity of rooms

"price" - price in hryvnas

"flat_area" - area of a flat in square meters

"prestigious" - code of districts:
code_prestigious = {'Печерський': 3,
                'Шевченківський': 3,
                'Голосіївський': 3,
                'Подільський': 2,
                'Святошинський': 1,
                'Солом\'янський': 2,
                'Оболонський': 2,
                'Дніпровський': 1,
                'Дарницький': 1,
                'Деснянський': 1}

code of flat types:
"code_type" = {'Дизайнерський ремонт': 5,
                'Євроремонт': 4,
                'Чудовий стан': 4,
                'Хороший стан': 3,
                'Задовільний стан': 1,
                'Перша здача': 2,
                'Потрібен капітальний ремонт': 1,
                'Незавершений ремонт': 1,
                'Потрібен косметичний ремонт': 1,
                'Від будівельників вільне планування': 1,
                 'Незавершений ремонт': 1}

codes of distances to metro: 1 - closer than 2 km, 2 - from 2 to 5 km, 3 - more than 5 km

<b>Results: 
    
StandardScaler did not influence the results 



In [97]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('all_flats_metro.csv')

# df = df[df['rooms'] <= 3]
# df = df[df['price'] <= 200000]

print(len(df))

#Linear regression

df['flat_area'] = df['flat_area'].astype(int)
df['price'] = df['price'].astype(int)
df['rooms'] = df['rooms'].astype(int)

# print(df.head(5))

label_encoder = LabelEncoder()
df['region_name_encoded'] = label_encoder.fit_transform(df['district'])

X = df[['flat_area', 'rooms', 'distance_category', 'code type', 'prestigious']]
y = df['price']

# Scale the features using StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred_train=model.predict(X_train)
mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

y_pred_test = model.predict(X_test)
mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

score_train = model.score(X_train, y_train)
score_test = model.score(X_test, y_test)

print('Train score:', score_train)
print('Test score:', score_test)

print('Mean Squared Error (MSE) for mse_y_train_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_train_y_pred:', rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for mse_y_test_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_test_y_pred:', rmse_y_train_y_pred)

print('Model cofficients:', model.coef_)

6020
Train score: 0.5790962964878815
Test score: 0.5435417979998884
Mean Squared Error (MSE) for mse_y_train_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_train_y_pred: 20508.006379222326
Mean Squared Error (MSE) for mse_y_test_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_test_y_pred: 20508.006379222326
Model cofficients: [25337.36769074 -6788.26676049  -553.36379199  2928.21855388
  4944.20225492]


In [98]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,25337.367691
rooms,-6788.26676
distance_category,-553.363792
code type,2928.218554
prestigious,4944.202255


In [99]:
#Prediction
price_pred = model.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11111.82443188] hryvnas




In [100]:
#Prediction
price_pred = model.predict(scaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13719.2362224] hryvnas




In [101]:
#Prediction
price_pred = model.predict(scaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.31270486] hryvnas




In [102]:
#Prediction
price_pred = model.predict(scaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29710.5050751] hryvnas




In [103]:
#Prediction
price_pred = model.predict(scaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [104]:
#Lasso of the model

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=1)

lasso_model.fit(X_train, y_train)


Lasso(alpha=1)

In [105]:
y_pred_train=lasso_model.predict(X_train)
lasso_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
lasso_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

lasso_y_pred_test = lasso_model.predict(X_test)
lasso_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
lasso_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

lasso_score_train = lasso_model.score(X_train, y_train)
lasso_score_test = lasso_model.score(X_test, y_test)

print('Train score for lasso_model:', lasso_score_train)
print('Test score for lasso_model:', lasso_score_test)

print('Mean Squared Error (MSE) for lasso_mse_y_train_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred:', lasso_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for lasso_mse_y_test_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred:', lasso_rmse_y_train_y_pred)

print('Model cofficients:', lasso_model.coef_)

Train score for lasso_model: 0.5790962848519278
Test score for lasso_model: 0.5435523957311217
Mean Squared Error (MSE) for lasso_mse_y_train_y_pred: 420578337.27718234
Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred: 20508.00666269597
Mean Squared Error (MSE) for lasso_mse_y_test_y_pred: 420578337.27718234
Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred: 20508.00666269597
Model cofficients: [25332.6706902  -6783.06360195  -552.63924046  2927.95839591
  4943.54377158]


In [106]:
# Оцінка кoефіцієнтів
lasso_df_coef = pd.DataFrame(lasso_model.coef_, X.columns, columns=['Model coefficients'])
lasso_df_coef

Unnamed: 0,Model coefficients
flat_area,25332.67069
rooms,-6783.063602
distance_category,-552.63924
code type,2927.958396
prestigious,4943.543772


In [107]:
#Prediction
price_pred = lasso_model.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11110.10601554] hryvnas




In [108]:
#Prediction
price_pred = lasso_model.predict(scaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.43663691] hryvnas




In [109]:
#Prediction
price_pred = model.predict(scaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [110]:
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1)

ridge_model.fit(X_train, y_train)


Ridge(alpha=1)

In [111]:
y_pred_train=ridge_model.predict(X_train)
ridge_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
ridge_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

ridge_y_pred_test = ridge_model.predict(X_test)
ridge_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
ridge_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

ridge_score_train = ridge_model.score(X_train, y_train)
ridge_score_test = ridge_model.score(X_test, y_test)

print('Train score for ridge_model:', ridge_score_train)
print('Test score for ridge_model:', ridge_score_test)

print('Mean Squared Error (MSE) for ridge_mse_y_train_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred:', ridge_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for ridge_mse_y_test_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred:', ridge_rmse_y_train_y_pred)

print('Model cofficients:', ridge_model.coef_)

Train score for ridge_model: 0.5790961619653368
Test score for ridge_model: 0.543579067758985
Mean Squared Error (MSE) for ridge_mse_y_train_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred: 20508.009656443417
Mean Squared Error (MSE) for ridge_mse_y_test_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred: 20508.009656443417
Model cofficients: [25317.84433728 -6771.05700312  -553.82068233  2930.08574412
  4943.83687955]


In [112]:
# Оцінка кoефіцієнтів
ridge_df_coef = pd.DataFrame(ridge_model.coef_, X.columns, columns=['Model coefficients'])
ridge_df_coef

Unnamed: 0,Model coefficients
flat_area,25317.844337
rooms,-6771.057003
distance_category,-553.820682
code type,2930.085744
prestigious,4943.83688


In [113]:
#Prediction
price_pred = ridge_model.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11114.64372438] hryvnas




In [114]:
#Prediction
price_pred = model.predict(scaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [115]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
# I added max_depth param = 4
regressor = DecisionTreeRegressor(max_depth=4)

regressor.fit(X_train, y_train)


DecisionTreeRegressor(max_depth=4)

In [116]:
regressor_y_pred_train=regressor.predict(X_train)
regressor_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_y_pred_test = regressor.predict(X_test)
regressor_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

train_score = regressor.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", train_score)

test_score = regressor.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", test_score)

print('Mean Squared Error (MSE) for regressor_mse_y_train_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred:', regressor_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_mse_y_test_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred:', regressor_rmse_y_train_y_pred)

print('Model feature importances:', regressor.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.6912331339051985
Test R2 Score for X_test, y_test: 0.4860315272658472
Mean Squared Error (MSE) for regressor_mse_y_train_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred: 20508.009656443417
Mean Squared Error (MSE) for regressor_mse_y_test_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred: 20508.009656443417
Model feature importances: [0.82105149 0.06821247 0.         0.08883566 0.02190038]
Cross-validated R2 scores: [0.56816268 0.71914981 0.51038505 0.66600079 0.51978856]
Mean R2 score: 0.5966973777468305


In [117]:
# Оцінка кoефіцієнтів
regr_df_imp = pd.DataFrame(regressor.feature_importances_, X.columns, columns=['Feature Importances'])
regr_df_imp

Unnamed: 0,Feature Importances
flat_area,0.821051
rooms,0.068212
distance_category,0.0
code type,0.088836
prestigious,0.0219


In [118]:
#Prediction
price_pred = regressor.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [18518.34683099] hryvnas




In [119]:
#Prediction
price_pred = regressor.predict(scaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [20628.78661088] hryvnas




In [120]:
#Prediction
price_pred = regressor.predict(scaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [11242.09689737] hryvnas




In [121]:
#Prediction
price_pred = regressor.predict(scaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
# 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [40585.66666667] hryvnas




In [122]:
#Prediction
price_pred = regressor.predict(scaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [11242.09689737] hryvnas




In [123]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import r2_score

regressor_forest = RandomForestRegressor(n_estimators=150, max_depth=12, min_samples_split=3, min_samples_leaf=2)

regressor_forest.fit(X_train, y_train)


RandomForestRegressor(max_depth=12, min_samples_leaf=2, min_samples_split=3,
                      n_estimators=150)

In [124]:
regressor_forest_y_pred_train=regressor_forest.predict(X_train)
regressor_forest_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_forest_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_forest_y_pred_test = regressor_forest.predict(X_test)
regressor_forest_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_forest_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

f_train_score = regressor_forest.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", f_train_score)

f_test_score = regressor_forest.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", f_test_score)

print('Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred:', regressor_forest_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred:', regressor_forest_rmse_y_train_y_pred)

print('Model regressor_forest feature importances:', regressor_forest.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor_forest, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.8242472129519175
Test R2 Score for X_test, y_test: 0.5761079805815981
Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred: 20508.009656443417
Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred: 420578460.06877637
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred: 20508.009656443417
Model regressor_forest feature importances: [0.80360717 0.05528469 0.00258519 0.09384614 0.04467681]
Cross-validated R2 scores: [0.58801872 0.56338867 0.51608862 0.71963676 0.52860006]
Mean R2 score: 0.5831465667675244


In [125]:
# Оцінка кoефіцієнтів
regrfor_df_imp = pd.DataFrame(regressor_forest.feature_importances_, X.columns, columns=['Feature Importances'])
regrfor_df_imp

Unnamed: 0,Feature Importances
flat_area,0.803607
rooms,0.055285
distance_category,0.002585
code type,0.093846
prestigious,0.044677


In [126]:
#Predictions
print(regressor_forest.predict(scaler.transform([[80, 2, 1, 1, 1]])))
#80 meters, 2 rooms, distant from metro, Задовільний стан  state of flat, non-prestigious district
print(regressor_forest.predict(scaler.transform([[160, 3, 3, 5, 1]])))
#160 sqmeters, 3 rooms, close to metro, Дизайнерський ремонт, non-prestigious district
print(regressor_forest.predict(scaler.transform([[100, 2, 3, 1, 1]])))
#100 sqmeters, 2 rooms, close to metro,Потрібен капітальний ремонт, non-prestigious district 
print(regressor_forest.predict(scaler.transform([[50, 2, 2, 1, 1]])))
#50 sqmeters, 2 rooms, middle distance to metro, Потрібен капітальний ремонт, non-prestigious district 

[7103.11187613]
[29778.03013276]
[6854.51069546]
[5139.13264806]




In [127]:
#Prediction
price_pred = regressor_forest.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [15605.93343316] hryvnas




In [128]:
#Prediction
price_pred = regressor_forest.predict(scaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [22037.41571717] hryvnas




In [129]:
#Prediction
price_pred = regressor_forest.predict(scaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [10820.66885132] hryvnas




In [130]:
#Prediction
price_pred = regressor_forest.predict(scaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [37846.06399995] hryvnas




In [131]:
#Prediction
price_pred = regressor_forest.predict(scaler.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1:", price_pred, "hryvnas")



Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1: [5268.18662743] hryvnas


In [132]:
from sklearn.svm import SVR

regressor = SVR()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

score = regressor.score(X_test, y_test)
print("R2 Score:", score)

R2 Score: -0.09475619435626093


In [133]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)


param_grid = {'fit_intercept': [True, False]}

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)  


best_params = grid_search.best_params_
best_model = LinearRegression(**best_params)
best_model.fit(X_train, y_train)  

# Print the evaluation metrics
print('Best Hyperparameters:', best_params)

Best Hyperparameters: {'fit_intercept': True}


In [134]:
best_model_y_pred_train = best_model.predict(X_train)
y_pred_train = best_model_y_pred_train
best_model_mse_y_train_y_pred = mean_squared_error(y_train, y_pred_train)
best_model_rmse_y_train_y_pred = np.sqrt(mean_squared_error(y_train, y_pred_train))

best_model_y_pred_test = best_model.predict(X_test)
y_pred_test = best_model_y_pred_test
best_model_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
best_model_rmse_y_test_y_pred = np.sqrt(mean_squared_error(y_test, y_pred_test))

gs_train_score = best_model.score(X_train, y_train)
print("Training  Score for X_train, y_train:", gs_train_score)

gs_test_score = best_model.score(X_test, y_test)
print("Test Score for X_test, y_test:", gs_test_score)

print('Mean Squared Error (MSE) for best_model_mse_y_train_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred:', best_model_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for best_model_y_test_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred:', best_model_rmse_y_train_y_pred)

print('Model best_model GridSearchCV coefficients:', best_model.coef_) 


Training  Score for X_train, y_train: 0.5790962964878815
Test Score for X_test, y_test: 0.5435417979998884
Mean Squared Error (MSE) for best_model_mse_y_train_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred: 20508.006379222326
Mean Squared Error (MSE) for best_model_y_test_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred: 20508.006379222326
Model best_model GridSearchCV coefficients: [25337.36769074 -6788.26676049  -553.36379199  2928.21855388
  4944.20225492]


In [135]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(best_model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,25337.367691
rooms,-6788.26676
distance_category,-553.363792
code type,2928.218554
prestigious,4944.202255


In [136]:
#Prediction
price_pred = best_model.predict(scaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")


Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11111.82443188] hryvnas




In [137]:
#Prediction
price_pred = best_model.predict(scaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13719.2362224] hryvnas




In [138]:
#Prediction
price_pred = best_model.predict(scaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.31270486] hryvnas




In [139]:
#Prediction
price_pred = best_model.predict(scaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29710.5050751] hryvnas




In [140]:
#Prediction
price_pred = best_model.predict(scaler.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1: [-12493.72469423] hryvnas


