<b> Data Vocabulary of Features taken to analysis:

"rooms" - quantity of rooms

"price" - price in hryvnas

"flat_area" - area of a flat in square meters

"prestigious" - code of districts:
code_prestigious = {'Печерський': 3,
                'Шевченківський': 3,
                'Голосіївський': 3,
                'Подільський': 2,
                'Святошинський': 1,
                'Солом\'янський': 2,
                'Оболонський': 2,
                'Дніпровський': 1,
                'Дарницький': 1,
                'Деснянський': 1}

code of flat types:
"code_type" = {'Дизайнерський ремонт': 5,
                'Євроремонт': 4,
                'Чудовий стан': 4,
                'Хороший стан': 3,
                'Задовільний стан': 1,
                'Перша здача': 2,
                'Потрібен капітальний ремонт': 1,
                'Незавершений ремонт': 1,
                'Потрібен косметичний ремонт': 1,
                'Від будівельників вільне планування': 1,
                 'Незавершений ремонт': 1}

codes of distances to metro: 1 - closer than 2 km, 2 - from 2 to 5 km, 3 - more than 5 km

<b>Results: 
    
LinearRegression, Lasso, Ridge worked the same: they were good on flats with high prices 
but were not good if the flat predicted to have small price (the resulting price was negative)
Also the coefficients for some features were negative showing negative correlation although in real life the correlation
was obviously positive
Model of LinearRegression with GridSearchCV was very close to simple LinearRegression with the same drawbacks of prediction.
    
DecisionTreeRegressor was not accurate with low prices and overfitting was observed.
    
<b> !!!!The best model was RandomForestRegressor with accurate prediction of prices in all price categories.
Also all features importances were positive and obviously reflecting real life tendencies

regressor = SVR() was not proper for our data



In [218]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from sklearn.covariance import EllipticEnvelope
import numpy as np

df = pd.read_csv('all_flats_metro.csv')

In [219]:
#Linear regression

df['flat_area'] = df['flat_area'].astype(int)
df['price'] = df['price'].astype(int)
df['rooms'] = df['rooms'].astype(int)

# print(df.head(5))

label_encoder = LabelEncoder()
df['region_name_encoded'] = label_encoder.fit_transform(df['district'])

X = df[['flat_area', 'rooms', 'distance_category', 'code type', 'prestigious']]
y = df['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred_train=model.predict(X_train)
mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

y_pred_test = model.predict(X_test)
mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

score_train = model.score(X_train, y_train)
score_test = model.score(X_test, y_test)

print('Train score:', score_train)
print('Test score:', score_test)

print('Mean Squared Error (MSE) for mse_y_train_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_train_y_pred:', rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for mse_y_test_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_test_y_pred:', rmse_y_train_y_pred)

print('Model cofficients:', model.coef_)

Train score: 0.5790962964878815
Test score: 0.5435417979998814
Mean Squared Error (MSE) for mse_y_train_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_train_y_pred: 20508.006379222326
Mean Squared Error (MSE) for mse_y_test_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_test_y_pred: 20508.006379222326
Model cofficients: [  321.2453978  -6952.62955832 -1114.00590556  3135.13339275
  5986.06856837]


In [220]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,321.245398
rooms,-6952.629558
distance_category,-1114.005906
code type,3135.133393
prestigious,5986.068568


In [221]:
#Prediction
price_pred = model.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11111.82443188] hryvnas




In [222]:
#Prediction
price_pred = model.predict([[60, 2, 1, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13719.2362224] hryvnas




In [223]:
#Prediction
price_pred = model.predict([[40, 1, 1, 4, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.31270487] hryvnas




In [224]:
#Prediction
price_pred = model.predict([[160, 4, 3, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29710.5050751] hryvnas




In [225]:
#Prediction
price_pred = model.predict([[40, 1, 1, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [226]:
#Lasso of the model

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=1)

lasso_model.fit(X_train, y_train)


Lasso(alpha=1)

In [227]:
y_pred_train=lasso_model.predict(X_train)
lasso_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
lasso_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

lasso_y_pred_test = lasso_model.predict(X_test)
lasso_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
lasso_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

lasso_score_train = lasso_model.score(X_train, y_train)
lasso_score_test = lasso_model.score(X_test, y_test)

print('Train score for lasso_model:', lasso_score_train)
print('Test score for lasso_model:', lasso_score_test)

print('Mean Squared Error (MSE) for lasso_mse_y_train_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred:', lasso_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for lasso_mse_y_test_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred:', lasso_rmse_y_train_y_pred)

print('Model cofficients:', lasso_model.coef_)

Train score for lasso_model: 0.5790962880743777
Test score for lasso_model: 0.5435473391209533
Mean Squared Error (MSE) for lasso_mse_y_train_y_pred: 420578334.05722356
Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred: 20508.00658419105
Mean Squared Error (MSE) for lasso_mse_y_test_y_pred: 420578334.05722356
Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred: 20508.00658419105
Model cofficients: [  321.22405505 -6949.51568729 -1110.21880092  3134.56006441
  5985.15753838]


In [228]:
# Оцінка кoефіцієнтів
lasso_df_coef = pd.DataFrame(lasso_model.coef_, X.columns, columns=['Model coefficients'])
lasso_df_coef

Unnamed: 0,Model coefficients
flat_area,321.224055
rooms,-6949.515687
distance_category,-1110.218801
code type,3134.560064
prestigious,5985.157538


In [229]:
#Prediction
price_pred = lasso_model.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11108.42352428] hryvnas




In [230]:
#Prediction
price_pred = lasso_model.predict([[40, 1, 1, 4, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-861.89155248] hryvnas




In [231]:
#Prediction
price_pred = model.predict([[40, 1, 1, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [232]:
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1)

ridge_model.fit(X_train, y_train)


Ridge(alpha=1)

In [233]:
y_pred_train=ridge_model.predict(X_train)
ridge_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
ridge_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

ridge_y_pred_test = ridge_model.predict(X_test)
ridge_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
ridge_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

ridge_score_train = ridge_model.score(X_train, y_train)
ridge_score_test = ridge_model.score(X_test, y_test)

print('Train score for ridge_model:', ridge_score_train)
print('Test score for ridge_model:', ridge_score_test)

print('Mean Squared Error (MSE) for ridge_mse_y_train_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred:', ridge_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for ridge_mse_y_test_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred:', ridge_rmse_y_train_y_pred)

print('Model cofficients:', ridge_model.coef_)

Train score for ridge_model: 0.5790962860112803
Test score for ridge_model: 0.5435479927328324
Mean Squared Error (MSE) for ridge_mse_y_train_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred: 20508.006634451973
Mean Squared Error (MSE) for ridge_mse_y_test_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred: 20508.006634451973
Model cofficients: [  321.20696666 -6947.9069225  -1113.92980978  3134.93109654
  5983.77544093]


In [234]:
# Оцінка кoефіцієнтів
ridge_df_coef = pd.DataFrame(ridge_model.coef_, X.columns, columns=['Model coefficients'])
ridge_df_coef

Unnamed: 0,Model coefficients
flat_area,321.206967
rooms,-6947.906922
distance_category,-1113.92981
code type,3134.931097
prestigious,5983.775441


In [235]:
#Prediction
price_pred = ridge_model.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11107.78226214] hryvnas




In [236]:
#Prediction
price_pred = model.predict([[40, 1, 1, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [237]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
# I added max_depth param = 4
regressor = DecisionTreeRegressor(max_depth=4)

regressor.fit(X_train, y_train)


DecisionTreeRegressor(max_depth=4)

In [238]:
regressor_y_pred_train=regressor.predict(X_train)
regressor_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_y_pred_test = regressor.predict(X_test)
regressor_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

train_score = regressor.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", train_score)

test_score = regressor.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", test_score)

print('Mean Squared Error (MSE) for regressor_mse_y_train_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred:', regressor_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_mse_y_test_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred:', regressor_rmse_y_train_y_pred)

print('Model feature importances:', regressor.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.6912331339051985
Test R2 Score for X_test, y_test: 0.4860315272658472
Mean Squared Error (MSE) for regressor_mse_y_train_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred: 20508.006634451973
Mean Squared Error (MSE) for regressor_mse_y_test_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred: 20508.006634451973
Model feature importances: [0.8207137  0.06855027 0.         0.08883566 0.02190038]
Cross-validated R2 scores: [0.56816268 0.71914981 0.22455965 0.66600079 0.51978856]
Mean R2 score: 0.5395322985414459


In [239]:
# Оцінка кoефіцієнтів
regr_df_imp = pd.DataFrame(regressor.feature_importances_, X.columns, columns=['Feature Importances'])
regr_df_imp

Unnamed: 0,Feature Importances
flat_area,0.820714
rooms,0.06855
distance_category,0.0
code type,0.088836
prestigious,0.0219


In [240]:
#Prediction
price_pred = regressor.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [18518.34683099] hryvnas




In [241]:
#Prediction
price_pred = regressor.predict([[60, 2, 1, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [20628.78661088] hryvnas




In [242]:
#Prediction
price_pred = regressor.predict([[40, 1, 1, 4, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [11242.09689737] hryvnas




In [243]:
#Prediction
price_pred = regressor.predict([[160, 4, 3, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [40585.66666667] hryvnas




In [244]:
#Prediction
price_pred = regressor.predict([[40, 1, 1, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [11242.09689737] hryvnas




In [245]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import r2_score

regressor_forest = RandomForestRegressor(n_estimators=150, max_depth=12, min_samples_split=3, min_samples_leaf=2)

regressor_forest.fit(X_train, y_train)


RandomForestRegressor(max_depth=12, min_samples_leaf=2, min_samples_split=3,
                      n_estimators=150)

In [246]:
regressor_forest_y_pred_train=regressor_forest.predict(X_train)
regressor_forest_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_forest_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_forest_y_pred_test = regressor_forest.predict(X_test)
regressor_forest_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_forest_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

f_train_score = regressor_forest.score(X_train, y_train)
print("Training R2 Score for X_train, y_train:", f_train_score)

f_test_score = regressor_forest.score(X_test, y_test)
print("Test R2 Score for X_test, y_test:", f_test_score)

print('Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred:', regressor_forest_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred:', regressor_forest_rmse_y_train_y_pred)

print('Model regressor_forest feature importances:', regressor_forest.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor_forest, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.8239982966705641
Test R2 Score for X_test, y_test: 0.5654182412714479
Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred: 20508.006634451973
Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred: 420578336.1187261
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred: 20508.006634451973
Model regressor_forest feature importances: [0.79913129 0.05813727 0.00250296 0.09540113 0.04482735]
Cross-validated R2 scores: [0.5924183  0.56245956 0.47181668 0.71500835 0.52286011]
Mean R2 score: 0.5729126015068962


In [247]:
# Оцінка кoефіцієнтів
regrfor_df_imp = pd.DataFrame(regressor_forest.feature_importances_, X.columns, columns=['Feature Importances'])
regrfor_df_imp

Unnamed: 0,Feature Importances
flat_area,0.799131
rooms,0.058137
distance_category,0.002503
code type,0.095401
prestigious,0.044827


In [248]:
#Predictions
print(regressor_forest.predict([[80, 2, 1, 1, 1]]))
#80 meters, 2 rooms, close to metro, Задовільний стан  state of flat, non-prestigious district
print(regressor_forest.predict([[160, 3, 3, 5, 1]]))
#160 sqmeters, 3 rooms, distant from metro, Дизайнерський ремонт, non-prestigious district
print(regressor_forest.predict([[100, 2, 3, 1, 1]]))
#100 sqmeters, 2 rooms, distant from metro,Потрібен капітальний ремонт, non-prestigious district 
print(regressor_forest.predict([[50, 2, 2, 1, 1]]))
#50 sqmeters, 2 rooms, middle distance to metro, Потрібен капітальний ремонт, non-prestigious district 

[6915.41717831]
[28388.836]
[6790.28565799]
[5021.00626704]




In [249]:
#Prediction
price_pred = regressor_forest.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [15887.58543773] hryvnas




In [250]:
#Prediction
price_pred = regressor_forest.predict([[60, 2, 1, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [22159.46410558] hryvnas




In [251]:
#Prediction
price_pred = regressor_forest.predict([[40, 1, 1, 4, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [10649.93725043] hryvnas




In [252]:
#Prediction
price_pred = regressor_forest.predict([[160, 4, 3, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [36429.29428919] hryvnas




In [253]:
#Prediction
price_pred = regressor_forest.predict([[40, 1, 3, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1: [5344.14178464] hryvnas




In [254]:
from sklearn.svm import SVR

regressor = SVR()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

score = regressor.score(X_test, y_test)
print("R2 Score:", score)

R2 Score: -0.090074404855087


In [255]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()

param_grid = {'fit_intercept': [True, False]}

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)  


best_params = grid_search.best_params_
best_model = LinearRegression(**best_params)
best_model.fit(X_train, y_train)  

# Print the evaluation metrics
print('Best Hyperparameters:', best_params)

Best Hyperparameters: {'fit_intercept': True}


In [256]:
best_model_y_pred_train = best_model.predict(X_train)
y_pred_train = best_model_y_pred_train
best_model_mse_y_train_y_pred = mean_squared_error(y_train, y_pred_train)
best_model_rmse_y_train_y_pred = np.sqrt(mean_squared_error(y_train, y_pred_train))

best_model_y_pred_test = best_model.predict(X_test)
y_pred_test = best_model_y_pred_test
best_model_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
best_model_rmse_y_test_y_pred = np.sqrt(mean_squared_error(y_test, y_pred_test))

gs_train_score = best_model.score(X_train, y_train)
print("Training  Score for X_train, y_train:", gs_train_score)

gs_test_score = best_model.score(X_test, y_test)
print("Test Score for X_test, y_test:", gs_test_score)

print('Mean Squared Error (MSE) for best_model_mse_y_train_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred:', best_model_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for best_model_y_test_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred:', best_model_rmse_y_train_y_pred)

print('Model best_model GridSearchCV coefficients:', best_model.coef_) 


Training  Score for X_train, y_train: 0.5858519415473591
Test Score for X_test, y_test: 0.5262987437016224
Mean Squared Error (MSE) for best_model_mse_y_train_y_pred: 371563271.4029539
Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred: 19275.976535650636
Mean Squared Error (MSE) for best_model_y_test_y_pred: 371563271.4029539
Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred: 19275.976535650636
Model best_model GridSearchCV coefficients: [  316.97480472 -7135.07662621  -955.55524113  3237.18606419
  5981.10121853]


In [257]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(best_model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,316.974805
rooms,-7135.076626
distance_category,-955.555241
code type,3237.186064
prestigious,5981.101219


In [258]:
#Prediction
price_pred = best_model.predict([[40, 1, 1, 4, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")


Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11611.64468378] hryvnas




In [259]:
#Prediction
price_pred = best_model.predict([[60, 2, 1, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [14053.25021614] hryvnas




In [260]:
#Prediction
price_pred = best_model.predict([[40, 1, 1, 4, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-350.55775329] hryvnas




In [261]:
#Prediction
price_pred = best_model.predict([[160, 4, 3, 5, 3]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29569.46695339] hryvnas




In [262]:
#Prediction
price_pred = best_model.predict([[40, 1, 3, 1, 1]]) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1: [-11973.2264281] hryvnas


