<b> Data Vocabulary of Features taken to analysis:

"rooms" - quantity of rooms

"price" - price in hryvnas

"flat_area" - area of a flat in square meters

"prestigious" - code of districts:
code_prestigious = {'Печерський': 3,
                'Шевченківський': 3,
                'Голосіївський': 3,
                'Подільський': 2,
                'Святошинський': 1,
                'Солом\'янський': 2,
                'Оболонський': 2,
                'Дніпровський': 1,
                'Дарницький': 1,
                'Деснянський': 1}

code of flat types:
"code_type" = {'Дизайнерський ремонт': 5,
                'Євроремонт': 4,
                'Чудовий стан': 4,
                'Хороший стан': 3,
                'Задовільний стан': 1,
                'Перша здача': 2,
                'Потрібен капітальний ремонт': 1,
                'Незавершений ремонт': 1,
                'Потрібен косметичний ремонт': 1,
                'Від будівельників вільне планування': 1,
                 'Незавершений ремонт': 1}

codes of distances to metro: 1 - closer than 2 km, 2 - from 2 to 5 km, 3 - more than 5 km

<b>Results: 
    
MinMax Scaling did not influence the results 



In [211]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv('all_flats_metro.csv')

# df = df[df['rooms'] <= 3]
# df = df[df['price'] <= 200000]

print(len(df))

#Linear regression

df['flat_area'] = df['flat_area'].astype(int)
df['price'] = df['price'].astype(int)
df['rooms'] = df['rooms'].astype(int)

# print(df.head(5))

label_encoder = LabelEncoder()
df['region_name_encoded'] = label_encoder.fit_transform(df['district'])

X = df[['flat_area', 'rooms', 'distance_category', 'code type', 'prestigious']]
y = df['price']

mmscaler = MinMaxScaler()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

X_train_mmscaled = mmscaler.fit_transform(X_train)

model = LinearRegression()
model.fit(X_train_mmscaled, y_train)

X_test_mmscaled = mmscaler.transform(X_test)
# y_pred = model.predict(X_test_mmscaled)

y_pred_train=model.predict(X_train_mmscaled)
mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

y_pred_test = model.predict(X_test_mmscaled)
mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

score_train = model.score(X_train_mmscaled, y_train)
score_test = model.score(X_test_mmscaled, y_test)

print('Train score:', score_train)
print('Test score:', score_test)

print('Mean Squared Error (MSE) for mse_y_train_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_train_y_pred:', rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for mse_y_test_y_pred:', mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for mse_y_test_y_pred:', rmse_y_train_y_pred)

print('Model cofficients:', model.coef_)

6020
Train score: 0.5790962964878815
Test score: 0.5435417979998883
Mean Squared Error (MSE) for mse_y_train_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_train_y_pred: 20508.006379222326
Mean Squared Error (MSE) for mse_y_test_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for mse_y_test_y_pred: 20508.006379222326
Model cofficients: [330240.26894313 -34763.14779159  -2228.01181112  12540.533571
  11972.13713674]


In [212]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,330240.268943
rooms,-34763.147792
distance_category,-2228.011811
code type,12540.533571
prestigious,11972.137137


In [213]:
#Prediction
price_pred = model.predict(mmscaler.transform([[40, 1, 1, 4, 3]]))
# 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11111.82443188] hryvnas




In [214]:
#Prediction
price_pred = model.predict(mmscaler.transform([[60, 2, 1, 5, 3]]))
 # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13719.2362224] hryvnas




In [215]:
#Prediction
price_pred = model.predict(mmscaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.31270486] hryvnas




In [216]:
#Prediction
price_pred = model.predict(mmscaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29710.5050751] hryvnas




In [217]:
#Prediction
price_pred = model.predict(mmscaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [218]:
#Lasso of the model

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=1)

lasso_model.fit(X_train_mmscaled, y_train)


Lasso(alpha=1)

In [219]:
y_pred_train=lasso_model.predict(X_train_mmscaled)
lasso_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
lasso_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

lasso_y_pred_test = lasso_model.predict(X_test_mmscaled)
lasso_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
lasso_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

lasso_score_train = lasso_model.score(X_train_mmscaled, y_train)
lasso_score_test = lasso_model.score(X_test_mmscaled, y_test)

print('Train score for lasso_model:', lasso_score_train)
print('Test score for lasso_model:', lasso_score_test)

print('Mean Squared Error (MSE) for lasso_mse_y_train_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred:', lasso_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for lasso_mse_y_test_y_pred:', lasso_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred:', lasso_rmse_y_train_y_pred)

print('Model cofficients:', lasso_model.coef_)

Train score for lasso_model: 0.5790954205303336
Test score for lasso_model: 0.5436404900780325
Mean Squared Error (MSE) for lasso_mse_y_train_y_pred: 420579200.93061626
Root Mean Squared Error (RMSE) for lasso_mse_y_train_y_pred: 20508.02771917905
Mean Squared Error (MSE) for lasso_mse_y_test_y_pred: 420579200.93061626
Root Mean Squared Error (RMSE) for lasso_mse_y_test_y_pred: 20508.02771917905
Model cofficients: [329606.95659402 -34533.27271539  -2215.41702127  12549.09301658
  11974.50202567]


In [220]:
# Оцінка кoефіцієнтів
lasso_df_coef = pd.DataFrame(lasso_model.coef_, X.columns, columns=['Model coefficients'])
lasso_df_coef

Unnamed: 0,Model coefficients
flat_area,329606.956594
rooms,-34533.272715
distance_category,-2215.417021
code type,12549.093017
prestigious,11974.502026


In [221]:
#Prediction 
price_pred = lasso_model.predict(mmscaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11116.2579913] hryvnas




In [222]:
#Prediction
price_pred = lasso_model.predict(mmscaler.transform([[40, 1, 1, 4, 1]]))# 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-858.24403438] hryvnas




In [223]:
#Prediction
price_pred = lasso_model.predict(mmscaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10270.06379681] hryvnas




In [224]:
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1)

ridge_model.fit(X_train_mmscaled, y_train)


Ridge(alpha=1)

In [225]:
y_pred_train=ridge_model.predict(X_train_mmscaled)
ridge_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
ridge_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

ridge_y_pred_test = ridge_model.predict(X_test_mmscaled)
ridge_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
ridge_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

ridge_score_train = ridge_model.score(X_train_mmscaled, y_train)
ridge_score_test = ridge_model.score(X_test_mmscaled, y_test)

print('Train score for ridge_model:', ridge_score_train)
print('Test score for ridge_model:', ridge_score_test)

print('Mean Squared Error (MSE) for ridge_mse_y_train_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred:', ridge_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for ridge_mse_y_test_y_pred:', ridge_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred:', ridge_rmse_y_train_y_pred)

print('Model cofficients:', ridge_model.coef_)

Train score for ridge_model: 0.5768338518559327
Test score for ridge_model: 0.5462479884403388
Mean Squared Error (MSE) for ridge_mse_y_train_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for ridge_mse_y_train_y_pred: 20563.049902889783
Mean Squared Error (MSE) for ridge_mse_y_test_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for ridge_mse_y_test_y_pred: 20563.049902889783
Model cofficients: [296606.263845   -24290.26751724  -2371.77885435  13883.90311037
  12330.00870108]


In [226]:
# Оцінка кoефіцієнтів
ridge_df_coef = pd.DataFrame(ridge_model.coef_, X.columns, columns=['Model coefficients'])
ridge_df_coef

Unnamed: 0,Model coefficients
flat_area,296606.263845
rooms,-24290.267517
distance_category,-2371.778854
code type,13883.90311
prestigious,12330.008701


In [227]:
#Prediction
price_pred = ridge_model.predict(mmscaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11972.89085969] hryvnas




In [228]:
#Prediction
price_pred = model.predict(mmscaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [-10265.71288311] hryvnas




In [229]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
# I added max_depth param = 4
regressor = DecisionTreeRegressor(max_depth=4)

regressor.fit(X_train_mmscaled, y_train)


DecisionTreeRegressor(max_depth=4)

In [230]:
regressor_y_pred_train=regressor.predict(X_train_mmscaled)
regressor_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_y_pred_test = regressor.predict(X_test_mmscaled)
regressor_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

train_score = regressor.score(X_train_mmscaled, y_train)
print("Training R2 Score for X_train, y_train:", train_score)

test_score = regressor.score(X_test_mmscaled, y_test)
print("Test R2 Score for X_test, y_test:", test_score)

print('Mean Squared Error (MSE) for regressor_mse_y_train_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred:', regressor_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_mse_y_test_y_pred:', regressor_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred:', regressor_rmse_y_train_y_pred)

print('Model feature importances:', regressor.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor, X_train, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.6912331339051985
Test R2 Score for X_test, y_test: 0.4860315272658472
Mean Squared Error (MSE) for regressor_mse_y_train_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for regressor_mse_y_train_y_pred: 20563.049902889783
Mean Squared Error (MSE) for regressor_mse_y_test_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for regressor_mse_y_test_y_pred: 20563.049902889783
Model feature importances: [0.82105149 0.06821247 0.         0.08883566 0.02190038]
Cross-validated R2 scores: [0.56816268 0.71914981 0.28635202 0.66600079 0.51978856]
Mean R2 score: 0.5518907723759898


In [231]:
# Оцінка кoефіцієнтів
regr_df_imp = pd.DataFrame(regressor.feature_importances_, X.columns, columns=['Feature Importances'])
regr_df_imp

Unnamed: 0,Feature Importances
flat_area,0.821051
rooms,0.068212
distance_category,0.0
code type,0.088836
prestigious,0.0219


In [232]:
#Prediction
price_pred = regressor.predict(mmscaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [18518.34683099] hryvnas




In [233]:
#Prediction
price_pred = regressor.predict(mmscaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [20628.78661088] hryvnas




In [234]:
#Prediction
price_pred = regressor.predict(mmscaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [11242.09689737] hryvnas




In [235]:
#Prediction
price_pred = regressor.predict(mmscaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [40585.66666667] hryvnas




In [236]:
#Prediction
price_pred = regressor.predict(mmscaler.transform([[40, 1, 1, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Потрібен капітальний ремонт,prestigious district 1: [11242.09689737] hryvnas




In [237]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.metrics import r2_score

regressor_forest = RandomForestRegressor(n_estimators=150, max_depth=12, min_samples_split=3, min_samples_leaf=2)

regressor_forest.fit(X_train_mmscaled, y_train)


RandomForestRegressor(max_depth=12, min_samples_leaf=2, min_samples_split=3,
                      n_estimators=150)

In [238]:
regressor_forest_y_pred_train=regressor_forest.predict(X_train_mmscaled)
regressor_forest_mse_y_train_y_pred=mean_squared_error(y_train, y_pred_train)
regressor_forest_rmse_y_train_y_pred=np.sqrt(mean_squared_error(y_train, y_pred_train))

regressor_forest_y_pred_test = regressor_forest.predict(X_test_mmscaled)
regressor_forest_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
regressor_forest_rmse_y_test_y_pred=np.sqrt(mean_squared_error(y_test, y_pred_test))

f_train_score = regressor_forest.score(X_train_mmscaled, y_train)
print("Training R2 Score for X_train, y_train:", f_train_score)

f_test_score = regressor_forest.score(X_test_mmscaled, y_test)
print("Test R2 Score for X_test, y_test:", f_test_score)

print('Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred:', regressor_forest_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred:', regressor_forest_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred:', regressor_forest_rmse_y_train_y_pred)

print('Model regressor_forest feature importances:', regressor_forest.feature_importances_) 

# Perform 5-fold cross-validation on the training data
scores = cross_val_score(regressor_forest, X_train_mmscaled, y_train, cv=5, scoring='r2')

print("Cross-validated R2 scores:", scores)
print("Mean R2 score:", scores.mean())

Training R2 Score for X_train, y_train: 0.8216590497930709
Test R2 Score for X_test, y_test: 0.588163396563768
Mean Squared Error (MSE) for regressor_forest_mse_y_train_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_train_y_pred: 20563.049902889783
Mean Squared Error (MSE) for regressor_forest_mse_y_test_y_pred: 422839021.3087355
Root Mean Squared Error (RMSE) for regressor_forest_rmse_y_test_y_pred: 20563.049902889783
Model regressor_forest feature importances: [0.80572236 0.05252039 0.00260856 0.09290884 0.04623985]
Cross-validated R2 scores: [0.58317391 0.56732519 0.53408154 0.72316243 0.52436964]
Mean R2 score: 0.5864225423523839


In [239]:
# Оцінка кoефіцієнтів
regrfor_df_imp = pd.DataFrame(regressor_forest.feature_importances_, X.columns, columns=['Feature Importances'])
regrfor_df_imp

Unnamed: 0,Feature Importances
flat_area,0.805722
rooms,0.05252
distance_category,0.002609
code type,0.092909
prestigious,0.04624


In [240]:
#Predictions
print(regressor_forest.predict(mmscaler.transform([[80, 2, 1, 1, 1]])))
#80 meters, 2 rooms, close to metro, Задовільний стан  state of flat, non-prestigious district
print(regressor_forest.predict(mmscaler.transform([[160, 3, 3, 5, 1]])))
#160 sqmeters, 3 rooms, distant from metro, Дизайнерський ремонт, non-prestigious district
print(regressor_forest.predict(mmscaler.transform([[100, 2, 3, 1, 1]])))
#100 sqmeters, 2 rooms, distant from metro,Потрібен капітальний ремонт, non-prestigious district 
print(regressor_forest.predict(mmscaler.transform([[50, 2, 2, 1, 1]])))
#50 sqmeters, 2 rooms, middle distance to metro, Потрібен капітальний ремонт, non-prestigious district 

[7004.41570197]
[28882.97249206]
[6842.45042003]
[5093.91388408]




In [241]:
#Prediction
price_pred = regressor_forest.predict(mmscaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [15715.80849206] hryvnas




In [242]:
#Prediction
price_pred = regressor_forest.predict(mmscaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [21982.60049182] hryvnas




In [243]:
#Prediction
price_pred = regressor_forest.predict(mmscaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [10728.6547984] hryvnas




In [244]:
#Prediction
price_pred = regressor_forest.predict(mmscaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")



Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [38109.64532768] hryvnas


In [245]:
#Prediction
price_pred = regressor_forest.predict(mmscaler.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт, prestigious district 1: [5254.12902305] hryvnas




In [246]:
from sklearn.svm import SVR

regressor = SVR()

regressor.fit(X_train_mmscaled, y_train)

y_pred = regressor.predict(X_test_mmscaled)

score = regressor.score(X_test_mmscaled, y_test)
print("R2 Score:", score)

R2 Score: -0.08827913191998493


In [247]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score

X = df[['flat_area', 'rooms', 'distance_category', 'code type', 'prestigious']]
y = df['price']

mmscaler = MinMaxScaler()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

X_train_mmscaled = mmscaler.fit_transform(X_train)

model = LinearRegression()
model.fit(X_train_mmscaled, y_train)

X_test_mmscaled = mmscaler.transform(X_test)

X_test_mmscaled = mmscaler.transform(X_test)
# y_pred = model.predict(X_test_mmscaled)

y_pred_train=model.predict(X_train_mmscaled)

param_grid = {'fit_intercept': [True, False]}

grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train_mmscaled, y_train)  


best_params = grid_search.best_params_
best_model = LinearRegression(**best_params)
best_model.fit(X_train_mmscaled, y_train)  

# Print the evaluation metrics
print('Best Hyperparameters:', best_params)

Best Hyperparameters: {'fit_intercept': True}


In [248]:
best_model_y_pred_train = best_model.predict(X_train_mmscaled)
y_pred_train = best_model_y_pred_train
best_model_mse_y_train_y_pred = mean_squared_error(y_train, y_pred_train)
best_model_rmse_y_train_y_pred = np.sqrt(mean_squared_error(y_train, y_pred_train))

best_model_y_pred_test = best_model.predict(X_test_mmscaled)
y_pred_test = best_model_y_pred_test
best_model_mse_y_test_y_pred = mean_squared_error(y_test, y_pred_test)
best_model_rmse_y_test_y_pred = np.sqrt(mean_squared_error(y_test, y_pred_test))

gs_train_score = best_model.score(X_train_mmscaled, y_train)
print("Training  Score for X_train, y_train:", gs_train_score)

gs_test_score = best_model.score(X_test_mmscaled, y_test)
print("Test Score for X_test, y_test:", gs_test_score)

print('Mean Squared Error (MSE) for best_model_mse_y_train_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred:', best_model_rmse_y_train_y_pred)
print('Mean Squared Error (MSE) for best_model_y_test_y_pred:', best_model_mse_y_train_y_pred)
print('Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred:', best_model_rmse_y_train_y_pred)

print('Model best_model GridSearchCV coefficients:', best_model.coef_) 


Training  Score for X_train, y_train: 0.5790962964878815
Test Score for X_test, y_test: 0.5435417979998883
Mean Squared Error (MSE) for best_model_mse_y_train_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for best_model_rmse_y_train_y_pred: 20508.006379222326
Mean Squared Error (MSE) for best_model_y_test_y_pred: 420578325.6502237
Root Mean Squared Error (RMSE) for best_model_rmse_y_test_y_pred: 20508.006379222326
Model best_model GridSearchCV coefficients: [330240.26894313 -34763.14779159  -2228.01181112  12540.533571
  11972.13713674]


In [249]:
# Оцінка кoефіцієнтів
df_coef = pd.DataFrame(best_model.coef_, X.columns, columns=['Model coefficients'])
df_coef

Unnamed: 0,Model coefficients
flat_area,330240.268943
rooms,-34763.147792
distance_category,-2228.011811
code type,12540.533571
prestigious,11972.137137


In [250]:
#Prediction
price_pred = best_model.predict(mmscaler.transform([[40, 1, 1, 4, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district:", price_pred, "hryvnas")


Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, prestigious district: [11111.82443188] hryvnas




In [251]:
#Prediction
price_pred = best_model.predict(mmscaler.transform([[60, 2, 1, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district:", price_pred, "hryvnas")

Price for 60 sqmeters, 2 rooms, close to metro, Євроремонт state of flat, prestigious district: [13719.2362224] hryvnas




In [252]:
#Prediction
price_pred = best_model.predict(mmscaler.transform([[40, 1, 1, 4, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, close to metro, Хороший стан of flat, not prestigious district: [-860.31270486] hryvnas




In [253]:
#Prediction
price_pred = best_model.predict(mmscaler.transform([[160, 4, 3, 5, 3]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district:", price_pred, "hryvnas")

Price for 160 sqmeters, 4 rooms, distant from metro, Дизайнерський ремонт of flat, prestigious district: [29710.5050751] hryvnas




In [254]:
#Prediction
price_pred = best_model.predict(mmscaler.transform([[40, 1, 3, 1, 1]])) # 40 sqmeters, 1 room, close to metro, good state of flat, prestigious district
print("Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1:", price_pred, "hryvnas")

Price for 40 sqmeters, 1 room, distant from metro, Потрібен капітальний ремонт,prestigious district 1: [-12493.72469423] hryvnas


