In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Predicting Sofia real estate prices

The project is focused on predicting real estate prices in Sofia, Bulgaria. With the growing interest in the real estate market and increasing complexity of property values, our goal is to leverage historical data to forecast future prices. Understanding and predicting real estate prices can provide valuable insights for buyers, sellers, investors, and policymakers.

The real estate market in Sofia, Bulgaria, is constantly changing. Understanding how property prices are influenced and predicting future values is important for everyone involved in buying, selling, or investing in real estate. This project aims to forecast property prices in Sofia using historical data to help people make better decisions in the real estate market.

By analyzing past transactions and key factors that affect property values—such as location, size, and property features—we will develop a model to predict future prices. My goal is to provide clear, accurate predictions that can guide buyers, sellers, and investors in making informed choices.

This project is a significant step towards improving my understanding of Sofia’s real estate market. With the help of data-driven insights, I hope to make it easier for people to navigate the complexities of property pricing and find valuable opportunities in the market.

### The goals of the project:
- To understand the key drivers of real estate prices in Sofia.
- To build and evaluate a machine learning model that can predict property prices.

### Why this problem is important?

__1. Importance for Investors__: <br/>
- Maximizing Returns: Real estate is one of the most significant and stable forms of investment. Investors rely on accurate price predictions to identify undervalued properties, time their purchases, and decide on the most profitable resale opportunities. By predicting future prices, investors can optimize their portfolios, maximize returns, and mitigate risks.
- Market Timing: For investors, knowing when to buy or sell is crucial. Accurate price predictions can help investors time the market effectively, allowing them to buy properties before prices increase and sell before a downturn. This timing can significantly affect the profitability of real estate investments.
- Diversification and Risk Management: Understanding price trends in different neighborhoods or property types allows investors to diversify their portfolios. They can spread their investments across various segments of the market, reducing the risk of exposure to price volatility in any single area.

__2. Importance for Homebuyers:__ <br/>
- Affordability and Budgeting: For most people, buying a home is the largest financial commitment they will make in their lifetime. Predicting future prices helps homebuyers understand market trends, ensuring they purchase at the right time to get the best value for their money. It also helps them budget accurately for future financial planning, such as mortgage payments and maintenance costs.
- Avoiding Overpayment: Accurate predictions can prevent homebuyers from overpaying in a competitive market. By understanding the fair value of properties, buyers can make informed decisions, avoid bidding wars, and negotiate better deals.
- Long-term Financial Planning: Homebuyers need to consider the long-term value of their investment. Predictions about property prices can guide them in choosing neighborhoods with strong potential for appreciation, ensuring that their home not only meets their immediate needs but also grows in value over time.

__3. Importance for Urban Planners and Policy Makers:__ <br/>
- Sustainable Urban Development: Urban planners rely on real estate price predictions to make informed decisions about zoning, infrastructure development, and public services. By understanding where prices are likely to rise, planners can allocate resources efficiently, ensuring sustainable growth and avoiding overdevelopment or underdevelopment in specific areas.
- Economic Policy and Housing Affordability: Accurate price predictions are crucial for developing policies that address housing affordability and social equity. Policymakers can use these predictions to implement measures such as affordable housing initiatives, rent controls, and tax incentives to stabilize markets and prevent housing bubbles.
- Infrastructure Planning: Understanding future real estate trends helps in planning for infrastructure needs, such as transportation, schools, and healthcare facilities. By anticipating areas of growth, urban planners can ensure that necessary services are in place to support population increases, thereby enhancing the quality of life for residents.

__4. Importance for the Broader Economy:__ <br/>
- Economic Stability: The real estate market is a significant component of the economy. Price fluctuations can impact consumer spending, banking stability, and overall economic health. Accurate predictions help prevent housing market bubbles, which, if left unchecked, could lead to economic recessions, as seen in the 2008 financial crisis (Global Financial Crisis (GFC) ).
- Employment and Industry Growth: The construction, finance, and retail sectors are closely tied to the real estate market. Predicting price trends allows businesses in these industries to plan for demand, hire appropriately, and make investments in their operations, contributing to overall economic growth.

In summary, real estate price prediction is a critical tool for a wide range of stakeholders. It empowers investors to maximize returns, enables homebuyers to make informed decisions, assists urban planners in managing growth sustainably, and supports policymakers in maintaining economic stability. Accurate predictions contribute to the health and efficiency of the real estate market and, by extension, the broader economy.

## Description of Datasets:
The websites that I used: __https://www.imoti.net/bg__  and __https://www.imot.bg/__ .


### Dataset 1: Historical Real Estate Prices from imoti.net
Description: This dataset provides historical real estate price data from imoti.net, including the following details:

- Location: The area or district where the property is located.
- Total Price (EUR): The overall cost of the property.
- Price per Square Meter (EUR/m²): The cost per square meter of the property.
- Property Type: The type of property (e.g., apartment, house, commercial).
- Date: The date when the price data was recorded.

### Dataset 2: Historical Real Estate Prices from imot.bg
Description: This dataset contains historical real estate price data from imot.bg, including the following details:

- Location: The specific area or district in Sofia, Bulgaria.
- One-Bedroom Apartments:
    - Total Price (EUR)
    - Price per Square Meter (EUR/m²)
- Two-Bedroom Apartments:
    - Total Price (EUR)
    - Price per Square Meter (EUR/m²)
- Three-Bedroom Apartments:
    - Total Price (EUR)
    - Price per Square Meter (EUR/m²)
- Date: The date when the data was recorded, indicating the time frame for each entry.


## Exploratory Data Analysis (EDA): 
Let's perform an EDA to understand the distribution, relationships, and outliers in THE data.

#### And now we will read the first dataset. 
Actually inside it there are 3 files. We will read them and save in variables:

In [19]:
df_2022 = pd.read_csv("data/property_prices_july_2022.csv")
df_2023 = pd.read_csv("data/property_prices_july_2023.csv")
df_2024 = pd.read_csv("data/property_prices_july_2024.csv")

Let's have a look in everyone:

In [20]:
df_2022

Unnamed: 0,Район,Цена,Цена / кв.м.,Валута,Тип Апартамент,Дата
0,Банишора,110201.52,1710.14,EUR,Двустаен апартамент,2022-07-28
1,Белите Брези,148125.00,1949.01,EUR,Двустаен апартамент,2022-07-28
2,Борово,141912.50,2120.07,EUR,Двустаен апартамент,2022-07-28
3,Бояна,167608.23,1888.09,EUR,Двустаен апартамент,2022-07-28
4,Бъкстон,109764.43,1869.47,EUR,Двустаен апартамент,2022-07-28
...,...,...,...,...,...,...
284,Хаджи Димитър,205000.00,1102.15,EUR,Многостаен апартамемент,2022-07-28
285,Хиподрума,377277.50,1326.89,EUR,Многостаен апартамемент,2022-07-28
286,Хладилника,460016.66,2115.02,EUR,Многостаен апартамемент,2022-07-28
287,Център,529647.62,2792.63,EUR,Многостаен апартамемент,2022-07-28


In [21]:
df_2023

Unnamed: 0,Район,Цена,Цена / кв.м.,Валута,Тип Апартамент,Дата
0,7-ми 11-ти километър,112150.66,1654.68,EUR,Двустаен апартамент,2023-07-28
1,Банишора,118202.97,1754.77,EUR,Двустаен апартамент,2023-07-28
2,Банкя (гр.),108600.00,1428.95,EUR,Двустаен апартамент,2023-07-28
3,Белите Брези,162233.00,2015.32,EUR,Двустаен апартамент,2023-07-28
4,Бенковски,94900.00,1395.59,EUR,Двустаен апартамент,2023-07-28
...,...,...,...,...,...,...
326,Хаджи Димитър,239000.00,2489.58,EUR,Многостаен апартамемент,2023-07-28
327,Хиподрума,445521.66,1265.69,EUR,Многостаен апартамемент,2023-07-28
328,Хладилника,490580.00,2063.43,EUR,Многостаен апартамемент,2023-07-28
329,Център,528999.94,2577.88,EUR,Многостаен апартамемент,2023-07-28


In [22]:
df_2024

Unnamed: 0,Район,Цена,Цена / кв.м.,Валута,Тип Апартамент,Дата
0,Банишора,134653.98,1987.51,EUR,Двустаен апартамент,2024-07-28
1,Банкя (гр.),131843.00,1890.84,EUR,Двустаен апартамент,2024-07-28
2,Белите Брези,249900.00,2108.86,EUR,Двустаен апартамент,2024-07-28
3,Бенковски,115863.50,1782.52,EUR,Двустаен апартамент,2024-07-28
4,Борово,135594.22,2148.50,EUR,Двустаен апартамент,2024-07-28
...,...,...,...,...,...,...
329,Суха Река,303037.00,1864.02,EUR,Многостаен апартамемент,2024-07-28
330,Хаджи Димитър,331999.50,2049.38,EUR,Многостаен апартамемент,2024-07-28
331,Хиподрума,446333.00,1410.21,EUR,Многостаен апартамемент,2024-07-28
332,Хладилника,817960.00,2493.78,EUR,Многостаен апартамемент,2024-07-28


Now we will will combine them in one dataset named __combined_df__:

In [12]:
combined_df = pd.concat([df_2022, df_2023, df_2024], ignore_index=True)

And check it:

In [14]:
combined_df

Unnamed: 0,Район,Цена,Цена / кв.м.,Валута,Тип Апартамент,Дата
0,Банишора,110201.52,1710.14,EUR,Двустаен апартамент,2022-07-28
1,Белите Брези,148125.00,1949.01,EUR,Двустаен апартамент,2022-07-28
2,Борово,141912.50,2120.07,EUR,Двустаен апартамент,2022-07-28
3,Бояна,167608.23,1888.09,EUR,Двустаен апартамент,2022-07-28
4,Бъкстон,109764.43,1869.47,EUR,Двустаен апартамент,2022-07-28
...,...,...,...,...,...,...
949,Суха Река,303037.00,1864.02,EUR,Многостаен апартамемент,2024-07-28
950,Хаджи Димитър,331999.50,2049.38,EUR,Многостаен апартамемент,2024-07-28
951,Хиподрума,446333.00,1410.21,EUR,Многостаен апартамемент,2024-07-28
952,Хладилника,817960.00,2493.78,EUR,Многостаен апартамемент,2024-07-28


#### Let's read the second dataset

In [16]:
second_df = pd.read_csv("data2/property_prices.csv")

In [17]:
second_df

Unnamed: 0,Район,Едностайни - цена,Едностайни - €/кв.м,Двустайни - цена,Двустайни - €/кв.м,Тристайни - цена,Тристайни - €/кв.м,Общо - €/кв.м,Дата
0,7-ми 11-ти километър,Null,Null,145 502,1 675,189 050,1 756,1 733,30.7.2024
1,Банишора,48 978,1 125,104 500,1 680,138 700,1 394,1 597,30.7.2024
2,Белите брези,38 000,826,81 700,1 137,122 550,1 151,1 162,30.7.2024
3,Бенковски,Null,Null,70 775,1 488,112 337,1 261,1 351,30.7.2024
4,Борово,50 825,1 057,90 249,1 236,141 550,1 619,1 272,30.7.2024
...,...,...,...,...,...,...,...,...,...
434,с. Мировяне,Null,Null,50 350,915,Null,Null,851,26.7.2022
435,с. Мрамор,Null,Null,68 400,1 140,103 740,1 140,1 140,26.7.2022
436,с. Мърчаево,Null,Null,26 600,266,35 862,297,297,26.7.2022
437,с. Панчарево,Null,Null,108 878,1 389,159 120,1 134,1 269,26.7.2022


#### Let's take a closer look at the datasets in more detail.

In [15]:
# Check for missing values
print(combined_df.isnull().sum())

# Summary statistics
print(combined_df.describe())

# Check the data types
print(combined_df.dtypes)


Район             0
Цена              0
Цена / кв.м.      0
Валута            0
Тип Апартамент    0
Дата              0
dtype: int64
               Цена  Цена / кв.м.
count  9.540000e+02    954.000000
mean   2.495837e+05   1843.112956
std    2.700700e+05    626.005076
min    6.500000e+03     17.400000
25%    1.142454e+05   1479.210000
50%    1.701621e+05   1776.785000
75%    2.810798e+05   2140.805000
max    3.000000e+06   7741.380000
Район              object
Цена              float64
Цена / кв.м.      float64
Валута             object
Тип Апартамент     object
Дата               object
dtype: object
