# Exploratory Data Analysis of House Rental Prices
<br>
**Bucharest** is the capital of **Romania**, it is a big city where people keep moving. This is one of the most demanded places in the country where students come to finish their higher education and people of many kinds arrive to make a living out of a new job. This city provides a wide range of possibilities regarding properties to rent or to buy, depending on the *location*, *surface*, *comfort*, and other variables. 

In this notebook, we will analyze what Bucharest had to offer during **September 2020** with respect to **rental** offers. Some of the questions we aim to answer are the following:
- [Which are the areas with the most expensive rental, depending on the number of rooms the apartment has? But with the cheapest?](#question-1) 
  - This could imply extracting the distribution of prices among city's areas and number of rooms.
- Which are the areas with the oldest buildings? But with the newest?
- How is the comfort affected depending on the apartment's rooms partitioning?
- Is there a correlation between price and the floor (level of building) where the apartment is found?
- Are the prices imposed by property management agencies bigger than the ones imposed by private owners, with respect to apartment's surface?
- Which is the distribution of useful surface depending on the number of rooms?

In [None]:
# Load useful libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import pandas.core.nanops as nanops # work with NaN in dataframes
import matplotlib.pyplot as plt # plotting
import seaborn as sns; # sophisticated plotting

In [None]:
# Take a peek at the available data
data = pd.read_csv('../input/bucharest-house-prices-september-2020/renting_houses_clean.csv')
data.head(5)

<div id="question-1"></div>

# Most expensive / cheapest areas

## Which are the areas with the most expensive / cheapest rental average?

In order to answer this question, we are going to create a table whose rows will represent location areas around Bucharest and columns represent the number of apartment's rooms. This way, we can analyze all the houses in a specific area for a category of houses (having X rooms) and choose the median price of those.

I have chosen to use the median, and not the mean, because in every location there might be outliers whose rental prices would exceed the usual range of prices.

Another approach to find out the truth about which are the areas with the most expensive / cheapest rental prices would be to use the useful/built surface as a measurement unit, instead of the rooms' number. However, to make visualising the data a little bit more interesting, I chose the first alternative, using a heatmap for that where each cell contains the median price.

In [None]:
# Get the prices categorised by location and number of rooms, by using the median price for more apartments in same location
area_prices = data.pivot_table(index='location_area', columns='rooms_count', values='price', aggfunc=np.median)

# Create new column '4+' representing mean price of all apartments with 4 or more rooms
area_prices['4+'] = area_prices.loc[:, 4:].mean(axis=1)

# Drop all columns having 4-22 number of rooms as label (those represent sparse data)
area_prices = area_prices.drop(columns=range(4, 23), axis=1, errors='ignore')

# Create new column for mean price of each location, disregarding the number of rooms the apartments own
area_prices['any'] = area_prices.loc[:, [1, 2, 3, '4+']].mean(axis=1)

# Sort by mean price and round prices
area_prices = area_prices.sort_values(by='any', ascending=False).round(2)

# Get relevant information for the most expensive and cheapest areas
expensive_areas = area_prices.head(30)[['any', 1, 2, 3, '4+']]
cheap_areas = area_prices.tail(30).sort_values(by='any')[['any', 1, 2, 3, '4+']]

In [None]:
# Display the final form of the table, computed in the previous block of code
pd.set_option('display.max_rows', None)
area_prices

### The most expensive areas

As we can see down below, the three most expensive areas in Bucharest (as calculated by our means) are **Nordului**, **Kiseleff**, and **Primaverii**, where the median rental price for an apartment with only one room can be 500€/month. It is well known that the northern areas are the most expensive places where to live in Bucharest, this being confirmed here too, where most of the listed areas in TOP 30 are found in the north.

As a side note, we can observe that these areas might lack apartments with 1 or 2 rooms, and instead have to offer bigger houses to the customers.

In [None]:
# Show the most expensive areas in a heatmap
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(expensive_areas, annot=True, linewidths=.5, fmt="g", cmap='Reds', ax=ax)

In [None]:
# Show all (apparently only one) apartments with 1 room in Nordului area
data[(data['location_area'] == 'Nordului') & (data['rooms_count'] == 1)].sort_values(by='price', ascending=False)

### The cheapest areas

On the opposite side, the cheapest areas in Bucharest are to be found at the capital's outskirts, either to the extreme East/West, or to the South, as expected. The three cheapest locations identified in the dataset are **Lucretiu Patrascanu**, **Ferentari**, and **Giurgiului**.

Here again, looking to the missing data for each location, we can conclude that usually the cheapest areas's offers are about smaller houses with 1 or 2 rooms, sometimes 3 and rarely owning 4 rooms to live in.

As a fun fact, there might be cases where apartments with different number of rooms can be priced the same ammount of euros, even in the same location area as observed in **Uverturii**'s case. However, the prices in this case might depend on other variables as well.

In [None]:
# Show the cheapest areas in a heatmap
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(cheap_areas, annot=True, linewidths=.5, fmt="g", cmap='Blues', ax=ax)

In [None]:
# Uverturii has 2 apartments with same prices, but different number of rooms
data[(data['location_area'] == 'Uverturii')]

### The cheapest areas with single room apartments

As a student myself, I have been looking for rentals for awhile and I could say that this is what determined me to work on this dataset and extract some insights. In order to make an informed choice regarding the place where I will spend the next of my 1 or 2 years, I have decided to make another graph, this time taking into consideration only the single room apartments (which are supposed to be the cheapest out of all categories).

In [None]:
# Sort areas by median price of single room apartments
single_room_prices = area_prices.sort_values(by=1, ascending=True)

# Drop unnecessary columns, areas with no single room apartments, and keep the first 30
single_room_prices = single_room_prices.drop(columns=[2, 3, '4+', 'any']).dropna()
single_room_prices = single_room_prices.head(30)

# Rename column for convenience
single_room_prices = single_room_prices.rename(columns={1: 'rental_price'})

# Plot the data using a horizontal bar plot
fig, ax = plt.subplots(figsize=(12,12))
ax = sns.barplot(x="rental_price", y=single_room_prices.index, data=single_room_prices, palette="summer")

# Display price annotations for each bar
grouped_values = single_room_prices.reset_index()
for p in ax.patches:
    ax.text(
        x=p.get_x() + p.get_width() + 2,
        y=p.get_y() + p.get_height() * 0.7,
        s=int(p.get_width()),
        ha='left'
    )