### <center>📜 **<font color="green">Question 5:</font> Analyze the price and the corresponding area to conclude the type of room, apartment, rented room, or industrial park in the 5 last years** </center>

#### 📙**Import the necessary libraries**

In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

---

### ✨ **1. Get data preprocessing**

In [14]:
# Read the data
df = pd.read_csv('../data/HCMHouseRentPreprocessing.csv')  
df.head(10)
df.set_index('id')

Unnamed: 0_level_0,title,price,published,acreage,street,ward,district
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,"Cho thuê nhà trọ mới sạch đẹp tại Lê Đình Cẩn,...",2200000,2022-05-16,20.0,Lê Đình Cẩn,Phường Tân Tạo,Quận Bình Tân
1,Cho thuê phòng trọ giá rẻ ở mặt tiền hẻm lớn Đ...,2500000,2022-04-20,20.0,487/35/25 Đường Huỳnh Tấn Phát,Phường Tân Thuận Đông,Quận 7
2,Cho thuê phòng trọ kdc Nam Long-Trần Trọng Cun...,3500000,2022-05-10,30.0,Đường 10,Phường Tân Thuận Đông,Quận 7
3,Phòng trọ giá rẻ ngay cổng khu chế xuất Tân Th...,1500000,2022-05-05,30.0,283/15 Huỳnh Tấn Phát,Phường Tân Thuận Đông,Quận 7
4,"Cho thuê phòng có gác, không gác, tolet riêng ...",3500000,2022-01-05,18.0,Lê Văn Sỹ,Phường 14,Quận Phú Nhuận
...,...,...,...,...,...,...,...
8872,Cho thuê phòng trọ gần trung tâm quận 11,2200000,2020-10-30,14.0,102/9/11a Đường Bình Thới,Phường 14,Quận 11
8873,Cho thuê phòng hoặc tầng 1 và 2 nhà mặt tiền 1...,2500000,2020-11-23,12.0,177 Đường Tôn Thất Hiệp,Phường 12,Quận 11
8874,Phòng FULL NOI THAT THOÁNG ĐẸP NHƯ HÌNH GẦN LÊ...,3500000,2022-07-28,20.0,212 Đường Lò Siêu,Phường 12,Quận 11
8875,CHÍNH CHỦ CHO THUÊ CĂN HỘ MINI TẠI TRUNG TÂM Q11,5000000,2020-11-25,30.0,127/17 Đường Âu Cơ,Phường 14,Quận 11


### ✨ **2. Get the last 5 years and group by acreage and get the lowest price for each year**

In [15]:
# get year > 2018
df['year'] = pd.DatetimeIndex(df['published']).year
df['year'] = df[df['year'] > 2017]['year'].astype(int)

# group by acreage and get the lowest price for each year > 2017
grouped = df.groupby(['acreage', 'year'])
df2 = grouped['price'].min().reset_index()
df2

Unnamed: 0,acreage,year,price
0,5.0,2018.0,1400000
1,5.0,2020.0,1200000
2,5.0,2021.0,1300000
3,5.0,2022.0,1000000
4,5.2,2022.0,900000
...,...,...,...
320,600.0,2020.0,4200000
321,600.0,2021.0,1499000
322,900.0,2020.0,95000000
323,1000.0,2020.0,1600000


### ✨**3. Create a new dataframe and calculate mean price per squared meter**

In [16]:
# create a new dataframe with index is acreage, each column is a year and each row is the lowest price
df_acreage = pd.DataFrame(grouped['price'].min())
df_acreage = df_acreage.unstack(level=1)
df_acreage = df_acreage.sort_index(ascending=True)
df_acreage['min_price'] = df_acreage.min(axis=1)
df_acreage = df_acreage.sort_values(by='acreage', ascending=True)
df_acreage.head(30)

# Name the columns
df_acreage.columns = ['2018', '2019', '2020','2021','2022', 'min_price']

# mean of min price per acreage
df_acreage['mean_per_acreage'] = round(df_acreage['min_price'] / df_acreage.index, 0)
df_acreage.tail(30)

Unnamed: 0_level_0,2018,2019,2020,2021,2022,min_price,mean_per_acreage
acreage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
100.0,,2800000.0,1150000.0,500000.0,500000.0,500000.0,5000.0
104.0,,,12000000.0,,,12000000.0,115385.0
105.0,,,13000000.0,,,13000000.0,123810.0
110.0,,,1700000.0,,,1700000.0,15455.0
115.0,,,,,1100000.0,1100000.0,9565.0
115.5,,3500000.0,,,,3500000.0,30303.0
116.0,,,,,1300000.0,1300000.0,11207.0
120.0,,150000000.0,1300000.0,7500000.0,1200000.0,1200000.0,10000.0
122.0,,,,,1200000.0,1200000.0,9836.0
130.0,12000000.0,,,1299000.0,1700000.0,1299000.0,9992.0


In [17]:
# print all the rows with mean_per_acreage < 10000
outlier = df_acreage[df_acreage['mean_per_acreage'] < 10000]
outlier

Unnamed: 0_level_0,2018,2019,2020,2021,2022,min_price,mean_per_acreage
acreage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
98.0,,800000.0,,,,800000.0,8163.0
100.0,,2800000.0,1150000.0,500000.0,500000.0,500000.0,5000.0
115.0,,,,,1100000.0,1100000.0,9565.0
122.0,,,,,1200000.0,1200000.0,9836.0
130.0,12000000.0,,,1299000.0,1700000.0,1299000.0,9992.0
150.0,,,7000000.0,3000000.0,1100000.0,1100000.0,7333.0
160.0,,,,,550000.0,550000.0,3438.0
182.0,,,1400000.0,,,1400000.0,7692.0
195.0,,,,1500000.0,,1500000.0,7692.0
200.0,,,1400000.0,1000000.0,1000000.0,1000000.0,5000.0


&#9889; <font color="yellow"><b>What are benefits of finding the answer? </b></font>
>- Looking at the table above, we can see that there are places for rent of more than 100 square meters, but the price is under  2000000 VNĐ. This proves that this is a dormitory or an industrial park, so the rent is very low because it is calculated on a per capita basis.
>- A special feature is that there are places up to 1000 square meters but only priced from 1200000 - 1600000. This proves that this is not an ordinary inn, but a long inn or an industrial park for rent.
>- Thus, looking at the table above, we can guess the type of hostel, apartment, rented room, or dormitory in the last 5 years.