## Handling the data of Housing.csv


### Pre-processing 

Load data

In [54]:
import numpy as np
import pandas as pd

data = pd.read_csv('HousingData.csv')
dt = data

Convert data type to calculatable "int"

In [55]:
dt["Miles (dist. between school and house)"] = dt["Miles (dist. between school and house)"].astype(int)
dt["Rent Price per Month"] = dt["Rent Price per Month"].astype(int)
dt["Sell Price"] = dt["Sell Price"].astype(int)

### A3 Consider renting a house:

I would like to live in a house that matches:
1. Located in the city center
2. More than 1 room
3. The renting price must not exceed 75% of the available choices
4. As near as possible to school

And of course, it can't be too expensive.

Houses located in "City Center"

In [56]:
dt_rent = dt.loc[(dt['Location'] == "City Center")]
dt_rent

Unnamed: 0,Area,No. of Rooms,No. of Bathrooms,Location,Miles (dist. between school and house),Rent Price per Month,Sell Price
7,1738,3,1,City Center,352,17023,51362588
8,830,3,1,City Center,137,17950,40783890
11,630,2,1,City Center,13,17867,72977550
12,2185,1,1,City Center,451,8745,63413770
13,1269,1,1,City Center,418,12091,62692404
...,...,...,...,...,...,...,...
981,2860,2,1,City Center,479,9645,27384981
985,2791,3,1,City Center,147,7788,35969114
989,885,2,1,City Center,120,19691,16387155
995,1375,1,1,City Center,33,16039,8670492


Filter out the houses with less than 2 rooms

In [57]:
dt_rent = dt_rent.drop(dt_rent[dt_rent['No. of Rooms'] < 2].index)
dt_rent

Unnamed: 0,Area,No. of Rooms,No. of Bathrooms,Location,Miles (dist. between school and house),Rent Price per Month,Sell Price
7,1738,3,1,City Center,352,17023,51362588
8,830,3,1,City Center,137,17950,40783890
11,630,2,1,City Center,13,17867,72977550
14,2891,3,1,City Center,312,9866,20986157
19,2824,2,1,City Center,236,8792,52108295
...,...,...,...,...,...,...,...
980,701,2,1,City Center,267,7776,44837854
981,2860,2,1,City Center,479,9645,27384981
985,2791,3,1,City Center,147,7788,35969114
989,885,2,1,City Center,120,19691,16387155


Get details of the houses' "Rent Price per Month"

In [58]:
print (dt_rent.describe())

              Area  No. of Rooms  No. of Bathrooms  \
count   224.000000    224.000000             224.0   
mean   1762.129464      2.464286               1.0   
std     673.551025      0.499840               0.0   
min     550.000000      2.000000               1.0   
25%    1243.750000      2.000000               1.0   
50%    1738.000000      2.000000               1.0   
75%    2330.750000      3.000000               1.0   
max    2992.000000      3.000000               1.0   

       Miles (dist. between school and house)  Rent Price per Month  \
count                              224.000000            224.000000   
mean                               262.785714          13387.401786   
std                                141.745204           4049.802266   
min                                 10.000000           6152.000000   
25%                                140.000000           9792.750000   
50%                                268.000000          13446.500000   
75%             

75% = 17053
50% = 13446
mean = 13387

Filter out the most expensive 25%. 

In [59]:
dt_rent = dt_rent.drop(dt_rent[dt_rent['Rent Price per Month'] > 17053].index)

Sort the list according to the distance between school and house in ascending order.

In [60]:
dt_rent.sort_values(by=['Miles (dist. between school and house)'], inplace=True, ascending=True)
dt_rent

Unnamed: 0,Area,No. of Rooms,No. of Bathrooms,Location,Miles (dist. between school and house),Rent Price per Month,Sell Price
785,2041,2,1,City Center,19,8912,27709264
976,2918,2,1,City Center,22,6816,37785895
653,767,2,1,City Center,31,10195,69574747
759,1715,2,1,City Center,36,16878,65073295
996,1262,2,1,City Center,38,9904,20223887
...,...,...,...,...,...,...,...
936,2562,3,1,City Center,490,13751,57983779
544,2458,2,1,City Center,493,10559,16356675
786,1805,2,1,City Center,493,15909,37478135
580,1167,2,1,City Center,496,9765,8269798


After the filtering, I will choose one between House 351, House 949 and House 785.

### A4 Consider buying a house

Same as A3, I would like to live in a house that matches:
1. Located in the city center
2. More than 1 room
3. The renting price must not exceed 75% of the available choices
4. As near as possible to school

And of course, it can't be too expensive.


Same processing procedure except focusing on "Sell Price" this time

Houses located in "City Center"

In [61]:
dt_buy = dt.loc[(dt['Location'] == "City Center")]

Filter out the houses with less than 2 rooms.
Sort the list according to the number of rooms.

In [62]:
dt_buy = dt_buy.drop(dt_buy[dt_buy["No. of Rooms"] < 2].index)
dt_buy = dt_buy.sort_values(by=["No. of Rooms"], ascending = False)


Get details of the houses' "Rent Price per Month".

In [63]:
print(dt_buy["Sell Price"].describe())

count    2.240000e+02
mean     4.280702e+07
std      2.120648e+07
min      6.144771e+06
25%      2.523921e+07
50%      4.327329e+07
75%      6.103225e+07
max      7.998578e+07
Name: Sell Price, dtype: float64


- 75% 6.10311e+07
- 50% 4.32732e+07
- mean 4.2807e+07

Filter out the highest 50% sell price

In [64]:
dt_buy = dt_buy.drop(dt_buy[dt_buy["Sell Price"] > 5.00e+07].index)
dt_buy

Unnamed: 0,Area,No. of Rooms,No. of Bathrooms,Location,Miles (dist. between school and house),Rent Price per Month,Sell Price
499,1738,3,1,City Center,68,9922,30325097
769,625,3,1,City Center,446,16103,44689071
465,2826,3,1,City Center,54,13742,27688293
777,1669,3,1,City Center,223,16748,12240327
435,1691,3,1,City Center,436,19182,43434360
...,...,...,...,...,...,...,...
437,784,2,1,City Center,484,13361,13039879
467,1372,2,1,City Center,163,15291,24722753
480,2992,2,1,City Center,254,7151,8848439
501,734,2,1,City Center,135,13032,22613311


Sort the list according to the distance between school and house in ascending order.

In [65]:
dt_buy.sort_values(by=['Miles (dist. between school and house)'], inplace=True, ascending=True)
dt_buy

Unnamed: 0,Area,No. of Rooms,No. of Bathrooms,Location,Miles (dist. between school and house),Rent Price per Month,Sell Price
275,2574,3,1,City Center,10,18602,48812486
785,2041,2,1,City Center,19,8912,27709264
976,2918,2,1,City Center,22,6816,37785895
996,1262,2,1,City Center,38,9904,20223887
481,575,3,1,City Center,39,13619,28755322
...,...,...,...,...,...,...,...
544,2458,2,1,City Center,493,10559,16356675
786,1805,2,1,City Center,493,15909,37478135
557,2605,2,1,City Center,496,16808,27419897
580,1167,2,1,City Center,496,9765,8269798


Print out the first 2 results after the processing 

In [66]:
print (dt_buy.head(2))

     Area  No. of Rooms  No. of Bathrooms     Location  \
275  2574             3                 1  City Center   
785  2041             2                 1  City Center   

     Miles (dist. between school and house)  Rent Price per Month  Sell Price  
275                                      10                 18602    48812486  
785                                      19                  8912    27709264  


As the result, I will choose one house to buy between House 275 and House 785