# Finding the right house
After answering the research questions, the last step was to find the suitable houses for my stakeholder. Taken my findings into consideration, I tried to find houses:
- older than 50 years
- renovated or non-renovated
- with 4+ bathrooms
- with a lot bigger than 20000 sqft
- with a golf course within 2.5 km
- with no waterfront
- good grade (9) and condition (3)


In [14]:
# Importing libraries
import pandas as pd
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings("ignore")

In [15]:
# Loading cleaned housing data and golf course data into Panda DataFrame
df = pd.read_csv("data/King_County_House_prices_dataset_cleaned_distance.csv")
df_golf = pd.read_csv("data/golf_course_locs.csv")

In [16]:
# only keep houses older than 50 years, 4+ bathrooms, a big lot, near golf course, no waterfront, and good grade and condition
df_f = df.query('yr_built < 1972 and bathrooms > 4  and sqft_lot > 20000 and nearest_golf_distance < 2.5 and waterfront == 0 and grade >= 9 and condition >= 3').reset_index()
df_f

Unnamed: 0,index,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,...,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15,nearest_golf,nearest_golf_distance
0,3018,3377900195,9/29/2014,2530000.0,4,5.5,6930,45100,1.0,0.0,...,?,1950,1991.0,98006,47.5547,-122.144,2560,37766,The Golf Club At Newcastle,1.821713
1,5961,5249800010,12/3/2014,2730000.0,4,4.25,6410,43838,2.5,0.0,...,800.0,1906,,98144,47.5703,-122.28,2270,6630,City of Seattle Golf Courses,2.110153
2,7245,6762700020,10/13/2014,7700000.0,6,8.0,12050,27600,2.5,0.0,...,3480.0,1910,1987.0,98102,47.6298,-122.323,3940,8800,Municipal Golf Of Seattle,1.445868
3,14542,2303900035,6/11/2014,2890000.0,5,6.25,8670,64033,2.0,0.0,...,2550.0,1965,,98177,47.7295,-122.372,4140,81021,Seattle Golf Club,1.033879
4,18314,5317100750,7/11/2014,2920000.0,4,4.75,4575,24085,2.5,0.0,...,670.0,1926,,98112,47.6263,-122.284,3900,9687,Broadmoor Golf Club,1.614485


So there were 5 houses meeting these strict conditions. Next I plotted a map with these houses:

In [17]:
# Using Plotly Express to create a map based on OpenStreetMap
fig = px.scatter_mapbox(df_f, lat="lat", lon="long",
    hover_name='id', hover_data=["price", "sqft_lot", 'yr_built', 'nearest_golf', 'nearest_golf_distance'],
    color_discrete_sequence=["black"])
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},
    width=1000, height=600)
fig.show()

From looking at this map, there seems to be one house which is actually a waterfront house (id: 5249800010). I excluded this from my recommendation.

As a bonus, I also plotted a map of all King County golf courses:

In [18]:
# Using Plotly Express to create a map based on OpenStreetMap
fig = px.scatter_mapbox(df_golf, lat="lat", lon="long",
    hover_name='name', hover_data=["address"],
    color_discrete_sequence=["green"])
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0},
    width=1000, height=600)
fig.show()

Quite a lot of golf courses!

#### Conclusion

Based on my EDA, I recommend my stakeholder to take a look at the following 4 houses:
- 3377900195
- 6762700020
- 2303900035
- 5317100750

In the case that these houses are not to his liking, there still is the possibility to look at houses with fewer than 4 bathrooms, combined with a nearby smaller house.