## Housing Search in Massachusetts Leveraging Unsupervised Machine Learning

#### Chris Grace
May 17th, 2021

### Introduction

In the search for a new home, it is generally easy to identify various characteristics such as square footage or number of bedrooms and search for properties using well-known sites such as Zillow.com, Realtor.com, or Redfin.com.  However, if you desire to investigate nearby amenities of each property, that typically involves looking up each address individually to judge suitability.  This project aims to streamline that process to narrow down the number of properties which will need to be investigated.

Our clients are homeowners who came to our small real estate office in Massachusetts.  They would like to relocate to a larger house due to a growing family.  They like the nearby amenities of their current neighborhood and would like to find a house in a neighborhood which offers similar nearby amenities.  Due to limited houses on the market, they also don't want to limit themselves to their current neighborhood.  Therefore, we will provide a means to input search parameters for the house, such as cost, square footage, number of bedrooms, etc. in a specified county.  Next, each returned address will be used as the center point of an amenity search to determine all businesses and attractions nearby.  Using an unsupervised learning algorithm (K-means algorithm), we'll group all the listings into common clusters bases on their nearby amenities.  The cluster which has their current address will be selected for further investigation since those addresses are most likely to have a similar neighborhood feel to their current residence.

The selected cluster should output relevant summary information which may help to narrow houses down further such as price, square footage, number of bedrooms, and proximity to major public amenities (hospital, fire station, police station, etc.).

While this project is being developed for a single client, this situation is encountered enough that this script is being developed to help future clients and to be used as a marketing differentiator from other real estate offices.

### Data

This project will use a few sources of data.

* <b>Redfin API</b><br>
    Redfin has an unofficial API which can be accessed by downloading the data from a search request.  The process to download a csv file with the API is briefly described here: https://support.redfin.com/hc/en-us/articles/360016476931-Downloading-Data.  When the link for this option is examined, it can be seen that the parameters can be altered to produce a csv file of our choosing.  An example of an API search query is: [1]  

    [1] `https://www.redfin.com/stingray/api/gis-csv?al=1&market=boston&max_listing_approx_size=3500&
    max_price=1000000&min_listing_approx_size=2000&min_price=500000&min_stories=1&
    num_beds=3&num_homes=350&ord=redfin-recommended-asc&page_number=1&region_id=1338&
    region_type=5&sf=1,5,6,7&status=1&time_on_market_range=30-&uipt=1&v=8` 
    
    This will return houses:   
    -From 500,000 to 1,000,000 USD  
    -From 2,000 to 3,500 square feet  
    -Single Family  
    -In Essex County, MA (region_id=1338)  
    -At least 3 bedrooms  
    -Less than 30 days on market  
    
    It is important to note that redfin will only return the top 350 results in the csv file.  Therefore, it will be important to make sure the filtering parameters are appropriate to return a number below this threshold.  
<br>
* <b>Foursquare API</b> (https://developer.foursquare.com/)<br> 
    Foursquare has an API which enables the user to return nearby venues of interest when the latitude and longitude of an address is entered.  We will use the "Venue Recommendations" call.  The information for this API can be found here: https://developer.foursquare.com/docs/api-reference/venues/explore/  
   

### Methodology 

Section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.



### Results 

Section where you discuss the results.

### Discussion

Section where you discuss any observations you noted and any recommendations you can make based on the results.

### Conclusion

Section where you conclude the report.

In [1]:
import pandas as pd
import numpy as np

In [3]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!
