# <center> Coursera Capstone Project </center>

## <center> Where Should I move? </center>

![alt text][logo]

[logo]: https://i2.wp.com/movingtips.wpengine.com/wp-content/uploads/2019/02/moving-boxes-crosscountry.jpg "Picture from Moving.com"


### Introduction/Business Problem
The stakeholders in this case are my girlfriend and I. We have both recently graduated from college and have been debating where to live. We want to live somewhere near family, but also somewhere that is similar to our current situation. We live in Downtown Frederick at the moment and we really enjoy it. We have to move for her graduate program soon though and want to make the best decision on where to move.

With the knowledge that I've gained throughout these courses, I thought that it would be interesting to apply data science skills and the Foursquare location data to make a more informed decision. I have talked with her and she has helped me create a list of possible places that we could live. I want to perform a cluster analysis on these cities to see which of them tend to be like Downtown Frederick.

With the knowledge of which places are more similar to Frederick, we can make a more informed decision that will lead to greater satisfaction with our move. If there are others in a same situation, this mehtod may help you find similar living spaces.


### Data
This project requires that we use the Foursquare location data, which is perfect for the problem at hand. The Foursquare explore endpoint is of particular interest for this project. The explore endpoint allows for the user to specify where to search (latitude and longitude), how far around the given point to search (radius), how many venues to retreive (limit), and how to choose those top values (relevance, popularity, or distance). This call to the API will return the venues and information that meets all of those parameters. The venue information will include the location, category, and tips, among other things. Category is the main feature that we are interested in.

I hope to make a search for each of these cities on my list to obtain the top 100 most popular venues within a 5 kilometer radius of the center. I will then find the proportion of the categories for each city and use that as the feature set for the clustering algorithm. I will use K-means clustering to determine similar groups of cities based on their venues and then examine to see which group contains Frederick. Looking into that group will give me more information about what Frederick looks like and what similar cities tend to look like as well based on the venues that they have around.




## Let's begin the analysis process

After talking to my girlfriend we decided on a list of possibilities for living locations. Let's load these in and attach location coordinates to each. (Also let's load in some libraries that we will need!)



In [1]:
# Standard library for working with tables and data
import pandas as pd

# Standard library for working with arrays and numerical data
import numpy as np

# Library for creating visual maps
import folium

# Library for geocoding
from geopy.geocoders import Nominatim

# Library for working with requests to urls
import requests

# Library to get the API credenials stored on my computer
import os

# Library for visualizations
import matplotlib.pyplot as plt

# Library for the K-means cluster algorithm used later
from sklearn.cluster import KMeans

In [2]:
# Load in the Possible Locations Table from the repository
df_url = 'https://raw.githubusercontent.com/BradenSmolko/Coursera_Capstone/master/PossibleLocations.csv'
locations_df = pd.read_csv(df_url)
locations_df.head()

Unnamed: 0,City,State
0,Frederick,MD
1,Annapolis,MD
2,Hagerstown,MD
3,Towson,MD
4,Hampden,MD


### Geocode possible locations
Use geopy to geocode the locations that we are considering for our move and make an initial map to visualize these locations.

In [3]:
# Use geopy's geocoder to acquire coordinates of cities

geolocator = Nominatim(user_agent = "capstone_project")
latitudes = []
longitudes = []

for city, state in zip(locations_df['City'], locations_df['State']):
    city_lat = None
    city_lng = None
    
    location = geolocator.geocode('{}, {}'.format(city, state))
    try:
        city_lat = location.latitude
        city_lng = location.longitude
    except:
        city_lat = None
        city_lat = None
        print('Error occured with {}, {}'.format(city, state))
        
    latitudes.append(city_lat)
    longitudes.append(city_lng)

In [4]:
# Add the coordinates to the locations_df
locations_df['Latitude'] = latitudes
locations_df['Longitude'] = longitudes

locations_df.head()

Unnamed: 0,City,State,Latitude,Longitude
0,Frederick,MD,39.414219,-77.410927
1,Annapolis,MD,38.97864,-76.492786
2,Hagerstown,MD,39.641922,-77.720264
3,Towson,MD,39.401855,-76.602388
4,Hampden,MD,39.33094,-76.634969


In [5]:
# Make a map object
my_map = folium.Map(location = [39.22, -77.09], zoom_start = 9., tiles = "Stamen Terrain")

# Add City markers to the map
for lat, lng, city, state in zip(locations_df['Latitude'], locations_df['Longitude'], locations_df['City'], locations_df['State']):
    label = '{}, {}'.format(city, state)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#ADD8E6',
        fill_opacity=0.6,
        parse_html=False).add_to(my_map)

my_map

### Collect venue data and perform cluster analysis

Using the Foursquare API, let's collect venue data surrounding these different cities and use the categories of the venues for cluster analysis. This will give us an idea of which location options are similar and which group Frederick falls into.