### Introduction: Business Problem 

This project we will determine the **perfect neighborhood for my family’s next apartment**. As I am a biased party the report will be targeted at my partner and dog, and their interest in moving to area’s that are remarkably similar to our current neighborhood. 

We currently love our community but hate our commute, as such this project will identify several Western Washington neighborhoods that have the same distribution of amenities **and** are closer to our respective workplaces. 

No one likes a long commute and we have decided that we no longer need to spend hours in the car to support our lifestyle. **With the power of Foursquare, Pandas and Folium** I will create a set of concise images that will guide our apartment hunt!

### Data 

Fist I will need to define the characteristics of my current neighborhood using Foursquare
- List of the top 20 activities
- Find the average distance between a few key sites (dog parks)
- Fit this data to a polynomial model for later comparison
- Define a working drive time threshold from our respective worksites

Second, I will need to find similar neighborhoods and the Bing.com/maps api to calculate drive time
- Foursquare will be leveraged to categorizes all area codes in our search area
- Data will be tested against the current neighborhood model  using r^2
- I will use the Bing Maps api to calculate drive times of the top 20 zip codes ranked by r^2

Lastly, I will display the data in a folium map
- Prospective neighborhoods will be grouped using scikit learn K-means
- Data will be plotted in a visual format using Folium
- Key information will be available as a popup summary

This information will empower my partner to identify the perfect place to move.

### Methodology

After the data was in hand, I used a for loop to drop all area’s that did not match our current Model Neighborhood. One method that I abandoned was the use of a r^2 to match potential neighborhoods. The client (my partner) decided that it wasn’t important that we have exactly the same number of coffee shops and parks, rather that we have at least one of several categories.

### Results

0f 134 neighborhoods within our search area only 25 were similar to our Model Neighborhood. Several results in South Seattle and Bellevue stand out because the travel times to work and school are approximately the same. 
### Discussion

By comparing frequently occurring venues types in the Model Neighborhood with those of Neighborhoods close to our daily destinations I was able to affectively short list the house hunt. Ultimately, we decided to move to Bellevue because of the available housing stock. In the future housing prices can be incorporated into the process.

This process can be made into a *next neighborhood recommender* by adding a few additional steps to capture users current home, their daily destinations and their desired travel times.


### Download the necessary resources

In [1]:
import numpy as np 
import pandas as pd
import json 
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 
import geopy.distance

import requests 
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
! pip install folium
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [2]:
#get the the location of our home
postalcode = '98059'

geolocator = Nominatim(user_agent="Wester Wa Exploer2")
location = geolocator.geocode(postalcode)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Postal Code {} is {}, {}.'.format(postalcode, latitude, longitude))

Work = '21218 76th Ave S, Kent, WA 98032'

geolocator = Nominatim(user_agent="Wester Wa Exploer")
W_location = geolocator.geocode(Work)
W_latitude = W_location.latitude
W_longitude = W_location.longitude
print('The geograpical coordinate of WOrk {} are {}, {}.'.format(Work, W_latitude, W_longitude))

School = '4105 George Washington Lane Northeast, Seattle, WA 98105'

geolocator = Nominatim(user_agent="Wester Wa Exploer")
S_location = geolocator.geocode(School)
S_latitude = S_location.latitude
S_longitude = S_location.longitude
print('The geograpical coordinate of School {} are {}, {}.'.format(School, S_latitude, S_longitude))

#find the distance between work and school

W_coords = (W_latitude, W_longitude)
S_coords = (S_latitude, S_longitude)

dis = geopy.distance.geodesic(W_coords, S_coords).miles
settle_distance = dis*1.25

print('the ditsance between work and school is {} miles and we will travel up to {} total miles.'.format(dis, settle_distance))

The geograpical coordinate of Postal Code 98059 is 47.48994828640541, -122.13755849835039.
The geograpical coordinate of WOrk 21218 76th Ave S, Kent, WA 98032 are 47.4101507, -122.2373393.
The geograpical coordinate of School 4105 George Washington Lane Northeast, Seattle, WA 98105 are 47.6572707, -122.30970970694753.
the ditsance between work and school is 17.40476895692529 miles and we will travel up to 21.755961196156616 total miles.


### getting the lat longs of all zips in the area

In [3]:
path = '/resources/labs/fina/zipss.csv'
zips= pd.read_csv(path)

In [4]:
local = pd.DataFrame()
local['Lat'] = ''
local['Long'] = ''
 
### test the system
postalcode = zips['Code'].iloc[0]

geolocator = Nominatim(user_agent="Wester Wa Exploer2")
location = geolocator.geocode(postalcode)
latitude = location.latitude
longitude = location.longitude
print('Lat: {}, Long {}' .format(latitude, longitude))


Lat: 47.31278529997836, Long -122.27204181178253


In [5]:
for i in range(len(zips)):
    geolocator = Nominatim(user_agent="Wester Wa Exploer4", timeout=3)
    location = geolocator.geocode(zips['Code'].iloc[i])
    if location is not None:
        latitude = location.latitude
        longitude = location.longitude
        local = local.append({'Lat': latitude, 'Long': longitude}, ignore_index=True)
    else:
        local = local.append({'Lat': 'NAN', 'Long': 'BANANA'}, ignore_index=True)

In [6]:
df_wa = pd.merge(zips, local, left_index = True, right_index = True)
print(df_wa.shape)
df_wa = df_wa[~df_wa['Lat'].isin(['NAN'])]
df_wa = df_wa[~df_wa['Long'].isin(['BANANA'])]
df_wa=df_wa.loc[(df_wa['Long'] < 0.0)]
df_wa=df_wa.loc[(df_wa['Lat'] < 48.4)]
print(df_wa.shape)

(187, 5)
(134, 5)


In [7]:
#### find the distance between my locations

distance = pd.DataFrame()
distance['d_work']=''
distance['d_school']=''
distance

print(W_coords, S_coords, settle_distance)

(47.4101507, -122.2373393) (47.6572707, -122.30970970694753) 21.755961196156616


In [8]:
T_coords = df_wa[['Lat', 'Long']]
print(geopy.distance.geodesic(W_coords, T_coords.iloc[0]).miles)
print(geopy.distance.geodesic(S_coords, T_coords.iloc[0]).miles)

6.9206965329612835
23.863793299885717


In [9]:
for i in range(len(df_wa)):
    distance = distance.append({'d_work': geopy.distance.geodesic(W_coords, T_coords.iloc[i]).miles, 'd_school': geopy.distance.geodesic(S_coords, T_coords.iloc[i]).miles}, ignore_index=True)

df_wa=pd.merge(df_wa, distance, left_index = True, right_index = True, how = 'left')
df_wa = df_wa.reset_index()
df_wa = df_wa.drop(['index'], axis = 1)
print(df_wa.shape)

(134, 7)


In [10]:
ok = pd.DataFrame()
ok['ok']=''

for i in range(len(df_wa)):
    ok = ok.append({'ok':df_wa.loc[i]['d_work']<settle_distance and df_wa.loc[i]['d_school']<settle_distance}, ignore_index=True)


df_wa=pd.merge(df_wa, ok, left_index = True, right_index = True, how = 'left')


potential = df_wa.loc[(df_wa['ok']==True)]
potential = potential.reset_index()
potential = potential.drop(['index'], axis=1)

print(potential.shape)
potential.head()

(53, 8)


Unnamed: 0,Code,City,County,Lat,Long,d_work,d_school,ok
0,98004,Bellevue,King,47.6245,-122.201,14.90493,5.564004,True
1,98005,Bellevue,King,47.6121,-122.164,14.366694,7.46813,True
2,98006,Bellevue,King,47.563,-122.159,11.180594,9.588843,True
3,98007,Bellevue,King,47.6098,-122.147,14.43379,8.290301,True
4,98008,Bellevue,King,47.6344,-122.125,16.3588,8.764045,True


### get my Foursquare data for our current home

In [11]:
CLIENT_ID = 'FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL'
CLIENT_SECRET = '5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O'
VERSION = '20200101'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL
CLIENT_SECRET:5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O


In [12]:
url1 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, LIMIT)
results1 = requests.get(url1).json()

# assign relevant part of JSON to venues
venues = results1['response']['groups'][0]['items']
nearby_venues = pd.json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.postalCode','venue.location.lat', 'venue.location.lng']
dataframe_filtered = nearby_venues.loc[:, filtered_columns]

In [13]:
category = pd.DataFrame()
category['type']=''

for i in range(len(dataframe_filtered)):
    data = (dataframe_filtered.iloc[i]['venue.categories'][0]['name'])
    category = category.append({'type': data.split()}, ignore_index = True)

import matplotlib.pyplot as plt

category['type as string']=''


test = pd.DataFrame()

test['type as string']=''


for i in range(len(category)):
    data = (category.iloc[i]['type'])
    test = test.append({'type as string': str(category.iloc[i]['type'])}, ignore_index=True)
  
fin = pd.merge(category, test, left_index = True, right_index = True)

final = fin.groupby('type as string_y').count()

final = final.sort_values(by='type', ascending = False)

In [14]:
column=pd.DataFrame()
column=final.drop(['type as string_x'], axis = 1)
#column=column.reset_index()
column=column.transpose()
column=column.rename({'type as string_y':'venue type', 'type': 'count'})
column.columns = column.columns.str.replace('[', '').str.replace(']', '').str.replace("'", '').str.replace(",", '')


model = pd.DataFrame()
model = column[['Coffee Shop', 'Mexican Restaurant', 'Grocery Store', 'Brewery',
      'Pet Store', 'Café', 'Bakery', 'Supermarket', 'Trail', 'Park',
      'Gastropub']]
model.head()


type as string_y,Coffee Shop,Mexican Restaurant,Grocery Store,Brewery,Pet Store,Café,Bakery,Supermarket,Trail,Park,Gastropub
count,12,4,6,5,2,2,2,1,1,1,1


### Get data for the relevent area codes

In [15]:
potential.head(2)

Unnamed: 0,Code,City,County,Lat,Long,d_work,d_school,ok
0,98004,Bellevue,King,47.6245,-122.201,14.90493,5.564004,True
1,98005,Bellevue,King,47.6121,-122.164,14.366694,7.46813,True


In [16]:
import requests
import json
bingkey='AtIG73fsNaAkmjh6G6-RGJJ4TtGEFva81aF6bOHrfjWzA4HvmRL2OBITa_udr65s'
start = '47.6044,-122.33456'
end = '45.5347,-122.6231'
driving = 'driving'
public = 'transit'
path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(start, end, driving, bingkey)
r = requests.get(path)

d=r.json()

d

print(d['resourceSets'])

[{'estimatedTotal': 1, 'resources': [{'__type': 'DistanceMatrix:http://schemas.microsoft.com/search/local/ws/rest/v1', 'destinations': [{'latitude': 45.5347, 'longitude': -122.6231}], 'errorMessage': 'Request completed.', 'origins': [{'latitude': 47.6044, 'longitude': -122.33456}], 'results': [{'destinationIndex': 0, 'originIndex': 0, 'totalWalkDuration': 0, 'travelDistance': 281.454, 'travelDuration': 161.217}]}]}]


In [17]:
travel_time = (d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration'])/60
travel_time = float("{:.2f}".format(travel_time))
print ('travel time is {} hours.'.format(travel_time))

travel time is 2.69 hours.


In [18]:
W_lat = str(float("{:.5f}".format(W_latitude)))
W_long = str(float("{:.5f}".format(W_longitude)))
S_lat = str(float('{:.5f}'.format(S_latitude)))
S_long = str(float('{:.5f}'.format(S_longitude)))

bingkey='AtIG73fsNaAkmjh6G6-RGJJ4TtGEFva81aF6bOHrfjWzA4HvmRL2OBITa_udr65s'
start = '47.6044,-122.33456'
end = '45.5347,-122.6231'
Work = '{},{}'.format(W_lat,W_long)
School = '{},{}'.format(S_lat,S_long)
Home = '47.52997,-122.20971'

driving = 'driving'
public = 'transit'
path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(Home, School, driving, bingkey)
r = requests.get(path)

d=r.json()

In [19]:
Travel_time = d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration']
Travel_time

23.483

In [20]:
location = '{},{}'.format(potential['Lat'].iloc[0],potential['Long'].iloc[0])
print(location)

path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(location, Work, driving, bingkey)
r = requests.get(path)
d=r.json()
Travel_time1 = d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration']
Travel_time1

47.62448015,-122.20086735


23.983

In [21]:
path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(location, School, public, bingkey)
r = requests.get(path)
d=r.json()
Travel_time2 = d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration']
Travel_time2

37

In [22]:
adding_times = pd.DataFrame()
adding_times['T_work'] = ''
adding_times['T_school']=''

bingkey='AtIG73fsNaAkmjh6G6-RGJJ4TtGEFva81aF6bOHrfjWzA4HvmRL2OBITa_udr65s'
Work = '{},{}'.format(W_lat,W_long)
School = '{},{}'.format(S_lat,S_long)
driving = 'driving'
public = 'transit'

for i in range(len(potential)):
    location = '{},{}'.format(potential['Lat'].iloc[i],potential['Long'].iloc[i])
    
    path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(location, Work, driving, bingkey)
    r = requests.get(path)
    d=r.json()
    Travel_time1 = d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration']
    
    path='https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix?origins={}&destinations={}&travelMode={}&key={}'.format(location, School, public, bingkey)
    r = requests.get(path)
    d=r.json()
    Travel_time2 = d['resourceSets'][0]['resources'][0]['results'][0]['travelDuration']
    
    adding_times = adding_times.append({'T_work': Travel_time1, 'T_school': Travel_time2}, ignore_index=True)


potential = pd.merge(potential, adding_times, left_index = True, right_index = True, how = 'left')

In [23]:
potential.shape

(53, 10)

In [24]:
potential = potential[(potential != 0).all(1)]
potential = potential[(potential['T_work'] < 45)]
potential = potential[(potential['T_school'] < 45)]
potential = potential[(potential['T_school'] > 0)]
potential = potential.reset_index()
potential = potential.drop(['index'], axis = 1)

In [25]:
print(potential.shape)
potential

(25, 10)


Unnamed: 0,Code,City,County,Lat,Long,d_work,d_school,ok,T_work,T_school
0,98004,Bellevue,King,47.6245,-122.201,14.90493,5.564004,True,23.983,37.0
1,98008,Bellevue,King,47.6344,-122.125,16.3588,8.764045,True,28.183,43.0
2,98009,Bellevue,King,47.6101,-122.186,14.016691,6.618294,True,21.233,32.0
3,98015,Bellevue,King,47.6608,-122.324,17.785628,0.722659,True,24.717,11.0
4,98028,Kenmore,King,47.7569,-122.242,3.492606,19.814766,True,35.25,37.0
5,98043,Mountlake Terrace,Snohomish,47.7949,-122.314,7.507113,10.84388,True,37.283,36.0
6,98056,Renton,King,47.5168,-122.206,15.337432,13.825459,True,14.233,43.0
7,98087,Lynnwood,Snohomish,47.8556,-122.27,20.057007,3.05709,True,40.417,43.0
8,98089,Kent,King,47.3915,-122.294,19.201249,1.974525,True,11.333,42.0
9,98102,Seattle,King,47.6216,-122.321,17.059902,2.916639,True,25.567,18.0


### Compare with the model

In [26]:
CLIENT_ID = 'FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL'
CLIENT_SECRET = '5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O'
VERSION = '20200101'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

url1 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, LIMIT)
results1 = requests.get(url1).json()

# assign relevant part of JSON to venues
venues = results1['response']['groups'][0]['items']
nearby_venues = pd.json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.postalCode','venue.location.lat', 'venue.location.lng']
dataframe_filtered = nearby_venues.loc[:, filtered_columns]

Your credentails:
CLIENT_ID: FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL
CLIENT_SECRET:5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O


In [27]:
CLIENT_ID = 'FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL'
CLIENT_SECRET = '5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O'
VERSION = '20200101'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


latitude = potential['Lat'].iloc[0]
longitude = potential['Long'].iloc[0]
url1 = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, LIMIT)
results2 = requests.get(url1).json()


Your credentails:
CLIENT_ID: FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL
CLIENT_SECRET:5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O


In [28]:
venues = results2['response']['groups'][0]['items']
new_venues = pd.json_normalize(venues)
filters = ['venue.name', 'venue.categories']
new_venues = new_venues.loc[:, filters]
#### testing
new_venues['Code']=potential['Code'].iloc[0]
###
x = str(new_venues.iloc[45]['venue.categories'][0]['name'])
print(x)

Steakhouse


In [29]:
data_cashe=pd.DataFrame()
data_cashe['type']=''

for i in range(len(new_venues)):
    data = str(new_venues.iloc[i]['venue.categories'][0]['name'])
    data_cashe = data_cashe.append({'type':data}, ignore_index=True)
new_venues = pd.merge(new_venues, data_cashe, left_index = True, right_index =True, how = 'left')
new_venues.shape

(100, 4)

In [30]:
newer_venues =  new_venues.groupby(['type']).count()
newer_venues['Code'] = new_venues['Code'].iloc[0]


newer_venues = newer_venues.sort_values(by='venue.name', axis=0,ascending=False)


newer_venues=newer_venues.reset_index()

### Limit the potential towns to ones that match our current town

In [31]:
fatou = pd.DataFrame()
#venues.transpose()
list1 = ('Coffee Shop', 'Mexican Restaurant', 'Grocery Store', 'Brewery', 'Pet Store', 'Café', 'Bakery', 'Supermarket', 'Trail', 'Park')
for i in range(len(newer_venues)):
    if newer_venues['type'].iloc[i] in list1:
        fatou = fatou.append(newer_venues.iloc[i])

In [None]:
CLIENT_ID = 'FD35ZLLUHS5JQBQBK0FDBZXI2ADUFCR02IOU51JMUXIUHDNL'
CLIENT_SECRET = '5WMJDA1OQP4XS0Q1NT5L0MBD32ELTSL32505WWPTTC53GV5O'
VERSION = '20200101'
LIMIT = 100
filters = ['venue.name', 'venue.categories']
fatou = pd.DataFrame()
list1 = ('Coffee Shop', 'Mexican Restaurant', 'Grocery Store', 'Brewery', 'Pet Store', 'Café', 'Bakery', 'Supermarket', 'Trail', 'Park')


for i in range(len(potential)):
    latitude = potential['Lat'].iloc[i]
    longitude = potential['Long'].iloc[i]
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, LIMIT)
    results = requests.get(url).json()
    
    venues = results2['response']['groups'][0]['items']
    new_venues = pd.json_normalize(venues)
    new_venues = new_venues.loc[:, filters]
    #### insert there
    new_venues['Code']=potential['Code'].iloc[i]

    data_cashe=pd.DataFrame()
    data_cashe['type']=''
    
    for i in range(len(new_venues)):
        data = str(new_venues.iloc[i]['venue.categories'][0]['name'])
        data_cashe = data_cashe.append({'type':data}, ignore_index=True)
           
    new_venues = pd.merge(new_venues, data_cashe, left_index = True, right_index =True, how = 'left')
    newer_venues =  new_venues.groupby(['type']).count()
    newer_venues['Code'] = new_venues['Code'].iloc[0]
    newer_venues=newer_venues.reset_index()
    
    for i in range(len(newer_venues)):
        
        if newer_venues['type'].iloc[i] in list1:
            fatou = fatou.append(newer_venues.iloc[i])

fatou.sort_values(by='venue.name', ascending = False)

fatou = fatou.merge(potential, left_on='Code',right_on='Code' )
fatou

### Make the map of the potential sites!

In [None]:
latitude = potential['Lat'].mean()
longitude = potential['Long'].mean()
Sea_Map= folium.Map(location=[latitude, longitude], zoom_start=10)
x = fatou


for lat, long, City, Code, travel1, travel2 in zip(x['Lat'], x['Long'], x['City'], x['Code'], x['T_work'], x['T_school']):
 label = '{}, {} minutes to work, {} Minutes to school'.format(City, travel1, travel2)
 label = folium.Popup(label, parse_html=True)
 folium.CircleMarker(
 [lat, long],
 radius=5,
 popup=label,
 color='green',
 fill=True,
 fill_color='green',
 parse_html=False).add_to(Sea_Map)

In [None]:
Sea_Map