# Capstone Project - Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

## Data <a name="data"></a>

## Methodology <a name="methodology"></a>

In [5]:
import json
import pandas as pd
import numpy as np
import folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
from sklearn.cluster import KMeans
import time

load the data from other notebooks

In [2]:
neighborhoods = pd.read_csv('neighborhoods.csv')
neighborhoods['Name_City'] = neighborhoods['Neighborhood'] + '_' + neighborhoods['City']
neighborhoods.head()

Unnamed: 0,City,State,Neighborhood,Latitude,Longitude,Home Ownership,Home Rentership,Home Value,Median Age,Population Denisty,Name_City
0,Phoenix,AZ,Camelback East,33.5017,-112.003014,0.466472,0.434634,197945,39.076757,6794.026486,Camelback East_Phoenix
1,Phoenix,AZ,South Mountain,33.3823,-112.047551,0.536819,0.375019,93301,29.738356,6029.60411,South Mountain_Phoenix
2,Phoenix,AZ,Estrella,33.4232,-112.197717,0.453543,0.434062,73498,26.823913,6495.667391,Estrella_Phoenix
3,Phoenix,AZ,Laveen,33.3654,-112.163788,0.691265,0.223038,105306,31.428571,2693.5,Laveen_Phoenix
4,Phoenix,AZ,North Mountain,33.5935,-112.100197,0.563129,0.359148,137746,37.183784,7012.575135,North Mountain_Phoenix


## Explore Ballard

Get Ballard's latitute and longitute values

In [3]:
idx = neighborhoods.loc[neighborhoods['Neighborhood']=='Ballard'].index[0]

In [4]:
neighborhood_latitude = neighborhoods.loc[idx, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[idx, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[idx, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ballard are 47.6665, -122.37605187700001.


Create the url for searching foursquare

In [5]:
CLIENT_ID = 'EMDJY1UYY44FU1RGBJIILG5OKSGJMZECJXGYWGT4F0VXDZCF' # your Foursquare ID
CLIENT_SECRET = 'VIW4VZ1FPA0XLH1NZVBARP4FOUTV4UGWF2W42D1XH0IAJGCU' # your Foursquare Secret
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EMDJY1UYY44FU1RGBJIILG5OKSGJMZECJXGYWGT4F0VXDZCF
CLIENT_SECRET:VIW4VZ1FPA0XLH1NZVBARP4FOUTV4UGWF2W42D1XH0IAJGCU


In [6]:
# limit search to 100 venues
limit = 100
# with 500 m of center of neighborhood
radius = 750
# create url
url = f'https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={neighborhood_latitude},{neighborhood_longitude}&radius={radius}&limit={limit}'


send the GET request

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e909803f7706a001bf8f618'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Ballard',
  'headerFullLocation': 'Ballard, Seattle',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 147,
  'suggestedBounds': {'ne': {'lat': 47.67325000675,
    'lng': -122.36604748879174},
   'sw': {'lat': 47.659749993249996, 'lng': -122.38605626520828}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f1c60f5e4b04ae084158528',
       'name': "Reuben's Brews",
       'location': {'address': '5010 14th Ave NW',
        'crossStreet': 'at NW 51st St',
        'lat': 47.665398,
        'lng': -122.37327,
        '

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [9]:
venues = results['response']['groups'][0]['items']
    
# nearby_venues = json_normalize(venues) # flatten JSON
nearby_venues = pd.json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Reuben's Brews,Brewery,47.665398,-122.37327
1,Lagunitas Seattle Taproom & Beer Sanctuary,Brewery,47.664548,-122.378057
2,Mighty-O Donuts,Donut Shop,47.668542,-122.378819
3,Stoup Brewing,Brewery,47.666551,-122.371277
4,Lux Pot Shop,Marijuana Dispensary,47.664877,-122.378686


In [28]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


### This is what makes Ballard Great

In [10]:
nearby_venues.categories.unique()

array(['Brewery', 'Donut Shop', 'Marijuana Dispensary', 'Pet Store',
       'Gaming Cafe', 'Toy / Game Store', 'Vegetarian / Vegan Restaurant',
       'Hot Dog Joint', 'Sandwich Place', 'Beer Bar', 'Yoga Studio',
       'Supermarket', 'Grocery Store', 'Seafood Restaurant',
       'Vietnamese Restaurant', 'Mexican Restaurant',
       'New American Restaurant', 'Clothing Store', 'Post Office',
       'Pizza Place', 'Food & Drink Shop', 'Gymnastics Gym', 'Gym',
       'Rock Club', 'Boutique', 'Coffee Shop', 'Noodle House',
       'Furniture / Home Store', 'Food Truck', 'Wine Bar',
       'Mediterranean Restaurant', 'Hotel', 'French Restaurant', 'Bar',
       'Farmers Market', 'Miscellaneous Shop', 'BBQ Joint',
       'Cocktail Bar', 'Ice Cream Shop', 'Garden Center',
       'Italian Restaurant', 'Spa', 'Sporting Goods Shop', 'Cupcake Shop',
       'Warehouse Store', 'Thai Restaurant', 'Sushi Restaurant',
       'Tea Room', 'Record Shop', 'Salon / Barbershop', 'Dessert Shop',
       'Movie

### Create a function for adding venues to dataframe

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
#         try:
        results = requests.get(url).json()["response"]['groups'][0]['items']
#         except Keyerror:
#             next
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Check Seattle, Portland, and Austin for right now

In [64]:
all_venues = pd.DataFrame(columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category'])

# for city in neighborhoods['City'].unique():
for city in ['Seattle', 'Portland', 'Austin']:
    city_neighborhoods = neighborhoods[neighborhoods['City'] == city]
    city_venue = getNearbyVenues(names=city_neighborhoods['Neighborhood'],
                                   latitudes=city_neighborhoods['Latitude'],
                                   longitudes=city_neighborhoods['Longitude'],
                                 radius = 1000
                                  )
    city_venue['City'] = city
    all_venues = pd.concat([all_venues, city_venue])
    time.sleep(10)
    #trying to pause for time
#     a = input('Insert Pause here for API to catch up')

Pinehurst
Brighton
Whittier Heights
Windermere
Loyal Heights
North Beach
Roosevelt
Pioneer Square
Westlake
Sand Point
South Park
Maple Leaf
Sunset Hill
Beacon Hill
Rainier Beach
Broadmoor
Madison Park
High Point
Interbay
View Ridge
Matthews Beach
Wedgwood
South Lake Union
Fauntleroy
Capitol Hill
First Hill
Arbor Heights
Northgate
Lower Queen Anne
Eastlake
Mount Baker
Haller Lake
Meadowbrook
Downtown
Admiral
North College Park
Queen Anne
Atlantic
Denny-Blaine
Madison Valley
Central District
International District
Industrial District
University District
Blue Ridge
Ballard
Portage Bay
Roxhill
North Delridge
Highland Park
Fremont
Wallingford
Hawthorne Hills
Greenwood
Leschi
Columbia City
Riverview
Montlake
Green Lake
Olympic Hills
Ravenna
Laurelhurst
Crown Hill
Madrona
Broadview
Bitter Lake
Seward Park
Olympic Manor
Bryant
South Delridge
Cedar Park
Victory Heights
Magnolia
Phinney Ridge
West Seattle
Belltown
Alki
Georgetown
Montavilla
Creston-Kenilworth
Downtown
Arbor Lodge
Buckman
Hillsda

In [24]:
city_neighborhoods = neighborhoods[neighborhoods['City'].isin(['Seattle', 'Portland', 'Austin'])]

In [59]:
# save so don't have to pull data again
all_venues.to_csv('all_venues.csv', index = False)

In [12]:
all_venues = pd.read_csv('all_venues.csv')
all_venues['Neighborhood_City'] = all_venues['Neighborhood'] + '_' + all_venues['City']

In [13]:
all_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,City,Neighborhood_City
0,Pinehurst,47.7229,-122.319553,Java Jane,47.722454,-122.316508,Coffee Shop,Seattle,Pinehurst_Seattle
1,Pinehurst,47.7229,-122.319553,Jackson Park Golf Club,47.728095,-122.316520,Golf Course,Seattle,Pinehurst_Seattle
2,Pinehurst,47.7229,-122.319553,Northacres Park,47.721685,-122.328555,Park,Seattle,Pinehurst_Seattle
3,Pinehurst,47.7229,-122.319553,Chaiyo Thai,47.715476,-122.312733,Thai Restaurant,Seattle,Pinehurst_Seattle
4,Pinehurst,47.7229,-122.319553,Seattle Drum School,47.719770,-122.312640,Rock Club,Seattle,Pinehurst_Seattle
...,...,...,...,...,...,...,...,...,...
9535,Zilker,30.2553,-97.769013,Lamar Union Gym,30.255751,-97.761829,Gym / Fitness Center,Austin,Zilker_Austin
9536,Zilker,30.2553,-97.769013,Jim Jim's Water Ice,30.251314,-97.774607,Dessert Shop,Austin,Zilker_Austin
9537,Zilker,30.2553,-97.769013,Sports Fields and Dog Walk in Zilker Park,30.261789,-97.769873,Athletics & Sports,Austin,Zilker_Austin
9538,Zilker,30.2553,-97.769013,Fusion Fitness,30.254873,-97.761447,Gym,Austin,Zilker_Austin


In [14]:
# one hot encoding
all_cities_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_cities_onehot['Neighborhood'] = all_venues['Neighborhood_City'] 
# all_cities_onehot['City'] = all_venues['City'] 

# move neighborhood column to the first column
fixed_columns = [all_cities_onehot.columns[-1]] + list(all_cities_onehot.columns[:-1])
all_cities_onehot = all_cities_onehot[fixed_columns]

all_cities_onehot.head()

Unnamed: 0,Zoo Exhibit,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Gate,Airport Service,Airport Terminal,Alternative Healer,...,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [33]:
all_cities_grouped = all_cities_onehot.groupby('Neighborhood').mean().reset_index()
all_cities_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Gate,Airport Service,Airport Terminal,...,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Admiral_Seattle,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.00,0.000000,0.0
1,Alameda_Portland,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.000000,0.0
2,Alki_Seattle,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.000000,0.0
3,Allandale_Austin,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.000000,0.0
4,Arbor Heights_Seattle,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,Windsor Road_Austin,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.047619,0.0
197,Woodlawn_Portland,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.030303,0.0
198,Woodstock_Portland,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.00,0.000000,0.0
199,Wooten_Austin,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.00,0.014085,0.0


In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = all_cities_grouped['Neighborhood']

for ind in np.arange(all_cities_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(all_cities_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiral_Seattle,Coffee Shop,Pub,Salon / Barbershop,Grocery Store,Pizza Place,Park,American Restaurant,Market,Flower Shop,Frozen Yogurt Shop
1,Alameda_Portland,Optical Shop,Garden Center,Italian Restaurant,Rental Car Location,Park,Bus Line,Pet Store,Coffee Shop,Soccer Field,Automotive Shop
2,Alki_Seattle,Coffee Shop,Ice Cream Shop,Trail,Park,Food Truck,Seafood Restaurant,Beach,Thai Restaurant,Mexican Restaurant,Burger Joint
3,Allandale_Austin,Bar,Food Truck,Baseball Field,Supermarket,Dive Bar,Bakery,Garden Center,New American Restaurant,Beer Bar,Fried Chicken Joint
4,Arbor Heights_Seattle,Other Repair Shop,Pool,Home Service,Scenic Lookout,Trail,Factory,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School


Add the demographic information to the top venues

Use the 7 clusters found in the Seattle area

In [38]:
city_neighborhoods.head()

Unnamed: 0,City,State,Neighborhood,Latitude,Longitude,Home Ownership,Home Rentership,Home Value,Median Age,Population Denisty,Name_City
15,Portland,OR,Montavilla,45.5202,-122.578332,0.561351,0.376216,225353,36.86,7654.348,Montavilla_Portland
16,Portland,OR,Creston-Kenilworth,45.4937,-122.619948,0.448566,0.48963,237699,37.0,10128.109091,Creston-Kenilworth_Portland
17,Portland,OR,Downtown,45.514,-122.678656,0.139921,0.758449,441455,39.092308,12555.6,Downtown_Portland
18,Portland,OR,Arbor Lodge,45.572,-122.692011,0.634859,0.303814,215261,37.705556,7022.938889,Arbor Lodge_Portland
19,Portland,OR,Buckman,45.5175,-122.653586,0.228697,0.692664,415137,35.766667,9352.073333,Buckman_Portland


In [39]:
# add the census demographics back to data using pd merge
all_cities_grouped = pd.merge(left = all_cities_grouped, 
                              right = city_neighborhoods[[
                                  'Name_City','Home Ownership', 'Home Rentership', 'Home Value','Median Age', 'Population Denisty']],
                             left_on = 'Neighborhood', 
                             right_on = 'Name_City',
                             how = 'left')
all_cities_grouped.head()

In [41]:
all_cities_grouped.head()

Unnamed: 0,Neighborhood,Zoo Exhibit,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Gate,Airport Service,Airport Terminal,...,Wings Joint,Women's Store,Yoga Studio,Zoo,Name_City,Home Ownership,Home Rentership,Home Value,Median Age,Population Denisty
0,Admiral_Seattle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,Admiral_Seattle,0.610664,0.327914,531914,43.736842,7051.247368
1,Alameda_Portland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,Alameda_Portland,0.753217,0.208982,423277,42.245833,8170.625
2,Alki_Seattle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,Alki_Seattle,0.530444,0.401738,618466,45.528571,7179.071429
3,Allandale_Austin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,Allandale_Austin,0.471473,0.472916,223863,40.034783,5082.065217
4,Arbor Heights_Seattle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,Arbor Heights_Seattle,0.732925,0.208749,394737,44.683333,5354.6


In [42]:
all_cities_grouped_clustering = all_cities_grouped.drop(['Neighborhood', 'Name_City'], 1)


In [43]:
# set number of clusters
kclusters = 7



# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(all_cities_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 6, 5, 4, 5, 0, 0, 4, 0], dtype=int32)

In [44]:
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Seattle are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Seattle are 47.6038321, -122.3300624.


In [57]:
# add clustering labels
# neighborhoods_venues_sorted.drop(columns = 'Cluster Labels', axis = 1, inplace = True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_ +1) # moving cluster labels up by 1 so it looks nicer in the Legend

all_cities_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
all_cities_merged = all_cities_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Name_City')

all_cities_merged.head() # check the last columns!

Unnamed: 0,City,State,Neighborhood,Latitude,Longitude,Home Ownership,Home Rentership,Home Value,Median Age,Population Denisty,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Phoenix,AZ,Camelback East,33.5017,-112.003014,0.466472,0.434634,197945,39.076757,6794.026486,...,,,,,,,,,,
1,Phoenix,AZ,South Mountain,33.3823,-112.047551,0.536819,0.375019,93301,29.738356,6029.60411,...,,,,,,,,,,
2,Phoenix,AZ,Estrella,33.4232,-112.197717,0.453543,0.434062,73498,26.823913,6495.667391,...,,,,,,,,,,
3,Phoenix,AZ,Laveen,33.3654,-112.163788,0.691265,0.223038,105306,31.428571,2693.5,...,,,,,,,,,,
4,Phoenix,AZ,North Mountain,33.5935,-112.100197,0.563129,0.359148,137746,37.183784,7012.575135,...,,,,,,,,,,


In [63]:
all_cities_merged = all_cities_merged[all_cities_merged['City'].isin(['Portland', 'Austin', 'Seattle'])]

map the clusters with choropleth to indicate the cluster. add in a tooltip so the neighborhood can be identified

In [64]:
with open('Austin_geo.json') as json_data:
    austin_geo = json.load(json_data)

# with open('Denver_geo.json') as json_data:
#     denver_geo = json.load(json_data)

# with open('Phoenix_geo.json') as json_data:
#     phoenix_geo = json.load(json_data)

with open('Portland_geo.json') as json_data:
    portland_geo = json.load(json_data)

with open('Seattle_geo.json') as json_data:
    seattle_geo = json.load(json_data)

In [98]:
def choropleth_cluster_map(latitude, longitude, data, col_labels, geo_json):
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11,tiles='cartodbpositron')
    folium.Choropleth(
    geo_data = geo_json,
    data = data,
    columns = col_labels,
    key_on = 'feature.id',
    fill_color = 'Set1',
    line_opacty = 0.2,
    legend_name = 'Cluster'
    
    ).add_to(map_clusters)
    
    for lat, lon, poi, cluster in zip(data['Latitude'], data['Longitude'], data['Neighborhood'], data['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=3,
            popup=label,
            color='black',
            fill=False).add_to(map_clusters)
    # folium.LayerControl().add_to(map_clusters)

    return map_clusters

In [102]:
for city, geo_json, state in zip(['Seattle', 'Austin', 'Portland'], 
                                 [seattle_geo, austin_geo, portland_geo], 
                                 ['WA', 'TX', 'OR']):
    address = city + ', ' + state

    geolocator = Nominatim(user_agent="map_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print(f'The geograpical coordinate of {address} are {latitude}, {longitude}.')

    choropleth_cluster_map(latitude, longitude, 
                   all_cities_merged[all_cities_merged['City'] == city],
                  ['Neighborhood', 'Cluster Labels'],
                  geo_json)
    

The geograpical coordinate of Seattle, WA are 47.6038321, -122.3300624.
The geograpical coordinate of Austin, TX are 30.2711286, -97.7436995.
The geograpical coordinate of Portland, OR are 45.5202471, -122.6741949.


In [114]:
address = 'Seattle, WA'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of {address} are {latitude}, {longitude}.')

choropleth_cluster_map(latitude, longitude, 
                       all_cities_merged[all_cities_merged['City'] == 'Seattle'],
                      ['Neighborhood', 'Cluster Labels'],
                      seattle_geo)

The geograpical coordinate of Seattle, WA are 47.6038321, -122.3300624.


In [116]:
address = 'Austin, TX'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of {address} are {latitude}, {longitude}.')

choropleth_cluster_map(latitude, longitude, 
                       all_cities_merged[all_cities_merged['City'] == 'Austin'],
                      ['Neighborhood', 'Cluster Labels'],
                      austin_geo).save('austin.html')

The geograpical coordinate of Austin, TX are 30.2711286, -97.7436995.


In [106]:
address = 'Portland, OR'

geolocator = Nominatim(user_agent="sea_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'The geograpical coordinate of {address} are {latitude}, {longitude}.')

choropleth_cluster_map(latitude, longitude, 
                       all_cities_merged[all_cities_merged['City'] == 'Portland'],
                      ['Neighborhood', 'Cluster Labels'],
                      portland_geo)

The geograpical coordinate of Portland, OR are 45.5202471, -122.6741949.


In [113]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11,tiles='cartodbpositron')

folium.Choropleth(
    geo_data = seattle_geo,
    data = seattle_merged,
    columns = ['Neighborhood', 'Cluster Labels'],
    key_on = 'feature.id',
    fill_color = 'Set1',
    line_opacty = 0.2,
    legend_name = 'Cluster'
    
).add_to(map_clusters)

for lat, lon, poi, cluster in zip(seattle_merged['Latitude'], seattle_merged['Longitude'], seattle_merged['Neighborhood'], seattle_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color='black',
        fill=False).add_to(map_clusters)
# folium.LayerControl().add_to(map_clusters)

map_clusters


NameError: name 'seattle_merged' is not defined


<filename="./austin.html" />

display(HTML('<iframe src=' + austin.html + ' width=100% height=1000></iframe>'))



## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>