# Chicago and Detroit Neighborhood Similarity Study

### IBM Data Science Professional Certificate Capstone

## Introduction
Two of the largest areas in the Great Lakes region of the United States are Detroit and Chicago.  Many of the area’s larger businesses have ties to both cities and large portions of the population in Chicago originate from Michigan, specifically the metro Detroit region.

Having lived in Chicago for over 10 years, I’m relocating to the Detroit area to be closer to family and friends.  I’d like to research neighborhoods in the Detroit area that are comparable to those of Chicago to give insights as to where I’d like to relocate.

This information may also be used to help tourism in the Detroit area or to attract younger talent and recent college graduates to the area.  Additionally, it may be useful for anyone else potentially interested in relocating from Chicago to Detroit or vice versa.

## Data
Since Chicago is substantially larger by population than Detroit and most of the areas of economic growth in recent years have been in Oakland County, MI, this analysis will include neighborhoods from both Wayne (including Detroit) and Oakland counties.  Additionally, since many areas in Wayne and Oakland counties are suburban, Cook County, IL has also been included for Chicago.  Furthermore, Ann Arbor is relatively close to Detroit and is worth including due to its proximity and economic diversity.

### Data Requirementes
   1. List of neighborhoods
   2. Geographical noordinates for each neighborhood
   3. Foursquare venue data
   4. US Census demographic data
   5. Walkscore data
 
## Methodology
This study makes use of clustering neighborhoods by three primary data sets (i.e., venue data, demographic data, and walkscore data) using Kmeans clustering.  An iterative approach is used to cluster and then re-cluster the data based on new data sets.  More specifically, the data was clustered by a first data set (i.e., venues), those results where then clustered by a second data set (i.e., demographics), and then those results were clustered one more time by a third data set (i.e., walkscore).

## Results and Discussion
This study originally called for using solely Foursquare venue and walkscore data.  However, the results didn’t seem appropriate so other data was sought to try to match the general “vibe’ of the neighborhoods.  Demographic data was added and seems to have been exactly what was needed.   Other data that could have been used includes population density, public transit data, housing units, housing types, crime statistics, and/or other suitable data.

In addition, if some metrics are more desirable than others to a stakeholder they could be weighted to have a greater or reduced impact on the clustering.  For example, if the stakeholder desired to be close to water places, venues such as a Harbor, Lakefront, and/or other data could be boosted by a multiplier to give them a greater weight in the clustering.  Likewise, if the stakeholder was less interested in certain aspects of the neighborhood, those aspects could be reduced by a multiplier to give them a reduced weight in the clustering.  Other metrics are contemplated.

## Conclusion
This study gathered venue, demographic, and walkscore data and iteratively clustered them to reduce and refine the cluster with each iteration.  The resulting four Detroit area neighborhoods are good candidates as being similar to that of the Lakeview neighborhood in Chicago.  As such, the model seems to be successful for purposes of this study.

In [1]:
import json
import pandas as pd
import numpy as np

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import folium # map rendering library

import requests
from pandas.io.json import json_normalize

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from bs4 import BeautifulSoup

import time
import csv

from sklearn import preprocessing

### Get Detroit Neighborhood Data

In [2]:
!wget -q -O 'detroit_data.json' https://opendata.arcgis.com/datasets/a25b7114d233496eaece59a23e31f4b2_0.geojson

In [3]:
with open('detroit_data.json') as json_data:
    detroit_data = json.load(json_data)

In [4]:
neighborhood_data = detroit_data['features']
neighborhood_data[0]

{'type': 'Feature',
 'properties': {'FID': 1,
  'NHOOD': 'Airport',
  'NHOODNUM': 1001,
  'KEYNUM': 2,
  'Shape_Leng': 49431.1170068,
  'Shape_Area': 129268149.654},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-82.9923768091644, 42.3933292805575],
    [-82.9927661361009, 42.3931664389709],
    [-82.9932767216074, 42.3929640854821],
    [-82.9935664512283, 42.3928610479945],
    [-82.9937360889373, 42.3928002669432],
    [-82.9938115456829, 42.3927765923539],
    [-82.9941109390184, 42.3926809546713],
    [-82.9946041282569, 42.392543038472],
    [-82.9949931926692, 42.3924475705616],
    [-82.9954843914603, 42.3923442294892],
    [-82.995943036502, 42.3922687223186],
    [-82.9963263641338, 42.3922105154438],
    [-82.9967192168154, 42.3921660823714],
    [-82.997196749785, 42.3921254009395],
    [-82.9975792576828, 42.3921045156645],
    [-82.9980598151953, 42.3920939166085],
    [-82.999629623018, 42.3921009853528],
    [-83.0000987879278, 42.3921057213616],
    [-83.0006060

### Create DataFrame

In [5]:
col_names = ['Neighborhood', 'Latitude', 'Longitude']
neighborhoods = pd.DataFrame(columns=col_names)
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude


### Populate DataFrame with Detroit Data

In [7]:
for data in neighborhood_data:
   
    neighborhood_name = data['properties']['NHOOD']   
    neighborhood_latlon = data['geometry']['coordinates']
    df_temp = pd.DataFrame(columns=['Lat', 'Long'])
    
    temp = str(neighborhood_latlon).split(',')
    
    iterations = int(len(temp)/2)
    i=0
    while i < iterations:
        df_temp.loc[i] = temp[i+1].replace(']',''), temp[i].replace('[', '')
        i = i + 2

    neighborhood_lat = df_temp['Lat'].astype(float).mean()
    neighborhood_lon = df_temp['Long'].astype(float).mean()
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name+', Detroit, MI',
                                                  'Latitude': neighborhood_lat,
                                                  'Longitude': neighborhood_lon}, ignore_index=True)   

In [8]:
neighborhoods.sort_values(by='Neighborhood').head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Airport, Detroit, MI",42.388475,-83.025065
1,"Bagley, Detroit, MI",42.422256,-83.171482
2,"Boynton, Detroit, MI",42.264908,-83.164444
3,"Brightmoor, Detroit, MI",42.384513,-83.248953
4,"Brooks, Detroit, MI",42.344826,-83.204472


In [9]:
neighborhoods.shape

(54, 3)

### Create Map of Detroit Data

In [10]:
address = 'Detroit, MI'

geolocator = Nominatim(user_agent="det_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of '+address+' are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Detroit, MI are 42.3315509, -83.0466403.


In [11]:
# create map of Detroit using latitude and longitude values
map_detroit = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'].astype(float), neighborhoods['Longitude'].astype(float), neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit)  
    
map_detroit

### Get Wayne County Data

In [13]:
source = requests.get('https://geographic.org/streetview/usa/mi/wayne/index.html').text
source[0:1000]

'ï»¿<!DOCTYPE html>\n<html>\n  <head>\n    <meta charset="UTF-8">\n<META NAME="Description"  CONTENT="Wayne County, Michigan, United States, maps, List of Towns and Cities, Street View, Geographic.org">\n<META NAME="keywords"  CONTENT="Wayne County, Michigan, List of Towns and Cities, maps, United States, Street View, Geographic.org">\n    <title>List of Towns and Cities in Wayne County, Michigan, United States, Maps and Steet Views, Geographic.org</title>\n    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;">\n\t<link rel="canonical" href="https://geographic.org/streetview/usa/mi/wayne/index.html">\n\n<!--Taboola Head Section-->\n    <script type="text/javascript">\n        window._taboola = window._taboola || [];\n        _taboola.push({article:\'auto\'});\n        !function (e, f, u) {\n            e.async = 1;\n            e.src = u;\n            f.parentNode.insertBefore(e, f);\n        }(document.createElement(\'script\'),\n            doc

In [14]:
#Parse data
soup = BeautifulSoup(source,'lxml')

table = soup.find('span', class_='listspan').text
table
table_split = table.split('\n')

#Make pandas dataframe
table_df = pd.DataFrame({'City':table_split})

#Replace empty values with NaN
table_df.replace('', np.nan, inplace=True)

#Replace Detroit with NaN in prep to drop, we have the neighborhoods in another dataset
table_df.replace('Detroit', np.nan, inplace=True)

#Add, MI to city name
table_df['City'] = table_df['City']+', MI'

#Drop NaN
table_df.dropna(inplace=True)

#Reset index
table_df.reset_index(drop=True, inplace=True)
               
table_df.head()

Unnamed: 0,City
0,"Allen Park, MI"
1,"Belleville, MI"
2,"Canton, MI"
3,"Dearborn, MI"
4,"Dearborn Heights, MI"


### Add Wayne County to DataFrame

In [15]:
#Let's add wayne county cities to dataframe

for neighborhood in table_df['City']:
    address = neighborhood.replace(', MI', '') +', Michigan'
    geolocator = Nominatim(user_agent="wayne_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
    except:
        latitude = np.nan
    
    try:
        longitude = location.longitude
    except:
        longitude = np.nan 
        
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood, 
                                    'Latitude': latitude, 
                                    'Longitude': longitude}, ignore_index=True)
#Drop NaN
neighborhoods.dropna(inplace=True)

#Reset index
neighborhoods.reset_index(drop=True, inplace=True)

neighborhoods.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude
80,"Taylor, MI",42.240872,-83.269651
81,"Trenton, MI",42.140655,-83.180054
82,"Wayne, MI",42.268241,-83.284417
83,"Westland, MI",42.323806,-83.400532
84,"Wyandotte, MI",42.200662,-83.151016


### Get Oakland County Data

In [16]:
#Get Data from website
source = requests.get('https://geographic.org/streetview/usa/mi/oakland/index.html').text
source[0:1000]

'ï»¿<!DOCTYPE html>\n<html>\n  <head>\n    <meta charset="UTF-8">\n<META NAME="Description"  CONTENT="Oakland County, Michigan, United States, maps, List of Towns and Cities, Street View, Geographic.org">\n<META NAME="keywords"  CONTENT="Oakland County, Michigan, List of Towns and Cities, maps, United States, Street View, Geographic.org">\n    <title>List of Towns and Cities in Oakland County, Michigan, United States, Maps and Steet Views, Geographic.org</title>\n    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;">\n\t<link rel="canonical" href="https://geographic.org/streetview/usa/mi/oakland/index.html">\n\n<!--Taboola Head Section-->\n    <script type="text/javascript">\n        window._taboola = window._taboola || [];\n        _taboola.push({article:\'auto\'});\n        !function (e, f, u) {\n            e.async = 1;\n            e.src = u;\n            f.parentNode.insertBefore(e, f);\n        }(document.createElement(\'script\'),\n       

In [17]:
#Parse data
soup = BeautifulSoup(source,'lxml')

table = soup.find('span', class_='listspan').text
table
table_split = table.split('\n')

#Make pandas dataframe
table_df = pd.DataFrame({'City':table_split})

#Replace empty values with NaN
table_df.replace('', np.nan, inplace=True)

#Add, MI to city name
table_df['City'] = table_df['City']+', MI'

#Drop NaN
table_df.dropna(inplace=True)

#Reset index
table_df.reset_index(drop=True, inplace=True)
               
table_df.head()

Unnamed: 0,City
0,"Auburn Hills, MI"
1,"Berkley, MI"
2,"Birmingham, MI"
3,"Bloomfield Hills, MI"
4,"Clarkston, MI"


### Add Oakland County to DataFrame

In [18]:
#Let's add oakland county cities to dataframe

for neighborhood in table_df['City']:
    address = neighborhood.replace(', MI', '') +', Michigan'
    geolocator = Nominatim(user_agent="oakland_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
    except:
        latitude = np.nan
    
    try:
        longitude = location.longitude
    except:
        longitude = np.nan 
        
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood, 
                                    'Latitude': latitude, 
                                    'Longitude': longitude}, ignore_index=True)
#Drop NaN
neighborhoods.dropna(inplace=True)

#Drop out of area results from geolocator
neighborhoods.drop(neighborhoods[neighborhoods['Longitude'] < -84].index, inplace=True)
neighborhoods.drop(neighborhoods[neighborhoods['Latitude'] > 43].index, inplace=True)

#Reset index
neighborhoods.reset_index(drop=True, inplace=True)

neighborhoods.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude
118,"Union Lake, MI",42.60651,-83.431046
119,"Walled Lake, MI",42.537811,-83.481048
120,"Waterford, MI",42.702253,-83.402718
121,"White Lake, MI",42.691696,-83.55411
122,"Wixom, MI",42.524773,-83.536334


In [19]:
# create map of metro Detroit using latitude and longitude values
map_detroit_metro = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'].astype(float), neighborhoods['Longitude'].astype(float), neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_metro)  
    
map_detroit_metro

### Get Ann Arbor Data

In [20]:
#Get Data from website
source = requests.get('https://annarborobserver.com/cg/t1300.html').text
source[0:1000]

'\n\n\n\n<!-- t1300.tpl -->\n\n<!-- global-head.cpp -->\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<!-- activity-head.cpp - an Ann Arbor Observer City Guide page -->\n\n<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" \n          "https://www.w3.org/TR/html4/strict.dtd">\n\n<html xmlns="https://www.w3.org/1999/xhtml"\n      xmlns:og="https://ogp.me/ns#"\n      xmlns:fb="https://www.facebook.com/2008/fbml">\n\n<head>\n<title>\n\n\nAnn Arbor Neighborhoods\n\n- Ann Arbor Observer</title>\n\n<!-- Start of Google Analytics -->\n<script>\n  (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n  })(window,document,\'script\',\'https://www.google-analytics.com/analytics.js\',\'ga\');\n\n  ga(\'create\', \'UA-11125878-1\', \'auto\');\n  ga(\'send\', \'pageview\');\n\n</script>\n<!-- End of Googl

In [21]:
#Parse data
soup = BeautifulSoup(source,'lxml')

table = soup.find('select').text
table

table_split = table.split('\n\n')
table_split

#Make pandas dataframe
table_df = pd.DataFrame({'Neighborhood':table_split})

#Replace empty values with NaN
table_df.replace('', np.nan, inplace=True)

#Replace Detroit with NaN in prep to drop, we have the neighborhoods in another dataset
table_df.loc[0] = np.nan

#Drop NaN
table_df.dropna(inplace=True)

#Reset index
table_df.reset_index(drop=True, inplace=True)

#Add, Ann Arbor, MI to city name
table_df['Neighborhood'] = table_df['Neighborhood']+', Ann Arbor, MI'
               
table_df.head()

Unnamed: 0,Neighborhood
0,"Abbot, Ann Arbor, MI"
1,"Allen, Ann Arbor, MI"
2,"Angell, Ann Arbor, MI"
3,"Bach, Ann Arbor, MI"
4,"Bryant-Pattengill, Ann Arbor, MI"


### Add Ann Arbor to DataFrame

In [22]:
#Add Ann Arbor neighborhoods and coordinates to dataframe

for neighborhood in table_df['Neighborhood']:
    address = neighborhood
    geolocator = Nominatim(user_agent="aa_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
    except:
        latitude = np.nan
    
    try:
        longitude = location.longitude
    except:
        longitude = np.nan 
        
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood, 
                                    'Latitude': latitude, 
                                    'Longitude': longitude}, ignore_index=True)
#Drop NaN
neighborhoods.dropna(inplace=True)

#Reset index
neighborhoods.reset_index(drop=True, inplace=True)

neighborhoods.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude
139,"Pittsfield, Ann Arbor, MI",42.250103,-83.692854
140,"Southeast Area, Ann Arbor, MI",42.268157,-83.731229
141,"Southwest Area, Ann Arbor, MI",42.268157,-83.731229
142,"Thurston, Ann Arbor, MI",42.306623,-83.701284
143,"Wines, Ann Arbor, MI",42.292429,-83.769656


### Map Ann Arbor and Metro Detroit

In [23]:
address = 'Livonia, MI'

geolocator = Nominatim(user_agent="chi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of '+address+' are {}, {}.'.format(latitude, longitude))

# create map of metro Detroit using latitude and longitude values
map_detroit_metro = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'].astype(float), neighborhoods['Longitude'].astype(float), neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_metro)  
    
map_detroit_metro

The geograpical coordinates of Livonia, MI are 42.36837, -83.3527097.


### Get Chicago Data

In [24]:
!wget -q -O 'chicago_data.json' 'https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON'

In [25]:
with open('chicago_data.json') as json_data:
    chicago_data = json.load(json_data)

In [27]:
chicago_neighborhood_data = chicago_data['features']
chicago_neighborhood_data[0]

{'type': 'Feature',
 'properties': {'community': 'DOUGLAS',
  'area': '0',
  'shape_area': '46004621.1581',
  'perimeter': '0',
  'area_num_1': '35',
  'area_numbe': '35',
  'comarea_id': '0',
  'comarea': '0',
  'shape_len': '31027.0545098'},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-87.60914087617894, 41.84469250265398],
     [-87.60914874757808, 41.84466159842403],
     [-87.6091611204126, 41.84458961193954],
     [-87.60916766215838, 41.84451717732316],
     [-87.60916860600166, 41.844456260738305],
     [-87.60915012199398, 41.84423871659811],
     [-87.60907241249289, 41.844194738881015],
     [-87.60900627147821, 41.84410646928696],
     [-87.6089650217216, 41.84404345755115],
     [-87.60891566390615, 41.84395529375054],
     [-87.60889980118988, 41.84387361649532],
     [-87.60886701371862, 41.84380438280048],
     [-87.6088514342449, 41.843697606960866],
     [-87.60881089281094, 41.84357184776641],
     [-87.60877127222787, 41.84336451715353],
     [-87.608

### Add Chicago Data to DataFrame

In [28]:
# Add Chicago neighborhoods to dataframe

for data in chicago_neighborhood_data:
   
    neighborhood_name = data['properties']['community']   
    address = neighborhood_name+', Chicago, Illinois'
    geolocator = Nominatim(user_agent="chi_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
    except:
        latitude = np.nan
    
    try:
        longitude = location.longitude
    except:
        longitude = np.nan 
        
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name+', Chicago, IL', 
                                    'Latitude': latitude, 
                                    'Longitude': longitude}, ignore_index=True)
#Drop NaN
neighborhoods.dropna(inplace=True)

#Drop out of area results from geolocator
neighborhoods.drop(neighborhoods[neighborhoods['Longitude'] < -88].index, inplace=True)
neighborhoods.drop(neighborhoods[neighborhoods['Latitude'] > 43].index, inplace=True)

#Reset index
neighborhoods.reset_index(drop=True, inplace=True)

neighborhoods.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude
215,"MOUNT GREENWOOD, Chicago, IL",41.698089,-87.708662
216,"MORGAN PARK, Chicago, IL",41.690312,-87.666716
217,"OHARE, Chicago, IL",41.973101,-87.906768
218,"EDGEWATER, Chicago, IL",41.983369,-87.663952
219,"EDISON PARK, Chicago, IL",42.005733,-87.814004


### Map Chicago

In [29]:
address = 'Chicago, IL'

geolocator = Nominatim(user_agent="chi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of '+address+' are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago, IL are 41.8755616, -87.6244212.


In [30]:
# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'].astype(float), neighborhoods['Longitude'].astype(float), neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

### Get Cook County Data

In [31]:
#Get Data from website
source = requests.get('https://geographic.org/streetview/usa/il/cook/index.html').text
source[0:1000]

'ï»¿<!DOCTYPE html>\n<html>\n  <head>\n    <meta charset="UTF-8">\n<META NAME="Description"  CONTENT="Cook County, Illinois, United States, maps, List of Towns and Cities, Street View, Geographic.org">\n<META NAME="keywords"  CONTENT="Cook County, Illinois, List of Towns and Cities, maps, United States, Street View, Geographic.org">\n    <title>List of Towns and Cities in Cook County, Illinois, United States, Maps and Steet Views, Geographic.org</title>\n    <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;">\n\t<link rel="canonical" href="https://geographic.org/streetview/usa/il/cook/index.html">\n\n<!--Taboola Head Section-->\n    <script type="text/javascript">\n        window._taboola = window._taboola || [];\n        _taboola.push({article:\'auto\'});\n        !function (e, f, u) {\n            e.async = 1;\n            e.src = u;\n            f.parentNode.insertBefore(e, f);\n        }(document.createElement(\'script\'),\n            documen

In [32]:
#Parse data
soup = BeautifulSoup(source,'lxml')

table = soup.find('span', class_='listspan').text
table
table_split = table.split('\n')

#Make pandas dataframe
table_df = pd.DataFrame({'City':table_split})

#Replace empty values with NaN
table_df.replace('', np.nan, inplace=True)

#Replace Chicago with NaN in prep to drop, we have the neighborhoods in another dataset
table_df.replace('Chicago', np.nan, inplace=True)

#Drop NaN
table_df.dropna(inplace=True)

#Add, IL to city name
table_df['City'] = table_df['City']+', IL'

#Reset index
table_df.reset_index(drop=True, inplace=True)
               
table_df.head()

Unnamed: 0,City
0,"Alsip, IL"
1,"Arlington Heights, IL"
2,"Bellwood, IL"
3,"Berkeley, IL"
4,"Berwyn, IL"


### Add Cook County to DataFrame

In [33]:
# Add Cook county cities and coordinates to dataframe

for city in table_df['City']:
    address = city.replace(', IL', '') +', Illinois'
    geolocator = Nominatim(user_agent="cook_explorer")
    location = geolocator.geocode(address)
    
    try:
        latitude = location.latitude
    except:
        latitude = np.nan
    
    try:
        longitude = location.longitude
    except:
        longitude = np.nan 
        
    neighborhoods = neighborhoods.append({'Neighborhood': city, 
                                    'Latitude': latitude, 
                                    'Longitude': longitude}, ignore_index=True)
#Drop NaN
neighborhoods.dropna(inplace=True)

#Reset index
neighborhoods.reset_index(drop=True, inplace=True)

neighborhoods.tail()

Unnamed: 0,Neighborhood,Latitude,Longitude
308,"Wheeling, IL",42.12509,-87.929281
309,"Willow Springs, IL",41.740231,-87.858696
310,"Wilmette, IL",42.075732,-87.719377
311,"Winnetka, IL",42.10807,-87.736529
312,"Worth, IL",41.689755,-87.797275


### Map Chicagoland

In [34]:
address = 'Chicago, IL'

geolocator = Nominatim(user_agent="chi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of '+address+' are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago, IL are 41.8755616, -87.6244212.


In [35]:
# create map of Chicagoland using latitude and longitude values
map_chicagoland = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'].astype(float), neighborhoods['Longitude'].astype(float), neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicagoland)  
    
map_chicagoland

### Define Foursquare Credentials and Version

In [87]:
CLIENT_ID = 'X' # your Foursquare ID
CLIENT_SECRET = 'X' # your Foursquare Secret
VERSION = '20190607' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: X
CLIENT_SECRET:X


### Explore Lake View, Chicago Neighborhood

In [37]:
#nbr_index = neighborhoods.loc[neighborhoods['Neighborhood'].str.contains('LAKE VIEW', case=True)].index
nbr_index=188
neighborhoods.loc[nbr_index, 'Neighborhood']

'LAKE VIEW, Chicago, IL'

In [38]:
neighborhood_latitude = neighborhoods.loc[nbr_index, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[nbr_index, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[nbr_index, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of LAKE VIEW, Chicago, IL are 41.94705, -87.6554287829005.


In [39]:
#Top 100 venues within 500 meters

LIMIT = 100
radius = 500 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [40]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d000352dd57970e25974a12'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Wrigleyville',
  'headerFullLocation': 'Wrigleyville, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 75,
  'suggestedBounds': {'ne': {'lat': 41.9515500045, 'lng': -87.64938975815315},
   'sw': {'lat': 41.942549995499995, 'lng': -87.66146780764785}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '40dcbc80f964a52081011fe3',
       'name': 'Wrigley Field',
       'location': {'address': '1060 W Addison St',
        'crossStreet': 'btwn Sheffield Ave & Clark St',
        'lat': 41.94816011494788,
        'l

In [41]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [42]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wrigley Field,Baseball Stadium,41.94816,-87.655562
1,Budweiser Brickhouse Tavern,Sports Bar,41.948357,-87.657202
2,Gallagher Way,Plaza,41.948296,-87.657066
3,Jeni’s Splendid Ice Creams,Ice Cream Shop,41.948472,-87.657353
4,Starbucks Reserve,Coffee Shop,41.948295,-87.657207


In [43]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

75 venues were returned by Foursquare.


### Explore Neighborhoods in DataFrame

In [47]:
def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [48]:
neighborhood_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                )

Airport, Detroit, MI
Bagley, Detroit, MI
Boynton, Detroit, MI
Brightmoor, Detroit, MI
Brooks, Detroit, MI
Burbank, Detroit, MI
Butzel, Detroit, MI
Central Business District, Detroit, MI
Cerveny / Grandmont, Detroit, MI
Chadsey, Detroit, MI
Chandler Park, Detroit, MI
Cody, Detroit, MI
Condon, Detroit, MI
Conner, Detroit, MI
Corktown, Detroit, MI
Davison, Detroit, MI
Denby, Detroit, MI
Durfee, Detroit, MI
East Riverside, Detroit, MI
Evergreen, Detroit, MI
Finney, Detroit, MI
Grant, Detroit, MI
Greenfield, Detroit, MI
Harmony Village, Detroit, MI
Hubbard Richard, Detroit, MI
Indian Village, Detroit, MI
Jefferson / Mack, Detroit, MI
Jeffries, Detroit, MI
Kettering, Detroit, MI
Lower East Central, Detroit, MI
Lower Woodward, Detroit, MI
Mackenzie, Detroit, MI
McNichols, Detroit, MI
Middle East Central, Detroit, MI
Middle Woodward, Detroit, MI
Mt. Olivet, Detroit, MI
Near East Riverfront, Detroit, MI
Nolan, Detroit, MI
Palmer Park, Detroit, MI
Pembroke, Detroit, MI
Pershing, Detroit, MI
Rosa

In [49]:
#Check size of resulting DataFrame
print(neighborhood_venues.shape)
neighborhood_venues.head()

(4578, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Airport, Detroit, MI",42.388475,-83.025065,Hong Moy,42.38659,-83.025033,Chinese Restaurant
1,"Airport, Detroit, MI",42.388475,-83.025065,Miller & Van Dyke Avenues,42.390422,-83.022587,Intersection
2,"Airport, Detroit, MI",42.388475,-83.025065,Van Dyke Liquor Market,42.387699,-83.0214,Liquor Store
3,"Airport, Detroit, MI",42.388475,-83.025065,Mark's Motown,42.390605,-83.02246,Food
4,"Bagley, Detroit, MI",42.422256,-83.171482,Northwest Activities Center,42.423385,-83.169607,Gym / Fitness Center


### How many venues for each Neighborhood

In [50]:
neighborhood_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"ALBANY PARK, Chicago, IL",16,16,16,16,16,16
"ARCHER HEIGHTS, Chicago, IL",25,25,25,25,25,25
"ARMOUR SQUARE, Chicago, IL",11,11,11,11,11,11
"ASHBURN, Chicago, IL",5,5,5,5,5,5
"AUBURN GRESHAM, Chicago, IL",7,7,7,7,7,7
"AUSTIN, Chicago, IL",12,12,12,12,12,12
"AVALON PARK, Chicago, IL",13,13,13,13,13,13
"AVONDALE, Chicago, IL",34,34,34,34,34,34
"Abbot, Ann Arbor, MI",9,9,9,9,9,9
"Airport, Detroit, MI",4,4,4,4,4,4


### Number of unique venues

In [51]:
print('There are {} unique categories.'.format(len(neighborhood_venues['Venue Category'].unique())))

There are 332 unique categories.


### Analyze Each Neighborhood

In [52]:
# one hot encoding
neighborhood_onehot = pd.get_dummies(neighborhood_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neighborhood_onehot['Neighborhood'] = neighborhood_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neighborhood_onehot.columns[-1]] + list(neighborhood_onehot.columns[:-1])
neighborhood_onehot = neighborhood_onehot[fixed_columns]

neighborhood_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Video Store,Vietnamese Restaurant,Vineyard,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,"Airport, Detroit, MI",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Airport, Detroit, MI",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Airport, Detroit, MI",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Airport, Detroit, MI",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bagley, Detroit, MI",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [53]:
neighborhood_onehot.shape

(4578, 333)

### Group rows by neighborhood and calculate mean of occurance of each category

In [54]:
neighborhood_grouped = neighborhood_onehot.groupby('Neighborhood').mean().reset_index()
neighborhood_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Video Store,Vietnamese Restaurant,Vineyard,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,"ALBANY PARK, Chicago, IL",0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"ARCHER HEIGHTS, Chicago, IL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0
2,"ARMOUR SQUARE, Chicago, IL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"ASHBURN, Chicago, IL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"AUBURN GRESHAM, Chicago, IL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [55]:
neighborhood_grouped.shape

(310, 333)

### Print each Neighborhood with top 5 venues

In [56]:
num_top_venues = 5

for hood in neighborhood_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neighborhood_grouped[neighborhood_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALBANY PARK, Chicago, IL----
               venue  freq
0     Sandwich Place  0.12
1             Bakery  0.06
2        Gas Station  0.06
3      Grocery Store  0.06
4  Mobile Phone Shop  0.06


----ARCHER HEIGHTS, Chicago, IL----
                venue  freq
0   Mobile Phone Shop  0.12
1  Mexican Restaurant  0.12
2         Gas Station  0.08
3       Grocery Store  0.08
4                Park  0.08


----ARMOUR SQUARE, Chicago, IL----
                venue  freq
0  Chinese Restaurant  0.27
1  Italian Restaurant  0.09
2       Hot Dog Joint  0.09
3          Sports Bar  0.09
4         Gas Station  0.09


----ASHBURN, Chicago, IL----
                        venue  freq
0          Italian Restaurant   0.2
1              Cosmetics Shop   0.2
2           Electronics Store   0.2
3  Construction & Landscaping   0.2
4          Light Rail Station   0.2


----AUBURN GRESHAM, Chicago, IL----
                  venue  freq
0  Fast Food Restaurant  0.29
1                Lounge  0.14
2        Cosmetics 

         venue  freq
0         Farm   0.4
1          Gym   0.2
2       Church   0.2
3  Gas Station   0.2
4          ATM   0.0


----CALUMET HEIGHTS, Chicago, IL----
                  venue  freq
0  Gym / Fitness Center  0.25
1         Auto Workshop  0.25
2                  Park  0.25
3           Bus Station  0.25
4                   ATM  0.00


----CHATHAM, Chicago, IL----
                  venue  freq
0           Bus Station  0.25
1                  Park  0.25
2  Fast Food Restaurant  0.25
3            Donut Shop  0.25
4       Organic Grocery  0.00


----CHICAGO LAWN, Chicago, IL----
                  venue  freq
0           Pizza Place  0.23
1     Electronics Store  0.15
2  Fast Food Restaurant  0.15
3   American Restaurant  0.15
4    Mexican Restaurant  0.15


----CLEARING, Chicago, IL----
                 venue  freq
0          Video Store  0.17
1          Pizza Place  0.17
2  Fried Chicken Joint  0.08
3    Convenience Store  0.08
4           Restaurant  0.08


----Calumet City, IL

                        venue  freq
0                  Playground   0.4
1                        Park   0.2
2  Construction & Landscaping   0.2
3               Shopping Mall   0.2
4                         ATM   0.0


----Evergreen Park, IL----
                  venue  freq
0  Fast Food Restaurant  0.07
1                  Bank  0.07
2    Mexican Restaurant  0.07
3        Sandwich Place  0.07
4        Discount Store  0.03


----Evergreen, Detroit, MI----
                           venue  freq
0            Fried Chicken Joint   0.5
1              Convenience Store   0.5
2                         Office   0.0
3                           Park   0.0
4  Paper / Office Supplies Store   0.0


----FOREST GLEN, Chicago, IL----
                  venue  freq
0         Grocery Store  0.14
1     Indian Restaurant  0.14
2         Moving Target  0.14
3            Restaurant  0.14
4  Fast Food Restaurant  0.14


----FULLER PARK, Chicago, IL----
                  venue  freq
0  Fast Food Restaurant   0.

                  venue  freq
0  Fast Food Restaurant  0.17
1        Sandwich Place  0.13
2     Mobile Phone Shop  0.09
3        Ice Cream Shop  0.09
4                  Food  0.04


----Hickory Hills, IL----
                           venue  freq
0                           Park   1.0
1                            ATM   0.0
2                    Pastry Shop   0.0
3  Paper / Office Supplies Store   0.0
4                Paintball Field   0.0


----Highland Park, MI----
                  venue  freq
0  Fast Food Restaurant  0.13
1   Fried Chicken Joint  0.13
2        Clothing Store  0.13
3        Sandwich Place  0.07
4            Shoe Store  0.07


----Hillside, IL----
                  venue  freq
0  Fast Food Restaurant  0.08
1    Chinese Restaurant  0.08
2        Cosmetics Shop  0.04
3        Breakfast Spot  0.04
4              Tea Room  0.04


----Hines, IL----
                        venue  freq
0                      Bakery  0.33
1                    Pharmacy  0.33
2  Construction & L

             venue  freq
0            Plaza  0.15
1            Hotel  0.15
2       Food Truck  0.08
3   Discount Store  0.08
4  Thai Restaurant  0.08


----Lower Woodward, Detroit, MI----
                   venue  freq
0             Skate Park  0.11
1    American Restaurant  0.11
2             Restaurant  0.11
3  Performing Arts Venue  0.05
4                Dog Run  0.05


----Lyons, IL----
                 venue  freq
0                  Bar  0.50
1       Ice Cream Shop  0.25
2  American Restaurant  0.25
3                  ATM  0.00
4         Optical Shop  0.00


----MCKINLEY PARK, Chicago, IL----
            venue  freq
0     Video Store  0.11
1           Diner  0.11
2   Grocery Store  0.11
3  Baseball Field  0.06
4     Coffee Shop  0.06


----MONTCLARE, Chicago, IL----
                    venue  freq
0  Furniture / Home Store  0.13
1      Mexican Restaurant  0.13
2            Intersection  0.07
3           Train Station  0.07
4            Optical Shop  0.07


----MORGAN PARK, Chicago

                 venue  freq
0  American Restaurant  0.11
1                 Café  0.05
2   Italian Restaurant  0.05
3                  Bar  0.05
4       Breakfast Spot  0.05


----Novi, MI----
                 venue  freq
0  Sporting Goods Shop  0.07
1       Clothing Store  0.06
2          Pizza Place  0.04
3        Women's Store  0.04
4    Mobile Phone Shop  0.04


----OAKLAND, Chicago, IL----
            venue  freq
0            Park  0.29
1        Boutique  0.14
2            Lake  0.14
3  Discount Store  0.14
4           Track  0.14


----OHARE, Chicago, IL----
             venue  freq
0   Airport Lounge  0.12
1      Coffee Shop  0.12
2      Snack Place  0.09
3         Tea Room  0.06
4  Airport Service  0.06


----Oak Forest, IL----
                 venue  freq
0   Mexican Restaurant  0.17
1           Donut Shop  0.08
2    Convenience Store  0.08
3  Fried Chicken Joint  0.08
4       Hardware Store  0.08


----Oak Lawn, IL----
             venue  freq
0   Baseball Field   0.5
1      

           venue  freq
0     Strip Club   0.2
1           Park   0.2
2  Train Station   0.2
3       Pharmacy   0.2
4    Pizza Place   0.2


----Rosa Parks, Detroit, MI----
            venue  freq
0      Art Museum  0.14
1    Intersection  0.14
2         Stadium  0.14
3      Taco Place  0.14
4  Baseball Field  0.14


----Rosedale, Detroit, MI----
                 venue  freq
0         Intersection  0.33
1               Bakery  0.17
2       Baseball Field  0.17
3  American Restaurant  0.17
4             Bus Stop  0.17


----Rouge, Detroit, MI----
                           venue  freq
0               Business Service  0.33
1                       Pharmacy  0.33
2                 Baseball Field  0.33
3                            ATM  0.00
4  Paper / Office Supplies Store  0.00


----Royal Oak, MI----
                venue  freq
0             Brewery  0.06
1         Coffee Shop  0.05
2  Seafood Restaurant  0.03
3              Lounge  0.03
4    Sushi Restaurant  0.03


----SOUTH CHICAGO, Ch

                  venue  freq
0  Fast Food Restaurant  0.25
1        Cosmetics Shop  0.17
2         Train Station  0.17
3     Currency Exchange  0.08
4           Wings Joint  0.08


----WEST GARFIELD PARK, Chicago, IL----
                  venue  freq
0  Fast Food Restaurant  0.21
1            Shoe Store  0.16
2        Clothing Store  0.11
3   Fried Chicken Joint  0.11
4        Sandwich Place  0.05


----WEST LAWN, Chicago, IL----
                venue  freq
0  Mexican Restaurant  0.18
1       Bowling Alley  0.12
2  Seafood Restaurant  0.12
3         Pizza Place  0.06
4      Sandwich Place  0.06


----WEST PULLMAN, Chicago, IL----
                        venue  freq
0  Construction & Landscaping  0.33
1                         Bar  0.17
2               Grocery Store  0.17
3              Clothing Store  0.17
4               Train Station  0.17


----WEST RIDGE, Chicago, IL----
                 venue  freq
0  Fried Chicken Joint  0.25
1    Convenience Store  0.25
2           Donut Shop  

### Put results into DataFrame

In [57]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [58]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhood_grouped['Neighborhood']

for ind in np.arange(neighborhood_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhood_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"ALBANY PARK, Chicago, IL",Sandwich Place,Grocery Store,Pizza Place,Bakery,Gas Station,Cocktail Bar,Café,Korean Restaurant,Karaoke Bar,Fried Chicken Joint
1,"ARCHER HEIGHTS, Chicago, IL",Mobile Phone Shop,Mexican Restaurant,Gas Station,Grocery Store,Park,Sandwich Place,Rental Service,Bank,Chinese Restaurant,Optical Shop
2,"ARMOUR SQUARE, Chicago, IL",Chinese Restaurant,Cosmetics Shop,Asian Restaurant,Sports Bar,Hot Dog Joint,Breakfast Spot,Italian Restaurant,Sandwich Place,Gas Station,Fabric Shop
3,"ASHBURN, Chicago, IL",Electronics Store,Cosmetics Shop,Light Rail Station,Construction & Landscaping,Italian Restaurant,Financial or Legal Service,Fabric Shop,Factory,Falafel Restaurant,Farm
4,"AUBURN GRESHAM, Chicago, IL",Fast Food Restaurant,Discount Store,Greek Restaurant,Pharmacy,Lounge,Cosmetics Shop,Food,Flower Shop,Food Court,Eye Doctor


### Cluster Neighborhoods

In [59]:
# set number of clusters
kclusters = 10

neighborhood_grouped_clustering = neighborhood_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(neighborhood_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 2, 2, 2, 4, 2, 2, 1, 2], dtype=int32)

In [60]:
# add clustering labels

try:
    neighborhoods_venues_sorted.drop('Cluster Labels', 1, inplace=True)
except:
    pass

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhood_merged = neighborhoods

neighborhood_merged = neighborhood_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='inner')

neighborhood_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Airport, Detroit, MI",42.388475,-83.025065,2,Liquor Store,Chinese Restaurant,Intersection,Food,Fast Food Restaurant,Event Space,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
1,"Bagley, Detroit, MI",42.422256,-83.171482,1,Gym / Fitness Center,Pizza Place,Zoo Exhibit,Elementary School,Event Space,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
2,"Boynton, Detroit, MI",42.264908,-83.164444,1,Storage Facility,Intersection,Liquor Store,Pizza Place,Food & Drink Shop,Food Court,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,3,Intersection,Zoo Exhibit,Financial or Legal Service,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
4,"Brooks, Detroit, MI",42.344826,-83.204472,1,Grocery Store,Liquor Store,Convenience Store,Bank,American Restaurant,Bakery,Food,Farmers Market,Food Service,Food Court


### Visualize Clusters on Map

In [62]:
# create map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#print('Cluster colors: ', rainbow)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhood_merged['Latitude'].astype(float), neighborhood_merged['Longitude'].astype(float), neighborhood_merged['Neighborhood'], neighborhood_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
    
#Add Legend to map

from branca.element import Template, MacroElement

template = """
{% macro html(this, kwargs) %}

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>jQuery UI Draggable - Default functionality</title>
  <link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">

  <script src="https://code.jquery.com/jquery-1.12.4.js"></script>
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
  
  <script>
  $( function() {
    $( "#maplegend" ).draggable({
                    start: function (event, ui) {
                        $(this).css({
                            right: "auto",
                            top: "auto",
                            bottom: "auto"
                        });
                    }
                });
});

  </script>
</head>
<body>

 
<div id='maplegend' class='maplegend' 
    style='position: absolute; z-index:9999; border:2px solid grey; background-color:rgba(255, 255, 255, 0.8);
     border-radius:6px; padding: 10px; font-size:14px; right: 20px; bottom: 20px;'>
     
<div class='legend-title'>Legend</div>
<div class='legend-scale'>
  <ul class='legend-labels'>
    <li><span style='background:#8000ff;opacity:0.7;'></span>Cluster 0</li>
    <li><span style='background:#4856fb;opacity:0.7;'></span>Cluster 1</li>
    <li><span style='background:##10a2f0;opacity:0.7;'></span>Cluster 2</li>
    <li><span style='background:#2adddd;opacity:0.7;'></span>Cluster 3</li>
    <li><span style='background:#62fbc4;opacity:0.7;'></span>Cluster 4</li>
    <li><span style='background:#9cfba4;opacity:0.7;'></span>Cluster 5</li>
    <li><span style='background:#d4dd80;opacity:0.7;'></span>Cluster 6</li>
    <li><span style='background:#ffa256;opacity:0.7;'></span>Cluster 7</li>
    <li><span style='background:#ff562c;opacity:0.7;'></span>Cluster 8</li>
    <li><span style='background:#ff0000;opacity:0.7;'></span>Cluster 9</li>
    

  </ul>
</div>
</div>
 
</body>
</html>

<style type='text/css'>
  .maplegend .legend-title {
    text-align: left;
    margin-bottom: 5px;
    font-weight: bold;
    font-size: 90%;
    }
  .maplegend .legend-scale ul {
    margin: 0;
    margin-bottom: 5px;
    padding: 0;
    float: left;
    list-style: none;
    }
  .maplegend .legend-scale ul li {
    font-size: 80%;
    list-style: none;
    margin-left: 0;
    line-height: 18px;
    margin-bottom: 2px;
    }
  .maplegend ul.legend-labels li span {
    display: block;
    float: left;
    height: 16px;
    width: 30px;
    margin-right: 5px;
    margin-left: 0;
    border: 1px solid #999;
    }
  .maplegend .legend-source {
    font-size: 80%;
    color: #777;
    clear: both;
    }
  .maplegend a {
    color: #777;
    }
</style>
{% endmacro %}"""

macro = MacroElement()
macro._template = Template(template)

map_clusters.get_root().add_child(macro)
    
map_clusters.save("Figure_1.html")

map_clusters

### Examine Lake View Chicago Cluster

In [63]:
cluster_label = neighborhood_merged['Cluster Labels'].loc[neighborhood_merged[(neighborhood_merged['Neighborhood'].str.contains('LAKE VIEW', case=True))].index]
neighborhood_merged.loc[neighborhood_merged['Cluster Labels'] == int(cluster_label), neighborhood_merged.columns[[0] + list(range(4, neighborhood_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Airport, Detroit, MI",Liquor Store,Chinese Restaurant,Intersection,Food,Fast Food Restaurant,Event Space,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
6,"Butzel, Detroit, MI",Farm,Church,Gas Station,Gym,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
7,"Central Business District, Detroit, MI",Brewery,Gym,Sports Bar,Hotel,Hot Dog Joint,Used Bookstore,New American Restaurant,Spa,Restaurant,Bagel Shop
10,"Chandler Park, Detroit, MI",Shoe Store,Fast Food Restaurant,Women's Store,Accessories Store,Intersection,Fried Chicken Joint,Grocery Store,Discount Store,Clothing Store,Event Space
12,"Condon, Detroit, MI",Fried Chicken Joint,Factory,Deli / Bodega,Sandwich Place,Zoo Exhibit,Filipino Restaurant,Eye Doctor,Fabric Shop,Falafel Restaurant,Farm
13,"Conner, Detroit, MI",Cosmetics Shop,Pharmacy,Fish & Chips Shop,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
18,"East Riverside, Detroit, MI",Harbor / Marina,Clothing Store,Zoo Exhibit,Ethiopian Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market
23,"Harmony Village, Detroit, MI",Recreation Center,Pool,Hockey Arena,Basketball Court,Nightlife Spot,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory
29,"Lower East Central, Detroit, MI",Hotel,Plaza,Restaurant,Food Truck,Southern / Soul Food Restaurant,Thai Restaurant,Greek Restaurant,Laundromat,Discount Store,Grocery Store
30,"Lower Woodward, Detroit, MI",Skate Park,American Restaurant,Restaurant,Indie Movie Theater,Dog Run,Music Venue,Beer Store,Bar,Locksmith,Sports Club


### Let's look at only Neighborhoods in MI that fall in the Lake View Cluster

In [64]:
mi_neighborhoods = neighborhood_merged[(neighborhood_merged['Neighborhood'].str.contains(', MI', case=True))]

mi_neighborhoods.loc[mi_neighborhoods['Cluster Labels'] == int(cluster_label), mi_neighborhoods.columns[[0] + list(range(4, mi_neighborhoods.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Airport, Detroit, MI",Liquor Store,Chinese Restaurant,Intersection,Food,Fast Food Restaurant,Event Space,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
6,"Butzel, Detroit, MI",Farm,Church,Gas Station,Gym,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
7,"Central Business District, Detroit, MI",Brewery,Gym,Sports Bar,Hotel,Hot Dog Joint,Used Bookstore,New American Restaurant,Spa,Restaurant,Bagel Shop
10,"Chandler Park, Detroit, MI",Shoe Store,Fast Food Restaurant,Women's Store,Accessories Store,Intersection,Fried Chicken Joint,Grocery Store,Discount Store,Clothing Store,Event Space
12,"Condon, Detroit, MI",Fried Chicken Joint,Factory,Deli / Bodega,Sandwich Place,Zoo Exhibit,Filipino Restaurant,Eye Doctor,Fabric Shop,Falafel Restaurant,Farm
13,"Conner, Detroit, MI",Cosmetics Shop,Pharmacy,Fish & Chips Shop,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
18,"East Riverside, Detroit, MI",Harbor / Marina,Clothing Store,Zoo Exhibit,Ethiopian Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market
23,"Harmony Village, Detroit, MI",Recreation Center,Pool,Hockey Arena,Basketball Court,Nightlife Spot,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Fabric Shop,Factory
29,"Lower East Central, Detroit, MI",Hotel,Plaza,Restaurant,Food Truck,Southern / Soul Food Restaurant,Thai Restaurant,Greek Restaurant,Laundromat,Discount Store,Grocery Store
30,"Lower Woodward, Detroit, MI",Skate Park,American Restaurant,Restaurant,Indie Movie Theater,Dog Run,Music Venue,Beer Store,Bar,Locksmith,Sports Club


### There are a lot of neighborhoods so lets add more data and cluster again

# Demographic Data
There are still a lot of neighborhoods in the LakeView cluster so let's include demographic data from USCensus.gov

In [65]:
lv_neighborhoods = neighborhood_grouped[(neighborhood_grouped['Neighborhood'].str.contains('LAKE VIEW', case=True))]
lv_neighborhoods = lv_neighborhoods.append(mi_neighborhoods[['Neighborhood']].join(neighborhood_grouped.set_index('Neighborhood'), on='Neighborhood'))

lv_neighborhoods.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,...,Video Store,Vietnamese Restaurant,Vineyard,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
146,"LAKE VIEW, Chicago, IL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,...,0.013333,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0
0,"Airport, Detroit, MI",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Bagley, Detroit, MI",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Boynton, Detroit, MI",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Brightmoor, Detroit, MI",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [66]:
#lv_neighborhoods.join(neighborhoods.set_index('Neighborhood'), on='Neighborhood', how ='inner')
lv_neighborhoods = neighborhoods.join(lv_neighborhoods.set_index('Neighborhood'), on='Neighborhood', how='inner')
lv_neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,ATM,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Lounge,Airport Service,...,Video Store,Vietnamese Restaurant,Vineyard,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,"Airport, Detroit, MI",42.388475,-83.025065,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Bagley, Detroit, MI",42.422256,-83.171482,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Boynton, Detroit, MI",42.264908,-83.164444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Brooks, Detroit, MI",42.344826,-83.204472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Let's make a new data frame with Census Information

First we'll add census tract location id for each geographical cooordinate

In [67]:
lv_neighborhood_demo = pd.DataFrame(columns=['Neighborhood', 'Latitude', 'Longitude'])
lv_neighborhood_demo = lv_neighborhood_demo.append(lv_neighborhoods[['Neighborhood', 'Latitude', 'Longitude']])
lv_neighborhood_demo['Census Tract'] = pd.Series(dtype=int, index=lv_neighborhood_demo.index)
lv_neighborhood_demo.reset_index(drop=True, inplace=True)
lv_neighborhood_demo.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Census Tract
0,"Airport, Detroit, MI",42.388475,-83.025065,0
1,"Bagley, Detroit, MI",42.422256,-83.171482,0
2,"Boynton, Detroit, MI",42.264908,-83.164444,0
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,0
4,"Brooks, Detroit, MI",42.344826,-83.204472,0


### Populate Census Tract column in DataFrame

In [68]:
i=0
for neighborhood in lv_neighborhood_demo['Neighborhood']:

    lat = lv_neighborhood_demo['Latitude'][i]
    long = lv_neighborhood_demo['Longitude'][i]
    url='https://geocoding.geo.census.gov/geocoder/geographies/coordinates?x='+str(long)+'&y='+str(lat)+'&benchmark=4&vintage=4&format=json'
    source = requests.get(url).json()
    
    #retry with pauses if timeout occurs
    
    try:
        lv_neighborhood_demo['Census Tract'][i] = int(str(source['result']['geographies']['Census Tracts']).split(',')[0].replace("'", "").split(':')[1])
        time.sleep(.3)
    except:
        try:
            lv_neighborhood_demo['Census Tract'][i] = int(str(source['result']['geographies']['Census Tracts']).split(',')[0].replace("'", "").split(':')[1])
            time.sleep(1)
        except:

            try:
                lv_neighborhood_demo['Census Tract'][i] = int(str(source['result']['geographies']['Census Tracts']).split(',')[0].replace("'", "").split(':')[1])
                time.sleep(5)
            except:
                pass
    
    # If still getting zero results, retry with pauses and rewriting url
    if lv_neighborhood_demo['Census Tract'][i] == 0:
        t=0.5
        while (lv_neighborhood_demo['Census Tract'][i] == 0 and t<100):
            
            try:
                lv_neighborhood_demo['Census Tract'][i] = int(str(source['result']['geographies']['Census Tracts']).split(',')[0].replace("'", "").split(':')[1])
                time.sleep(t)
            except:
                time.sleep(t)
                t = t + 0.5
                lat = lv_neighborhood_demo['Latitude'][i]
                long = lv_neighborhood_demo['Longitude'][i]
                url='https://geocoding.geo.census.gov/geocoder/geographies/coordinates?x='+str(long)+'&y='+str(lat)+'&benchmark=4&vintage=4&format=json'
                source = requests.get(url).json()

    i = i + 1

lv_neighborhood_demo.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Neighborhood,Latitude,Longitude,Census Tract
0,"Airport, Detroit, MI",42.388475,-83.025065,26163511000
1,"Bagley, Detroit, MI",42.422256,-83.171482,26163539400
2,"Boynton, Detroit, MI",42.264908,-83.164444,26163524800
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,26163543900
4,"Brooks, Detroit, MI",42.344826,-83.204472,26163545500


### Add columns to DataFrame for demographic data

In [69]:
demo_col_names = ['Voting Age','Men','Women','Hispanic','White','Black','Hispanic','Native','Asian','Pacific','Income','Poverty','Professional','Unemployment']

for name in demo_col_names:
    #print(name)
    lv_neighborhood_demo[name] = pd.Series(dtype=float, index=lv_neighborhood_demo.index)

lv_neighborhood_demo.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Census Tract,Voting Age,Men,Women,Hispanic,White,Black,Native,Asian,Pacific,Income,Poverty,Professional,Unemployment
0,"Airport, Detroit, MI",42.388475,-83.025065,26163511000,,,,,,,,,,,,,
1,"Bagley, Detroit, MI",42.422256,-83.171482,26163539400,,,,,,,,,,,,,
2,"Boynton, Detroit, MI",42.264908,-83.164444,26163524800,,,,,,,,,,,,,
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,26163543900,,,,,,,,,,,,,
4,"Brooks, Detroit, MI",42.344826,-83.204472,26163545500,,,,,,,,,,,,,


### Populate Demographic data in DataFrame

In [70]:
i=0
for neighborhood in lv_neighborhood_demo['Census Tract']:
    #print(neighborhood)
    
    with open('acs2017_census_tract_data.csv') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            #print(row)
            if row['TractId'] == str(neighborhood):
                
                lv_neighborhood_demo['Voting Age'].loc[i] = float(row['VotingAgeCitizen'])/float(row['TotalPop'])
                lv_neighborhood_demo['Men'].loc[i] = float(row['Men'])/float(row['VotingAgeCitizen'])*float(row['VotingAgeCitizen'])/float(row['TotalPop'])
                lv_neighborhood_demo['Women'].loc[i] = float(row['Women'])/float(row['VotingAgeCitizen'])*float(row['VotingAgeCitizen'])/float(row['TotalPop'])
                lv_neighborhood_demo['Hispanic'].loc[i] = float(row['Hispanic'])/100
                lv_neighborhood_demo['White'].loc[i] = float(row['White'])/100
                lv_neighborhood_demo['Black'].loc[i] = float(row['Black'])/100
                lv_neighborhood_demo['Native'].loc[i] = float(row['Native'])/100
                lv_neighborhood_demo['Asian'].loc[i] = float(row['Asian'])/100
                lv_neighborhood_demo['Pacific'].loc[i] = float(row['Pacific'])/100
                lv_neighborhood_demo['Income'].loc[i] = float(row['Income'])
                lv_neighborhood_demo['Poverty'].loc[i] = float(row['Poverty'])/100
                lv_neighborhood_demo['Professional'].loc[i] = float(row['Professional'])/100
                lv_neighborhood_demo['Unemployment'].loc[i] = float(row['Unemployment'])/100
        
        i = i + 1

lv_neighborhood_demo.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


Unnamed: 0,Neighborhood,Latitude,Longitude,Census Tract,Voting Age,Men,Women,Hispanic,White,Black,Native,Asian,Pacific,Income,Poverty,Professional,Unemployment
0,"Airport, Detroit, MI",42.388475,-83.025065,26163511000,0.71939,0.509721,0.490279,0.02,0.018,0.916,0.0,0.029,0.0,17930.0,0.536,0.155,0.419
1,"Bagley, Detroit, MI",42.422256,-83.171482,26163539400,0.769583,0.467026,0.532974,0.009,0.02,0.953,0.0,0.0,0.0,32314.0,0.289,0.166,0.241
2,"Boynton, Detroit, MI",42.264908,-83.164444,26163524800,0.679008,0.39311,0.60689,0.06,0.033,0.896,0.0,0.0,0.0,23430.0,0.513,0.239,0.231
3,"Brightmoor, Detroit, MI",42.384513,-83.248953,26163543900,0.731308,0.434579,0.565421,0.009,0.148,0.819,0.0,0.0,0.0,20500.0,0.527,0.062,0.24
4,"Brooks, Detroit, MI",42.344826,-83.204472,26163545500,0.712465,0.444558,0.555442,0.008,0.177,0.705,0.0,0.0,0.0,20648.0,0.502,0.179,0.259


### Let's normalize the data in preparation for KMeans clustering

In [71]:
#Normalize Data
x = lv_neighborhood_demo.drop(['Neighborhood', 'Longitude', 'Latitude', 'Census Tract'], 1)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
normalized_demo = pd.DataFrame(x_scaled)
normalized_demo.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,0.491595,0.604089,0.395911,0.02805,0.011213,0.924319,0.0,0.067916,0.0,0.043624,0.693017,0.147712,1.0
1,0.590202,0.430589,0.569411,0.012623,0.013252,0.961655,0.0,0.0,0.0,0.125201,0.367589,0.162092,0.575179
2,0.412262,0.130226,0.869774,0.084151,0.026504,0.904137,0.0,0.0,0.0,0.074816,0.662714,0.257516,0.551313
3,0.515009,0.298741,0.701259,0.012623,0.143731,0.826438,0.0,0.0,0.0,0.058199,0.681159,0.026144,0.572792
4,0.477989,0.339289,0.660711,0.01122,0.173293,0.711403,0.0,0.0,0.0,0.059039,0.648221,0.179085,0.618138


### KMeans Clustering

In [72]:
# set number of clusters
kclusters = 10

#lv_neighborhoods_clustering = lv_neighborhoods.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(normalized_demo)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 9, 9, 9, 9, 3, 4, 1, 9, 7], dtype=int32)

In [73]:
# add clustering labels
try:
    lv_neighborhood_demo.drop('Cluster Labels', 1, inplace=True)
except:
    pass

lv_neighborhood_demo.insert(0, 'Cluster Labels', kmeans.labels_)

lv_neighborhood_demo.head()

Unnamed: 0,Cluster Labels,Neighborhood,Latitude,Longitude,Census Tract,Voting Age,Men,Women,Hispanic,White,Black,Native,Asian,Pacific,Income,Poverty,Professional,Unemployment
0,3,"Airport, Detroit, MI",42.388475,-83.025065,26163511000,0.71939,0.509721,0.490279,0.02,0.018,0.916,0.0,0.029,0.0,17930.0,0.536,0.155,0.419
1,9,"Bagley, Detroit, MI",42.422256,-83.171482,26163539400,0.769583,0.467026,0.532974,0.009,0.02,0.953,0.0,0.0,0.0,32314.0,0.289,0.166,0.241
2,9,"Boynton, Detroit, MI",42.264908,-83.164444,26163524800,0.679008,0.39311,0.60689,0.06,0.033,0.896,0.0,0.0,0.0,23430.0,0.513,0.239,0.231
3,9,"Brightmoor, Detroit, MI",42.384513,-83.248953,26163543900,0.731308,0.434579,0.565421,0.009,0.148,0.819,0.0,0.0,0.0,20500.0,0.527,0.062,0.24
4,9,"Brooks, Detroit, MI",42.344826,-83.204472,26163545500,0.712465,0.444558,0.555442,0.008,0.177,0.705,0.0,0.0,0.0,20648.0,0.502,0.179,0.259


### Examine Lake View Cluster

In [74]:
cluster_label = lv_neighborhood_demo['Cluster Labels'].loc[lv_neighborhood_demo[(lv_neighborhood_demo['Neighborhood'].str.contains('LAKE VIEW', case=True))].index]

similar_neighborhoods = lv_neighborhood_demo['Neighborhood'].loc[lv_neighborhood_demo['Cluster Labels'] == int(cluster_label)]
similar_neighborhoods = pd.DataFrame({'Neighborhood':similar_neighborhoods.values})
similar_neighborhoods

Unnamed: 0,Neighborhood
0,"Grosse Ile, MI"
1,"Northville, MI"
2,"Plymouth, MI"
3,"Berkley, MI"
4,"Birmingham, MI"
5,"Bloomfield Hills, MI"
6,"Commerce Township, MI"
7,"Davisburg, MI"
8,"Farmington, MI"
9,"Ferndale, MI"


These results are looking pretty good

### Visualize results on Map

In [76]:
# create map

address = 'Livonia, MI'

geolocator = Nominatim(user_agent="det_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lv_neighborhood_demo['Latitude'].astype(float), lv_neighborhood_demo['Longitude'].astype(float), lv_neighborhood_demo['Neighborhood'], lv_neighborhood_demo['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) , parse_html=True)
    
    if(int(cluster) == int(cluster_label)):
    
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color='#ff0000',
            fill=True,
            fill_color='#ff0000',
            fill_opacity=0.7).add_to(map_clusters)
    else:
                folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            fill=True,
            fill_opacity=0.7).add_to(map_clusters)
    

#Add Legend to map

from branca.element import Template, MacroElement

template = """
{% macro html(this, kwargs) %}

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>jQuery UI Draggable - Default functionality</title>
  <link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">

  <script src="https://code.jquery.com/jquery-1.12.4.js"></script>
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
  
  <script>
  $( function() {
    $( "#maplegend" ).draggable({
                    start: function (event, ui) {
                        $(this).css({
                            right: "auto",
                            top: "auto",
                            bottom: "auto"
                        });
                    }
                });
});

  </script>
</head>
<body>

 
<div id='maplegend' class='maplegend' 
    style='position: absolute; z-index:9999; border:2px solid grey; background-color:rgba(255, 255, 255, 0.8);
     border-radius:6px; padding: 10px; font-size:14px; right: 20px; bottom: 20px;'>
     
<div class='legend-title'>Legend</div>
<div class='legend-scale'>
  <ul class='legend-labels'>
    <li><span style='background:#8000ff;opacity:0.7;'></span>Lake View Cluster</li>

    
  </ul>
</div>
</div>
 
</body>
</html>

<style type='text/css'>
  .maplegend .legend-title {
    text-align: left;
    margin-bottom: 5px;
    font-weight: bold;
    font-size: 90%;
    }
  .maplegend .legend-scale ul {
    margin: 0;
    margin-bottom: 5px;
    padding: 0;
    float: left;
    list-style: none;
    }
  .maplegend .legend-scale ul li {
    font-size: 80%;
    list-style: none;
    margin-left: 0;
    line-height: 18px;
    margin-bottom: 2px;
    }
  .maplegend ul.legend-labels li span {
    display: block;
    float: left;
    height: 16px;
    width: 30px;
    margin-right: 5px;
    margin-left: 0;
    border: 1px solid #999;
    }
  .maplegend .legend-source {
    font-size: 80%;
    color: #777;
    clear: both;
    }
  .maplegend a {
    color: #777;
    }
</style>
{% endmacro %}"""

template = template.replace('#8000ff','#ff0000')

macro = MacroElement()
macro._template = Template(template)

map_clusters.get_root().add_child(macro)
    
map_clusters.save("Figure 2.html")

map_clusters


### Top venues

In [77]:
neighborhood_merged[(neighborhood_merged['Neighborhood'].str.contains('LAKE VIEW', case=True))]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
188,"LAKE VIEW, Chicago, IL",41.94705,-87.655429,2,Bar,General Entertainment,Sports Bar,Sandwich Place,Mexican Restaurant,Baseball Stadium,Pizza Place,BBQ Joint,Outdoor Sculpture,Dive Bar


In [78]:
similar_neighborhoods_venues = neighborhood_merged.join(similar_neighborhoods.set_index('Neighborhood'), on='Neighborhood', how='inner')

similar_neighborhoods_venues.drop(['Latitude', 'Longitude', 'Cluster Labels'], 1)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,"Grosse Ile, MI",Campground,Zoo Exhibit,Event Space,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
72,"Northville, MI",American Restaurant,Café,Breakfast Spot,Italian Restaurant,Bar,Bakery,Sushi Restaurant,Theater,Thai Restaurant,Gift Shop
73,"Plymouth, MI",Italian Restaurant,Coffee Shop,Bakery,Bar,Sandwich Place,Thai Restaurant,Bank,Grocery Store,Greek Restaurant,Farmers Market
86,"Berkley, MI",Breakfast Spot,Intersection,Liquor Store,Bookstore,Bank,Falafel Restaurant,Massage Studio,Gastropub,Tailor Shop,Coffee Shop
87,"Birmingham, MI",Spa,Coffee Shop,American Restaurant,Steakhouse,New American Restaurant,Boutique,Middle Eastern Restaurant,Yoga Studio,Italian Restaurant,Bakery
88,"Bloomfield Hills, MI",Bank,Greek Restaurant,Hotel,Wine Shop,Cupcake Shop,Seafood Restaurant,Deli / Bodega,Bistro,Bagel Shop,Farmers Market
91,"Commerce Township, MI",Convenience Store,Zoo Exhibit,Financial or Legal Service,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
92,"Davisburg, MI",Park,Spa,Arts & Crafts Store,Business Service,Liquor Store,Flea Market,Food & Drink Shop,Event Space,Eye Doctor,Fabric Shop
93,"Farmington, MI",Sandwich Place,Pub,Mobile Phone Shop,Gym,Movie Theater,Spa,Flea Market,Brewery,Breakfast Spot,Middle Eastern Restaurant
94,"Ferndale, MI",Cocktail Bar,Bar,Gym,Gift Shop,Sandwich Place,Sushi Restaurant,Cosmetics Shop,Thai Restaurant,Massage Studio,Food Truck


# Get walkscore for each community area
Walkscore is a good way to determine walkability of neighborhoods, something many people enjoy about thier neighborhood, and could be a good way to futher cluster neighborhods.

In [79]:
#Creat walkscore dataframe

walk_score =  pd.DataFrame(columns=['Walkscore'])
                           
#Walkscore API key
YOUR_WSAPIKEY = 'X'

#Get scores via API
temp_df = similar_neighborhoods_venues[['Neighborhood', 'Latitude', 'Longitude']]
temp_df.reset_index(drop=True, inplace=True)

i=0

walk_lat = temp_df['Latitude']
walk_lon = temp_df['Longitude']

for num in temp_df['Latitude']:
    
    url = 'http://api.walkscore.com/score?format=json&lat={}&lon={}&wsapikey={}'.format(
    walk_lat.loc[i],
    walk_lon.loc[i],
    YOUR_WSAPIKEY)
    
    results = requests.get(url).json()
    walk_score = walk_score.append({'Walkscore':results['walkscore']}, ignore_index=True)
    i = i + 1

walk_score.head()

Unnamed: 0,Walkscore
0,22
1,77
2,91
3,64
4,94


In [80]:
similar_walk = temp_df.join(walk_score)
similar_walk.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Walkscore
0,"Grosse Ile, MI",42.138175,-83.154123,22
1,"Northville, MI",42.431081,-83.483226,77
2,"Plymouth, MI",42.3712,-83.467502,91
3,"Berkley, MI",42.503091,-83.183539,64
4,"Birmingham, MI",42.546701,-83.211319,94


### Normalize data in preparation for clustering

In [81]:
#Normalize Data
x = similar_walk.drop(['Neighborhood', 'Longitude', 'Latitude'], 1)
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
normalized_walk = pd.DataFrame(x_scaled)
normalized_walk.head()

  return self.partial_fit(X, y)


Unnamed: 0,0
0,0.208791
1,0.813187
2,0.967033
3,0.67033
4,1.0


### KMeans Clustering

In [82]:
# set number of clusters
kclusters = 10

#lv_neighborhoods_clustering = lv_neighborhoods.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(normalized_walk)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 4, 5, 4, 6, 9, 0, 1, 4], dtype=int32)

In [83]:
# add clustering labels
try:
    similar_walk.drop('Cluster Labels', 1, inplace=True)
except:
    pass

similar_walk.insert(0, 'Cluster Labels', kmeans.labels_)

similar_walk.head()

Unnamed: 0,Cluster Labels,Neighborhood,Latitude,Longitude,Walkscore
0,3,"Grosse Ile, MI",42.138175,-83.154123,22
1,1,"Northville, MI",42.431081,-83.483226,77
2,4,"Plymouth, MI",42.3712,-83.467502,91
3,5,"Berkley, MI",42.503091,-83.183539,64
4,4,"Birmingham, MI",42.546701,-83.211319,94


In [84]:
cluster_label = similar_walk['Cluster Labels'].loc[similar_walk[(similar_walk['Neighborhood'].str.contains('LAKE VIEW', case=True))].index]

similar_neighborhoods = similar_walk['Neighborhood'].loc[similar_walk['Cluster Labels'] == int(cluster_label)]
similar_neighborhoods = pd.DataFrame({'Neighborhood':similar_neighborhoods.values})
similar_neighborhoods

Unnamed: 0,Neighborhood
0,"Plymouth, MI"
1,"Birmingham, MI"
2,"Ferndale, MI"
3,"Royal Oak, MI"
4,"LAKE VIEW, Chicago, IL"


In [85]:
# create map

address = 'Southfield, MI'

geolocator = Nominatim(user_agent="det_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(similar_walk['Latitude'].astype(float), similar_walk['Longitude'].astype(float), similar_walk['Neighborhood'], similar_walk['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) , parse_html=True)
    
    if(int(cluster) == int(cluster_label)):
    
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color='#ff0000',
            fill=True,
            fill_color='#ff0000',
            fill_opacity=0.7).add_to(map_clusters)
    else:
                folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            fill=True,
            fill_opacity=0.7).add_to(map_clusters)
    

#Add Legend to map

from branca.element import Template, MacroElement

template = """
{% macro html(this, kwargs) %}

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>jQuery UI Draggable - Default functionality</title>
  <link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">

  <script src="https://code.jquery.com/jquery-1.12.4.js"></script>
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
  
  <script>
  $( function() {
    $( "#maplegend" ).draggable({
                    start: function (event, ui) {
                        $(this).css({
                            right: "auto",
                            top: "auto",
                            bottom: "auto"
                        });
                    }
                });
});

  </script>
</head>
<body>

 
<div id='maplegend' class='maplegend' 
    style='position: absolute; z-index:9999; border:2px solid grey; background-color:rgba(255, 255, 255, 0.8);
     border-radius:6px; padding: 10px; font-size:14px; right: 20px; bottom: 20px;'>
     
<div class='legend-title'>Legend</div>
<div class='legend-scale'>
  <ul class='legend-labels'>
    <li><span style='background:#8000ff;opacity:0.7;'></span>Lake View Cluster</li>

    
  </ul>
</div>
</div>
 
</body>
</html>

<style type='text/css'>
  .maplegend .legend-title {
    text-align: left;
    margin-bottom: 5px;
    font-weight: bold;
    font-size: 90%;
    }
  .maplegend .legend-scale ul {
    margin: 0;
    margin-bottom: 5px;
    padding: 0;
    float: left;
    list-style: none;
    }
  .maplegend .legend-scale ul li {
    font-size: 80%;
    list-style: none;
    margin-left: 0;
    line-height: 18px;
    margin-bottom: 2px;
    }
  .maplegend ul.legend-labels li span {
    display: block;
    float: left;
    height: 16px;
    width: 30px;
    margin-right: 5px;
    margin-left: 0;
    border: 1px solid #999;
    }
  .maplegend .legend-source {
    font-size: 80%;
    color: #777;
    clear: both;
    }
  .maplegend a {
    color: #777;
    }
</style>
{% endmacro %}"""

template = template.replace('#8000ff','#ff0000')

macro = MacroElement()
macro._template = Template(template)

map_clusters.get_root().add_child(macro)
    
map_clusters.save("LakeView_Detroit.html")

map_clusters


In [86]:
similar_neighborhoods_venues = neighborhood_merged.join(similar_neighborhoods.set_index('Neighborhood'), on='Neighborhood', how='inner')

similar_neighborhoods_venues.drop(['Latitude', 'Longitude', 'Cluster Labels'], 1)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
73,"Plymouth, MI",Italian Restaurant,Coffee Shop,Bakery,Bar,Sandwich Place,Thai Restaurant,Bank,Grocery Store,Greek Restaurant,Farmers Market
87,"Birmingham, MI",Spa,Coffee Shop,American Restaurant,Steakhouse,New American Restaurant,Boutique,Middle Eastern Restaurant,Yoga Studio,Italian Restaurant,Bakery
94,"Ferndale, MI",Cocktail Bar,Bar,Gym,Gift Shop,Sandwich Place,Sushi Restaurant,Cosmetics Shop,Thai Restaurant,Massage Studio,Food Truck
114,"Royal Oak, MI",Brewery,Coffee Shop,Vegetarian / Vegan Restaurant,Sushi Restaurant,Yoga Studio,Italian Restaurant,Lounge,American Restaurant,Café,Seafood Restaurant
188,"LAKE VIEW, Chicago, IL",Bar,General Entertainment,Sports Bar,Sandwich Place,Mexican Restaurant,Baseball Stadium,Pizza Place,BBQ Joint,Outdoor Sculpture,Dive Bar
