# IBM Coursera Applied Data Science 
# Capstone Project - The Battle of Neighborhoods (Week 2)

## Table of contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)


## Introduction: Business Problem <a name="introduction"></a>

The problem this project aims to tackle is the search for a new neighborhood in case people have to move to a foreign city.
To be more specific, this project serves to help people who have to move to a foreign city to find a new neighbourhood that is as similar as possible to the old neighbourhood they have to leave. On top of that, this project tries to identify the most suitable neighborhood for a given set of preferences.

Due to very different reasons a lot of people have to move homes. The decision process of where to buy a new house or rent an appartement is quite complex. A lot of features have to be taken into consideration, e.g. socio economic factors like unemployment or the income level of the neighborhood, but also crime rates, housing prices, reputation of public schools for the children, shops, malls, theatres, hospitals et cetera.   

## Data <a name="data"></a>

To complete this assignment I will be using the Foursquare API for location data as well as other datasets for socioeconomic criterias, as well as crime data and housing data. For practical purposes, I will rely on data that is available on the city of Chicago's Data Portal and housing prices from a large real estate agent:

1. <a href="https://de.foursquare.com/explore?mode=url&near=Chicago%2C%20IL%2C%20USA&nearGeoId=72057594042815334">Foursquare location data</a>
1. <a href="https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2">Socioeconomic Indicators in Chicago</a>
1. <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2">Chicago Crime Data</a>
1. <a href="https://www.zillow.com/research/data/">Home values - Zillow Home Value Index (ZHVI)</a>




### Data Description:  


__1. Foursquare API__

As requested by the assignment, this project will heavily use Four-square API as its prime data gathering source. With the API I will perform location search, location sharing and details about a business. Due to limitations of the API requests possible, the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 500.

To find similarities between neighborhoods in New York and Chicago, we need to gather data about different kind of  venues to find the characteristics. Hence, I will use Foursquare data for this task. 

To determine the similarities of both cities, I will segment and cluster the neighborhoods to find similar places. In order to do that, a k-means clustering algorithm will be utilized, basen on location data provided by the Foursquare API.


__2. Selected Socioeconomic Indicators in Chicago__

The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.



A detailed description of the dataset can be found on [the city of Chicago's website](
https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2), but to summarize, the dataset has the following variables:

* **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset

* **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago 

* **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room

* **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line

* **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed

* **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education

* **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)

* **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population

* **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012. Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship. I will use the Hardship Index only as this score includes each of the indicators.

I acknowledge that the time series ends in the year of 2012, but for this assignment I will ignore this fact and assume that the data is up to date and still valid for this decision process.


__3. Chicago Crime Data__  

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. 

This dataset is quite large - over 1.5GB in size with over 7 million rows. A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2



__4. Home values - Zillow Home Value Index (ZHVI)__  

The Zillow Home Value Index (ZHVI)is a smoothed, seasonally adjusted measure of the typical home value and market changes across a given region and housing type. 

Zillow publishes top-tier ZHVI (USD, typical value for homes within the 65th to 95th percentile range for a given region) and bottom-tier ZHVI (USD, typical value for homes that fall within the 5th to 35th percentile range for a given region). 

Zillow also publishes ZHVI for all single-family residences (USD, typical value for all single-family homes in a given region), for condo/coops (USD), for all homes with 1, 2, 3, 4 and 5+ bedrooms (USD), and the ZHVI per square foot (USD, typical value of all homes per square foot calculated by taking the estimated home value for each home in a given region and dividing it by the home’s square footage). 

Check out https://www.zillow.com/research/data/ for an overview of ZHVI and a deep-dive into its methodology.


## Methodology <a name="methodology"></a>

The methodolgy of this project can be described by the following steps:

__A: Define the characteristics of the neighborhoods__
1. Retrieve the names of neighborhoods for both cities, we will label the current residence as "old" city or neighborhood and the new destination as "new" city or neighborhood.
2. Put the names of the neighborhoods in a dataframe and add the latitude and longitude data.
3. Use the Foursquare API to get location data of the venues in all the neighborhoods.
4. Clustering of the neighborhoods with a clustering algorithm to find similar neighborhoods in both cities.
5. Define each cluster by checking the main characteristics based on venues data.
6. Pick the cluster that your old neighborhood is in and select the neighborhoods from the new city.

__B: Scoring of neighborhoods within the new city to make a recommendation of the best places__
7. Score each of the potential new neighborhood based on socioeconomic data and define the top 10 list.
8. Score remaining neighborhood based on crime data and we'll get the top 5 list.
9. Last step is to check housing prices for the top 3 list and decide based on financial resources available.



## Analysis <a name="analysis"></a>

In [259]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import geopandas
import geopy

print('Libraries imported.')

Libraries imported.


### A: Define the characteristics of the neighborhoods

#### 1. Retrieve the names of neighborhoods for both cities, we will label the current residence as "old" city or neighborhood and the new destination as "new" city or neighborhood.


#### 2. Put the names of the neighborhoods in a dataframe and add the latitude and longitude data.




As an example for this project, I will be using following two cities:

__New York__ is set to be the "old" home and serves as a starting point for the analysis

__Chicago__ will be the "new" home and we will dive into a lot of data to find the best fit

In [260]:
old_city = ('New York')
old_borough = ('Manhattan')
new_city = ('Chicago')
print("The old city is set to be " + old_city + ", and the new one is " + new_city)

The old city is set to be New York, and the new one is Chicago


Let's start with the old city!

__Old City Data__

First task is to find a data source that satisfies our need, i.e. contains at least the neighborhood names. In this case the JSON file includes the latitude and longitude values as well.


In [261]:
import json
json_data = open("nyu-2451-34572-geojson.json")
ny_data = json.load(json_data)

In [262]:
# let's have a look at the data
ny_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Notice how all the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [263]:
old_neigh_data = ny_data['features']

In [264]:
# Tranform the data into a pandas dataframe

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
old_neigh_df = pd.DataFrame(columns=column_names)

# loop through the data and fill the dataframe one row at a time.
for data in old_neigh_data:
    
    old_borough = old_neigh_name = data['properties']['borough'] 
    old_neigh_name = data['properties']['name']
    
    old_neigh_latlon = data['geometry']['coordinates']
    old_neigh_lat = old_neigh_latlon[1]
    old_neigh_lon = old_neigh_latlon[0]
    
    old_neigh_df = old_neigh_df.append({'Borough': old_borough,
                                        'Neighborhood': old_neigh_name,
                                        'Latitude': old_neigh_lat,
                                        'Longitude': old_neigh_lon}, ignore_index=True)

In [265]:
# quick check
old_neigh_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Limit to one borough for better performance and avoid running into Foursquare API daily call quota limits.

In [266]:
old_neigh_df = old_neigh_df[old_neigh_df['Borough'] == 'Manhattan'].reset_index(drop=True)
print(old_neigh_df.shape)
old_neigh_df.head()

(40, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


There are 40 distinct neighborhoods in Manhattan.

Use geopy library to get the latitude and longitude values

In [267]:
old_address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="old_city_explorer")
location = geolocator.geocode(old_address)
old_latitude = location.latitude
old_longitude = location.longitude
print('The geograpical coordinate of ' + old_address + ' are {}, {}.'.format(old_latitude, old_longitude))

The geograpical coordinate of Manhattan, NY are 40.7896239, -73.9598939.


In [268]:
# create map of New York using latitude and longitude values
map_old_city = folium.Map(location=[old_latitude, old_longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(old_neigh_df['Latitude'], old_neigh_df['Longitude'], old_neigh_df['Borough'], old_neigh_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_old_city)  
    
map_old_city

__New City Data__

Let's repeat the same process for the new city. For this purpose I reference to the data set that can be found on the website of the City of Chicago. It also contains socioeconomic metrics that I will use at a later stage in this project.

In [269]:
# we can find the names of the neighborhoods in this data source
new_socioeconomic_data = pd.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')
new_socioeconomic_data.shape

(78, 9)

In [270]:
# quick check
new_socioeconomic_data.head()

Unnamed: 0,ca,community_area_name,percent_of_housing_crowded,percent_households_below_poverty,percent_aged_16_unemployed,percent_aged_25_without_high_school_diploma,percent_aged_under_18_or_over_64,per_capita_income_,hardship_index
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0


In [271]:
# change column names to match the names given in the assignment
new_socioeconomic_data.columns = ['Community Area Number',
                 'Neighborhood',
                 'PERCENT OF HOUSING CROWDED',
                 'PERCENT HOUSEHOLDS BELOW POVERTY',
                 'PERCENT AGED 16+ UNEMPLOYED',
                 'PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA',
                 'PERCENT AGED UNDER 18 OR OVER 64',
                 'PER CAPITA INCOME',
                 'Hardship Index'
                ] 

new_socioeconomic_data.head()

Unnamed: 0,Community Area Number,Neighborhood,PERCENT OF HOUSING CROWDED,PERCENT HOUSEHOLDS BELOW POVERTY,PERCENT AGED 16+ UNEMPLOYED,PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA,PERCENT AGED UNDER 18 OR OVER 64,PER CAPITA INCOME,Hardship Index
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0


In [272]:
# let's focus on community number and neighorhood name in the first place
new_neigh_df = new_socioeconomic_data[['Community Area Number', 'Neighborhood']].reset_index(drop=True)
new_neigh_df.head()

Unnamed: 0,Community Area Number,Neighborhood
0,1.0,Rogers Park
1,2.0,West Ridge
2,3.0,Uptown
3,4.0,Lincoln Square
4,5.0,North Center


In [273]:
print(new_neigh_df.shape)

(78, 2)


There are 75 distinct neighborhoods in Chicago.

__Geocoding__

Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface (spatial representation in numerical coordinates).

We use Nominatim Geocoding service, which is built on top of OpenStreetMap data. We create locator that holds the Geocoding service, Nominatim. Then we pass the locator we created to geocode any address.

We delay our Geocoding 1 second between each address. This is convenient when you are Geocoding a large number of physical addresses as the Geocoding service provider can deny access to the service.

In [274]:
# add another column with the city name and additional information for geocoding
new_neigh_df['ADDRESS'] = new_neigh_df['Neighborhood'] + ', Chicago, IL, USA'
new_neigh_df.head()

Unnamed: 0,Community Area Number,Neighborhood,ADDRESS
0,1.0,Rogers Park,"Rogers Park, Chicago, IL, USA"
1,2.0,West Ridge,"West Ridge, Chicago, IL, USA"
2,3.0,Uptown,"Uptown, Chicago, IL, USA"
3,4.0,Lincoln Square,"Lincoln Square, Chicago, IL, USA"
4,5.0,North Center,"North Center, Chicago, IL, USA"


In [275]:
geolocator = Nominatim(user_agent="foursquare_agent")

from geopy.extra.rate_limiter import RateLimiter

# Function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

# Create location column
new_neigh_df['location'] = new_neigh_df['ADDRESS'].apply(geocode)

# Create longitude, laatitude and altitude from location column (returns tuple)
new_neigh_df['point'] = new_neigh_df['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
new_neigh_df[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(new_neigh_df['point'].tolist(), index=new_neigh_df.index)

new_neigh_df.head()

Unnamed: 0,Community Area Number,Neighborhood,ADDRESS,location,point,Latitude,Longitude,Altitude
0,1.0,Rogers Park,"Rogers Park, Chicago, IL, USA","(Rogers Park, Chicago, Cook County, Illinois, ...","(42.01053135, -87.67074819664808, 0.0)",42.010531,-87.670748,0.0
1,2.0,West Ridge,"West Ridge, Chicago, IL, USA","(West Ridge, Chicago, Cook County, Illinois, 6...","(42.0035482, -87.6962426, 0.0)",42.003548,-87.696243,0.0
2,3.0,Uptown,"Uptown, Chicago, IL, USA","(Uptown, Chicago, Cook County, Illinois, 60640...","(41.9666299, -87.6555458, 0.0)",41.96663,-87.655546,0.0
3,4.0,Lincoln Square,"Lincoln Square, Chicago, IL, USA","(Lincoln Square, Chicago, Cook County, Illinoi...","(41.975989850000005, -87.6896163305115, 0.0)",41.97599,-87.689616,0.0
4,5.0,North Center,"North Center, Chicago, IL, USA","(North Center, Chicago, Cook County, Illinois,...","(41.9561073, -87.6791596, 0.0)",41.956107,-87.67916,0.0


In [276]:
# drop all columns that are irrelevant
new_neigh_df = new_neigh_df.drop(['ADDRESS', 'location', 'point', 'Altitude'], axis=1)
new_neigh_df

Unnamed: 0,Community Area Number,Neighborhood,Latitude,Longitude
0,1.0,Rogers Park,42.010531,-87.670748
1,2.0,West Ridge,42.003548,-87.696243
2,3.0,Uptown,41.96663,-87.655546
3,4.0,Lincoln Square,41.97599,-87.689616
4,5.0,North Center,41.956107,-87.67916
5,6.0,Lake View,41.94705,-87.655429
6,7.0,Lincoln Park,41.940298,-87.638117
7,8.0,Near North Side,41.900033,-87.634497
8,9.0,Edison Park,42.005733,-87.814016
9,10.0,Norwood Park,41.98559,-87.800582


In [277]:
# drop all neighorhoods with NaN
new_neigh_df.dropna(inplace=True)
new_neigh_df

Unnamed: 0,Community Area Number,Neighborhood,Latitude,Longitude
0,1.0,Rogers Park,42.010531,-87.670748
1,2.0,West Ridge,42.003548,-87.696243
2,3.0,Uptown,41.96663,-87.655546
3,4.0,Lincoln Square,41.97599,-87.689616
4,5.0,North Center,41.956107,-87.67916
5,6.0,Lake View,41.94705,-87.655429
6,7.0,Lincoln Park,41.940298,-87.638117
7,8.0,Near North Side,41.900033,-87.634497
8,9.0,Edison Park,42.005733,-87.814016
9,10.0,Norwood Park,41.98559,-87.800582


Again, we'll use geopy library to get the latitude and longitude values for the map to center the right location.

In [278]:
new_address = 'Chicago, IL'

geolocator = Nominatim(user_agent="new_city_explorer")
location = geolocator.geocode(new_address)
new_latitude = location.latitude
new_longitude = location.longitude
print('The geograpical coordinate of Chicago are {}, {}.'.format(new_latitude, new_longitude))

The geograpical coordinate of Chicago are 41.8755616, -87.6244212.


In [279]:
# create map of Chicago using latitude and longitude values
map_new_city = folium.Map(location=[new_latitude, new_longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(new_neigh_df['Latitude'], new_neigh_df['Longitude'], new_neigh_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_new_city)  
    
map_new_city

Okay, so far we have identified the neighborhoods in the old and new city by using geopy library. The next step will be to merge the neighborhoods in one dataframe and perform the clustering algorithm to find similiarities between them.

__Merge both city data tables__

Before we can merge both dataframe, we need to do some housing keeping first. Let's start with the old city.

In [280]:
# skip the boroughs
old_neigh_df_tobemerged = old_neigh_df[['Neighborhood', 'Latitude', 'Longitude']].reset_index(drop=True)
old_neigh_df_tobemerged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Marble Hill,40.876551,-73.91066
1,Chinatown,40.715618,-73.994279
2,Washington Heights,40.851903,-73.9369
3,Inwood,40.867684,-73.92121
4,Hamilton Heights,40.823604,-73.949688


In [281]:
# add the city to the neighborhood column
old_neigh_df_tobemerged['Neighborhood'] = old_neigh_df_tobemerged['Neighborhood'] + ', New York City'
old_neigh_df_tobemerged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Marble Hill, New York City",40.876551,-73.91066
1,"Chinatown, New York City",40.715618,-73.994279
2,"Washington Heights, New York City",40.851903,-73.9369
3,"Inwood, New York City",40.867684,-73.92121
4,"Hamilton Heights, New York City",40.823604,-73.949688


Now, let's do the same with the new city

In [282]:
# add the city to the neighborhood column
new_neigh_df_tobemerged = new_neigh_df[['Neighborhood', 'Latitude', 'Longitude']].reset_index(drop=True)
new_neigh_df_tobemerged['Neighborhood'] = new_neigh_df_tobemerged['Neighborhood'] + ', Chicago'
new_neigh_df_tobemerged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Rogers Park, Chicago",42.010531,-87.670748
1,"West Ridge, Chicago",42.003548,-87.696243
2,"Uptown, Chicago",41.96663,-87.655546
3,"Lincoln Square, Chicago",41.97599,-87.689616
4,"North Center, Chicago",41.956107,-87.67916


__Combine data from multiple tables__

In [283]:
# put the rows of both dataframes into a new dataframe
merged_df = pd.concat([old_neigh_df_tobemerged, new_neigh_df_tobemerged], axis=0).reset_index(drop=True)
merged_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Marble Hill, New York City",40.876551,-73.91066
1,"Chinatown, New York City",40.715618,-73.994279
2,"Washington Heights, New York City",40.851903,-73.9369
3,"Inwood, New York City",40.867684,-73.92121
4,"Hamilton Heights, New York City",40.823604,-73.949688
5,"Manhattanville, New York City",40.816934,-73.957385
6,"Central Harlem, New York City",40.815976,-73.943211
7,"East Harlem, New York City",40.792249,-73.944182
8,"Upper East Side, New York City",40.775639,-73.960508
9,"Yorkville, New York City",40.77593,-73.947118


In [284]:
merged_df.shape

(115, 3)

We have 115 neighborhoods in total. As we saw above, in Manhattan there are 40 and in Chicago 75.

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### 3. Use the Foursquare API to get location data of the venues in all the neighborhoods.

In [285]:
CLIENT_ID = 'QZERFRNDH5KHBJXQE330T5EANYUDOWEU4RKJ2VJZNVZETB3T' # your Foursquare ID
CLIENT_SECRET = 'LEZD1ZQQNICFJKRBVGA4UBYCW23OZN5U1GI0TB2LEJAHKVFZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QZERFRNDH5KHBJXQE330T5EANYUDOWEU4RKJ2VJZNVZETB3T
CLIENT_SECRET:LEZD1ZQQNICFJKRBVGA4UBYCW23OZN5U1GI0TB2LEJAHKVFZ


In [286]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [287]:
neigh_venues = getNearbyVenues(names=merged_df['Neighborhood'],
                                   latitudes=merged_df['Latitude'],
                                   longitudes=merged_df['Longitude']
                                  )

Marble Hill, New York City
Chinatown, New York City
Washington Heights, New York City
Inwood, New York City
Hamilton Heights, New York City
Manhattanville, New York City
Central Harlem, New York City
East Harlem, New York City
Upper East Side, New York City
Yorkville, New York City
Lenox Hill, New York City
Roosevelt Island, New York City
Upper West Side, New York City
Lincoln Square, New York City
Clinton, New York City
Midtown, New York City
Murray Hill, New York City
Chelsea, New York City
Greenwich Village, New York City
East Village, New York City
Lower East Side, New York City
Tribeca, New York City
Little Italy, New York City
Soho, New York City
West Village, New York City
Manhattan Valley, New York City
Morningside Heights, New York City
Gramercy, New York City
Battery Park City, New York City
Financial District, New York City
Carnegie Hill, New York City
Noho, New York City
Civic Center, New York City
Midtown South, New York City
Sutton Place, New York City
Turtle Bay, New Yor

In [288]:
# Let's check the size of the resulting dataframe
print(neigh_venues.shape)
neigh_venues.head()

(4444, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Marble Hill, New York City",40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,"Marble Hill, New York City",40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,"Marble Hill, New York City",40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,"Marble Hill, New York City",40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,"Marble Hill, New York City",40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


We have a total of 4444 venues in the 115 neighborhoods in Manhattan and Chicago.

In [289]:
neigh_venues.groupby('Neighborhood').count().sort_values(by=['Venue'], ascending=False)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Yorkville, New York City",100,100,100,100,100,100
"Noho, New York City",100,100,100,100,100,100
"Financial District, New York City",100,100,100,100,100,100
"Lenox Hill, New York City",100,100,100,100,100,100
"East Village, New York City",100,100,100,100,100,100
"Little Italy, New York City",100,100,100,100,100,100
"Loop, Chicago",100,100,100,100,100,100
"Clinton, New York City",100,100,100,100,100,100
"Chinatown, New York City",100,100,100,100,100,100
"Chelsea, New York City",100,100,100,100,100,100


In [290]:
print('There are {} uniques categories.'.format(len(neigh_venues['Venue Category'].unique())))

There are 368 uniques categories.


Out of the 4444 venues, we can group them into 368 distinct categories. As from the table above, you can also see that the neighborhoods with highest count of venues are mostly in Manhattan. Loop in Chicago is the only neighborhood that has more than 100 venues to offer. It looks like someone moving from Manhattan to Chicago will find himself in a city that is not as exciting as he or she might expect. 

__Analyze Each Neighborhood__

Now, let's take a deeper dive into what each neighborhood has to offer

In [291]:
# one hot encoding
neigh_onehot = pd.get_dummies(neigh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neigh_onehot['Neighborhood'] = neigh_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neigh_onehot.columns[-1]] + list(neigh_onehot.columns[:-1])
neigh_onehot = neigh_onehot[fixed_columns]

neigh_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Currency Exchange,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Marble Hill, New York City",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Marble Hill, New York City",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,"Marble Hill, New York City",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Marble Hill, New York City",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Marble Hill, New York City",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [292]:
neigh_grouped = neigh_onehot.groupby('Neighborhood').mean().reset_index()
neigh_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Currency Exchange,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Albany Park, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Archer Heights, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0
2,"Armour Square, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Ashburn, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Auburn Gresham, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Austin, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0
6,"Avalon Park, Chicago",0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Avondale, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0
8,"Battery Park City, New York City",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.016949,0.067797,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.084746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050847,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.101695,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.033898,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0
9,"Belmont Cragin, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's print each neighborhood along with the top 5 most common venues

In [293]:
num_top_venues = 5

for hood in neigh_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neigh_grouped[neigh_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albany Park, Chicago----
                venue  freq
0              Bakery  0.08
1   Mobile Phone Shop  0.08
2       Grocery Store  0.08
3  Chinese Restaurant  0.08
4          Donut Shop  0.08


----Archer Heights, Chicago----
                venue  freq
0  Mexican Restaurant  0.17
1   Mobile Phone Shop  0.11
2       Grocery Store  0.11
3                 Bar  0.06
4      Sandwich Place  0.06


----Armour Square, Chicago----
                venue  freq
0  Chinese Restaurant  0.25
1    Asian Restaurant  0.17
2       Hot Dog Joint  0.08
3      Sandwich Place  0.08
4       Tanning Salon  0.08


----Ashburn, Chicago----
                        venue  freq
0             Automotive Shop  0.33
1  Construction & Landscaping  0.17
2          Italian Restaurant  0.17
3              Cosmetics Shop  0.17
4          Light Rail Station  0.17


----Auburn Gresham, Chicago----
              venue  freq
0              Park  0.25
1              Pool  0.25
2  Basketball Court  0.25
3    Discount Store

                        venue  freq
0         American Restaurant  0.12
1         Fried Chicken Joint  0.06
2  Construction & Landscaping  0.06
3                        Farm  0.06
4               Grocery Store  0.06


----Noho, New York City----
                 venue  freq
0   Italian Restaurant  0.06
1  Japanese Restaurant  0.05
2          Coffee Shop  0.05
3       Sandwich Place  0.04
4        Grocery Store  0.04


----North Center, Chicago----
         venue  freq
0          Bar  0.09
1  Coffee Shop  0.06
2      Brewery  0.04
3         Bank  0.04
4   Restaurant  0.03


----North Lawndale, Chicago----
                        venue  freq
0                        Park  0.25
1  Construction & Landscaping  0.25
2          Seafood Restaurant  0.25
3               Train Station  0.25
4                Noodle House  0.00


----North Park, Chicago----
             venue  freq
0             Park  0.25
1  Nature Preserve  0.25
2      Bus Station  0.25
3   Gymnastics Gym  0.25
4              AT

Again, from a quick glance at the top 5 most common venues it looks to me as some neighborhoods of Chicago are a little bit sleepy. Take, for example, Woodlawn (way at the end) - it has some parks and coffee shops, but nothing more. Could be a perfect fit for a stressed out banker from Wall Street. 

First, let's write a function to sort the venues in descending order.

In [294]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [295]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neigh_grouped['Neighborhood']

for ind in np.arange(neigh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neigh_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Albany Park, Chicago",Mobile Phone Shop,Cocktail Bar,Donut Shop,Grocery Store,Bakery,Chinese Restaurant,Sandwich Place,Diner,Korean Restaurant,Karaoke Bar
1,"Archer Heights, Chicago",Mexican Restaurant,Mobile Phone Shop,Grocery Store,Coffee Shop,Candy Store,Gas Station,Gym / Fitness Center,Sandwich Place,Italian Restaurant,Bank
2,"Armour Square, Chicago",Chinese Restaurant,Asian Restaurant,Cosmetics Shop,Hot Dog Joint,Italian Restaurant,Sandwich Place,Gas Station,Tanning Salon,Business Service,Yoga Studio
3,"Ashburn, Chicago",Automotive Shop,Construction & Landscaping,Cosmetics Shop,Italian Restaurant,Light Rail Station,Eye Doctor,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant
4,"Auburn Gresham, Chicago",Pool,Park,Discount Store,Basketball Court,Dry Cleaner,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant


Now we have the top 10 venues for all neighborhoods in our example. We can move on to cluster them by using the k-means algorithm.

In [296]:
# Let's make a copy by writing to a csv-file for further usage.
neigh_grouped.to_csv('neigh_grouped.csv', encoding='utf-8', index=False)
neighborhoods_venues_sorted.to_csv('neighborhoods_venues_sorted.csv', encoding='utf-8', index=False)

#### 4. Clustering of the neighborhoods with a clustering algorithm to find similar neighborhoods in both cities.


In [297]:
# re-load the csv-file 
df_neigh_grouped = pd.read_csv('neigh_grouped.csv')
df_neigh_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Currency Exchange,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lebanese Restaurant,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stables,Stadium,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Albany Park, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Archer Heights, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0
2,"Armour Square, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Ashburn, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Auburn Gresham, Chicago",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Run k-means to cluster the neighborhood into 5 clusters.

In [298]:
# set number of clusters
kclusters = 5

neigh_grouped_clustering = df_neigh_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neigh_grouped_clustering)


# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 1, 1, 0, 1, 4, 1, 1, 4])

In [299]:
df_neighborhoods_venues_sorted = pd.read_csv('neighborhoods_venues_sorted.csv')
df_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Albany Park, Chicago",Mobile Phone Shop,Cocktail Bar,Donut Shop,Grocery Store,Bakery,Chinese Restaurant,Sandwich Place,Diner,Korean Restaurant,Karaoke Bar
1,"Archer Heights, Chicago",Mexican Restaurant,Mobile Phone Shop,Grocery Store,Coffee Shop,Candy Store,Gas Station,Gym / Fitness Center,Sandwich Place,Italian Restaurant,Bank
2,"Armour Square, Chicago",Chinese Restaurant,Asian Restaurant,Cosmetics Shop,Hot Dog Joint,Italian Restaurant,Sandwich Place,Gas Station,Tanning Salon,Business Service,Yoga Studio
3,"Ashburn, Chicago",Automotive Shop,Construction & Landscaping,Cosmetics Shop,Italian Restaurant,Light Rail Station,Eye Doctor,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant
4,"Auburn Gresham, Chicago",Pool,Park,Discount Store,Basketball Court,Dry Cleaner,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [300]:
# add clustering labels
df_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge to add latitude/longitude for each neighborhood
merged_df = merged_df.join(df_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

merged_df # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Marble Hill, New York City",40.876551,-73.91066,1.0,Sandwich Place,Coffee Shop,Gym,Pharmacy,Deli / Bodega,Department Store,Diner,Discount Store,Kids Store,Donut Shop
1,"Chinatown, New York City",40.715618,-73.994279,1.0,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Coffee Shop,Spa,Salon / Barbershop,Optical Shop,Shanghai Restaurant,Asian Restaurant
2,"Washington Heights, New York City",40.851903,-73.9369,1.0,Café,Bakery,Grocery Store,Mexican Restaurant,Chinese Restaurant,Mobile Phone Shop,New American Restaurant,Spanish Restaurant,Coffee Shop,Latin American Restaurant
3,"Inwood, New York City",40.867684,-73.92121,1.0,Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Wine Bar,Frozen Yogurt Shop,Park,Deli / Bodega
4,"Hamilton Heights, New York City",40.823604,-73.949688,1.0,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Yoga Studio,Park,Caribbean Restaurant,School,Chinese Restaurant
5,"Manhattanville, New York City",40.816934,-73.957385,1.0,Coffee Shop,Seafood Restaurant,Deli / Bodega,Park,Mexican Restaurant,Italian Restaurant,Food & Drink Shop,Farmers Market,Lounge,Bike Trail
6,"Central Harlem, New York City",40.815976,-73.943211,1.0,Gym / Fitness Center,Chinese Restaurant,Seafood Restaurant,African Restaurant,Deli / Bodega,American Restaurant,Bar,French Restaurant,Fried Chicken Joint,Gym
7,"East Harlem, New York City",40.792249,-73.944182,4.0,Mexican Restaurant,Bakery,Thai Restaurant,Latin American Restaurant,Deli / Bodega,Steakhouse,Restaurant,Chinese Restaurant,Street Art,Gas Station
8,"Upper East Side, New York City",40.775639,-73.960508,1.0,Italian Restaurant,Gym / Fitness Center,Coffee Shop,Exhibit,Bakery,Yoga Studio,Pizza Place,French Restaurant,Juice Bar,Spa
9,"Yorkville, New York City",40.77593,-73.947118,1.0,Italian Restaurant,Coffee Shop,Gym,Bar,Deli / Bodega,Wine Shop,Japanese Restaurant,Mexican Restaurant,Diner,Pizza Place


In [301]:
# change data type for algorithm to work properly
merged_df['Cluster Labels'] = merged_df['Cluster Labels'].fillna(0.0).astype(int)
print(merged_df.shape)
merged_df.head()

(115, 14)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Marble Hill, New York City",40.876551,-73.91066,1,Sandwich Place,Coffee Shop,Gym,Pharmacy,Deli / Bodega,Department Store,Diner,Discount Store,Kids Store,Donut Shop
1,"Chinatown, New York City",40.715618,-73.994279,1,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Coffee Shop,Spa,Salon / Barbershop,Optical Shop,Shanghai Restaurant,Asian Restaurant
2,"Washington Heights, New York City",40.851903,-73.9369,1,Café,Bakery,Grocery Store,Mexican Restaurant,Chinese Restaurant,Mobile Phone Shop,New American Restaurant,Spanish Restaurant,Coffee Shop,Latin American Restaurant
3,"Inwood, New York City",40.867684,-73.92121,1,Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Wine Bar,Frozen Yogurt Shop,Park,Deli / Bodega
4,"Hamilton Heights, New York City",40.823604,-73.949688,1,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Yoga Studio,Park,Caribbean Restaurant,School,Chinese Restaurant


In [302]:
# drop NaN and check shape
merged_df.dropna(inplace=True)
merged_df.shape

(113, 14)

Okay, we have lost 2 neighborhoods due to NaN values.

In [303]:
# separate cities- first the old city
merged_df_old = merged_df[~merged_df['Neighborhood'].str.contains('Chicago')].sort_values(by=['Neighborhood'], ascending=True)
merged_df_old

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,"Battery Park City, New York City",40.711932,-74.016869,1,Park,Hotel,Gym,Memorial Site,Italian Restaurant,Coffee Shop,Shopping Mall,Burger Joint,Boat or Ferry,Food Court
30,"Carnegie Hill, New York City",40.782683,-73.953256,1,Coffee Shop,Café,Yoga Studio,Bakery,Gym / Fitness Center,Gym,Italian Restaurant,Japanese Restaurant,Bookstore,Bar
6,"Central Harlem, New York City",40.815976,-73.943211,1,Gym / Fitness Center,Chinese Restaurant,Seafood Restaurant,African Restaurant,Deli / Bodega,American Restaurant,Bar,French Restaurant,Fried Chicken Joint,Gym
17,"Chelsea, New York City",40.744035,-74.003116,1,Art Gallery,Coffee Shop,Ice Cream Shop,Italian Restaurant,American Restaurant,Park,Bookstore,Cupcake Shop,Boutique,Market
1,"Chinatown, New York City",40.715618,-73.994279,1,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Coffee Shop,Spa,Salon / Barbershop,Optical Shop,Shanghai Restaurant,Asian Restaurant
32,"Civic Center, New York City",40.715229,-74.005415,1,Hotel,Coffee Shop,Spa,French Restaurant,Park,Cocktail Bar,Café,Sushi Restaurant,Gym / Fitness Center,Yoga Studio
14,"Clinton, New York City",40.759101,-73.996119,1,Theater,Coffee Shop,Gym / Fitness Center,Gym,Spa,Wine Shop,Hotel,Thai Restaurant,Italian Restaurant,American Restaurant
7,"East Harlem, New York City",40.792249,-73.944182,4,Mexican Restaurant,Bakery,Thai Restaurant,Latin American Restaurant,Deli / Bodega,Steakhouse,Restaurant,Chinese Restaurant,Street Art,Gas Station
19,"East Village, New York City",40.727847,-73.982226,1,Pizza Place,Bar,Coffee Shop,Cocktail Bar,Japanese Restaurant,Juice Bar,Café,Salon / Barbershop,Bagel Shop,Record Shop
29,"Financial District, New York City",40.707107,-74.010665,1,Coffee Shop,Hotel,American Restaurant,Café,Cocktail Bar,Gym,Gym / Fitness Center,Falafel Restaurant,Japanese Restaurant,Pizza Place


In [304]:
# create map
map_clusters_old = folium.Map(location=[old_latitude, old_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged_df_old['Latitude'], merged_df_old['Longitude'], merged_df_old['Neighborhood'], merged_df_old['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_old)
       
map_clusters_old

Interestingly, when we put the neighborhoods in Manhattan together with the ones from Chicago, we find that they are quite similar. Almost all of them belong to Cluster 1. East Harlem is the only neighborhood that was put into Cluster 4 and no single one was assigned to Cluter 2 and 3.

In [305]:
# separate cities- now the new city,
merged_df_new = merged_df[~merged_df['Neighborhood'].str.contains('New York')].reset_index(drop=True)
merged_df_new.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rogers Park, Chicago",42.010531,-87.670748,4,Mexican Restaurant,Pizza Place,Chinese Restaurant,American Restaurant,Train Station,Bakery,Theater,Donut Shop,Discount Store,Dive Bar
1,"West Ridge, Chicago",42.003548,-87.696243,1,Convenience Store,Wine Bar,Fried Chicken Joint,Eye Doctor,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
2,"Uptown, Chicago",41.96663,-87.655546,1,Coffee Shop,Mexican Restaurant,Pizza Place,Bar,Music Venue,Sushi Restaurant,Diner,Ethiopian Restaurant,Grocery Store,Lounge
3,"Lincoln Square, Chicago",41.97599,-87.689616,1,Bar,Sandwich Place,Bus Station,Café,Cosmetics Shop,Korean Restaurant,Convenience Store,Pizza Place,Pharmacy,Bank
4,"North Center, Chicago",41.956107,-87.67916,1,Bar,Coffee Shop,Brewery,Bank,Dive Bar,Gym / Fitness Center,Boutique,Restaurant,Spa,Furniture / Home Store


In [306]:
# create map
map_clusters_new = folium.Map(location=[new_latitude, new_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged_df_new['Latitude'], merged_df_new['Longitude'], merged_df_new['Neighborhood'], merged_df_new['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_new)
       
map_clusters_new

In contrast to Manhattan, the neighborhoods in Chicago seem to differ quite a lot. Here, we found a higher number of Cluster 0, 1, and 4. Whereas Cluster 2 hat a single neighborhood, Pullman and 3 is assigned to Norwood Park only.  

#### 5. Define each cluster by checking the main characteristics based on venues data.


In [307]:
# let's check out cluster 0
merged_df.loc[merged_df['Cluster Labels'] == 0, merged_df.columns[[0] + list(range(4, merged_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,"North Park, Chicago",Bus Station,Gymnastics Gym,Park,Nature Preserve,Exhibit,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
61,"Humboldt park, Chicago",Park,Food Truck,Baseball Field,Museum,Lake,Beach,History Museum,Café,Soccer Field,Yoga Studio
67,"North Lawndale, Chicago",Seafood Restaurant,Train Station,Park,Construction & Landscaping,Coworking Space,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store
74,"Oakland, Chicago",Park,Boutique,Public Art,Track,Discount Store,Event Space,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store
75,"Fuller Park, Chicago",Fast Food Restaurant,Park,Sandwich Place,Yoga Studio,Ethiopian Restaurant,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store
80,"Woodlawn, Chicago",Park,Coffee Shop,Yoga Studio,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
82,"Chatham, Chicago",Park,Boutique,Fast Food Restaurant,Bus Station,Donut Shop,Ice Cream Shop,Creperie,Eye Doctor,Eastern European Restaurant,Electronics Store
86,"Calumet Heights, Chicago",Gym / Fitness Center,Bus Station,Park,Deli / Bodega,Financial or Legal Service,Filipino Restaurant,Dumpling Restaurant,Duty-free Shop,Fish Market,Eastern European Restaurant
109,"Auburn Gresham, Chicago",Pool,Park,Discount Store,Basketball Court,Dry Cleaner,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant
111,"Mount Greenwood, Chicago",Cosmetics Shop,Park,Vineyard,Yoga Studio,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant


In [308]:
# let's check out cluster 1
merged_df.loc[merged_df['Cluster Labels'] == 1, merged_df.columns[[0] + list(range(4, merged_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Marble Hill, New York City",Sandwich Place,Coffee Shop,Gym,Pharmacy,Deli / Bodega,Department Store,Diner,Discount Store,Kids Store,Donut Shop
1,"Chinatown, New York City",Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Coffee Shop,Spa,Salon / Barbershop,Optical Shop,Shanghai Restaurant,Asian Restaurant
2,"Washington Heights, New York City",Café,Bakery,Grocery Store,Mexican Restaurant,Chinese Restaurant,Mobile Phone Shop,New American Restaurant,Spanish Restaurant,Coffee Shop,Latin American Restaurant
3,"Inwood, New York City",Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Wine Bar,Frozen Yogurt Shop,Park,Deli / Bodega
4,"Hamilton Heights, New York City",Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Yoga Studio,Park,Caribbean Restaurant,School,Chinese Restaurant
5,"Manhattanville, New York City",Coffee Shop,Seafood Restaurant,Deli / Bodega,Park,Mexican Restaurant,Italian Restaurant,Food & Drink Shop,Farmers Market,Lounge,Bike Trail
6,"Central Harlem, New York City",Gym / Fitness Center,Chinese Restaurant,Seafood Restaurant,African Restaurant,Deli / Bodega,American Restaurant,Bar,French Restaurant,Fried Chicken Joint,Gym
8,"Upper East Side, New York City",Italian Restaurant,Gym / Fitness Center,Coffee Shop,Exhibit,Bakery,Yoga Studio,Pizza Place,French Restaurant,Juice Bar,Spa
9,"Yorkville, New York City",Italian Restaurant,Coffee Shop,Gym,Bar,Deli / Bodega,Wine Shop,Japanese Restaurant,Mexican Restaurant,Diner,Pizza Place
10,"Lenox Hill, New York City",Italian Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Cocktail Bar,Café,Burger Joint,Gym / Fitness Center,Gym,Salon / Barbershop


In [309]:
# let's check out cluster 2
merged_df.loc[merged_df['Cluster Labels'] == 2, merged_df.columns[[0] + list(range(4, merged_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
88,"Pullman, Chicago",History Museum,Yoga Studio,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


In [310]:
# let's check out cluster 3
merged_df.loc[merged_df['Cluster Labels'] == 3, merged_df.columns[[0] + list(range(4, merged_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,"Norwood Park, Chicago",Park,Yoga Studio,Exhibit,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


In [311]:
# let's check out cluster 4
merged_df.loc[merged_df['Cluster Labels'] == 4, merged_df.columns[[0] + list(range(4, merged_df.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,"East Harlem, New York City",Mexican Restaurant,Bakery,Thai Restaurant,Latin American Restaurant,Deli / Bodega,Steakhouse,Restaurant,Chinese Restaurant,Street Art,Gas Station
40,"Rogers Park, Chicago",Mexican Restaurant,Pizza Place,Chinese Restaurant,American Restaurant,Train Station,Bakery,Theater,Donut Shop,Discount Store,Dive Bar
51,"Forest Glen, Chicago",Indian Restaurant,Fast Food Restaurant,Golf Course,Grocery Store,Coffee Shop,Pharmacy,Filipino Restaurant,Farmers Market,Dumpling Restaurant,Fish Market
57,"Belmont Cragin, Chicago",Mexican Restaurant,Restaurant,Laundromat,Discount Store,Gas Station,Chinese Restaurant,Department Store,BBQ Joint,Thrift / Vintage Store,Nightclub
58,"Hermosa, Chicago",Optical Shop,Discount Store,Park,Art Gallery,Department Store,Check Cashing Service,Latin American Restaurant,Supermarket,Mexican Restaurant,Seafood Restaurant
64,"West Garfield Park, Chicago",Fast Food Restaurant,Clothing Store,Fried Chicken Joint,Shoe Store,Pizza Place,Sandwich Place,Taco Place,Discount Store,Kids Store,Intersection
65,"East Garfield Park, Chicago",Garden Center,Public Art,Discount Store,American Restaurant,Pharmacy,Burger Joint,Pet Service,Bus Line,Train Station,Yoga Studio
68,"South Lawndale, Chicago",Mexican Restaurant,Ice Cream Shop,Grocery Store,Dessert Shop,Clothing Store,Restaurant,Liquor Store,Mobile Phone Shop,Pizza Place,Ethiopian Restaurant
69,"Lower West Side, Chicago",Grocery Store,Mexican Restaurant,Food,Gas Station,Dessert Shop,Farmers Market,Food Truck,Supermarket,Boat or Ferry,Financial or Legal Service
78,"Washington Park, Chicago",Fast Food Restaurant,Convenience Store,Art Gallery,Bookstore,Bus Station,Gas Station,Lounge,Theater,Train Station,Breakfast Spot


The clustering of the neighborhoods lead to the following results:

- Cluster 0, 2, and 3 includes only Chicago neighborhoods, so for our analysis we can skip these clusters since there is no possible match.
- Cluster 1 seems to have a very diverse mix of parks, restaurants and other interesting places, but it is hard to find the one main topic that have all neighborhoods in common.
- Cluster 4 is interesting for our analysis since we have only one neighborhood in Manhattan, East Harlem, and multiples new neighborhoods to choose from. 

So, we will define Cluster 4 as our example for the remainder of the project and the goal is to find the best fit for a person moving from East Harlem, NY to Chicago.

#### 6. Pick the cluster that your old neighborhood is in and select the neighborhoods from the new city.

In [320]:
# create a new dataframe that contains only new city neighborhoods from the cluster we've picked and skip out East Harlem
df_cluster = merged_df.loc[merged_df['Cluster Labels'] == 4]
df_cluster = df_cluster[~df_cluster['Neighborhood'].str.contains('New York')].reset_index(drop=True)
df_cluster

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rogers Park, Chicago",42.010531,-87.670748,4,Mexican Restaurant,Pizza Place,Chinese Restaurant,American Restaurant,Train Station,Bakery,Theater,Donut Shop,Discount Store,Dive Bar
1,"Forest Glen, Chicago",41.991752,-87.751674,4,Indian Restaurant,Fast Food Restaurant,Golf Course,Grocery Store,Coffee Shop,Pharmacy,Filipino Restaurant,Farmers Market,Dumpling Restaurant,Fish Market
2,"Belmont Cragin, Chicago",41.931698,-87.76867,4,Mexican Restaurant,Restaurant,Laundromat,Discount Store,Gas Station,Chinese Restaurant,Department Store,BBQ Joint,Thrift / Vintage Store,Nightclub
3,"Hermosa, Chicago",41.928643,-87.734502,4,Optical Shop,Discount Store,Park,Art Gallery,Department Store,Check Cashing Service,Latin American Restaurant,Supermarket,Mexican Restaurant,Seafood Restaurant
4,"West Garfield Park, Chicago",41.880588,-87.729223,4,Fast Food Restaurant,Clothing Store,Fried Chicken Joint,Shoe Store,Pizza Place,Sandwich Place,Taco Place,Discount Store,Kids Store,Intersection
5,"East Garfield Park, Chicago",41.880866,-87.702833,4,Garden Center,Public Art,Discount Store,American Restaurant,Pharmacy,Burger Joint,Pet Service,Bus Line,Train Station,Yoga Studio
6,"South Lawndale, Chicago",41.843644,-87.712554,4,Mexican Restaurant,Ice Cream Shop,Grocery Store,Dessert Shop,Clothing Store,Restaurant,Liquor Store,Mobile Phone Shop,Pizza Place,Ethiopian Restaurant
7,"Lower West Side, Chicago",41.84762,-87.671774,4,Grocery Store,Mexican Restaurant,Food,Gas Station,Dessert Shop,Farmers Market,Food Truck,Supermarket,Boat or Ferry,Financial or Legal Service
8,"Washington Park, Chicago",41.792534,-87.618105,4,Fast Food Restaurant,Convenience Store,Art Gallery,Bookstore,Bus Station,Gas Station,Lounge,Theater,Train Station,Breakfast Spot
9,"Avalon Park, Chicago",41.745035,-87.588658,4,Fast Food Restaurant,Burger Joint,ATM,Cajun / Creole Restaurant,Sandwich Place,Diner,Grocery Store,Pizza Place,Boutique,Insurance Office


In [321]:
df_cluster.shape

(26, 14)

So, the question is, which of the 26 neighborhoods should be picked by our East Harlem migrant?

### B: Scoring of neighborhoods within the new city to make a recommendation of the best places

We'll begin with some cleaning work first.

In [323]:
# clean the dataframe
df_top10 = df_cluster[['Neighborhood']]
df_top10.head()

Unnamed: 0,Neighborhood
0,"Rogers Park, Chicago"
1,"Forest Glen, Chicago"
2,"Belmont Cragin, Chicago"
3,"Hermosa, Chicago"
4,"West Garfield Park, Chicago"


In [326]:
# drop the name Chicago
df_top10['Neighborhood'] = df_top10['Neighborhood'].map(lambda x: x.rstrip(', Chicago'))
df_top10.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Neighborhood
0,Rogers Park
1,Forest Glen
2,Belmont Cragin
3,Hermos
4,West Garfield Park


#### 7. Score each of the potential new neighborhood based on socioeconomic data and define the top 10 list.


Now, we add socioeconomic data from the City of Chicago website to find the places which might be a not so good environment.

In [327]:
# we can refer to the data frame from the beginning when we started analyzing the new city
new_socioeconomic_data.head()

Unnamed: 0,Community Area Number,Neighborhood,PERCENT OF HOUSING CROWDED,PERCENT HOUSEHOLDS BELOW POVERTY,PERCENT AGED 16+ UNEMPLOYED,PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA,PERCENT AGED UNDER 18 OR OVER 64,PER CAPITA INCOME,Hardship Index
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0


I will use the Hardship Index only as this score includes each of the indicators.

I acknowledge that the time series ends in the year of 2012, but for this assignment I will ignore this fact and assume that the data is up to date and still valid for this decision process.


In [328]:
# get the needed columns
hardship_index = new_socioeconomic_data[['Neighborhood', 'Community Area Number', 'Hardship Index']].reset_index(drop=True)
hardship_index

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index
0,Rogers Park,1.0,39.0
1,West Ridge,2.0,46.0
2,Uptown,3.0,20.0
3,Lincoln Square,4.0,17.0
4,North Center,5.0,6.0
5,Lake View,6.0,5.0
6,Lincoln Park,7.0,2.0
7,Near North Side,8.0,1.0
8,Edison Park,9.0,8.0
9,Norwood Park,10.0,21.0


In [329]:
# merge the hardship index to the cluster and sort in ascending order
df_top10 = df_top10.merge(hardship_index, on='Neighborhood')
df_top10

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index
0,Rogers Park,1.0,39.0
1,Forest Glen,12.0,11.0
2,Belmont Cragin,19.0,70.0
3,West Garfield Park,26.0,92.0
4,East Garfield Park,27.0,83.0
5,South Lawndale,30.0,96.0
6,Lower West Side,31.0,76.0
7,Washington Park,40.0,88.0
8,Avalon Park,45.0,41.0
9,Burnside,47.0,79.0


In [330]:
# sort neighborhoods by Hardship Index
df_top10_sorted = df_top10.sort_values(by='Hardship Index', ascending=True).reset_index(drop=True)
df_top10_sorted

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index
0,Forest Glen,12.0,11.0
1,Morgan Park,75.0,30.0
2,Garfield Ridge,56.0,32.0
3,Rogers Park,1.0,39.0
4,Avalon Park,45.0,41.0
5,West Lawn,65.0,56.0
6,McKinley Park,59.0,61.0
7,West Pullman,53.0,62.0
8,Archer Heights,57.0,67.0
9,West Elsdon,62.0,69.0


In [345]:
# cut the list of the neighborhoods down to 10
df_top10 = df_top10_sorted.head(10)
df_top10

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index
0,Forest Glen,12.0,11.0
1,Morgan Park,75.0,30.0
2,Garfield Ridge,56.0,32.0
3,Rogers Park,1.0,39.0
4,Avalon Park,45.0,41.0
5,West Lawn,65.0,56.0
6,McKinley Park,59.0,61.0
7,West Pullman,53.0,62.0
8,Archer Heights,57.0,67.0
9,West Elsdon,62.0,69.0


Now we have reduced our list of potential new neighborhoods down to 10 and can go on to analyze further relevant data. From here, it looks like Forest Glen is quite a good spot to live.

#### 8. Score remaining neighborhood based on crime data and we'll get the top 5 list.


Now, we add crime data into the analysis.

In [332]:
# we can find the names of the neighborhoods in this data source
new_crime_data = pd.read_csv('Crimes_-_2001_to_present.csv')
new_crime_data.shape

  interactivity=interactivity, compiler=compiler, result=result)


(7100712, 22)

Let's have a look at the file.

In [333]:
new_crime_data.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11034701,JA366925,01/01/2001 11:00:00 AM,016XX E 86TH PL,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,412,4.0,8.0,45.0,11,,,2001,08/05/2017 03:50:08 PM,,,
1,11227287,JB147188,10/08/2017 03:00:00 AM,092XX S RACINE AVE,281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,RESIDENCE,False,False,2222,22.0,21.0,73.0,2,,,2017,02/11/2018 03:57:41 PM,,,
2,11227583,JB147595,03/28/2017 02:00:00 PM,026XX W 79TH ST,620,BURGLARY,UNLAWFUL ENTRY,OTHER,False,False,835,8.0,18.0,70.0,5,,,2017,02/11/2018 03:57:41 PM,,,
3,11227293,JB147230,09/09/2017 08:17:00 PM,060XX S EBERHART AVE,810,THEFT,OVER $500,RESIDENCE,False,False,313,3.0,20.0,42.0,6,,,2017,02/11/2018 03:57:41 PM,,,
4,11227634,JB147599,08/26/2017 10:00:00 AM,001XX W RANDOLPH ST,281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,HOTEL/MOTEL,False,False,122,1.0,42.0,32.0,2,,,2017,02/11/2018 03:57:41 PM,,,


In [334]:
# drop NaN
new_crime_data.dropna(inplace=True)
print(new_crime_data.shape)
new_crime_data.head()

(6421539, 22)


Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
63316,11665567,JC234307,04/10/2019 04:37:00 PM,102XX S VERNON AVE,1562,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,"SCHOOL, PUBLIC, BUILDING",False,False,511,5.0,9.0,49.0,17,1181051.0,1837225.0,2019,08/03/2019 04:02:13 PM,41.708589,-87.612583,"(41.708589, -87.612583094)"
63401,11667963,JC235212,04/12/2019 04:08:00 PM,032XX N KEELER AVE,1754,OFFENSE INVOLVING CHILDREN,AGG SEX ASSLT OF CHILD FAM MBR,RESIDENCE,False,True,1731,17.0,30.0,16.0,02,1147835.0,1921408.0,2019,10/02/2019 04:13:24 PM,41.940298,-87.732066,"(41.940297617, -87.732066473)"
63402,11667968,JC237058,04/19/2019 01:57:00 PM,002XX N LARAMIE AVE,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,RESIDENCE,False,True,1532,15.0,28.0,25.0,17,1141669.0,1901165.0,2019,03/18/2020 03:52:17 PM,41.884865,-87.75523,"(41.884865037, -87.755230327)"
63478,11668309,JC238187,04/25/2019 05:20:00 PM,108XX S DR MARTIN LUTHER KING JR DR,486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,513,5.0,9.0,49.0,08B,1180832.0,1833222.0,2019,06/30/2019 03:56:27 PM,41.697609,-87.613508,"(41.697609261, -87.613507612)"
63883,11692179,JC261724,05/13/2019 05:26:00 PM,090XX S RACINE AVE,560,ASSAULT,SIMPLE,STREET,False,False,2222,22.0,21.0,73.0,08A,1169908.0,1844927.0,2019,06/30/2019 03:56:27 PM,41.729973,-87.653167,"(41.729973132, -87.653166753)"


This file is quite huge, so we need to trim is to make it more manageable. For this, we will drop the years before 2015.

In [335]:
# Select years we want to drop
YearsToDrop = new_crime_data[ new_crime_data['Year'] < 2015 ].index

# Delete these row indexes from dataFrame
new_crime_data.drop(YearsToDrop , inplace=True)
new_crime_data.tail(50)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
7100658,9999943,HY189919,03/18/2015 08:15:00 PM,035XX S COTTAGE GROVE AVE,2820,OTHER OFFENSE,TELEPHONE THREAT,APARTMENT,False,True,212,2.0,4.0,36.0,26,1181307.0,1881665.0,2015,02/10/2018 03:50:01 PM,41.830531,-87.610277,"(41.830530933, -87.610277459)"
7100659,9999944,HY182162,03/10/2015 10:05:00 PM,001XX N KEDZIE AVE,2890,PUBLIC PEACE VIOLATION,OTHER VIOLATION,TAVERN/LIQUOR STORE,True,False,1222,12.0,27.0,27.0,26,1155029.0,1900649.0,2015,02/10/2018 03:50:01 PM,41.883192,-87.706184,"(41.883191631, -87.706183852)"
7100660,9999945,HY189912,03/18/2015 04:45:00 PM,028XX S FARRELL ST,0810,THEFT,OVER $500,STREET,False,False,913,9.0,11.0,60.0,06,1169245.0,1885869.0,2015,02/10/2018 03:50:01 PM,41.842337,-87.654411,"(41.842337169, -87.654411216)"
7100661,9999946,HY189905,03/18/2015 07:40:00 PM,024XX S CALIFORNIA AVE,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,ALLEY,True,False,1033,10.0,12.0,30.0,18,1158088.0,1887537.0,2015,02/10/2018 03:50:01 PM,41.847149,-87.695309,"(41.847149067, -87.69530883)"
7100662,9999947,HY189876,03/18/2015 07:15:00 PM,055XX S MICHIGAN AVE,1812,NARCOTICS,POSS: CANNABIS MORE THAN 30GMS,STREET,True,False,225,2.0,20.0,40.0,18,1178100.0,1868327.0,2015,02/10/2018 03:50:01 PM,41.794004,-87.622449,"(41.794003774, -87.622448757)"
7100663,9999948,HY189884,03/18/2015 07:50:00 PM,013XX W CULLERTON ST,4310,OTHER OFFENSE,POSSESSION OF BURGLARY TOOLS,ALLEY,True,False,1235,12.0,25.0,31.0,26,1167683.0,1890535.0,2015,02/10/2018 03:50:01 PM,41.855175,-87.660009,"(41.855174837, -87.660009037)"
7100664,9999949,HY189799,03/18/2015 06:21:00 PM,069XX S INDIANA AVE,143A,WEAPONS VIOLATION,UNLAWFUL POSS OF HANDGUN,ABANDONED BUILDING,True,False,322,3.0,6.0,69.0,15,1178789.0,1859179.0,2015,02/10/2018 03:50:01 PM,41.768885,-87.6202,"(41.768885085, -87.620200454)"
7100665,9999950,HY189920,03/18/2015 08:32:00 PM,005XX E 115TH ST,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,SIDEWALK,True,False,531,5.0,9.0,50.0,18,1181715.0,1828799.0,2015,02/10/2018 03:50:01 PM,41.685452,-87.61041,"(41.685451645, -87.610410484)"
7100666,9999951,HY189917,03/18/2015 07:35:00 PM,070XX S SOUTH CHICAGO AVE,5002,OTHER OFFENSE,OTHER VEHICLE OFFENSE,APARTMENT,True,False,322,3.0,6.0,69.0,26,1182407.0,1858471.0,2015,02/10/2018 03:50:01 PM,41.766859,-87.606961,"(41.766859179, -87.606960677)"
7100667,9999952,HY189880,03/18/2015 07:45:00 PM,001XX N STATE ST,0860,THEFT,RETAIL THEFT,DEPARTMENT STORE,True,False,111,1.0,42.0,32.0,06,1176338.0,1901346.0,2015,02/10/2018 03:50:01 PM,41.88465,-87.627915,"(41.884650262, -87.627915459)"


In [336]:
new_crime_data.shape

(1361345, 22)

We have still over 1 million rows left in this dataframe and a wealth of information to dig into. For the purpose of this project, I will concentrate on the number of incidents only and do not take the type of crime into account.

In [337]:
# group crimes by neighborhoods and apply count() function 

df_neigh_counts = new_crime_data['Community Area'].value_counts().to_frame()
df_neigh_counts.reset_index(drop=False, inplace=True)
df_neigh_counts.rename(columns={'index':'Community Area', 'Community Area':'Count'}, inplace=True)
df_neigh_counts

Unnamed: 0,Community Area,Count
0,25.0,81565
1,8.0,58702
2,32.0,49484
3,28.0,46213
4,29.0,45586
5,43.0,44720
6,23.0,41476
7,71.0,39096
8,24.0,38733
9,67.0,35923


After having count the total numbers of crimes over the time period 2015 - March 2020, I will calculate the average crime count per year as a rough indication for how dangerous a neighborhood was in the past. 

In [338]:
# count per year - divided by 4.25 (four full years and a quarter of 2020)
df_neigh_counts['Count per Year'] = df_neigh_counts['Count'] / 4.25 
df_neigh_counts

Unnamed: 0,Community Area,Count,Count per Year
0,25.0,81565,19191.764706
1,8.0,58702,13812.235294
2,32.0,49484,11643.294118
3,28.0,46213,10873.647059
4,29.0,45586,10726.117647
5,43.0,44720,10522.352941
6,23.0,41476,9759.058824
7,71.0,39096,9199.058824
8,24.0,38733,9113.647059
9,67.0,35923,8452.470588


In [339]:
# rename column so the match between dataframes
df_neigh_counts.rename(columns={'Community Area': 'Community Area Number'}, inplace=True)
df_neigh_counts

Unnamed: 0,Community Area Number,Count,Count per Year
0,25.0,81565,19191.764706
1,8.0,58702,13812.235294
2,32.0,49484,11643.294118
3,28.0,46213,10873.647059
4,29.0,45586,10726.117647
5,43.0,44720,10522.352941
6,23.0,41476,9759.058824
7,71.0,39096,9199.058824
8,24.0,38733,9113.647059
9,67.0,35923,8452.470588


In [346]:
# merge crimes per year to top10 list
df_top5 = df_top10.merge(df_neigh_counts, on='Community Area Number')
df_top5

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index,Count,Count per Year
0,Forest Glen,12.0,11.0,2584,608.0
1,Morgan Park,75.0,30.0,10459,2460.941176
2,Garfield Ridge,56.0,32.0,10015,2356.470588
3,Rogers Park,1.0,39.0,19768,4651.294118
4,Avalon Park,45.0,41.0,6670,1569.411765
5,West Lawn,65.0,56.0,9817,2309.882353
6,McKinley Park,59.0,61.0,5032,1184.0
7,West Pullman,53.0,62.0,20513,4826.588235
8,Archer Heights,57.0,67.0,4470,1051.764706
9,West Elsdon,62.0,69.0,5223,1228.941176


In [347]:
# sort in ascending order of count per year
df_top5 = df_top5.sort_values(by='Count per Year', ascending=True).reset_index(drop=True)
df_top5

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index,Count,Count per Year
0,Forest Glen,12.0,11.0,2584,608.0
1,Archer Heights,57.0,67.0,4470,1051.764706
2,McKinley Park,59.0,61.0,5032,1184.0
3,West Elsdon,62.0,69.0,5223,1228.941176
4,Avalon Park,45.0,41.0,6670,1569.411765
5,West Lawn,65.0,56.0,9817,2309.882353
6,Garfield Ridge,56.0,32.0,10015,2356.470588
7,Morgan Park,75.0,30.0,10459,2460.941176
8,Rogers Park,1.0,39.0,19768,4651.294118
9,West Pullman,53.0,62.0,20513,4826.588235


Again, Forest Glen has the best score in relation to the crime statistics which should come at no surprise. Interestingly, however, is that neighborhoods with a high Hardship Index seem to have relatively low crime rates.  

In [348]:
# cut the list of the neighborhoods down to 5
df_top5 = df_top5.head(5)
df_top5

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index,Count,Count per Year
0,Forest Glen,12.0,11.0,2584,608.0
1,Archer Heights,57.0,67.0,4470,1051.764706
2,McKinley Park,59.0,61.0,5032,1184.0
3,West Elsdon,62.0,69.0,5223,1228.941176
4,Avalon Park,45.0,41.0,6670,1569.411765


#### 9. Last step is to check housing prices for the top 5 list and decide based on financial resources available.

In [349]:
# load the housing prices from file
housing_data = pd.read_csv('Neighborhood_Zhvi_AllHomes.csv')
housing_data.head()

Unnamed: 0,RegionID,RegionName,City,State,Metro,CountyName,SizeRank,1996-04,1996-05,1996-06,1996-07,1996-08,1996-09,1996-10,1996-11,1996-12,1997-01,1997-02,1997-03,1997-04,1997-05,1997-06,1997-07,1997-08,1997-09,1997-10,1997-11,1997-12,1998-01,1998-02,1998-03,1998-04,1998-05,1998-06,1998-07,1998-08,1998-09,1998-10,1998-11,1998-12,1999-01,1999-02,1999-03,1999-04,1999-05,1999-06,1999-07,1999-08,1999-09,1999-10,1999-11,1999-12,2000-01,2000-02,2000-03,2000-04,2000-05,2000-06,2000-07,2000-08,2000-09,2000-10,2000-11,2000-12,2001-01,2001-02,2001-03,2001-04,2001-05,2001-06,2001-07,2001-08,2001-09,2001-10,2001-11,2001-12,2002-01,2002-02,2002-03,2002-04,2002-05,2002-06,2002-07,2002-08,2002-09,2002-10,2002-11,2002-12,2003-01,2003-02,2003-03,2003-04,2003-05,2003-06,2003-07,2003-08,2003-09,2003-10,2003-11,2003-12,2004-01,2004-02,2004-03,2004-04,2004-05,2004-06,2004-07,2004-08,2004-09,2004-10,2004-11,2004-12,2005-01,2005-02,2005-03,2005-04,2005-05,2005-06,2005-07,2005-08,2005-09,2005-10,2005-11,2005-12,2006-01,2006-02,2006-03,2006-04,2006-05,2006-06,2006-07,2006-08,2006-09,2006-10,2006-11,2006-12,2007-01,2007-02,2007-03,2007-04,2007-05,2007-06,2007-07,2007-08,2007-09,2007-10,2007-11,2007-12,2008-01,2008-02,2008-03,2008-04,2008-05,2008-06,2008-07,2008-08,2008-09,2008-10,2008-11,2008-12,2009-01,2009-02,2009-03,2009-04,2009-05,2009-06,2009-07,2009-08,2009-09,2009-10,2009-11,2009-12,2010-01,2010-02,2010-03,2010-04,2010-05,2010-06,2010-07,2010-08,2010-09,2010-10,2010-11,2010-12,2011-01,2011-02,2011-03,2011-04,2011-05,2011-06,2011-07,2011-08,2011-09,2011-10,2011-11,2011-12,2012-01,2012-02,2012-03,2012-04,2012-05,2012-06,2012-07,2012-08,2012-09,2012-10,2012-11,2012-12,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,2013-11,2013-12,2014-01,2014-02,2014-03,2014-04,2014-05,2014-06,2014-07,2014-08,2014-09,2014-10,2014-11,2014-12,2015-01,2015-02,2015-03,2015-04,2015-05,2015-06,2015-07,2015-08,2015-09,2015-10,2015-11,2015-12,2016-01,2016-02,2016-03,2016-04,2016-05,2016-06,2016-07,2016-08,2016-09,2016-10,2016-11,2016-12,2017-01,2017-02,2017-03,2017-04,2017-05,2017-06,2017-07,2017-08,2017-09,2017-10,2017-11,2017-12,2018-01,2018-02,2018-03,2018-04,2018-05,2018-06,2018-07,2018-08,2018-09,2018-10,2018-11,2018-12,2019-01,2019-02,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,2019-11,2019-12,2020-01,2020-02
0,274772,Northeast Dallas,Dallas,TX,Dallas-Fort Worth-Arlington,Dallas County,1,135166.0,135884.0,136017.0,135645.0,135449.0,135255.0,135714.0,135654.0,135743.0,135809.0,136179.0,135807.0,135537.0,135153.0,135666.0,136707.0,137646.0,138308.0,138409.0,139021.0,139684.0,139676.0,139397.0,139228.0,139367.0,139527.0,139401.0,139564.0,139515.0,140066.0,139883.0,140039.0,139668.0,140691.0,141986.0,143601.0,144744.0,145735.0,146837.0,147702.0,148431.0,148699.0,149458.0,150030.0,150848.0,151127.0,151654.0,151851.0,152110.0,152178.0,152596.0,153192.0,153897.0,154611.0,155194.0,155873.0,156577.0,156985.0,156368.0,156150.0,156087.0,156600.0,156286.0,156074.0,156130.0,156388.0,156668.0,156854.0,157165.0,157486.0,158153.0,158493.0,159028.0,159468.0,160407.0,161246.0,161868.0,162400.0,162747.0,163087.0,163506.0,163416.0,163240.0,162737.0,162693.0,162982.0,163582.0,163972.0,164438.0,164989.0,165801.0,166353.0,166732.0,167864.0,170182.0,173040.0,174947.0,175334.0,174967.0,175140.0,175522.0,175776.0,176051.0,176829.0,177697.0,178961.0,178637.0,178449.0,177901.0,178523.0,179655.0,180427.0,181114.0,181665.0,181585.0,181748.0,181397.0,181492.0,182409.0,183059.0,183774.0,184033.0,184308.0,184246.0,183965.0,184057.0,185305.0,186020.0,186898.0,186994.0,186930.0,186965.0,187386.0,188019.0,188290.0,188413.0,188886.0,189538.0,189817.0,190639.0,191260.0,191494.0,190200.0,189037.0,187911.0,187356.0,187120.0,187783.0,188459.0,188261.0,187322.0,184755.0,183288.0,182201.0,183520.0,184454.0,185519.0,186260.0,186544.0,186626.0,186208.0,186707.0,187304.0,189194.0,189780.0,189900.0,188955.0,188295.0,188263.0,188029.0,188127.0,187263.0,186296.0,184573.0,183547.0,182486.0,182596.0,182981.0,182964.0,182371.0,181007.0,179800.0,178592.0,177781.0,177644.0,177790.0,177680.0,177213.0,176182.0,175665.0,176332.0,177676.0,178474.0,178741.0,178988.0,180439.0,181541.0,182600.0,183119.0,184142.0,185270.0,186642.0,187203.0,187826.0,189123.0,190912.0,192694.0,194455.0,196290.0,198382.0,200112.0,201679.0,203256.0,204469.0,206090.0,207405.0,208296.0,209238.0,210584.0,210021.0,210731.0,211472.0,214316.0,216318.0,218154.0,219864.0,221358.0,223420.0,226353.0,229289.0,232012.0,235455.0,237693.0,239685.0,240970.0,243378.0,246293.0,247692.0,248688.0,249204.0,251187.0,253777.0,256769.0,260384.0,263119.0,266090.0,268618.0,271602.0,273559.0,275041.0,276926.0,279027.0,281382.0,283451.0,284911.0,285775.0,285581.0,286515.0,288037.0,290293.0,292695.0,296167,300016,303667,306043,308544,310974,313489,317107,317771,319441,319660,322196,323484,324546,325657,326509,327264,328100,328739,329041,330673,331039,331707,330452,329505,328163
1,112345,Maryvale,Phoenix,AZ,Phoenix-Mesa-Scottsdale,Maricopa County,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,93142.0,91143.0,89169.0,85621.0,82646.0,80302.0,78499.0,76805.0,75814.0,74971.0,74782.0,74601.0,74940.0,75261.0,75518.0,75513.0,75155.0,74784.0,73739.0,72864.0,71219.0,70035.0,68386.0,66910.0,64900.0,63244.0,62180.0,61271.0,60624.0,59802.0,59648.0,59656.0,59828.0,59641.0,59580.0,60116.0,61015.0,62100.0,63180.0,64856.0,66937.0,68712.0,70307.0,71954.0,74286.0,76799.0,79316.0,80864.0,82379.0,83596.0,85064.0,86362.0,87436.0,89557.0,92070.0,94389.0,96073.0,97611.0,98857.0,100727.0,102504.0,104510.0,105336.0,105837.0,106513.0,107221.0,107647.0,108152.0,108449.0,109097.0,110002.0,111114.0,112037.0,113289.0,115817.0,118153.0,120192.0,121405.0,123262.0,125066.0,127203.0,128755.0,130331.0,131769.0,133768.0,135118.0,135764.0,136786.0,138007.0,139611.0,140260.0,141336.0,142414.0,143990.0,145301.0,146310.0,146970.0,147915.0,149134.0,150516.0,151930.0,153428.0,155307.0,156464.0,157695.0,158878.0,160498.0,161964.0,163502,165164,166751,168103,169533,170751,171732,172551,173723,175089,176408,177554,178156,178848,179811,180939,181786,182690,183800,185357,186598,187631,188551,189799,191546,193150
2,192689,Paradise,Las Vegas,NV,Las Vegas-Henderson-Paradise,Clark County,3,140161.0,140431.0,140572.0,140837.0,141260.0,141735.0,142145.0,142316.0,142629.0,142901.0,143374.0,143602.0,144100.0,144378.0,144656.0,144747.0,144664.0,144748.0,144949.0,145489.0,145953.0,146638.0,147260.0,147925.0,148188.0,148379.0,148853.0,149190.0,149580.0,149623.0,149820.0,149993.0,150059.0,149890.0,149718.0,149749.0,150031.0,150491.0,151105.0,152291.0,153278.0,154246.0,154823.0,155549.0,156202.0,156869.0,157378.0,157962.0,158666.0,159392.0,159991.0,160303.0,160724.0,161314.0,162063.0,162884.0,163649.0,164505.0,165577.0,166622.0,167540.0,168418.0,169362.0,170370.0,171461.0,172533.0,173531.0,174215.0,175050.0,175849.0,176606.0,177255.0,178084.0,178806.0,179425.0,179925.0,180568.0,181281.0,182457.0,183806.0,185292.0,186692.0,188121.0,189131.0,190010.0,190955.0,192140.0,193563.0,195446.0,198265.0,201645.0,205607.0,209237.0,213364.0,216866.0,222025.0,226357.0,233047.0,239996.0,248087.0,254388.0,259330.0,262925.0,266453.0,270890.0,275015.0,281111.0,285393.0,291842.0,294230.0,296881.0,297712.0,300641.0,303169.0,305781.0,308011.0,309490.0,311359.0,311721.0,313371.0,314332.0,316257.0,316877.0,317376.0,316683.0,316627.0,315892.0,315432.0,314799.0,314241.0,313845.0,312518.0,309317.0,306017.0,303167.0,300306.0,297166.0,293326.0,289453.0,283009.0,275609.0,266926.0,258080.0,249552.0,242976.0,237835.0,231935.0,226213.0,219939.0,213232.0,206112.0,200111.0,195400.0,190220.0,185281.0,180146.0,174823.0,168283.0,161791.0,155445.0,150480.0,146677.0,144061.0,141719.0,139354.0,138324.0,137510.0,136679.0,135429.0,134101.0,133799.0,133245.0,132806.0,131514.0,130242.0,129305.0,128168.0,126228.0,124064.0,122191.0,120737.0,119913.0,118170.0,117072.0,115281.0,114710.0,114002.0,113320.0,112820.0,112703.0,112630.0,112411.0,112601.0,113014.0,114268.0,115230.0,117032.0,118602.0,120364.0,122299.0,124469.0,127060.0,129997.0,133849.0,137342.0,141229.0,145080.0,149447.0,153242.0,156912.0,159917.0,162683.0,164463.0,166271.0,168053.0,169405.0,170317.0,170358.0,170357.0,170845.0,171257.0,171037.0,171894.0,172934.0,175311.0,176520.0,177721.0,178417.0,179481.0,181088.0,182238.0,182461.0,183317.0,184856.0,186294.0,187003.0,188260.0,190089.0,191882.0,192810.0,194235.0,195447.0,196866.0,198191.0,199337.0,200454.0,200550.0,201335.0,200489.0,200955.0,201468.0,204631.0,207089.0,209649.0,212482.0,216076.0,219148.0,222599.0,226745.0,231362.0,235703.0,238703,241984,244015,246531,248990,252386,255916,259829,262670,264374,264735,266031,267194,268417,268583,269233,269753,269298,268368,267491,267232,267681,268660,269391,270464,271219
3,270958,Upper West Side,New York,NY,New York-Newark-Jersey City,New York County,4,245906.0,246768.0,247673.0,247748.0,247878.0,247972.0,248829.0,249759.0,250912.0,252215.0,254551.0,256782.0,258286.0,260071.0,261957.0,265261.0,268195.0,271712.0,274070.0,277397.0,280839.0,284753.0,287557.0,290351.0,293226.0,295641.0,298423.0,301059.0,304851.0,307884.0,312043.0,314914.0,317937.0,320018.0,322526.0,325199.0,327798.0,330740.0,333673.0,337746.0,342064.0,347639.0,352029.0,355917.0,358641.0,361958.0,366811.0,371884.0,378554.0,385621.0,393432.0,400140.0,406010.0,411138.0,416474.0,423037.0,430007.0,437900.0,445180.0,453345.0,459236.0,463834.0,466855.0,470423.0,473940.0,478331.0,483978.0,490172.0,497360.0,503012.0,508135.0,512123.0,517493.0,522671.0,526501.0,527701.0,529404.0,532587.0,537482.0,542774.0,547566.0,553108.0,559061.0,565048.0,571670.0,580789.0,595148.0,613955.0,631150.0,642134.0,647466.0,652251.0,658104.0,662758.0,668315.0,673135.0,681607.0,687488.0,692894.0,698580.0,708096.0,725074.0,739652.0,750679.0,756664.0,766420.0,777162.0,795641.0,808862.0,822658.0,833201.0,842843.0,854205.0,862373.0,881553.0,900663.0,917850.0,935743.0,948213.0,953958.0,949438.0,952338.0,961439.0,967226.0,962670.0,947963.0,932288.0,922763.0,928955.0,926525.0,925533.0,918867.0,922233.0,925694.0,921276.0,922833.0,919096.0,920558.0,922979.0,924692.0,924841.0,923397.0,925455.0,926719.0,928340.0,916244.0,910409.0,902098.0,912825.0,932411.0,940967.0,945022.0,937981.0,940084.0,934790.0,933535.0,932310.0,936338.0,930004.0,912992.0,888972.0,863921.0,847233.0,838987.0,832080.0,824327.0,817143.0,813658.0,813272.0,814536.0,820852.0,837348.0,850222.0,859881.0,865460.0,865879.0,869395.0,872926.0,882495.0,889728.0,890055.0,889906.0,891243.0,891234.0,893806.0,895201.0,898843.0,904088.0,908383.0,910801.0,910168.0,908586.0,911345.0,915070.0,918409.0,920314.0,920770.0,924378.0,929435.0,931672.0,934356.0,938181.0,944255.0,950718.0,957154.0,963040.0,968487.0,974986.0,985167.0,994888.0,1003234.0,1012520.0,1024212.0,1038057.0,1050175.0,1063291.0,1073726.0,1086624.0,1098276.0,1109348.0,1118064.0,1126099.0,1134356.0,1140892.0,1144709.0,1147599.0,1152773.0,1156827.0,1161746.0,1163602.0,1169836.0,1175041.0,1183134.0,1188202.0,1194851.0,1202355.0,1208343.0,1213481.0,1214516.0,1220841.0,1225297.0,1230949.0,1230428.0,1228430.0,1225769.0,1224043.0,1223065.0,1220843.0,1223699.0,1226270.0,1224053.0,1213099.0,1210904.0,1216204.0,1223703.0,1237045.0,1240633.0,1251178.0,1250918.0,1256483.0,1259505.0,1258335,1279370,1295382,1311449,1303787,1302492,1300615,1313804,1319589,1322167,1307855,1286508,1279195,1266138,1265946,1253789,1252953,1247405,1239607,1225043,1211726,1204926,1207091,1214088,1212560,1205744
4,118208,South Los Angeles,Los Angeles,CA,Los Angeles-Long Beach-Anaheim,Los Angeles County,5,133824.0,134281.0,134569.0,134741.0,134761.0,134855.0,134823.0,134772.0,134620.0,133993.0,133537.0,133180.0,133258.0,133094.0,133177.0,133641.0,134399.0,134998.0,135590.0,136086.0,136852.0,138153.0,139713.0,140942.0,141315.0,141543.0,141608.0,141169.0,140826.0,140771.0,141151.0,141619.0,141891.0,142393.0,142738.0,143293.0,143767.0,144334.0,145245.0,146545.0,147362.0,147871.0,148234.0,148966.0,149685.0,150360.0,150848.0,151408.0,152087.0,153004.0,153683.0,154274.0,154877.0,155735.0,156367.0,157023.0,157570.0,158123.0,158791.0,159504.0,160503.0,161205.0,162214.0,163027.0,164241.0,165165.0,166564.0,167747.0,169298.0,170688.0,172378.0,173989.0,175920.0,177803.0,179941.0,182211.0,184457.0,186789.0,189234.0,191996.0,194630.0,197034.0,199346.0,202125.0,204451.0,207244.0,209532.0,212831.0,216249.0,220018.0,223131.0,227272.0,231720.0,237427.0,242505.0,247830.0,253795.0,260624.0,267455.0,273627.0,280064.0,287456.0,295125.0,301210.0,307539.0,313233.0,321273.0,329021.0,337420.0,344563.0,352713.0,361626.0,370773.0,379955.0,389634.0,398646.0,406500.0,413662.0,419300.0,425128.0,429290.0,435180.0,440188.0,444774.0,448485.0,450056.0,451889.0,454055.0,457350.0,458937.0,459914.0,460519.0,463149.0,463630.0,463655.0,461801.0,459619.0,455792.0,450305.0,444964.0,439645.0,435634.0,429685.0,420658.0,408186.0,394288.0,382070.0,369381.0,355977.0,343198.0,331147.0,319229.0,305518.0,294154.0,285556.0,278581.0,271364.0,265493.0,259461.0,255183.0,252651.0,251843.0,251471.0,251668.0,252494.0,250882.0,247500.0,246340.0,247392.0,250137.0,250095.0,249528.0,247967.0,246898.0,246024.0,244303.0,243523.0,243651.0,244988.0,244141.0,242669.0,239470.0,239420.0,239479.0,239930.0,239255.0,237994.0,238239.0,238199.0,238893.0,238891.0,239496.0,240227.0,241820.0,243208.0,244865.0,246516.0,248692.0,251989.0,254805.0,258599.0,261571.0,266571.0,271494.0,277234.0,281765.0,286644.0,291431.0,297005.0,301691.0,304978.0,307803.0,309618.0,312017.0,312408.0,313302.0,313843.0,315449.0,316308.0,318414.0,319286.0,320607.0,321685.0,324176.0,326936.0,330256.0,334105.0,338106.0,340539.0,343254.0,345698.0,347595.0,349782.0,352190.0,355526.0,358434.0,361799.0,364017.0,366440.0,368169.0,372060.0,375392.0,379387.0,381724.0,385127.0,389188.0,394277.0,397709.0,400823.0,404977.0,408881.0,413124.0,415900.0,419798.0,423605.0,428966.0,433971.0,438533.0,442148.0,447628.0,452869.0,458011,462964,468226,473104,476669,480066,483292,486104,488525,490937,493257,495172,496902,496563,495977,495845,497673,499663,501626,504167,507464,511394,515256,520109,525489,533290


In [350]:
# drop all neighorhoods with NaN
housing_data.dropna(inplace=True)

# clean the table by filtering only for city of Chicago
housing_data = housing_data[housing_data['City'].str.contains('Chicago')]
housing_data.head()

Unnamed: 0,RegionID,RegionName,City,State,Metro,CountyName,SizeRank,1996-04,1996-05,1996-06,1996-07,1996-08,1996-09,1996-10,1996-11,1996-12,1997-01,1997-02,1997-03,1997-04,1997-05,1997-06,1997-07,1997-08,1997-09,1997-10,1997-11,1997-12,1998-01,1998-02,1998-03,1998-04,1998-05,1998-06,1998-07,1998-08,1998-09,1998-10,1998-11,1998-12,1999-01,1999-02,1999-03,1999-04,1999-05,1999-06,1999-07,1999-08,1999-09,1999-10,1999-11,1999-12,2000-01,2000-02,2000-03,2000-04,2000-05,2000-06,2000-07,2000-08,2000-09,2000-10,2000-11,2000-12,2001-01,2001-02,2001-03,2001-04,2001-05,2001-06,2001-07,2001-08,2001-09,2001-10,2001-11,2001-12,2002-01,2002-02,2002-03,2002-04,2002-05,2002-06,2002-07,2002-08,2002-09,2002-10,2002-11,2002-12,2003-01,2003-02,2003-03,2003-04,2003-05,2003-06,2003-07,2003-08,2003-09,2003-10,2003-11,2003-12,2004-01,2004-02,2004-03,2004-04,2004-05,2004-06,2004-07,2004-08,2004-09,2004-10,2004-11,2004-12,2005-01,2005-02,2005-03,2005-04,2005-05,2005-06,2005-07,2005-08,2005-09,2005-10,2005-11,2005-12,2006-01,2006-02,2006-03,2006-04,2006-05,2006-06,2006-07,2006-08,2006-09,2006-10,2006-11,2006-12,2007-01,2007-02,2007-03,2007-04,2007-05,2007-06,2007-07,2007-08,2007-09,2007-10,2007-11,2007-12,2008-01,2008-02,2008-03,2008-04,2008-05,2008-06,2008-07,2008-08,2008-09,2008-10,2008-11,2008-12,2009-01,2009-02,2009-03,2009-04,2009-05,2009-06,2009-07,2009-08,2009-09,2009-10,2009-11,2009-12,2010-01,2010-02,2010-03,2010-04,2010-05,2010-06,2010-07,2010-08,2010-09,2010-10,2010-11,2010-12,2011-01,2011-02,2011-03,2011-04,2011-05,2011-06,2011-07,2011-08,2011-09,2011-10,2011-11,2011-12,2012-01,2012-02,2012-03,2012-04,2012-05,2012-06,2012-07,2012-08,2012-09,2012-10,2012-11,2012-12,2013-01,2013-02,2013-03,2013-04,2013-05,2013-06,2013-07,2013-08,2013-09,2013-10,2013-11,2013-12,2014-01,2014-02,2014-03,2014-04,2014-05,2014-06,2014-07,2014-08,2014-09,2014-10,2014-11,2014-12,2015-01,2015-02,2015-03,2015-04,2015-05,2015-06,2015-07,2015-08,2015-09,2015-10,2015-11,2015-12,2016-01,2016-02,2016-03,2016-04,2016-05,2016-06,2016-07,2016-08,2016-09,2016-10,2016-11,2016-12,2017-01,2017-02,2017-03,2017-04,2017-05,2017-06,2017-07,2017-08,2017-09,2017-10,2017-11,2017-12,2018-01,2018-02,2018-03,2018-04,2018-05,2018-06,2018-07,2018-08,2018-09,2018-10,2018-11,2018-12,2019-01,2019-02,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,2019-11,2019-12,2020-01,2020-02
51,269592,Logan Square,Chicago,IL,Chicago-Naperville-Elgin,Cook County,52,141259.0,141514.0,141951.0,142616.0,143719.0,144590.0,144948.0,145248.0,145602.0,146852.0,148034.0,148809.0,148508.0,149142.0,149668.0,150311.0,149586.0,148804.0,147830.0,147206.0,146804.0,145950.0,145373.0,145459.0,147118.0,148435.0,149327.0,149489.0,151244.0,154021.0,158735.0,162962.0,166705.0,168883.0,170759.0,172089.0,173533.0,174824.0,177015.0,179488.0,181100.0,181882.0,181938.0,182710.0,184024.0,186622.0,189531.0,192911.0,195622.0,198459.0,201137.0,204388.0,207591.0,210796.0,213573.0,217096.0,220499.0,223620.0,225776.0,228042.0,230517.0,233462.0,236148.0,238481.0,241334.0,244070.0,247162.0,249175.0,250588.0,252683.0,255664.0,259392.0,261860.0,263747.0,265291.0,267561.0,269573.0,272341.0,274866.0,277232.0,280124.0,282144.0,284495.0,285419.0,287897.0,290200.0,293076.0,295378.0,297928.0,300033.0,302203.0,304487.0,306605.0,308871.0,310785.0,313434.0,316269.0,319815.0,323557.0,326753.0,329222.0,331184.0,333673.0,336721.0,339999.0,343452.0,346993.0,350453.0,353423.0,356265.0,358891.0,362157.0,366085.0,371253.0,375913.0,380111.0,383577.0,386746.0,389576.0,391908.0,394204.0,395701.0,397477.0,398650.0,400247.0,400661.0,400967.0,400840.0,401642.0,402129.0,403795.0,403158.0,404401.0,402710.0,402464.0,399738.0,399013.0,397286.0,397493.0,397301.0,397140.0,397551.0,393603.0,393778.0,386899.0,386949.0,381029.0,381260.0,375537.0,373937.0,368546.0,365050.0,357718.0,350768.0,345890.0,341570.0,339993.0,335815.0,333827.0,331438.0,330934.0,327871.0,325756.0,322386.0,323523.0,323422.0,325654.0,322474.0,322440.0,319275.0,319899.0,314587.0,312254.0,307993.0,305452.0,302102.0,299525.0,297360.0,293991.0,292163.0,289552.0,288551.0,285522.0,286582.0,284787.0,285203.0,283306.0,284097.0,281164.0,277966.0,273478.0,273504.0,273340.0,274133.0,273299.0,271003.0,270844.0,270935.0,275034.0,277832.0,281506.0,286270.0,291405.0,294626.0,296833.0,298800.0,302967.0,309193.0,314967.0,318726.0,318293.0,317027.0,316916.0,316795.0,317803.0,318238.0,319394.0,322042.0,324785.0,325992.0,325645.0,327366.0,330160.0,333113.0,335703.0,338581.0,340563.0,341937.0,343584.0,344553.0,343919.0,343632.0,343993.0,344396.0,344965.0,345876.0,348576.0,350847.0,355149.0,358358.0,361862.0,363832.0,366972.0,370084.0,373482.0,375577.0,378407.0,381458.0,381514.0,380890.0,379355.0,381140.0,382233.0,383386.0,384120.0,384682.0,385054.0,385444.0,385321.0,385144.0,386867.0,390942,394214,394610,393703,393775,394185,393824,393721,394533,395907,397394,398907,398271,396293,394370,393922,394111,393995,394546,395005,395171,395175,395242,395091,395760,398122
103,403169,West Rogers Park,Chicago,IL,Chicago-Naperville-Elgin,Cook County,104,134485.0,134536.0,134817.0,134843.0,135215.0,135967.0,136911.0,138087.0,139057.0,139868.0,140404.0,141007.0,141110.0,141897.0,142254.0,142839.0,142065.0,140488.0,138303.0,136224.0,134878.0,134318.0,134250.0,134673.0,134975.0,135044.0,135032.0,135372.0,137427.0,140308.0,144023.0,147079.0,149531.0,150825.0,151912.0,152728.0,154084.0,155426.0,157771.0,159887.0,161266.0,161932.0,162382.0,163618.0,165163.0,167815.0,170331.0,172869.0,174882.0,177141.0,179011.0,181257.0,183438.0,185864.0,188412.0,190964.0,193486.0,195464.0,197325.0,199354.0,201677.0,204099.0,206549.0,208786.0,211240.0,213449.0,215557.0,217521.0,219412.0,221326.0,223272.0,225544.0,227446.0,229339.0,230770.0,232637.0,234712.0,237619.0,240374.0,242671.0,244700.0,246615.0,249025.0,250985.0,252915.0,254194.0,255334.0,256451.0,257529.0,258393.0,259600.0,261068.0,262822.0,264595.0,266285.0,268240.0,270665.0,273249.0,276320.0,278506.0,280579.0,282328.0,284373.0,286590.0,288826.0,291527.0,294175.0,296525.0,298651.0,301518.0,304738.0,309045.0,313307.0,317543.0,320647.0,323276.0,325749.0,327426.0,329071.0,330700.0,333281.0,334759.0,335843.0,335745.0,335878.0,336018.0,336472.0,337153.0,338403.0,340215.0,342629.0,342471.0,341803.0,339303.0,339197.0,337704.0,337358.0,335133.0,334974.0,334514.0,333871.0,333370.0,329041.0,328751.0,323377.0,323462.0,317212.0,315792.0,310055.0,308742.0,303285.0,299196.0,292213.0,285316.0,280599.0,275973.0,274148.0,270224.0,267913.0,265177.0,263735.0,260585.0,258853.0,256726.0,258048.0,258204.0,258899.0,254723.0,253558.0,249749.0,250374.0,246102.0,244497.0,241274.0,239721.0,236323.0,232029.0,227595.0,223500.0,221762.0,218856.0,217937.0,214857.0,214783.0,213268.0,212170.0,209428.0,208599.0,207275.0,207404.0,206919.0,209712.0,211278.0,212185.0,211150.0,210077.0,209664.0,210432.0,211958.0,213647.0,215289.0,217439.0,219054.0,218298.0,218189.0,218967.0,222441.0,226044.0,229463.0,231895.0,234941.0,236251.0,238262.0,238882.0,240989.0,242410.0,244145.0,246645.0,249128.0,250658.0,251352.0,252721.0,253879.0,255791.0,257211.0,258617.0,259142.0,260755.0,263108.0,264621.0,264923.0,265528.0,267030.0,268016.0,268378.0,268888.0,270449.0,272608.0,275897.0,278140.0,279373.0,279809.0,281001.0,283040.0,284420.0,285731.0,288133.0,290842.0,292644.0,293639.0,294003.0,295566.0,296542.0,297747.0,297327.0,296834.0,296556.0,297564.0,297145.0,296577.0,296236.0,297514,298104,297879,297890,298552,299755,300472,301021,301055,301040,301631,302076,301720,300873,300124,299731,299291,299081,299126,299365,299501,300225,300611,301255,302025,303728
150,269566,Albany Park,Chicago,IL,Chicago-Naperville-Elgin,Cook County,151,125085.0,125047.0,124995.0,124285.0,124306.0,124571.0,125658.0,126910.0,128310.0,129572.0,130721.0,131425.0,131362.0,131904.0,132600.0,133975.0,134308.0,133644.0,131797.0,129868.0,128535.0,127623.0,127074.0,127245.0,127583.0,128696.0,129467.0,130890.0,132389.0,135400.0,138547.0,141602.0,143377.0,145038.0,146656.0,148034.0,149534.0,150244.0,151834.0,153148.0,154754.0,155438.0,156303.0,157540.0,159572.0,161925.0,164140.0,166452.0,169049.0,171858.0,174685.0,177241.0,179802.0,182687.0,186558.0,190873.0,195205.0,198620.0,201661.0,204290.0,206639.0,209298.0,211679.0,214566.0,217130.0,219058.0,219987.0,220509.0,220881.0,221563.0,223004.0,225656.0,227697.0,229597.0,231111.0,233366.0,235965.0,238972.0,241840.0,244139.0,246795.0,249861.0,252359.0,253313.0,254560.0,255709.0,257580.0,258330.0,259098.0,259740.0,260885.0,262531.0,263713.0,265072.0,266566.0,269352.0,273029.0,276566.0,279573.0,281982.0,284368.0,286905.0,289541.0,292310.0,295733.0,298907.0,302316.0,304566.0,306358.0,308894.0,312003.0,316003.0,319916.0,324268.0,328049.0,331757.0,334593.0,336921.0,338610.0,340430.0,343086.0,344064.0,344723.0,344708.0,345189.0,344688.0,344169.0,343228.0,343077.0,343146.0,344553.0,344809.0,344834.0,343391.0,342586.0,339753.0,338746.0,337927.0,339398.0,339966.0,339759.0,339391.0,335377.0,334387.0,328262.0,327574.0,322192.0,322354.0,317366.0,315119.0,308794.0,304900.0,298679.0,292935.0,287977.0,283222.0,280777.0,276531.0,274171.0,271913.0,270575.0,267470.0,264760.0,261573.0,261656.0,261095.0,262467.0,259115.0,258725.0,255637.0,256192.0,251330.0,249869.0,246647.0,245890.0,242597.0,239647.0,237315.0,235469.0,235112.0,232307.0,230467.0,226945.0,226439.0,223820.0,222471.0,219674.0,219736.0,217971.0,216073.0,212284.0,211864.0,212322.0,213554.0,214139.0,214466.0,215862.0,216912.0,218366.0,219954.0,222284.0,225533.0,229336.0,231329.0,233751.0,235588.0,238133.0,240873.0,243885.0,247221.0,251669.0,254132.0,256532.0,257399.0,259280.0,261109.0,262681.0,265931.0,268841.0,270518.0,270685.0,271986.0,272625.0,273149.0,273337.0,273948.0,274364.0,275029.0,276003.0,276690.0,277338.0,279497.0,281486.0,282373.0,282058.0,282785.0,285600.0,288856.0,292664.0,294667.0,296817.0,298431.0,300315.0,301718.0,303510.0,305258.0,308131.0,310926.0,311098.0,311153.0,310587.0,313251.0,315219.0,316580.0,317407.0,318396.0,319996.0,321618.0,323097.0,324243.0,326049.0,328423,330638,331157,331156,330187,329141,327208,326075,325785,325353,324923,323142,320938,319396,318869,318624,318815,319106,319850,319485,318380,317131,315935,315796,315664,315109
155,269609,Uptown,Chicago,IL,Chicago-Naperville-Elgin,Cook County,156,117334.0,117428.0,117918.0,118280.0,119152.0,120073.0,121043.0,122065.0,122938.0,123857.0,124751.0,125863.0,126282.0,127107.0,127161.0,127341.0,126442.0,125661.0,124896.0,124558.0,124799.0,125487.0,126419.0,127633.0,129120.0,130466.0,132066.0,133930.0,136757.0,139709.0,142868.0,145343.0,147461.0,148715.0,150219.0,151406.0,153039.0,154670.0,157188.0,159966.0,162388.0,164358.0,166264.0,168635.0,171034.0,174014.0,176745.0,179493.0,182017.0,184766.0,187264.0,189348.0,191549.0,194188.0,197208.0,200351.0,203438.0,206399.0,209203.0,212509.0,215635.0,218784.0,221666.0,224735.0,227894.0,230450.0,232820.0,234969.0,236902.0,238535.0,240085.0,241824.0,243287.0,244874.0,245930.0,247253.0,248375.0,250244.0,251603.0,252677.0,253970.0,255684.0,257835.0,259236.0,260509.0,261153.0,262242.0,263319.0,264528.0,265403.0,266665.0,267978.0,269375.0,270744.0,271952.0,273307.0,275000.0,276720.0,278417.0,279834.0,281481.0,283132.0,284967.0,286965.0,288740.0,290119.0,291343.0,292667.0,294383.0,296524.0,298729.0,300724.0,302574.0,304708.0,306458.0,308177.0,309840.0,311331.0,313034.0,314294.0,315803.0,316778.0,317730.0,318147.0,318342.0,318260.0,318388.0,318160.0,318300.0,318826.0,320100.0,319424.0,319065.0,316874.0,316618.0,314870.0,314807.0,313394.0,313142.0,312605.0,312395.0,312673.0,309579.0,310808.0,306290.0,306576.0,300864.0,300623.0,296538.0,295872.0,292551.0,291082.0,286890.0,282086.0,278621.0,274261.0,273614.0,271115.0,271732.0,271585.0,272460.0,271470.0,271149.0,269806.0,271154.0,271406.0,273013.0,270844.0,271062.0,268783.0,268974.0,264323.0,261999.0,258755.0,257327.0,254453.0,251384.0,248833.0,246641.0,246077.0,243644.0,242611.0,239836.0,239868.0,238238.0,237423.0,233754.0,231841.0,228929.0,227417.0,224003.0,222981.0,222055.0,221917.0,220835.0,219407.0,218732.0,218269.0,219237.0,220750.0,222441.0,224483.0,227184.0,228642.0,230477.0,231504.0,234164.0,237193.0,240501.0,242650.0,246307.0,247806.0,250070.0,250552.0,252193.0,253706.0,254238.0,255515.0,256382.0,256676.0,256263.0,257114.0,257311.0,258318.0,258825.0,259746.0,259700.0,259800.0,260758.0,261889.0,262229.0,262815.0,263312.0,263843.0,263749.0,263865.0,264602.0,265436.0,267523.0,268741.0,270019.0,270538.0,271640.0,273308.0,275152.0,276489.0,278292.0,280133.0,281439.0,282352.0,282527.0,284302.0,285761.0,286884.0,287310.0,287468.0,288004.0,288891.0,289006.0,288955.0,289678.0,293097,295973,296662,295505,295058,295503,295637,295572,295984,296800,298383,299007,297625,296016,295300,296360,297262,297666,298254,298744,298807,298880,298508,298629,299106,300138
159,269589,Lake View,Chicago,IL,Chicago-Naperville-Elgin,Cook County,160,219878.0,220418.0,221375.0,221533.0,222385.0,223906.0,225952.0,228001.0,229617.0,231756.0,233679.0,236259.0,236956.0,238443.0,238872.0,240308.0,240341.0,239316.0,237330.0,235693.0,235601.0,235458.0,236170.0,236923.0,238793.0,240177.0,242445.0,245266.0,249220.0,253834.0,259330.0,264570.0,268659.0,271583.0,273950.0,276200.0,278870.0,281770.0,285344.0,289013.0,292278.0,295248.0,298041.0,301292.0,304510.0,308931.0,313559.0,318364.0,322394.0,326580.0,330321.0,333786.0,337205.0,340349.0,343217.0,346139.0,349686.0,353190.0,355993.0,358773.0,361628.0,364740.0,367855.0,370534.0,373172.0,375491.0,378357.0,380994.0,382969.0,384474.0,386340.0,388923.0,390688.0,392118.0,392501.0,393964.0,395770.0,398527.0,400582.0,402113.0,403903.0,405950.0,408271.0,409820.0,411435.0,412175.0,413180.0,414000.0,414838.0,415727.0,416971.0,418554.0,420013.0,421660.0,423189.0,424652.0,426975.0,429791.0,433433.0,436085.0,438959.0,440814.0,443009.0,445253.0,447807.0,449920.0,451826.0,453674.0,455567.0,458130.0,460718.0,463673.0,466117.0,469670.0,472827.0,475913.0,478385.0,480692.0,483521.0,485864.0,487290.0,487237.0,487173.0,487535.0,488170.0,487989.0,488265.0,488821.0,490569.0,492410.0,495048.0,494685.0,491754.0,486078.0,482965.0,480913.0,481793.0,481214.0,482477.0,482547.0,482584.0,483742.0,479735.0,482743.0,486782.0,499713.0,503304.0,506076.0,501326.0,502572.0,498260.0,496908.0,490271.0,483444.0,479161.0,473885.0,470469.0,463343.0,460664.0,459635.0,461407.0,459430.0,459493.0,457210.0,460615.0,461105.0,464543.0,461114.0,461361.0,456753.0,457545.0,451575.0,449009.0,444263.0,442435.0,438710.0,435857.0,433978.0,433270.0,433144.0,430213.0,429463.0,426126.0,428678.0,427311.0,428310.0,423311.0,423298.0,419576.0,417821.0,410998.0,411356.0,411419.0,413655.0,412594.0,410444.0,410922.0,411505.0,415653.0,418709.0,421766.0,426946.0,432659.0,436618.0,440596.0,443956.0,450548.0,456578.0,462731.0,466363.0,472662.0,475411.0,480426.0,481267.0,485034.0,485643.0,484693.0,484505.0,485616.0,486478.0,485660.0,487887.0,489040.0,491474.0,492302.0,493058.0,492420.0,492978.0,496681.0,500230.0,501116.0,501691.0,502530.0,502733.0,501820.0,500995.0,502282.0,504498.0,509564.0,512954.0,514811.0,514701.0,515628.0,518672.0,521124.0,522670.0,524487.0,527618.0,528942.0,529633.0,528578.0,530806.0,532293.0,533632.0,533719.0,533203.0,534217.0,535580.0,535520.0,534408.0,534955.0,539536,542828,542584,541118,540684,541531,540659,540277,540436,541889,543739,544709,542125,539248,537386,537536,537568,536979,537465,536931,536027,534810,533608,532618,532542,533551


In [351]:
# let's take only the current date into account
housing_data = housing_data[['RegionName', '2020-02']].reset_index(drop=True)

# Rename column "RegionName" which is exactly the same as "Neighborhood"
housing_data.rename(columns={'RegionName': 'Neighborhood'}, inplace=True)

housing_data.head()

Unnamed: 0,Neighborhood,2020-02
0,Logan Square,398122
1,West Rogers Park,303728
2,Albany Park,315109
3,Uptown,300138
4,Lake View,533551


In [352]:
# merge housing prices to top3 list and sort in ascending order
df_top3 = df_top5.merge(housing_data, on='Neighborhood')
df_top3

Unnamed: 0,Neighborhood,Community Area Number,Hardship Index,Count,Count per Year,2020-02
0,Forest Glen,12.0,11.0,2584,608.0,363921
1,Archer Heights,57.0,67.0,4470,1051.764706,213038
2,McKinley Park,59.0,61.0,5032,1184.0,235814
3,West Elsdon,62.0,69.0,5223,1228.941176,203498
4,Avalon Park,45.0,41.0,6670,1569.411765,122019


In [353]:
# clean dataframe
df_top3 = df_top3[['Neighborhood', 'Hardship Index', 'Count per Year', '2020-02']].reset_index(drop=True)
# Rename columns
df_top3.rename(columns={'Count per Year': 'Crimes per Year', '2020-02': 'Current avg. Housing Prices'}, inplace=True)
df_top3

Unnamed: 0,Neighborhood,Hardship Index,Crimes per Year,Current avg. Housing Prices
0,Forest Glen,11.0,608.0,363921
1,Archer Heights,67.0,1051.764706,213038
2,McKinley Park,61.0,1184.0,235814
3,West Elsdon,69.0,1228.941176,203498
4,Avalon Park,41.0,1569.411765,122019


## Results and Discussion <a name="results"></a>

Finally, the last step in my analysis shows a high relationship between socioeconomic data, crime data and the average housing prices for Forest Glen. 

__The final decision for a new neighborhood can now be based on the financial resources available. If a budget of more than USD 360k is not a problem, than Forest Glen would be the best fit for someone moving from East Harlem, NY to Chicago.__