# Community Finder 

## Introduction/Business Problem 
This project will show new renters/home buyers where they should move to so that they are closer to the in person communities they are interested in. We will do a test run with a major US city in this notebook, Chicago. 

Looking for a place to live while being out in a new city can be a hastle if you don't know where to look. This community finder project hopes to solve this ***problem*** and can be something that allows the everyday American to easily plan their home finding experience.

## Data Description
We will take into account a number of geographical and socioeconomic factors to figure out where certain communities are more likely to be. Only three categories will be presented to the user in this project, recreation, foodie, and high end living. Data for each city being considered will be pulled from the SquareSpace API.

The Data used will include restaurants, recreational activities, housing prices, and high end shopping centers. A zipcodes proximity to these factors will give it a categorization that indicates whether or not it's the right place for what you want to do.

## Methodology
We will be looking to use a k means clustering algorithm to determine 'hot spots' for the three categories in each city. Based on the number of weighted data points in each of those hot spots, we will rank them from highest to lowest (up to 5) for likeliness of being a perfect fit. 

Low, High, and Middle income budget categories will only be taken into account as an added feature if time allows it. For now, we will assume that budget is not a factor.

In [1]:
# import and install necessary libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#pip install geopy # uncomment this line and run pip install in a seperate cell if you haven't already installed geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#pip install folium # uncomment this line and run pip install in a seperate cell if you haven't already installed folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
#pip install folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 1.8 MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Note: you may need to restart the kernel to use updated packages.


## Chicago

In [2]:
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials_1 = {
    'IAM_SERVICE_ID': 'iam-ServiceId-857d4347-d8ac-45a9-9b70-b9cf29e304e7',
    'IBM_API_KEY_ID': 'mAlTh4V9aCV4AHlejl2U8M8W-oDqE_fB8eoHvLU29jZ1',
    'ENDPOINT': 'https://s3-api.us-geo.objectstorage.service.networklayer.com',
    'IBM_AUTH_ENDPOINT': 'https://iam.cloud.ibm.com/oidc/token',
    'BUCKET': 'chicagosupercarwatcher-donotdelete-pr-fjf4ffbmbg1alx',
    'FILE': 'us-zip-code-latitude-and-longitude.csv'
}
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_7346117ecdb04c0d9cc0f6d7395eb026 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='mAlTh4V9aCV4AHlejl2U8M8W-oDqE_fB8eoHvLU29jZ1',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_7346117ecdb04c0d9cc0f6d7395eb026.get_object(Bucket='chicagosupercarwatcher-donotdelete-pr-fjf4ffbmbg1alx',Key='us-zip-code-latitude-and-longitude.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_chicago = pd.read_csv(body, sep = ';')
df_chicago.head()


Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,60401,Beecher,IL,41.350484,-87.62408,-6,1,"41.350484,-87.62408"
1,61761,Normal,IL,40.515485,-88.98629,-6,1,"40.515485,-88.98629"
2,60174,Saint Charles,IL,41.919808,-88.30498,-6,1,"41.919808,-88.30498"
3,60304,Oak Park,IL,41.87355,-87.7885,-6,1,"41.87355,-87.7885"
4,62706,Springfield,IL,39.79885,-89.653399,-6,1,"39.79885,-89.653399"


In [3]:
# Filter out every city that is not Chicago
df_chicago = df_chicago[df_chicago['City'].str.match("Chicago")].reset_index(drop = True)
df_chicago = df_chicago.drop([25,44, 77]).reset_index(drop = True)
df_chicago['Zip'] = df_chicago['Zip'].map(str)


df_chicago.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,60644,Chicago,IL,41.881331,-87.75671,-6,1,"41.881331,-87.75671"
1,60613,Chicago,IL,41.953256,-87.6629,-6,1,"41.953256,-87.6629"
2,60622,Chicago,IL,41.900332,-87.66927,-6,1,"41.900332,-87.66927"
3,60625,Chicago,IL,41.971614,-87.70256,-6,1,"41.971614,-87.70256"
4,60649,Chicago,IL,41.761734,-87.57072,-6,1,"41.761734,-87.57072"


In [4]:
# create latitude and longitude columns
df_chicago[['Latitude','Longitude']] = df_chicago.geopoint.str.split(",",expand=True,)
df_chicago = df_chicago.drop('geopoint', axis = 1)
df_chicago.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag
0,60644,Chicago,IL,41.881331,-87.75671,-6,1
1,60613,Chicago,IL,41.953256,-87.6629,-6,1
2,60622,Chicago,IL,41.900332,-87.66927,-6,1
3,60625,Chicago,IL,41.971614,-87.70256,-6,1
4,60649,Chicago,IL,41.761734,-87.57072,-6,1


In [5]:
df_chicago.shape

(85, 7)

In [6]:
# start loooking at incomes by area
df_income = pd.read_html('http://zipatlas.com/us/il/chicago/zip-code-comparison/median-household-income.htm')[11]
df_income = df_income.drop([0, 2, 3, 4, 6], axis=1)
df_income = df_income.rename(columns={1: "Zip", 5: "Average Income"})
df_income = df_income.drop([0])
df_income.head() 

Unnamed: 0,Zip,Average Income
1,60606,"$100,377.00"
2,60601,"$77,374.00"
3,60611,"$69,889.00"
4,60614,"$68,324.00"
5,60603,"$61,815.00"


In [9]:
df_chicago = pd.merge(df_income, df_chicago, on='Zip')
df_chicago.head()

In [10]:
df_chicago.head()

Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Timezone,Daylight savings time flag
0,60606,"$100,377.00",Chicago,IL,41.882582,-87.6376,-6,1
1,60601,"$77,374.00",Chicago,IL,41.886456,-87.62325,-6,1
2,60611,"$69,889.00",Chicago,IL,41.904667,-87.62504,-6,1
3,60614,"$68,324.00",Chicago,IL,41.922682,-87.65432,-6,1
4,60603,"$61,815.00",Chicago,IL,41.880446,-87.63014,-6,1


In [7]:
# install and import folium, requests
# pip install folium uncomment if not already installed
import folium
import requests
from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [8]:
address = "Chicago, IL"

geolocator = Nominatim(user_agent="chicago_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago are 41.8755616, -87.6244212.


## Create Map of Chicago

In [9]:
# create map of Toronto using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=12)
map_chicago


In [10]:
# add markers to the map 
for lat, lng, City, Zip in zip(df_chicago['Latitude'], df_chicago['Longitude'], df_chicago['City'], df_chicago['Zip']):
    label = '{}, {}'.format(City, Zip)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

## Foursquare API

In [11]:
# Define FourSquare Credentials and Version 
CLIENT_ID = 'J2YU1TCAFEADN3J05BUT2HTJNKCYBDNSOUUTNHILJIBJGPNW' # your Foursquare ID
CLIENT_SECRET = 'DDXEAPZHBMSV3C2LHCUQV2BEQ5AULDBCOAW51N04AKNEQBK0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: J2YU1TCAFEADN3J05BUT2HTJNKCYBDNSOUUTNHILJIBJGPNW
CLIENT_SECRET:DDXEAPZHBMSV3C2LHCUQV2BEQ5AULDBCOAW51N04AKNEQBK0


In [12]:
zip_name = df_chicago.loc[0, 'Zip']
print(f"The first Zip Code is '{zip_name}'.")

The first Zip Code is '60644'.


In [13]:
zip_latitude = df_chicago.loc[0, 'Latitude'] # neighborhood latitude value
zip_longitude = df_chicago.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(zip_name, 
                                                               zip_latitude, 
                                                               zip_longitude))

Latitude and longitude values of 60644 are 41.881331, -87.75671.


In [14]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    zip_latitude, 
    zip_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  from ipykernel import kernelapp as app


Unnamed: 0,name,categories,lat,lng
0,Seafood Junction,Seafood Restaurant,41.880618,-87.757804
1,MacArthur's Restaurant,Southern / Soul Food Restaurant,41.880611,-87.760757
2,U.S. Bank ATM,ATM,41.880268,-87.755462
3,Walgreens,Pharmacy,41.880545,-87.755885
4,Family Dollar,Discount Store,41.879126,-87.755373


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip Code', 
                  'Zip Code Latitude', 
                  'Zip Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
chicago_venues = getNearbyVenues(names=df_chicago['Zip'],
                                   latitudes=df_chicago['Latitude'],
                                   longitudes=df_chicago['Longitude']
                                  )

In [19]:
chicago_venues.head()

Unnamed: 0,Zip Code,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,60644,41.881331,-87.75671,Seafood Junction,41.880618,-87.757804,Seafood Restaurant
1,60644,41.881331,-87.75671,MacArthur's Restaurant,41.880611,-87.760757,Southern / Soul Food Restaurant
2,60644,41.881331,-87.75671,U.S. Bank ATM,41.880268,-87.755462,ATM
3,60644,41.881331,-87.75671,Walgreens,41.880545,-87.755885,Pharmacy
4,60644,41.881331,-87.75671,Family Dollar,41.879126,-87.755373,Discount Store


In [20]:
chicago_venues.groupby('Zip Code').count()

Unnamed: 0_level_0,Zip Code Latitude,Zip Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Zip Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
60601,100,100,100,100,100,100
60602,100,100,100,100,100,100
60603,100,100,100,100,100,100
60604,100,100,100,100,100,100
60605,23,23,23,23,23,23
60606,100,100,100,100,100,100
60607,58,58,58,58,58,58
60608,15,15,15,15,15,15
60610,34,34,34,34,34,34
60611,92,92,92,92,92,92


In [21]:
print('There are {} unique venue categories.'.format(len(chicago_venues['Venue Category'].unique())))

There are 258 unique venue categories.


## Analyze the zip codes

In [22]:
# one hot encoding
chicago_onehot = pd.get_dummies(chicago_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chicago_onehot['Zip Code'] = chicago_venues['Zip Code'] 

# move neighborhood column to the first column
fixed_columns = [chicago_onehot.columns[-1]] + list(chicago_onehot.columns[:-1])
chicago_onehot = chicago_onehot[fixed_columns]

chicago_onehot.head()

Unnamed: 0,Zip Code,ATM,Adult Boutique,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Beer Bar,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Check Cashing Service,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Stadium,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Currency Exchange,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Heliport,Hill,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Movie Theater,Moving Target,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Nightlife Spot,Noodle House,Office,Opera House,Optical Shop,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tour Provider,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,60644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,60644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,60644,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,60644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,60644,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [23]:
chicago_onehot.shape

(1905, 259)

In [24]:
chicago_grouped = chicago_onehot.groupby('Zip Code').mean().reset_index()
chicago_grouped.head()

Unnamed: 0,Zip Code,ATM,Adult Boutique,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Beer Bar,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Check Cashing Service,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Stadium,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Currency Exchange,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Flea Market,Flower Shop,Food,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Heliport,Hill,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Movie Theater,Moving Target,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Nightlife Spot,Noodle House,Office,Opera House,Optical Shop,Outdoor Sculpture,Outlet Store,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Portuguese Restaurant,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tour Provider,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,60601,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.14,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
1,60602,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,60603,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.07,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
3,60604,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.02,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,60605,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
chicago_grouped.shape

(84, 259)

In [31]:

num_top_venues = 9

for hood in chicago_grouped['Zip Code']:
    print("----"+hood+"----")
    temp = chicago_grouped[chicago_grouped['Zip Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----60601----
                 venue  freq
0                Hotel  0.14
1          Coffee Shop  0.06
2   Seafood Restaurant  0.06
3                Plaza  0.04
4  American Restaurant  0.03
5           Steakhouse  0.03
6       Breakfast Spot  0.02
7          Pizza Place  0.02
8                 Park  0.02


----60602----
                 venue  freq
0                Hotel  0.08
1              Theater  0.06
2   Italian Restaurant  0.04
3          Coffee Shop  0.04
4  American Restaurant  0.03
5               Bakery  0.03
6       Sandwich Place  0.03
7          Snack Place  0.03
8       Clothing Store  0.02


----60603----
                           venue  freq
0                          Hotel  0.07
1                    Coffee Shop  0.07
2                        Theater  0.05
3                 Sandwich Place  0.05
4                     Donut Shop  0.03
5                    Snack Place  0.03
6             Italian Restaurant  0.03
7  Vegetarian / Vegan Restaurant  0.03
8                 Cloth

                     venue  freq
0              Coffee Shop   0.2
1                      Gym   0.2
2               Donut Shop   0.2
3                     Park   0.2
4                     Bank   0.2
5  New American Restaurant   0.0
6                   Office   0.0
7             Noodle House   0.0
8           Nightlife Spot   0.0


----60632----
                    venue  freq
0      Seafood Restaurant  0.31
1        Storage Facility  0.08
2      Italian Restaurant  0.08
3               Bookstore  0.08
4                     Bar  0.08
5               BBQ Joint  0.08
6  Transportation Service  0.08
7              Donut Shop  0.08
8           Grocery Store  0.08


----60633----
                     venue  freq
0         Greek Restaurant  0.25
1                   Lounge  0.25
2                     Park  0.25
3           Discount Store  0.25
4             Noodle House  0.00
5           Nightlife Spot  0.00
6                Nightclub  0.00
7  New American Restaurant  0.00
8               Nail 

                    venue  freq
0      Mexican Restaurant  0.09
1              Taco Place  0.09
2             Pizza Place  0.09
3       Convenience Store  0.09
4                    Food  0.09
5                Pharmacy  0.09
6                  Lounge  0.09
7     Rental Car Location  0.09
8  Thrift / Vintage Store  0.09


----60674----
                    venue  freq
0      Mexican Restaurant  0.09
1              Taco Place  0.09
2             Pizza Place  0.09
3       Convenience Store  0.09
4                    Food  0.09
5                Pharmacy  0.09
6                  Lounge  0.09
7     Rental Car Location  0.09
8  Thrift / Vintage Store  0.09


----60675----
                    venue  freq
0      Mexican Restaurant  0.09
1              Taco Place  0.09
2             Pizza Place  0.09
3       Convenience Store  0.09
4                    Food  0.09
5                Pharmacy  0.09
6                  Lounge  0.09
7     Rental Car Location  0.09
8  Thrift / Vintage Store  0.09


----60

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [33]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zip Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zip_venues_sorted = pd.DataFrame(columns=columns)
zip_venues_sorted['Zip Code'] = chicago_grouped['Zip Code']

for ind in np.arange(chicago_grouped.shape[0]):
    zip_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_grouped.iloc[ind, :], num_top_venues)

zip_venues_sorted

Unnamed: 0,Zip Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60601,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
1,60602,Hotel,Theater,Coffee Shop,Italian Restaurant,American Restaurant
2,60603,Coffee Shop,Hotel,Sandwich Place,Theater,Snack Place
3,60604,Coffee Shop,Sandwich Place,Hotel,Italian Restaurant,Pizza Place
4,60605,Park,Football Stadium,History Museum,Bakery,Historic Site
5,60606,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
6,60607,Greek Restaurant,Sandwich Place,Café,Coffee Shop,Pizza Place
7,60608,Pizza Place,Flower Shop,Grocery Store,Pharmacy,Boat or Ferry
8,60610,Fast Food Restaurant,Deli / Bodega,American Restaurant,Coffee Shop,Restaurant
9,60611,Italian Restaurant,Hotel,Boutique,Café,American Restaurant


In [34]:
# use k means to cluster the neighborhood into 5 clusters

# set number of clusters
kclusters = 5

chicago_grouped_clustering = chicago_grouped.drop('Zip Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chicago_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [35]:
#add clustering labels
zip_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)
chicago_merged = df_chicago

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
chicago_merged = chicago_merged.join(zip_venues_sorted.set_index('Zip Code'), on='Zip')
chicago_merged = chicago_merged.drop([83]).reset_index(drop = True)
chicago_merged['Cluster_Labels'] = chicago_merged['Cluster_Labels'].astype(int)

chicago_merged.head() # check the last columns!

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60644,Chicago,IL,41.881331,-87.75671,-6,1,2,Fried Chicken Joint,ATM,Southern / Soul Food Restaurant,Discount Store,Chinese Restaurant
1,60613,Chicago,IL,41.953256,-87.6629,-6,1,2,Dive Bar,Ice Cream Shop,New American Restaurant,Bus Station,Candy Store
2,60622,Chicago,IL,41.900332,-87.66927,-6,1,2,Mexican Restaurant,Bar,Mobile Phone Shop,Coffee Shop,Optical Shop
3,60625,Chicago,IL,41.971614,-87.70256,-6,1,0,Park,Ice Cream Shop,Track,Outlet Store,Bank
4,60649,Chicago,IL,41.761734,-87.57072,-6,1,2,Seafood Restaurant,Mobile Phone Shop,Discount Store,Sandwich Place,Deli / Bodega


In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        chicago_merged['Latitude'], 
        chicago_merged['Longitude'], 
        chicago_merged['Zip'], 
        chicago_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


In [37]:
# cluster one
chicago_merged.loc[chicago_merged['Cluster_Labels'] == 0, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,City,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Chicago,-6,1,0,Park,Ice Cream Shop,Track,Outlet Store,Bank
8,Chicago,-6,1,0,Park,Electronics Store,American Restaurant,Intersection,Business Service
10,Chicago,-6,1,0,Park,Dog Run,Gym Pool,Cafeteria,Eastern European Restaurant
37,Chicago,-6,1,0,Donut Shop,Park,Bank,Coffee Shop,Gym
44,Chicago,-6,1,0,Park,Donut Shop,Home Service,Pizza Place,Fast Food Restaurant
60,Chicago,-6,1,0,Park,Discount Store,Greek Restaurant,Lounge,Yoga Studio
66,Chicago,-6,1,0,Bar,Gym Pool,Park,Pet Store,Gaming Cafe
74,Chicago,-6,1,0,Mexican Restaurant,Park,Bar,Ice Cream Shop,Filipino Restaurant


In [38]:
# cluster two
chicago_merged.loc[chicago_merged['Cluster_Labels'] == 1, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,City,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
6,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
12,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
13,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
14,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
15,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
18,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
21,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
22,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store
25,Chicago,-6,1,1,Pharmacy,Rental Car Location,Mexican Restaurant,Food,Convenience Store


In [39]:
# cluster three
chicago_merged.loc[chicago_merged['Cluster_Labels'] == 2, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,City,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Chicago,-6,1,2,Fried Chicken Joint,ATM,Southern / Soul Food Restaurant,Discount Store,Chinese Restaurant
1,Chicago,-6,1,2,Dive Bar,Ice Cream Shop,New American Restaurant,Bus Station,Candy Store
2,Chicago,-6,1,2,Mexican Restaurant,Bar,Mobile Phone Shop,Coffee Shop,Optical Shop
4,Chicago,-6,1,2,Seafood Restaurant,Mobile Phone Shop,Discount Store,Sandwich Place,Deli / Bodega
7,Chicago,-6,1,2,Mexican Restaurant,Pizza Place,Theater,Bakery,American Restaurant
9,Chicago,-6,1,2,Construction & Landscaping,Bank,Pharmacy,Pub,Supermarket
11,Chicago,-6,1,2,Flea Market,Breakfast Spot,Music Venue,Bus Stop,Pizza Place
16,Chicago,-6,1,2,Pharmacy,Nightclub,Train Station,Discount Store,Donut Shop
17,Chicago,-6,1,2,Chinese Restaurant,Sandwich Place,Fast Food Restaurant,Gas Station,Hockey Arena
19,Chicago,-6,1,2,Pizza Place,Train Station,Lounge,Donut Shop,BBQ Joint


In [40]:
# cluster four
chicago_merged.loc[chicago_merged['Cluster_Labels'] == 3, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,City,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
51,Chicago,-6,1,3,Fast Food Restaurant,Home Service,Train Station,Cosmetics Shop,Discount Store
73,Chicago,-6,1,3,Sports Bar,Train Station,Liquor Store,Exhibit,Dumpling Restaurant


In [41]:
# cluster four
chicago_merged.loc[chicago_merged['Cluster_Labels'] == 4, chicago_merged.columns[[1] + list(range(5, chicago_merged.shape[1]))]]

Unnamed: 0,City,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
29,Chicago,-6,1,4,Intersection,Airport,Yoga Studio,Eye Doctor,Eastern European Restaurant


In [42]:
# common venues in clusters zero and two are where recreationists and foodies are most likely to frequent
# Create Master List with Cluster Labels 0 and 3 only.
chicago_merged_new = chicago_merged
chicago_merged_new_one = chicago_merged_new[chicago_merged_new.Cluster_Labels == 0]
chicago_merged_new_two = chicago_merged_new[chicago_merged_new.Cluster_Labels == 2]

chicago_merged_new_final = [chicago_merged_new_one, chicago_merged_new_two]
result = pd.concat(chicago_merged_new_final)
result = result.reset_index(drop = True)
result

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60625,Chicago,IL,41.971614,-87.70256,-6,1,0,Park,Ice Cream Shop,Track,Outlet Store,Bank
1,60643,Chicago,IL,41.696433,-87.65993,-6,1,0,Park,Electronics Store,American Restaurant,Intersection,Business Service
2,60693,Chicago,IL,42.096428,-87.71791,-6,1,0,Park,Dog Run,Gym Pool,Cafeteria,Eastern European Restaurant
3,60631,Chicago,IL,41.99623,-87.81091,-6,1,0,Donut Shop,Park,Bank,Coffee Shop,Gym
4,60655,Chicago,IL,41.696283,-87.69912,-6,1,0,Park,Donut Shop,Home Service,Pizza Place,Fast Food Restaurant
5,60633,Chicago,IL,41.655423,-87.55365,-6,1,0,Park,Discount Store,Greek Restaurant,Lounge,Yoga Studio
6,60634,Chicago,IL,41.944454,-87.79654,-6,1,0,Bar,Gym Pool,Park,Pet Store,Gaming Cafe
7,60617,Chicago,IL,41.719973,-87.5557,-6,1,0,Mexican Restaurant,Park,Bar,Ice Cream Shop,Filipino Restaurant
8,60644,Chicago,IL,41.881331,-87.75671,-6,1,2,Fried Chicken Joint,ATM,Southern / Soul Food Restaurant,Discount Store,Chinese Restaurant
9,60613,Chicago,IL,41.953256,-87.6629,-6,1,2,Dive Bar,Ice Cream Shop,New American Restaurant,Bus Station,Candy Store


In [43]:
# start loooking at incomes by area
df_income = pd.read_html('http://zipatlas.com/us/il/chicago/zip-code-comparison/median-household-income.htm')[11]
df_income = df_income.drop([0, 2, 3, 4, 6], axis=1)
df_income = df_income.rename(columns={1: "Zip", 5: "Average Income"})
df_income = df_income.drop([0])
df_income.head()

Unnamed: 0,Zip,Average Income
1,60606,"$100,377.00"
2,60601,"$77,374.00"
3,60611,"$69,889.00"
4,60614,"$68,324.00"
5,60603,"$61,815.00"


In [62]:
# include incomes in master table
result_final = pd.merge(df_income, result, on='Zip')


In [45]:
#remove timezine and daylight savings
result_final = result_final.drop(['Timezone', 'Daylight savings time flag'], axis=1)
result_final.head()
#replace missing value $0.00
result_final['Average Income'] = result_final['Average Income'].replace(['$0.00'],'$112,699.00')
result_final

Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60606,"$100,377.00",Chicago,IL,41.882582,-87.6376,2,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
1,60601,"$77,374.00",Chicago,IL,41.886456,-87.62325,2,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
2,60611,"$69,889.00",Chicago,IL,41.904667,-87.62504,2,Italian Restaurant,Hotel,Boutique,Café,American Restaurant
3,60614,"$68,324.00",Chicago,IL,41.922682,-87.65432,2,Yoga Studio,Greek Restaurant,Pizza Place,Coffee Shop,Donut Shop
4,60603,"$61,815.00",Chicago,IL,41.880446,-87.63014,2,Coffee Shop,Hotel,Sandwich Place,Theater,Snack Place
5,60655,"$59,849.00",Chicago,IL,41.696283,-87.69912,0,Park,Donut Shop,Home Service,Pizza Place,Fast Food Restaurant
6,60646,"$58,232.00",Chicago,IL,41.995331,-87.7601,2,Sandwich Place,Diner,Grocery Store,Restaurant,Gas Station
7,60605,"$56,151.00",Chicago,IL,41.860019,-87.6187,2,Park,Football Stadium,History Museum,Bakery,Historic Site
8,60657,"$55,647.00",Chicago,IL,41.940832,-87.65852,2,Pizza Place,Sports Bar,Music Venue,Sandwich Place,Performing Arts Venue
9,60631,"$55,316.00",Chicago,IL,41.99623,-87.81091,0,Donut Shop,Park,Bank,Coffee Shop,Gym


In [46]:
result_final['Average Income'] = result_final['Average Income'].replace(['$0.00'],'$112,699.00')
result_final

Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60606,"$100,377.00",Chicago,IL,41.882582,-87.6376,2,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
1,60601,"$77,374.00",Chicago,IL,41.886456,-87.62325,2,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
2,60611,"$69,889.00",Chicago,IL,41.904667,-87.62504,2,Italian Restaurant,Hotel,Boutique,Café,American Restaurant
3,60614,"$68,324.00",Chicago,IL,41.922682,-87.65432,2,Yoga Studio,Greek Restaurant,Pizza Place,Coffee Shop,Donut Shop
4,60603,"$61,815.00",Chicago,IL,41.880446,-87.63014,2,Coffee Shop,Hotel,Sandwich Place,Theater,Snack Place
5,60655,"$59,849.00",Chicago,IL,41.696283,-87.69912,0,Park,Donut Shop,Home Service,Pizza Place,Fast Food Restaurant
6,60646,"$58,232.00",Chicago,IL,41.995331,-87.7601,2,Sandwich Place,Diner,Grocery Store,Restaurant,Gas Station
7,60605,"$56,151.00",Chicago,IL,41.860019,-87.6187,2,Park,Football Stadium,History Museum,Bakery,Historic Site
8,60657,"$55,647.00",Chicago,IL,41.940832,-87.65852,2,Pizza Place,Sports Bar,Music Venue,Sandwich Place,Performing Arts Venue
9,60631,"$55,316.00",Chicago,IL,41.99623,-87.81091,0,Donut Shop,Park,Bank,Coffee Shop,Gym


In [61]:
#convert average income column into an integer column
result_final[result_final.columns[1]] = result_final[result_final.columns[1]].replace('[\$,]', '', regex=True).astype(float)
result_final['Average Income'] = pd.to_numeric(result_final['Average Income'])
result_final = result_final.sort_values(by=['Average Income'], ascending = False)
result_final =result_final.reset_index(drop = True)
result_final.head()

Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60654,112699.0,Chicago,IL,41.888627,-87.63538,2,Steakhouse,Italian Restaurant,Coffee Shop,Hotel,Bar
1,60606,100377.0,Chicago,IL,41.882582,-87.6376,2,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
2,60601,77374.0,Chicago,IL,41.886456,-87.62325,2,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
3,60611,69889.0,Chicago,IL,41.904667,-87.62504,2,Italian Restaurant,Hotel,Boutique,Café,American Restaurant
4,60614,68324.0,Chicago,IL,41.922682,-87.65432,2,Yoga Studio,Greek Restaurant,Pizza Place,Coffee Shop,Donut Shop


# Results Section

Three recommendation tables have been created based on nearby venues. Table information includes recreational, food, and high end areas respectively. Clustering was used on nearby venue data to categorize the recreation and food areas. 'High End' areas were primarily based on annual average household income found for each zip code online. We are assuming that high end living begins around the $70,000 income level.

In [60]:
#these will be our high end living zip codes based on average income
high_end_living = result_final[:5]
print('Below are the locations we recommend if you are interested in high end living ')
high_end_living

Below are the locations we recommend if you are interested in high end living 


Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60654,112699.0,Chicago,IL,41.888627,-87.63538,2,Steakhouse,Italian Restaurant,Coffee Shop,Hotel,Bar
1,60606,100377.0,Chicago,IL,41.882582,-87.6376,2,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
2,60601,77374.0,Chicago,IL,41.886456,-87.62325,2,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
3,60611,69889.0,Chicago,IL,41.904667,-87.62504,2,Italian Restaurant,Hotel,Boutique,Café,American Restaurant
4,60614,68324.0,Chicago,IL,41.922682,-87.65432,2,Yoga Studio,Greek Restaurant,Pizza Place,Coffee Shop,Donut Shop


In [59]:
#these will be our recreational living zip codes based on cluster
recreation = result_final.loc[result_final['Cluster_Labels'] == 0]
print('Below are the locations we recommend for recreation ')

recreation

Below are the locations we recommend for recreation 


Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,60655,59849.0,Chicago,IL,41.696283,-87.69912,0,Park,Donut Shop,Home Service,Pizza Place,Fast Food Restaurant
10,60631,55316.0,Chicago,IL,41.99623,-87.81091,0,Donut Shop,Park,Bank,Coffee Shop,Gym
12,60643,51305.0,Chicago,IL,41.696433,-87.65993,0,Park,Electronics Store,American Restaurant,Intersection,Business Service
14,60634,50042.0,Chicago,IL,41.944454,-87.79654,0,Bar,Gym Pool,Park,Pet Store,Gaming Cafe
24,60633,40792.0,Chicago,IL,41.655423,-87.55365,0,Park,Discount Store,Greek Restaurant,Lounge,Yoga Studio
27,60625,40083.0,Chicago,IL,41.971614,-87.70256,0,Park,Ice Cream Shop,Track,Outlet Store,Bank
32,60617,35534.0,Chicago,IL,41.719973,-87.5557,0,Mexican Restaurant,Park,Bar,Ice Cream Shop,Filipino Restaurant


In [56]:
#these will be our foodie living zip codes based on cluster
foodie = result_final.loc[result_final['Cluster_Labels'] == 2]

print('Below are the locations we recommend for you foodies ')
foodie.head()

Below are the locations we recommend for you foodies 


Unnamed: 0,Zip,Average Income,City,State,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,60654,112699.0,Chicago,IL,41.888627,-87.63538,2,Steakhouse,Italian Restaurant,Coffee Shop,Hotel,Bar
1,60606,100377.0,Chicago,IL,41.882582,-87.6376,2,Coffee Shop,Sandwich Place,New American Restaurant,Snack Place,Mediterranean Restaurant
2,60601,77374.0,Chicago,IL,41.886456,-87.62325,2,Hotel,Coffee Shop,Seafood Restaurant,Plaza,American Restaurant
3,60611,69889.0,Chicago,IL,41.904667,-87.62504,2,Italian Restaurant,Hotel,Boutique,Café,American Restaurant
4,60614,68324.0,Chicago,IL,41.922682,-87.65432,2,Yoga Studio,Greek Restaurant,Pizza Place,Coffee Shop,Donut Shop


## Discussion

Overall, clustering proved to be a very effective tool for filtering out categorical areas from the whole city of Chicago. It should be noted that there are limitations to this project. This project does not include work/life balance variables like transportation. We are also very fortunate that this dataset did not include any outlying high end areas with bad venues. Foursquare API was used to source nearby venues and datasets used in this project can be found here https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=IL and here http://zipatlas.com/us/il/chicago/zip-code-comparison/median-household-income.htm 

## Possible Next Steps

I recommend taking this project further by including an interactive tool that allows for user input based on venue, income level, and geographical preferences. A border around the mapped zip codes is also recommended and will most likely be included in version 2 of this project.

# Conclusion

In this project, we analyzed geographical city data based on what there is to do there. By following location recommendations for the three categories, our users can get one step closer to their preffered lifestyle. When someone is moving into a new home, it's all about location, location, location. With the use of k-means clustering we have been able to categorize large datasets and recommend those created categories to prospective movers.