<H1>
    Manhattan vs Toronto: Where should I travel to?
</H1>

<H2>
    Coursera Capstone - The Battle of the Neighbourhoods


In [47]:
#Import all necessary packages
import pandas as pd
import numpy as np
import requests
import json
import urllib.request
from geopy.geocoders import Nominatim
import folium
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


<H2>
    Segmenting and Clustering Neighborhoods in New York
</H2>

In [27]:
#Get data
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'

with urllib.request.urlopen(url) as json_data:
    newyork_data = json.load(json_data)

In [28]:
#Identify features of data
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [29]:
#Create empty data frame starting with column names
column_names = ['Borough','Neighborhood','Latitude','Longitude']
neighborhoods = pd.DataFrame(columns = column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [30]:
#Loop through data and poplate instantiated dataframe
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough, 'Neighborhood':neighborhood_name,
                                         'Latitude':neighborhood_lat, 'Longitude':neighborhood_lon}, 
                                        ignore_index = True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [31]:
#subset manhattan from newyork
manhattan_data = neighborhoods[neighborhoods['Borough']=='Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [58]:
manhattan_data.shape

(40, 4)

In [116]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(manhattan_data['Borough'].unique()),
        manhattan_data.shape[0]
    )
)

The dataframe has 1 boroughs and 40 neighborhoods.


In [35]:
#map out manhattan using folium
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent='manhattan_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_manhattan = folium.Map(location=[latitude,longitude], zoom_start = 11)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186c',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)
map_manhattan

In [43]:
#Get foursquare credentials
c_id = 'VPYZLQAH5UDCC4XBWX3TFUZ4HTWORLGGCHOLPCZZRAB2CUXB'
c_s = 'SF5RJZ1LXMR0BJFSUZNLCCMC2MQ0S5IOIYCZUU2ZV0K2P3XF'
vs = '20180605'

#set limit and radius
limit = 100
radius = 500

In [44]:
#create a function to get category of venue from foursquare api
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list)==0:
        return None
    else:
        return categories_list[0]['Name']

In [45]:
#create function to get nearby venues across all neighbourhoods in manhattan
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            c_id, 
            c_s, 
            vs, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [48]:
#populate a new variable containing all the venues in manhattan
manhattan_venues = getNearbyVenues(names = manhattan_data['Neighborhood'],
                                      latitudes = manhattan_data['Latitude'],
                                      longitudes = manhattan_data['Longitude'])

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [49]:
manhattan_venues.head(5)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [57]:
manhattan_venues.shape

(3203, 7)

In [50]:
#This is to see how many venues per neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,76,76,76,76,76,76
Carnegie Hill,90,90,90,90,90,90
Central Harlem,44,44,44,44,44,44
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,39,39,39,39,39,39
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


In [52]:
print('There are {} unique categories'.format(len(manhattan_venues['Venue Category'].unique())))

There are 334 unique categories


In [53]:
#One hot encoding - creating binary values
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood']
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]
manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Hill,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [56]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cha Chaan Teng,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Hill,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Physical Therapist,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.065789,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.013158,0.039474,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039474,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065789,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.026316,0.026316,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.011111,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.011111,0.0,0.022222,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.011111,0.0,0.022222,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.011111,0.0,0.0,0.0,0.022222,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.011111,0.033333,0.0,0.011111,0.033333
2,Central Harlem,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.045455,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01
4,Chinatown,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
5,Civic Center,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02
6,Clinton,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.128205,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.09,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0


In [59]:
#identifying the frequency a category of venue may appear
num_top_venues = 5
for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood']==hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp=temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
            venue  freq
0     Coffee Shop  0.08
1            Park  0.07
2  Clothing Store  0.07
3           Hotel  0.07
4   Memorial Site  0.04


----Carnegie Hill----
               venue  freq
0        Coffee Shop  0.09
1               Café  0.04
2        Pizza Place  0.04
3        Yoga Studio  0.03
4  French Restaurant  0.03


----Central Harlem----
                venue  freq
0                 Bar  0.05
1      Cosmetics Shop  0.05
2  Chinese Restaurant  0.05
3  Seafood Restaurant  0.05
4         Art Gallery  0.05


----Chelsea----
                venue  freq
0         Coffee Shop  0.07
1         Art Gallery  0.06
2              Bakery  0.05
3  Italian Restaurant  0.03
4      Ice Cream Shop  0.03


----Chinatown----
                 venue  freq
0   Chinese Restaurant  0.07
1               Bakery  0.05
2         Cocktail Bar  0.05
3  American Restaurant  0.04
4       Ice Cream Shop  0.03


----Civic Center----
               venue  freq
0        Coffee Shop 

In [60]:
#create a function to identify the most common venues, stored inside row_categories
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [65]:
#creating a top 10 most common venue across neighborhoods in manhattan
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1), indicators[ind])
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind,:], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Coffee Shop,Hotel,Park,Clothing Store,Gym,Memorial Site,Boat or Ferry,Food Court,Shopping Mall,Wine Shop
1,Carnegie Hill,Coffee Shop,Café,Pizza Place,Yoga Studio,French Restaurant,Wine Shop,Bookstore,Gym / Fitness Center,Gym,Shipping Store
2,Central Harlem,Art Gallery,Bar,Chinese Restaurant,Seafood Restaurant,African Restaurant,American Restaurant,Gym / Fitness Center,Cosmetics Shop,French Restaurant,Café
3,Chelsea,Coffee Shop,Art Gallery,Bakery,French Restaurant,American Restaurant,Wine Shop,Ice Cream Shop,Italian Restaurant,Café,Park
4,Chinatown,Chinese Restaurant,Cocktail Bar,Bakery,American Restaurant,Spa,Hotpot Restaurant,Vietnamese Restaurant,Optical Shop,Ice Cream Shop,Bar


<H2>
    K-Means Clustering
</H2>

In [68]:
#Set clusters to 3, just so we can compare to toronto which I previously found in another assignment 
#in coursera to only show 1 cluster
kclusters = 3
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood',1)
kmeans_manhattan = KMeans(n_clusters = kclusters, random_state=0).fit(manhattan_grouped_clustering)
kmeans_manhattan.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 2, 1, 1])

In [72]:
#merging results
manhattan_merged = manhattan_data
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Gym,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Pizza Place,Steakhouse,Shopping Mall,Seafood Restaurant,Department Store
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Cocktail Bar,Bakery,American Restaurant,Spa,Hotpot Restaurant,Vietnamese Restaurant,Optical Shop,Ice Cream Shop,Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,2,Café,Bakery,Mobile Phone Shop,Bank,Grocery Store,Park,Tapas Restaurant,Chinese Restaurant,Deli / Bodega,Wine Shop
3,Manhattan,Inwood,40.867684,-73.92121,2,Café,Mexican Restaurant,Lounge,Restaurant,Wine Bar,Pizza Place,Park,Bakery,Frozen Yogurt Shop,Chinese Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,2,Pizza Place,Café,Coffee Shop,Deli / Bodega,Mexican Restaurant,Yoga Studio,Sushi Restaurant,Caribbean Restaurant,Chinese Restaurant,School


In [76]:
#map the clusters
map_clusters_manhattan = folium.Map(location = [latitude, longitude], zoom_start = 11)

x1 = np.arange(kclusters)
y1 = [i + x1 + (i*x1)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(y1)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'],manhattan_merged['Longitude'],
                                 manhattan_merged['Neighborhood'],manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster '+ str(cluster), parse_html = True)
    folium.CircleMarker(
        [lat,lon],
        radius=5,
        popup=label,
        color = rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_manhattan)
map_clusters_manhattan

<H3>
    Cluster Analysis
    <H5>
        Cluster 1
    <h5>
 </h3>

In [82]:
manhattan_merged.loc[manhattan_merged['Cluster Labels']==0, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Stuyvesant Town,-73.974052,0,Park,Bar,Fountain,Pet Service,Gym / Fitness Center,Baseball Field,Harbor / Marina,Cocktail Bar,Coffee Shop,Bistro


In [249]:
cluster0_df_manhattan = manhattan_merged.loc[manhattan_merged['Cluster Labels']==0, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]
cluster0_df_manhattan['1th Most Common Venue'].value_counts()

Park    1
Name: 1th Most Common Venue, dtype: int64

<h5>
    Cluster 2
<h5>

In [81]:
manhattan_merged.loc[manhattan_merged['Cluster Labels']==1, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,-73.91066,1,Gym,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Pizza Place,Steakhouse,Shopping Mall,Seafood Restaurant,Department Store
1,Chinatown,-73.994279,1,Chinese Restaurant,Cocktail Bar,Bakery,American Restaurant,Spa,Hotpot Restaurant,Vietnamese Restaurant,Optical Shop,Ice Cream Shop,Bar
6,Central Harlem,-73.943211,1,Art Gallery,Bar,Chinese Restaurant,Seafood Restaurant,African Restaurant,American Restaurant,Gym / Fitness Center,Cosmetics Shop,French Restaurant,Café
8,Upper East Side,-73.960508,1,Italian Restaurant,Coffee Shop,Exhibit,Bakery,Yoga Studio,French Restaurant,Juice Bar,Spa,Hotel,American Restaurant
9,Yorkville,-73.947118,1,Italian Restaurant,Gym,Coffee Shop,Bar,Deli / Bodega,Wine Shop,Diner,Sushi Restaurant,Japanese Restaurant,Pizza Place
10,Lenox Hill,-73.95886,1,Italian Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Café,Cocktail Bar,Burger Joint,Deli / Bodega,Gym,Gym / Fitness Center
12,Upper West Side,-73.977059,1,Italian Restaurant,Bar,Wine Bar,Bakery,Café,Indian Restaurant,Coffee Shop,Yoga Studio,French Restaurant,Seafood Restaurant
13,Lincoln Square,-73.985338,1,Plaza,Café,Concert Hall,Performing Arts Venue,Theater,Gym / Fitness Center,Wine Shop,Italian Restaurant,French Restaurant,Indie Movie Theater
14,Clinton,-73.996119,1,Theater,Gym / Fitness Center,American Restaurant,Coffee Shop,Cocktail Bar,Hotel,Italian Restaurant,Spa,Sandwich Place,Gym
15,Midtown,-73.981669,1,Hotel,Clothing Store,Theater,Sporting Goods Shop,Coffee Shop,Bakery,American Restaurant,Bookstore,Cuban Restaurant,Cosmetics Shop


In [243]:
cluster1_df_manhattan = manhattan_merged.loc[manhattan_merged['Cluster Labels']==1, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]
cluster1_df_manhattan['1th Most Common Venue'].value_counts()

Italian Restaurant      8
Coffee Shop             6
Bar                     3
Café                    2
Gym / Fitness Center    1
Hotel                   1
Theater                 1
Gym                     1
American Restaurant     1
Park                    1
Clothing Store          1
Art Gallery             1
Korean Restaurant       1
Chinese Restaurant      1
Plaza                   1
Name: 1th Most Common Venue, dtype: int64

<H5>
    Cluster 3
</H5>

In [244]:
manhattan_merged.loc[manhattan_merged['Cluster Labels']==2, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude,Cluster Labels,1th Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,-73.9369,2,Café,Bakery,Mobile Phone Shop,Bank,Grocery Store,Park,Tapas Restaurant,Chinese Restaurant,Deli / Bodega,Wine Shop
3,Inwood,-73.92121,2,Café,Mexican Restaurant,Lounge,Restaurant,Wine Bar,Pizza Place,Park,Bakery,Frozen Yogurt Shop,Chinese Restaurant
4,Hamilton Heights,-73.949688,2,Pizza Place,Café,Coffee Shop,Deli / Bodega,Mexican Restaurant,Yoga Studio,Sushi Restaurant,Caribbean Restaurant,Chinese Restaurant,School
5,Manhattanville,-73.957385,2,Coffee Shop,Seafood Restaurant,Deli / Bodega,Italian Restaurant,Mexican Restaurant,Check Cashing Service,Sushi Restaurant,Boutique,Supermarket,Gastropub
7,East Harlem,-73.944182,2,Mexican Restaurant,Thai Restaurant,Bakery,Latin American Restaurant,Deli / Bodega,Sandwich Place,Taco Place,Gym,Grocery Store,Cocktail Bar
11,Roosevelt Island,-73.949168,2,Park,Dry Cleaner,Scenic Lookout,Gym,Coffee Shop,Greek Restaurant,Outdoors & Recreation,Liquor Store,Sandwich Place,School
26,Morningside Heights,-73.963896,2,Park,Coffee Shop,Bookstore,American Restaurant,Burger Joint,Deli / Bodega,Café,Pharmacy,Grocery Store,Farmers Market
35,Turtle Bay,-73.967708,2,Coffee Shop,Italian Restaurant,Park,Deli / Bodega,Sushi Restaurant,Sandwich Place,Japanese Restaurant,Seafood Restaurant,Ramen Restaurant,Indian Restaurant
36,Tudor City,-73.971219,2,Park,Café,Mexican Restaurant,Deli / Bodega,Coffee Shop,Diner,Greek Restaurant,Thai Restaurant,Gym,Gym / Fitness Center


In [246]:
cluster2_df_manhattan = manhattan_merged.loc[manhattan_merged['Cluster Labels']==2, 
                     manhattan_merged.columns[[1]+list(range(3, manhattan_merged.shape[1]))]]
cluster2_df_manhattan['1th Most Common Venue'].value_counts()

Park                  3
Café                  2
Coffee Shop           2
Pizza Place           1
Mexican Restaurant    1
Name: 1th Most Common Venue, dtype: int64

<H2>
    Segmenting and Clustering Neighborhoods in Toronto
</H2>

<H5>
    First we want to scrape the data from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
</H5>

In [84]:
import requests
import lxml.html as lh

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
get_url = requests.get(url)
doc = lh.fromstring(get_url.content) #storing content of the url page
tr_elements = doc.xpath('//tr') #Parsing only table contents

<H5>
    Now we go to parsing the table header as these become the column headers required for the data.
</H5>

In [85]:
empty_list = []
i = 0
for j in tr_elements[0]:
    i+=1
    col_name = j.text_content()
    print('%d: "%s"'%(i, col_name))
    empty_list.append((col_name,[]))

1: "Postal Code
"
2: "Borough
"
3: "Neighbourhood
"


<H5>
    Now we can create the dataframe object for the data.
    We begin with the second row of data since first row is filled wil column names.
</H5>

In [86]:
for z in range(1, len(tr_elements)):
    #loop through table elements
    T = tr_elements[z]
    if len(T)!=3:
        #Conditional set to 3 as that is how much content in each row
        #If length exceeds 3 then that is data not needed for this context
        break
    col_index = 0 #index column
    for t in T.iterchildren():
        content = t.text_content()
        empty_list[col_index][1].append(content)
        col_index+=1
complete_list = empty_list
complete_list

[('Postal Code\n',
  ['M1A\n',
   'M2A\n',
   'M3A\n',
   'M4A\n',
   'M5A\n',
   'M6A\n',
   'M7A\n',
   'M8A\n',
   'M9A\n',
   'M1B\n',
   'M2B\n',
   'M3B\n',
   'M4B\n',
   'M5B\n',
   'M6B\n',
   'M7B\n',
   'M8B\n',
   'M9B\n',
   'M1C\n',
   'M2C\n',
   'M3C\n',
   'M4C\n',
   'M5C\n',
   'M6C\n',
   'M7C\n',
   'M8C\n',
   'M9C\n',
   'M1E\n',
   'M2E\n',
   'M3E\n',
   'M4E\n',
   'M5E\n',
   'M6E\n',
   'M7E\n',
   'M8E\n',
   'M9E\n',
   'M1G\n',
   'M2G\n',
   'M3G\n',
   'M4G\n',
   'M5G\n',
   'M6G\n',
   'M7G\n',
   'M8G\n',
   'M9G\n',
   'M1H\n',
   'M2H\n',
   'M3H\n',
   'M4H\n',
   'M5H\n',
   'M6H\n',
   'M7H\n',
   'M8H\n',
   'M9H\n',
   'M1J\n',
   'M2J\n',
   'M3J\n',
   'M4J\n',
   'M5J\n',
   'M6J\n',
   'M7J\n',
   'M8J\n',
   'M9J\n',
   'M1K\n',
   'M2K\n',
   'M3K\n',
   'M4K\n',
   'M5K\n',
   'M6K\n',
   'M7K\n',
   'M8K\n',
   'M9K\n',
   'M1L\n',
   'M2L\n',
   'M3L\n',
   'M4L\n',
   'M5L\n',
   'M6L\n',
   'M7L\n',
   'M8L\n',
   'M9L\n',
   'M1M\n

<H5>
    Create a dictionary with the required data and convert to dataframe.
</H5>

In [87]:
Dict_content = {title:column for (title,column) in complete_list}
df = pd.DataFrame(Dict_content)
df.head(5)

Unnamed: 0,Postal Code\n,Borough\n,Neighbourhood\n
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


<H5>
    We see that the data is filled with '\n', so we must remove this.
</H5>

In [88]:
df = df.replace(r'\n','', regex=True) #replace '\n' within each data entry
df.columns = ['Postal Code', 'Borough', 'Neighbourhood'] #Replace column names with ones without '\n'
df.head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


<H5>
    Now we have to remove any rows of data where Borough == 'Not assigned'.
    We see below that the neighbourhoods have already been combined into one row.
    According to their postal code i.e. M5A has 'Regent Park', 'Harbourfront'.
</H5>

In [89]:
df = df[~df['Borough'].isin(['Not assigned'])]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [90]:
df.shape

(104, 3)

<H5>
    We now create the dataframe that contains the latitude and logitude values of the associated Postal Code using Geospatial Data from 'https://cocl.us/Geospatial_data' as the Geocoder package had proven unreliable.
</H5>

In [91]:
geo_data = pd.read_csv('https://cocl.us/Geospatial_data')
merged_table = pd.merge(df, geo_data, on = 'Postal Code')
merged_table.head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [138]:
test = merged_table['Borough'].unique()
test

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [139]:
#identifying best borough to use
for i in test:
    a = merged_table[merged_table['Borough']==i]
    print('The dataframe {} has {} boroughs and {} neighborhoods.'.format(i,len(a['Borough'].unique()),a.shape[0]))

The dataframe North York has 1 boroughs and 24 neighborhoods.
The dataframe Downtown Toronto has 1 boroughs and 19 neighborhoods.
The dataframe Etobicoke has 1 boroughs and 12 neighborhoods.
The dataframe Scarborough has 1 boroughs and 17 neighborhoods.
The dataframe East York has 1 boroughs and 5 neighborhoods.
The dataframe York has 1 boroughs and 5 neighborhoods.
The dataframe East Toronto has 1 boroughs and 5 neighborhoods.
The dataframe West Toronto has 1 boroughs and 6 neighborhoods.
The dataframe Central Toronto has 1 boroughs and 9 neighborhoods.
The dataframe Mississauga has 1 boroughs and 1 neighborhoods.


In [147]:
#Combine boroughs to make 1 big borough with multiple neighbourhoods as to get a cluster result
d_toronto = merged_table[merged_table['Borough']=='Downtown Toronto'].reset_index(drop=True)
e_toronto = merged_table[merged_table['Borough']=='East Toronto'].reset_index(drop=True)
w_toronto = merged_table[merged_table['Borough']=='West Toronto'].reset_index(drop=True)
c_toronto = merged_table[merged_table['Borough']=='Central Toronto'].reset_index(drop=True)

frames = [d_toronto, e_toronto, w_toronto, c_toronto]
toronto_merged = pd.concat(frames)
toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


In [148]:
#for ease we create a new 'centralised' borough by renaming it all to 'Toronto Main'
toronto_merged['Borough'] = toronto_merged['Borough'].replace({'Downtown Toronto': 'Toronto Main', 'East Toronto':'Toronto Main',
                                                       'West Toronto': 'Toronto Main', 'Central Toronto':'Toronto Main'})
toronto_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Toronto Main,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Toronto Main,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Toronto Main,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Toronto Main,St. James Town,43.651494,-79.375418
4,M5E,Toronto Main,Berczy Park,43.644771,-79.373306
5,M5G,Toronto Main,Central Bay Street,43.657952,-79.387383
6,M6G,Toronto Main,Christie,43.669542,-79.422564
7,M5H,Toronto Main,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Toronto Main,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Toronto Main,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


In [92]:
address = 'Toronto'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Toronto are 43.6534817, -79.3839347.


<H5>
    Now we create a map of Toronto with neighbourhoods superimposed on top.
</H5>

In [93]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
for lat, lng, borough, neighbourhood in zip(merged_table['Latitude'],merged_table['Longitude'],merged_table['Borough'], merged_table['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
map_toronto

<H5>
    We now cluster only the borough we created 'Toronto Main'
</H5>

In [154]:
geolocator2 = Nominatim(user_agent="toronto_main_explorer")
location2 = geolocator.geocode(address) #keep address the same
latitude2 = location.latitude
longitude2 = location.longitude
print('The geographical coordinate of "Toronto Main" are {}, {}.'.format(latitude2, longitude2))

The geographical coordinate of "Toronto Main" are 43.6534817, -79.3839347.


In [155]:
map_toronto_main = folium.Map(location=[latitude2, longitude2], zoom_start=10)
for lat, lng, borough, neighbourhood in zip(toronto_merged['Latitude'],toronto_merged['Longitude'],
                                            toronto_merged['Borough'], toronto_merged['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_main)
map_toronto_main

<H5>
    Now we perform the segmentation analysis on venues in the Toronto area
</H5>

In [160]:
neighbourhood_latitude = toronto_merged.loc[0,'Latitude']
neighbourhood_longitude = toronto_merged.loc[0,'Longitude']
neighbourhood_name = toronto_merged.loc[0,'Neighbourhood']
print(toronto_merged.loc[0])

level_0                                  0
index                                    0
Postal Code                            M5A
Borough                       Toronto Main
Neighbourhood    Regent Park, Harbourfront
Latitude                           43.6543
Longitude                         -79.3606
Name: 0, dtype: object


In [186]:
toronto_merged.head()

Unnamed: 0,level_0,index,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,0,0,M5A,Toronto Main,"Regent Park, Harbourfront",43.65426,-79.360636
1,1,1,M7A,Toronto Main,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,2,2,M5B,Toronto Main,"Garden District, Ryerson",43.657162,-79.378937
3,3,3,M5C,Toronto Main,St. James Town,43.651494,-79.375418
4,4,4,M5E,Toronto Main,Berczy Park,43.644771,-79.373306


<H5>
    Now we look at top 100 venues within a raidus of 500 meters
</H5>

In [191]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toronto_merged['Borough'].unique()),
        toronto_merged.shape[0]
    )
)
radius2 = 250

The dataframe has 1 boroughs and 39 neighborhoods.


<H5>
    Now we perform the same method amongst all neighbourhoods in Toronto area
</H5>

In [192]:
def getNearbyVenues_Toronto(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            c_id, 
            c_s, 
            vs, 
            neighbourhood_latitude, 
            neighbourhood_longitude, 
            radius2, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [193]:
toronto_main_venues = getNearbyVenues_Toronto(names = toronto_merged['Neighbourhood'], 
                                              latitudes=toronto_merged['Latitude'],
                                              longitudes=toronto_merged['Longitude'])

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley
The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High Park, The Junction South
Parkdale, Roncesvalles
Runnymede, Swansea
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, F

In [194]:
toronto_main_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [195]:
toronto_main_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bakery,39,39,39,39,39,39
Breakfast Spot,39,39,39,39,39,39
Coffee Shop,39,39,39,39,39,39
Distribution Center,39,39,39,39,39,39
Dog Run,39,39,39,39,39,39
Flower Shop,39,39,39,39,39,39
Food Truck,39,39,39,39,39,39
Gym / Fitness Center,39,39,39,39,39,39
History Museum,39,39,39,39,39,39
Mediterranean Restaurant,39,39,39,39,39,39


In [196]:
print('There are {} uniques categories.'.format(len(toronto_main_venues['Venue Category'].unique())))

There are 13 uniques categories.


<H5>
    Now we analyze each neighbourhood
</H5>

In [197]:
toronto_main_onehot = pd.get_dummies(toronto_main_venues[['Venue Category']], prefix_sep="")
toronto_main_onehot['Neighbourhood']=toronto_main_venues['Neighbourhood']
fixed_columns = [toronto_main_onehot.columns[-1]]+list(toronto_main_onehot.columns[:-1])
toronto_main_onehot=toronto_main_onehot[fixed_columns]
toronto_main_onehot.head()

Unnamed: 0,Neighbourhood,Venue CategoryBakery,Venue CategoryBreakfast Spot,Venue CategoryCoffee Shop,Venue CategoryDistribution Center,Venue CategoryDog Run,Venue CategoryFlower Shop,Venue CategoryFood Truck,Venue CategoryGym / Fitness Center,Venue CategoryHistory Museum,Venue CategoryMediterranean Restaurant,Venue CategoryPark,Venue CategorySandwich Place,Venue CategorySpa
0,"Regent Park, Harbourfront",1,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,1,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,1,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,1,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,1


<H5>
    Grouping rows by neighbourhood and taking the mean of the frequency of occurrence of each category
</H5>

In [198]:
toronto_main_grouped = toronto_main_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_main_grouped

Unnamed: 0,Neighbourhood,Venue CategoryBakery,Venue CategoryBreakfast Spot,Venue CategoryCoffee Shop,Venue CategoryDistribution Center,Venue CategoryDog Run,Venue CategoryFlower Shop,Venue CategoryFood Truck,Venue CategoryGym / Fitness Center,Venue CategoryHistory Museum,Venue CategoryMediterranean Restaurant,Venue CategoryPark,Venue CategorySandwich Place,Venue CategorySpa
0,Berczy Park,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
1,"Brockton, Parkdale Village, Exhibition Place",0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
2,"Business reply mail Processing Centre, South C...",0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
4,Central Bay Street,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
5,Christie,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
6,Church and Wellesley,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
7,"Commerce Court, Victoria Hotel",0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
8,Davisville,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923
9,Davisville North,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923,0.076923


<H5>
    Printing each neighbourhood along with the top 5 most common venues
</H5>

In [199]:
for hood in toronto_main_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_main_grouped[toronto_main_grouped['Neighbourhood']==hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq]']=temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(5))
    print('\n')

----Berczy Park----
                               venue       freq     freq]
0               Venue CategoryBakery  0.0769231  0.076923
1       Venue CategoryBreakfast Spot  0.0769231  0.076923
2          Venue CategoryCoffee Shop  0.0769231  0.076923
3  Venue CategoryDistribution Center  0.0769231  0.076923
4              Venue CategoryDog Run  0.0769231  0.076923


----Brockton, Parkdale Village, Exhibition Place----
                               venue       freq     freq]
0               Venue CategoryBakery  0.0769231  0.076923
1       Venue CategoryBreakfast Spot  0.0769231  0.076923
2          Venue CategoryCoffee Shop  0.0769231  0.076923
3  Venue CategoryDistribution Center  0.0769231  0.076923
4              Venue CategoryDog Run  0.0769231  0.076923


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                               venue       freq     freq]
0               Venue CategoryBakery  0.0769231  0.076923
1       Venue Cate

<H5>
    Most common venues in dataframe
</H5>

In [207]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_main_venues_sorted = pd.DataFrame(columns=columns)
toronto_main_venues_sorted['Neighbourhood'] = toronto_main_grouped['Neighbourhood']

for ind in np.arange(downtown_grouped.shape[0]):
    toronto_main_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_main_grouped.iloc[ind, :], num_top_venues)

toronto_main_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
1,"Brockton, Parkdale Village, Exhibition Place",Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
2,"Business reply mail Processing Centre, South C...",Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
3,"CN Tower, King and Spadina, Railway Lands, Har...",Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
4,Central Bay Street,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
5,Christie,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
6,Church and Wellesley,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
7,"Commerce Court, Victoria Hotel",Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
8,Davisville,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
9,Davisville North,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center


In [201]:
toronto_main_venues_sorted.shape

(39, 11)

<H4>
    K-mean clustering
</H4>

In [228]:
toronto_main_clustering = toronto_main_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(toronto_main_clustering)

kmeans.labels_[0:10]

  kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(toronto_main_clustering)


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [203]:
toronto_main_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_main_merged = toronto_merged
toronto_main_merged = toronto_main_merged.join(toronto_main_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_main_merged.head()

Unnamed: 0,level_0,index,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,0,M5A,Toronto Main,"Regent Park, Harbourfront",43.65426,-79.360636,0,,,,,,,,,,
1,1,1,M7A,Toronto Main,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,,,,,,,,,,
2,2,2,M5B,Toronto Main,"Garden District, Ryerson",43.657162,-79.378937,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
3,3,3,M5C,Toronto Main,St. James Town,43.651494,-79.375418,0,,,,,,,,,,
4,4,4,M5E,Toronto Main,Berczy Park,43.644771,-79.373306,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center


<H5>
    We then remove all rows with NaN
</H5>

In [227]:
toronto_main_merged = toronto_main_merged[toronto_main_merged['1st Most Common Venue'].notna()]
#toronto_main_merged =toronto_main_merged.drop(['level_0','index'],axis=1).reset_index()
toronto_main_merged.head()

Unnamed: 0,index,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,M5B,Toronto Main,"Garden District, Ryerson",43.657162,-79.378937,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
1,4,M5E,Toronto Main,Berczy Park,43.644771,-79.373306,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
2,5,M5G,Toronto Main,Central Bay Street,43.657952,-79.387383,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
3,6,M6G,Toronto Main,Christie,43.669542,-79.422564,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
4,8,M5J,Toronto Main,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center


In [204]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_main_merged['Latitude'], toronto_main_merged['Longitude'], 
                                  toronto_main_merged['Neighbourhood'], toronto_main_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<H5>
    Cluster examination
</H5>
<H6>
    From the above raised warning 'ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (3). Possibly due to duplicate points in X.' It can be seen prior to running the cluster analysis that there is not enough diversity within the features to justify clustering beyond 1 distinct feature. This is further supported when looking at the first cluster analysis down below in 'Cluster 1'. As such, from this analysis, only a single cluster is produced, as seen in 'Cluster 2' which displays nothing. It can be seen that within 'Toronto Main' the most common venues are constant among the cluster.
</H6>

<H5>
    Cluster 1
</H5>

In [248]:
toronto_main_merged.loc[toronto_main_merged['Cluster Labels'] == 0, 
                        toronto_main_merged.columns[[1] + list(range(5, toronto_main_merged.shape[1]))]]

Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5B,-79.378937,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
1,M5E,-79.373306,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
2,M5G,-79.387383,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
3,M6G,-79.422564,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
4,M5J,-79.381752,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
5,M5L,-79.379817,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
6,M5T,-79.400049,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
7,M5V,-79.39442,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
8,M5X,-79.38228,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center
9,M4Y,-79.38316,0,Venue CategorySpa,Venue CategorySandwich Place,Venue CategoryPark,Venue CategoryMediterranean Restaurant,Venue CategoryHistory Museum,Venue CategoryGym / Fitness Center,Venue CategoryFood Truck,Venue CategoryFlower Shop,Venue CategoryDog Run,Venue CategoryDistribution Center


In [247]:
cluster1_toronto = toronto_main_merged.loc[toronto_main_merged['Cluster Labels'] == 0, 
                        toronto_main_merged.columns[[1] + list(range(5, toronto_main_merged.shape[1]))]]
cluster1_toronto['1st Most Common Venue'].value_counts()

Venue CategorySpa    19
Name: 1st Most Common Venue, dtype: int64

<H5>
    Cluster 2
</H5>

In [250]:
toronto_main_merged.loc[toronto_main_merged['Cluster Labels'] == 1, 
                        toronto_main_merged.columns[[1] + list(range(5, toronto_main_merged.shape[1]))]]


Unnamed: 0,Postal Code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


<H2>
    Discussion
    <H5>
    From the 2 cluster maps generated, it can be seen that only 1 cluster is similar between Toronto and Manhattan, which is cluster 0 (colored red on the map). However, it should be noted that Toronto only displays 1 cluster which indicates that its venue choice is not as diversified as well as Manhattan which produced 3 clusters. As both cities were analysed using the same methods and parameters (i.e. k = 3 for kmeans clustering and almost same amount of neighbourhoods for both cities) then what is expected would have been a fair comparison. The results suggest that Manhattan presents higher variety in terms of venues when compared to Toronto which is mostly the same venues that are rated highly consistently.
    
Analysing the results of Manhattan in cluster 0, the single data point shows the result of park which does not provide a lot of information. In cluster 1 we can see that the most common venue are italian restaurants and coffee shops coming in second. Similarly we see that in cluster 2, the most common venue is parks which aligns with cluster 0, possibly indicating that the clusters could have been increased to classify parks into their own clusters.
    
For toronto, the venue of Spa is ranked number 1 across all neighbourhoods with the rest which can be seen in the table above.
    
Comparing the cluster tables between Manhattan and Toronto it can clearly be seen that Manhattan is more food centric whilst Toronto is more exploration and tourism based as their History museum is ranked 4th most common across all neighbourhoods. This suggests that travellers more focused on food would be more inclined to choose Manhattan as their destination of choice, whilst those aiming for exploration and learning about the culture, tradition and history of the city they are in, would choose Toronto.
    </H5>
</H2>

    

<H2>
    Conclusion
    <H5>
        Using KMeans clustering on a subset of data representing Manhattan and Toronto, this project provides a finding that suggests that travellers with an aim to eat food would be inclined to travel to Manhattan as presented by the cluster tables for Manhattan presenting high re-occurence of food locations. Alternatively, travellers interested in exploration, relaxation and learning about the history of a city would be more inclined to choosing Toronto as presented by the cluster it presented with Spa ranking 1st and History Mueseum ranking fourth.
    </H5>
</H2>