# DATA DESCRIPTION

### For this situation, I will get the services of Foursquare API to explore the data, in terms of their neighborhoods. The data also include the information about the places around each neighborhood like restaurants, hotels, coffee shops, parks, theaters, art galleries, museums and many more. I selected one Borough to analyze their neighborhoods. I will use machine learning technique, “Clustering” to segment the neighborhoods with similar objects on the basis of each neighborhood data.

# EXPLORATION

### I have extracted table of Toronto’s Borough from Wikipedia page. According to our requirements, in the arrangement phase, which applied multiple steps including but not limited to, eliminating “Not assigned” values, combine neighborhoods which have same geographical coordinates at each borough and sorted against the concerned borough. For data verification and further exploration, we use Foursquare API to get the coordinates and explore its neighborhoods. The neighborhoods are further characterized as venues and venue categories.



## PREPROCESSING

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Visualization
import matplotlib.pyplot
import seaborn as sns
# Too see full dataframe...
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

!pip install lxml

print('Libraries imported.')
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
# First We have to locate the file path and changed accordingly
#import os
#os.getcwd()
#print(os.listdir())

# Any results you write to the current directory are saved as output.

In [None]:
neighborhood

In [None]:
# Link To Extract
path='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
# Read File
df_wiki=pd.read_html(path)
#Check the type
type(df_wiki)
# Call the position where the table is stored
neighborhood=df_wiki[0]
# Rename the Columns
neighborhood.rename(columns={0:'Postcode', 1: 'Borough', 2: 'Neighborhood'}, inplace=True)
# Eliminate the first row
neighborhood=neighborhood.drop([0])
# Eliminate "Not assigned", categorical values from "Borough" Column
neighborhood=neighborhood[neighborhood.Borough !='Not assigned']
# Making DataFrame
neighborhood=pd.DataFrame(neighborhood)
# Merging rows with same Postcode
neighborhood.set_index(['Postal Code','Borough'],inplace=True)
merge_result = neighborhood.groupby(level=['Postal Code','Borough'], sort=False).agg( ','.join)
# Setting the index
serial_wise=merge_result.reset_index()
# Assign the 'Borough' column value to 'Neighborhood' where 'Not assigned' occurs
serial_wise.loc[4, 'Neighborhood']='Queen\'s Park'
# Saving the file for future use!
serial_wise.to_csv('wikipedia_table.csv')
# Showing the Data Frame
df=pd.DataFrame(serial_wise)
df.head()

In [None]:
# Geographical Coordinates
df1=pd.read_csv("Geospatial_Coordinates.csv")
# Change the Postal Code to Postcode
df1.rename(columns={'Postal Code':'Postcode'},inplace=True)
#Cancatenation
frames=[df,df1]
frames=pd.concat(frames, axis=1, sort=False)
# Merging the two columns on 'Postcode'
merge_columns=pd.merge(df, df1, left_on='Postcode', right_on='Postcode')
# Save the Data Frame
merge_columns.to_csv('neigbors_geographical.csv')
merge_columns.head()


In [None]:
# Sorting
# set index for only Downtown Toronto
downtown_toronto_data = merge_columns[merge_columns['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
# eliminate 'Postal Code' column
downtown_toronto_data=downtown_toronto_data.drop(['Postcode'], axis=1)
downtown_toronto_data.head()

In [47]:
#Cargamos los datos de New York 
downtown_toronto_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3
1,Downtown Toronto,Queen's Park,43.662301,-79.389494,0
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3
3,Downtown Toronto,St. James Town,43.651494,-79.375418,4
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,3


In [48]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
newyork_data

In [50]:
neighborhoods_data = newyork_data['features']

In [51]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [52]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [53]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [55]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [56]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [57]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [58]:
# Save the File for futures uses. 

neighborhoods.to_csv('neighborhoods_NY.csv')

In [59]:
# Creating new Dataframe manhattan_data
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## Foursquare API 

In [60]:
# Define Foursquare Credentials and Version

CLIENT_ID = '5EKR0JI1G1J02YPXX33J0OEGQY4YMDWVN1C2245XKPAKX5XO' # your Foursquare ID
CLIENT_SECRET = 'C3RRTSVHC0UKWJQYWOVDUQYE5LEHJMJYDEWCXOHTUBROATXC' # your Foursquare Secret
VERSION = '20180605'
limit = 20

print('Your credentails:')
print('CLIENT_ID:'+ CLIENT_ID)
print('CLIENT_SECRET:'+ CLIENT_SECRET)

Your credentails:
CLIENT_ID:5EKR0JI1G1J02YPXX33J0OEGQY4YMDWVN1C2245XKPAKX5XO
CLIENT_SECRET:C3RRTSVHC0UKWJQYWOVDUQYE5LEHJMJYDEWCXOHTUBROATXC


In [61]:
# get the geographical coordinates of Downtown Toronto
address = 'Downtown Toronto, ON, Canada'

# geopy.geocoders.options.default_user_agent = "my-application"
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude_downtown_toronto = location.latitude
longitude_downtown_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_downtown_toronto, "& " "longitude" ,longitude_downtown_toronto)

Downtown Toronto latitude 43.6563221 & longitude -79.3809161


In [62]:
# Let's get the geographical coordinates of Manhattan.
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


# VISUALIZATION 

### We visualize the data many times at different stages. In the beginning, we visualize the selected borough neighborhoods so that we can get an idea or confirmation regarding the coordinates of that Borough. The second time after clustered the neighborhoods, we visualize the clusters to name them. Assigning the names are very important because it can identify the areas or specific places in each cluster.

## (Before Clustering)

## Downtown Toronto

In [63]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto

In [64]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=11)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(downtown_toronto_data['Latitude'], downtown_toronto_data['Longitude'], downtown_toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto

## Manhattan

In [65]:
# let's visualizat Manhattan the neighborhoods in it.
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [66]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_manhattan)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_manhattan

# ANALYSIS

### We analyze both boroughs neighborhoods through one hot encoding (giving ‘1’ if a venue category is there, and ‘0’ in case of venue category is not there). On the basis of one hot encoding, we calculate mean of the frequency of occurrence of each category and picked top ten venues on that basis for each neighborhood. It means the top venues are showing the foot traffic or the more visited places.

## Exploring Neighborhoods in Downtown Toronto

In [67]:
# Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [68]:
# Write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues.
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_data['Neighborhood'],
                                   latitudes=downtown_toronto_data['Latitude'],
                                   longitudes=downtown_toronto_data['Longitude'],
                                  )

Regent Park, Harbourfront
Queen's Park
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [69]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(355, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [70]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,20,20,20,20,20,20
Christie,16,16,16,16,16,16
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"First Canadian Place, Underground city",20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
"Harbourfront East, Union Station, Toronto Islands",20,20,20,20,20,20
"Kensington Market, Chinatown, Grange Park",20,20,20,20,20,20


In [71]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 124 uniques categories.


## Analyzing Each Neighborhood

In [72]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bakery,Bank,Bar,Basketball Stadium,Beer Bar,Belgian Restaurant,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Distribution Center,Electronics Store,Ethiopian Restaurant,Farmers Market,Fish Market,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,General Travel,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Liquor Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Museum,Music Venue,Neighborhood,Nightclub,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Poke Place,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [73]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [74]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in downtown_toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_toronto_grouped[downtown_toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0      Farmers Market  0.10
1  Seafood Restaurant  0.10
2          Restaurant  0.05
3        Liquor Store  0.05
4      Breakfast Spot  0.05


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                venue  freq
0     Airport Service  0.20
1      Airport Lounge  0.13
2            Boutique  0.07
3  Airport Food Court  0.07
4        Airport Gate  0.07


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.20
1  Italian Restaurant  0.10
2                 Spa  0.05
3     Bubble Tea Shop  0.05
4         Pizza Place  0.05


----Christie----
           venue  freq
0  Grocery Store  0.25
1           Café  0.19
2           Park  0.12
3     Baby Store  0.06
4    Candy Store  0.06


----Church and Wellesley----
              venue  freq
0  Ramen Restaurant  0.05
1          Beer Bar  0.05
2    Breakfast Spot  0.05
3   Bubble Tea Shop  0.05
4      Burge

In [75]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Farmers Market,Seafood Restaurant,Jazz Club,Bakery,Museum,Cocktail Bar,Coffee Shop,Breakfast Spot,Liquor Store,Restaurant
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Plane,Sculpture Garden,Boutique,Rental Car Location,Boat or Ferry,Airport Terminal,Airport Gate
2,Central Bay Street,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Seafood Restaurant,Spa,Poke Place,Sandwich Place,Sushi Restaurant,Art Museum,Middle Eastern Restaurant
3,Christie,Grocery Store,Café,Park,Baby Store,Italian Restaurant,Diner,Coffee Shop,Nightclub,Restaurant,Candy Store
4,Church and Wellesley,General Entertainment,Dance Studio,Park,Coffee Shop,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub,Ramen Restaurant,Mexican Restaurant
5,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Art Gallery,Museum,Gastropub,Pub,Restaurant,Beer Bar,Bakery,Ice Cream Shop
6,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,General Travel,Bakery,Seafood Restaurant,Pizza Place,Gym,Gym / Fitness Center,Gastropub
7,"Garden District, Ryerson",Café,Pizza Place,Sporting Goods Shop,Burrito Place,Burger Joint,Plaza,Clothing Store,Ramen Restaurant,Coffee Shop,College Rec Center
8,"Harbourfront East, Union Station, Toronto Islands",Park,Café,Hotel,Plaza,Japanese Restaurant,Lake,Skating Rink,Ice Cream Shop,Sporting Goods Shop,Deli / Bodega
9,"Kensington Market, Chinatown, Grange Park",Café,Wine Bar,Bakery,Vietnamese Restaurant,Fish Market,Farmers Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop


## Clustering Neighborhoods

In [77]:
# set number of clusters
kclusters = 5

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 3, 4, 3, 1, 1, 3, 3, 1], dtype=int32)

In [78]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
downtown_toronto_merged = downtown_toronto_data

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3,Coffee Shop,Park,Bakery,Breakfast Spot,Distribution Center,Restaurant,Performing Arts Venue,Spa,Dessert Shop,Pub
1,Downtown Toronto,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Diner,Yoga Studio,Arts & Crafts Store,Burrito Place,Mexican Restaurant,Creperie,Beer Bar,Bank,Smoothie Shop
2,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3,Café,Pizza Place,Sporting Goods Shop,Burrito Place,Burger Joint,Plaza,Clothing Store,Ramen Restaurant,Coffee Shop,College Rec Center
3,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Gastropub,Coffee Shop,Restaurant,Café,Hotel,Poke Place,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Italian Restaurant
4,Downtown Toronto,Berczy Park,43.644771,-79.373306,3,Farmers Market,Seafood Restaurant,Jazz Club,Bakery,Museum,Cocktail Bar,Coffee Shop,Breakfast Spot,Liquor Store,Restaurant


In [79]:
# create map
map_clusters = folium.Map(location=[latitude_downtown_toronto, longitude_downtown_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_toronto_merged['Latitude'], downtown_toronto_merged['Longitude'], downtown_toronto_merged['Neighborhood'], downtown_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

## Cluster 1 (Airport Lounge, Coffee Shop, Cafe, Restaurants & Grocery Store)

In [80]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Queen's Park,Coffee Shop,Diner,Yoga Studio,Arts & Crafts Store,Burrito Place,Mexican Restaurant,Creperie,Beer Bar,Bank,Smoothie Shop


## Cluster 2 (Gastropubs)

In [81]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Central Bay Street,Coffee Shop,Italian Restaurant,Bubble Tea Shop,Seafood Restaurant,Spa,Poke Place,Sandwich Place,Sushi Restaurant,Art Museum,Middle Eastern Restaurant
6,Christie,Grocery Store,Café,Park,Baby Store,Italian Restaurant,Diner,Coffee Shop,Nightclub,Restaurant,Candy Store
9,"Toronto Dominion Centre, Design Exchange",Coffee Shop,Café,Art Gallery,Pizza Place,Pub,Restaurant,Deli / Bodega,Beer Bar,Bakery,Hotel
12,"Kensington Market, Chinatown, Grange Park",Café,Wine Bar,Bakery,Vietnamese Restaurant,Fish Market,Farmers Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop
14,Rosedale,Park,Playground,Trail,Dance Studio,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym
17,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,General Travel,Bakery,Seafood Restaurant,Pizza Place,Gym,Gym / Fitness Center,Gastropub


## Cluster 3 (Cafes)

In [82]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Plane,Sculpture Garden,Boutique,Rental Car Location,Boat or Ferry,Airport Terminal,Airport Gate


## Cluster 4 (Coffee Shop, Cafe, Park & Japanese Restaurant)

In [83]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 3, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",Coffee Shop,Park,Bakery,Breakfast Spot,Distribution Center,Restaurant,Performing Arts Venue,Spa,Dessert Shop,Pub
2,"Garden District, Ryerson",Café,Pizza Place,Sporting Goods Shop,Burrito Place,Burger Joint,Plaza,Clothing Store,Ramen Restaurant,Coffee Shop,College Rec Center
4,Berczy Park,Farmers Market,Seafood Restaurant,Jazz Club,Bakery,Museum,Cocktail Bar,Coffee Shop,Breakfast Spot,Liquor Store,Restaurant
7,"Richmond, Adelaide, King",Coffee Shop,Hotel,Pizza Place,Steakhouse,Monument / Landmark,Plaza,Restaurant,Concert Hall,Seafood Restaurant,Smoke Shop
8,"Harbourfront East, Union Station, Toronto Islands",Park,Café,Hotel,Plaza,Japanese Restaurant,Lake,Skating Rink,Ice Cream Shop,Sporting Goods Shop,Deli / Bodega
10,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Art Gallery,Museum,Gastropub,Pub,Restaurant,Beer Bar,Bakery,Ice Cream Shop
11,"University of Toronto, Harbord",Restaurant,Bakery,Japanese Restaurant,Bookstore,Yoga Studio,Sushi Restaurant,Beer Bar,Bar,Dessert Shop,Italian Restaurant
15,Stn A PO Boxes,Café,Cocktail Bar,Farmers Market,Park,Concert Hall,Seafood Restaurant,Jazz Club,Hotel,Restaurant,Museum
16,"St. James Town, Cabbagetown",Restaurant,Café,Butcher,Jewelry Store,Pub,Japanese Restaurant,Italian Restaurant,Bakery,Indian Restaurant,Deli / Bodega
18,Church and Wellesley,General Entertainment,Dance Studio,Park,Coffee Shop,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub,Ramen Restaurant,Mexican Restaurant


## Cluster 5 (Seafood, steakhouse, Hotel & Cafe)

In [86]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 4, downtown_toronto_merged.columns[[1] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,Gastropub,Coffee Shop,Restaurant,Café,Hotel,Poke Place,Middle Eastern Restaurant,Cosmetics Shop,Creperie,Italian Restaurant


## Exploring Neighborhoods in Manhattan

In [87]:
# Let's create a function to repeat the same process to all the neighborhoods in Manhattan
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
#        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
#        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [88]:
# Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [89]:
# Let's check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [90]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 206 uniques categories.


## Analyzing the Neighborhoods

In [91]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Duty-free Shop,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health Food Store,Heliport,Historic Site,History Museum,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Noodle House,Opera House,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Restaurant,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [92]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [93]:
# Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                   venue  freq
0                   Park  0.15
1          Memorial Site  0.15
2             Food Court  0.10
3  Performing Arts Venue  0.05
4         Sandwich Place  0.05


----Carnegie Hill----
                  venue  freq
0           Coffee Shop  0.10
1  Gym / Fitness Center  0.10
2                   Gym  0.10
3    Italian Restaurant  0.10
4                   Spa  0.05


----Central Harlem----
                 venue  freq
0  American Restaurant  0.10
1    French Restaurant  0.10
2                  Bar  0.10
3            Juice Bar  0.05
4             Beer Bar  0.05


----Chelsea----
         venue  freq
0      Theater  0.05
1     Beer Bar  0.05
2    Speakeasy  0.05
3  Coffee Shop  0.05
4         Café  0.05


----Chinatown----
                venue  freq
0  Chinese Restaurant  0.15
1                 Spa  0.10
2      Sandwich Place  0.10
3              Bakery  0.05
4    Greek Restaurant  0.05


----Civic Center----
                             v

In [94]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [95]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Memorial Site,Food Court,Burrito Place,Food Truck,Smoke Shop,Shopping Mall,Sandwich Place,Gym,Cupcake Shop
1,Carnegie Hill,Coffee Shop,Italian Restaurant,Gym,Gym / Fitness Center,Spa,Bagel Shop,Pizza Place,Community Center,Shoe Store,Café
2,Central Harlem,American Restaurant,French Restaurant,Bar,Dessert Shop,Gym / Fitness Center,Library,Ethiopian Restaurant,Music Venue,Beer Bar,Juice Bar
3,Chelsea,Hotel,Bar,French Restaurant,Seafood Restaurant,Scenic Lookout,Fish Market,Speakeasy,Market,Coffee Shop,New American Restaurant
4,Chinatown,Chinese Restaurant,Spa,Sandwich Place,Greek Restaurant,Pizza Place,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar,Salon / Barbershop
5,Civic Center,Spa,Yoga Studio,Gym,Cuban Restaurant,Park,Falafel Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Burrito Place,French Restaurant
6,Clinton,Gym / Fitness Center,Theater,Indie Theater,Peruvian Restaurant,Comedy Club,Café,Building,Pizza Place,Sporting Goods Shop,Sports Bar
7,East Harlem,Mexican Restaurant,Thai Restaurant,French Restaurant,Pharmacy,Pet Store,Cuban Restaurant,Park,Doctor's Office,Sandwich Place,New American Restaurant
8,East Village,Vietnamese Restaurant,Dessert Shop,Bagel Shop,Park,Speakeasy,Korean Restaurant,Scandinavian Restaurant,Beer Store,Dog Run,Bar
9,Financial District,Coffee Shop,Gym / Fitness Center,Pizza Place,Restaurant,Café,French Restaurant,Shoe Store,Falafel Restaurant,Event Space,New American Restaurant


## CLUSTERING NEIGHBORHOODS

### Now we applied Machine Learning Technique “Clustering” to segment the neighborhoods in similar objects cluster. This will help to analyze from Tourist perspective and we can easily extract the Tourist places which are present on one of the clusters.

## Manhattan

In [96]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 2, 2, 4, 4, 3, 4, 1, 0], dtype=int32)

In [97]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Coffee Shop,Gym,Yoga Studio,Tennis Stadium,Pizza Place,Department Store,Diner,Discount Store,Donut Shop,Sandwich Place
1,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Spa,Sandwich Place,Greek Restaurant,Pizza Place,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar,Salon / Barbershop
2,Manhattan,Washington Heights,40.851903,-73.9369,2,Wine Shop,Café,Park,Deli / Bodega,Bakery,Market,Coffee Shop,New American Restaurant,Frozen Yogurt Shop,Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,2,Bakery,Park,Yoga Studio,Deli / Bodega,Diner,Restaurant,Farmers Market,Café,Mexican Restaurant,Pharmacy
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Yoga Studio,Caribbean Restaurant,Mexican Restaurant,Cocktail Bar,Historic Site,Mediterranean Restaurant,Coffee Shop,Café,Bar,Bakery


In [98]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## EXAMINE CLUSTERS

### Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

## Manhattan

### Residential

In [99]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Spa,Sandwich Place,Greek Restaurant,Pizza Place,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar,Salon / Barbershop
9,Yorkville,Wine Shop,Coffee Shop,Italian Restaurant,Deli / Bodega,Bagel Shop,Sushi Restaurant,Beer Store,Park,Dog Run,Gym
11,Roosevelt Island,Deli / Bodega,Residential Building (Apartment / Condo),Soccer Field,Food & Drink Shop,Farmers Market,School,Sandwich Place,Liquor Store,Coffee Shop,Bus Line
18,Greenwich Village,Italian Restaurant,French Restaurant,Café,Sushi Restaurant,Yoga Studio,Gourmet Shop,Coffee Shop,Snack Place,Beer Bar,Sandwich Place
22,Little Italy,Ice Cream Shop,Wine Bar,Sandwich Place,Coffee Shop,Thai Restaurant,Snack Place,Chinese Restaurant,Spanish Restaurant,French Restaurant,Salon / Barbershop


### Commercial Places

In [100]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Gym,Yoga Studio,Tennis Stadium,Pizza Place,Department Store,Diner,Discount Store,Donut Shop,Sandwich Place
8,Upper East Side,Hotel,Gym / Fitness Center,Hotel Bar,Park,Optical Shop,Coffee Shop,Sandwich Place,Chocolate Shop,Shoe Store,French Restaurant
10,Lenox Hill,Gym,Thai Restaurant,Middle Eastern Restaurant,Taco Place,Cycle Studio,Pizza Place,Dessert Shop,Restaurant,College Academic Building,Salad Place
15,Midtown,Hotel,Cuban Restaurant,Bookstore,Sporting Goods Shop,Spa,Smoke Shop,Salad Place,Clothing Store,Park,Szechuan Restaurant
25,Manhattan Valley,Bar,Yoga Studio,Bakery,Pizza Place,Park,Coffee Shop,Chinese Restaurant,Fried Chicken Joint,Mexican Restaurant,Grocery Store
26,Morningside Heights,Bookstore,Park,American Restaurant,Coffee Shop,Ice Cream Shop,Pub,Outdoor Sculpture,Salad Place,Sandwich Place,Seafood Restaurant
28,Battery Park City,Park,Memorial Site,Food Court,Burrito Place,Food Truck,Smoke Shop,Shopping Mall,Sandwich Place,Gym,Cupcake Shop
30,Carnegie Hill,Coffee Shop,Italian Restaurant,Gym,Gym / Fitness Center,Spa,Bagel Shop,Pizza Place,Community Center,Shoe Store,Café
32,Civic Center,Spa,Yoga Studio,Gym,Cuban Restaurant,Park,Falafel Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Burrito Place,French Restaurant
33,Midtown South,Korean Restaurant,Hotel,Building,Fried Chicken Joint,Scenic Lookout,Boutique,Gift Shop,Clothing Store,Grocery Store,Coffee Shop


### Tourist Areas & Hubs

In [101]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Wine Shop,Café,Park,Deli / Bodega,Bakery,Market,Coffee Shop,New American Restaurant,Frozen Yogurt Shop,Restaurant
3,Inwood,Bakery,Park,Yoga Studio,Deli / Bodega,Diner,Restaurant,Farmers Market,Café,Mexican Restaurant,Pharmacy
12,Upper West Side,Italian Restaurant,American Restaurant,Nail Salon,Movie Theater,Greek Restaurant,Pub,Bar,Bakery,Bagel Shop,Bookstore
13,Lincoln Square,Theater,Concert Hall,Performing Arts Venue,Indie Movie Theater,Gym / Fitness Center,Plaza,Circus,Opera House,College Arts Building,Library
19,East Village,Vietnamese Restaurant,Dessert Shop,Bagel Shop,Park,Speakeasy,Korean Restaurant,Scandinavian Restaurant,Beer Store,Dog Run,Bar
20,Lower East Side,Coffee Shop,Cocktail Bar,Art Gallery,Yoga Studio,Juice Bar,Performing Arts Venue,Filipino Restaurant,Clothing Store,Chinese Restaurant,Café
21,Tribeca,American Restaurant,Park,Yoga Studio,Sushi Restaurant,Café,Cycle Studio,Dog Run,Greek Restaurant,Hotel,Italian Restaurant
27,Gramercy,Pizza Place,Coffee Shop,Yoga Studio,Beer Bar,Irish Pub,Liquor Store,Gourmet Shop,Mexican Restaurant,Filipino Restaurant,Playground
34,Sutton Place,Yoga Studio,Beer Garden,Grocery Store,Gym,Steakhouse,Bakery,Beer Store,Deli / Bodega,Design Studio,French Restaurant
36,Tudor City,Park,Yoga Studio,Taco Place,Japanese Restaurant,Hawaiian Restaurant,Gym,Deli / Bodega,Pizza Place,Salad Place,Seafood Restaurant


### Center Acivity

In [102]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Central Harlem,American Restaurant,French Restaurant,Bar,Dessert Shop,Gym / Fitness Center,Library,Ethiopian Restaurant,Music Venue,Beer Bar,Juice Bar
14,Clinton,Gym / Fitness Center,Theater,Indie Theater,Peruvian Restaurant,Comedy Club,Café,Building,Pizza Place,Sporting Goods Shop,Sports Bar
17,Chelsea,Hotel,Bar,French Restaurant,Seafood Restaurant,Scenic Lookout,Fish Market,Speakeasy,Market,Coffee Shop,New American Restaurant


### Cultural & Going Out Places

In [103]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Hamilton Heights,Yoga Studio,Caribbean Restaurant,Mexican Restaurant,Cocktail Bar,Historic Site,Mediterranean Restaurant,Coffee Shop,Café,Bar,Bakery
5,Manhattanville,Italian Restaurant,Bar,Climbing Gym,Supermarket,Coffee Shop,Bike Trail,Gastropub,Ramen Restaurant,Juice Bar,Dumpling Restaurant
7,East Harlem,Mexican Restaurant,Thai Restaurant,French Restaurant,Pharmacy,Pet Store,Cuban Restaurant,Park,Doctor's Office,Sandwich Place,New American Restaurant
16,Murray Hill,Burger Joint,Coffee Shop,Japanese Restaurant,Gym,Shanghai Restaurant,Café,Mediterranean Restaurant,Speakeasy,Museum,Sandwich Place
23,Soho,Women's Store,Salon / Barbershop,Men's Store,Yoga Studio,Supermarket,Dessert Shop,Cupcake Shop,Clothing Store,Miscellaneous Shop,Mediterranean Restaurant
24,West Village,Cocktail Bar,Italian Restaurant,Coffee Shop,Gourmet Shop,French Restaurant,Boutique,Mediterranean Restaurant,Park,Cosmetics Shop,Bakery
29,Financial District,Coffee Shop,Gym / Fitness Center,Pizza Place,Restaurant,Café,French Restaurant,Shoe Store,Falafel Restaurant,Event Space,New American Restaurant
31,Noho,Rock Club,Wine Shop,French Restaurant,Boutique,Coffee Shop,Deli / Bodega,Southern / Soul Food Restaurant,Bookstore,Sandwich Place,Gourmet Shop
35,Turtle Bay,Karaoke Bar,Residential Building (Apartment / Condo),Seafood Restaurant,Museum,Farmers Market,Boxing Gym,Cocktail Bar,Gift Shop,Coffee Shop,Greek Restaurant


# RESULTS

### After clustering the data of the respective neighborhoods, both cities (Boroughs) have venues which can be explored and attract the Tourists. The neighborhoods are much similar in features like Theaters, opera houses, food places, clubs, museums, parks etc. As far as concern to dissimilarity, it differs in terms of some unique places like historical places and monuments.

# Observations & Recommendations

### When we compare the tourist places, we observe that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available in Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Manhattan.

### As far as concern to recommendations, we recommend Downtown Toronto Neighborhoods will be considered first to visit. The tourists have an easily travelling access due to Airport facility, which not only saves time but also helps to save money. This saved money can be utilized to explore more, the attracting venues.

# Conclusion

### The downtown Toronto and Manhattan neighborhoods have more like similar venues. As we know that every place is unique in its own way, so that’s argument is present in both neighborhoods. The dissimilarity exists in terms of some different venues and facilities but not on a larger extent.