## Coursera Capstone Project - The Battle of the Neighborhoods (Week 2)
## Fifth Assignment

### Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction: Business problem

How is it known Toronto, the capital of the province of Ontario, is a major Canadian city along Lake Ontario’s northwestern shore. It's a dynamic metropolis with a core of skyscrapers, all dwarfed by the iconic, free-standing CN Tower. 

On the other hand New York City comprises 5 boroughs sitting where the Hudson River meets the Atlantic Ocean. At its core is Manhattan, a densely populated borough that’s among the world’s major commercial, financial and cultural centers. Its iconic sites include skyscrapers such as the Empire State Building and sprawling Central Park.

Let's say that an Italian firm located in Texas City, United States, decides to move its headquarters to New York City or Toronto, Canada. They don't know which city is the best for them. The firm wants to know local businesses and neighborhoods to locate the company. The project will analyze the neighborhoods between New York City and Toronto, understand the differences and similarities, group the neighborhoods, visualize these groups on a map, and provide the best decision.

Also for this project the target audience are the investors interested in moving their headquarters to the best city and may need an objective advice to choose the location for the company and its employees.


### Data

The dataset used for this project and analyse the information are:

a. The websites that collect the information about Toronto and New York borough and their locations. The pages are: https://geo.nyu.edu/catalog/nyu_2451_34572 and https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

b. The Foursquare API, that collect venues and their categories for each location within a radius 700 meters.

c. The Geopy and Folium libraries to get the coordinates of every location of Toronto and New York City. The page is https://cocl.us/Geospatial_data.

d. Cluster venues of each neighborhood using k-means algorithm and analyze the top 10 most common venue in each cluste.

e. Visualize clusters on the map, thus showing the best locations.

f. The pandas library used for data manipulation and analysis. 

g. The Numpy library used to work with arrays.

h. The Requests used to send HTTP/1.1 requests.

i. Matplotlib library used to create static, animated, and interactive visualizations in Python.

j. The  json used to transfer data as text that can be sent over a network.

k. The Urllib used to fetch URLs (Uniform Resource Locators).

l. Bs4 library used to pulling data out of HTML and XML files.


### Methodology

In [1]:
!pip install BeautifulSoup4
!pip install geopy
import pandas as pd
import numpy as np
import requests
import folium 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json

from urllib.request import urlopen
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

print('Libraries imported.')

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 4.4MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/02/fb/1c65691a9aeb7bd6ac2aa505b84cb8b49ac29c976411c6ab3659425e045f/soupsieve-2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.1
Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 5.3MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e9

##### Explore Toronto, Canada dataset

In [2]:
source = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urlopen(source).read().decode('utf-8')
soup = BeautifulSoup(page, 'html.parser')

In [3]:
table = soup.table.tbody

def table_cell(i):
    cells = i.find_all('td')
    row = []
    
    for cell in cells:
        if cell.a:            
            if (cell.a.text):
                row.append(cell.a.text)
                continue
        row.append(cell.string.strip())
        
    return row

def table_row():    
    data = []  
    
    for tr in table.find_all('tr'):
        row = table_cell(tr)
        if len(row) != 3:
            continue
        data.append(row)        
    
    return data

In [4]:
data = table_row()
columns = ['PostalCode', 'Borough', 'Neighborhood']
df_Toronto = pd.DataFrame(data, columns = columns)
df_Toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
df_Tor = df_Toronto[df_Toronto.Borough != 'Not assigned']
df_Tor = df_Tor.sort_values(by=['PostalCode','Borough'])
df_Tor.reset_index(inplace = True)
df_Tor.drop('index',axis = 1, inplace = True)
df_Tor.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [6]:
df_Tor.shape

(103, 3)

##### Get the latitude and longitude coordinates of each Postal Code

In [7]:
latit_longi = pd.read_csv('https://cocl.us/Geospatial_data')

In [8]:
latit_longi.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


##### Latitude and longitude coordinates with Neighborhood dataframe

In [9]:
df_Tor = df_Tor.join(latit_longi.set_index('Postal Code'), on = 'PostalCode')
df_Tor.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [10]:
print('Toronto has {} boroughs and {} neighborhoods.'.format(len(df_Tor['Borough'].unique()),df_Tor.shape[0]))

Toronto has 10 boroughs and 103 neighborhoods.


##### Get the latitude and longitude values of Toronto

In [11]:
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent = 'Toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Canada are 43.6534817, -79.3839347.


##### Create a map of Toronto with neighborhoods superimposed on top.

In [12]:
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df_Tor['Latitude'], df_Tor['Longitude'], df_Tor['Borough'], df_Tor['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto

##### Work with only boroughs that contain the word Downtown Toronto

In [13]:
Downtown_Toronto_data = df_Tor[df_Tor['Borough'].str.contains('Downtown Toronto')].reset_index(drop=True)
Downtown_Toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [14]:
Downtown_Toronto_data.shape

(19, 5)

##### Create a map of Downtown of Toronto

In [15]:
map_Downtown_Toronto_data = folium.Map(location=[latitude, longitude], zoom_start = 12)
for lat, lng, borough, neighborhood in zip(
       Downtown_Toronto_data['Latitude'], 
       Downtown_Toronto_data['Longitude'], 
       Downtown_Toronto_data['Borough'], 
       Downtown_Toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Downtown_Toronto_data)  

map_Downtown_Toronto_data

##### Explore Toronto neighborhoods with Foursquare API
##### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0' 
CLIENT_SECRET = 'EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS' 
VERSION = '20180605' 
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0
CLIENT_SECRET:EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS


##### Get the first neighborhood's name in our dataframe.

In [17]:
name_neighbor = Downtown_Toronto_data.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{name_neighbor}'.")

The first neighborhood's name is 'Rosedale'.


##### Get the neighborhood's latitude and longitude values of Rosedale

In [18]:
neighborhood_latitude = Downtown_Toronto_data.loc[0, 'Latitude'] 
neighborhood_longitude = Downtown_Toronto_data.loc[0, 'Longitude'] 

print('Latitude and longitude values of {} are {}, {}.'.format(name_neighbor, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rosedale are 43.6795626, -79.37752940000001.


##### Top 100 venues that are in Rosedale within a radius of 700 meters

In [19]:
LIMIT = 100 
radius = 700 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0&client_secret=EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS&v=20180605&ll=43.6795626,-79.37752940000001&radius=700&limit=100'

##### Create the GET request URL. 

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '601a4d51f81e5f4b32d5fe6d'},
 'response': {'headerLocation': 'Rosedale',
  'headerFullLocation': 'Rosedale, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 43.685862606300006,
    'lng': -79.36883453584119},
   'sw': {'lat': 43.67326259369999, 'lng': -79.38622426415884}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c0960fb009a0f472fabe7bf',
       'name': 'Craigleigh Gardens',
       'location': {'address': '160 South Drive',
        'crossStreet': 'at Elm Ave',
        'lat': 43.67809940868806,
        'lng': -79.37158584594727,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67809940868806,
          'lng': -79.37158584594727}],
        'distan

In [21]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [22]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,Craigleigh Gardens,Park,43.678099,-79.371586
1,Rosedale Park,Playground,43.682328,-79.378934
2,Whitney Park,Park,43.682036,-79.373788
3,Alex Murray Parkette,Park,43.6783,-79.382773
4,Milkman's Lane,Trail,43.676352,-79.373842


In [23]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

5 venues were returned by Foursquare.


##### Explore neighborhoods in Toronto

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius = 700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
       
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)   

In [25]:
Downtown_Toronto_venues = getNearbyVenues(names =  Downtown_Toronto_data['Neighborhood'],
                                   latitudes =  Downtown_Toronto_data['Latitude'],
                                   longitudes =  Downtown_Toronto_data['Longitude'])

Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Queen's Park, Ontario Provincial Government


In [26]:
print(Downtown_Toronto_venues.shape)
Downtown_Toronto_venues.head()

(1548, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Craigleigh Gardens,43.678099,-79.371586,Park
1,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
2,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
3,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
4,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail


##### How many venues were returned for each neighborhood

In [27]:
Downtown_Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",26,26,26,26,26,26
Central Bay Street,94,94,94,94,94,94
Christie,26,26,26,26,26,26
Church and Wellesley,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",95,95,95,95,95,95
"Kensington Market, Chinatown, Grange Park",100,100,100,100,100,100


##### How many unique categories can be curated from all the returned venues

In [28]:
print('There are {} uniques categories.'.format(len(Downtown_Toronto_venues['Venue Category'].unique())))

There are 224 uniques categories.


In [29]:
print('There are {} distinct venues in {} categories.'.format(len(Downtown_Toronto_venues['Venue'].unique()),len(Downtown_Toronto_venues['Venue Category'].unique())))

There are 816 distinct venues in 224 categories.


##### Analyze Each Neighborhood

In [30]:
Downtown_Toronto_onehot = pd.get_dummies(Downtown_Toronto_venues[['Venue Category']], prefix="", prefix_sep="")
Downtown_Toronto_onehot['Neighborhood'] = Downtown_Toronto_venues['Neighborhood'] 
fixed_columns = [Downtown_Toronto_onehot.columns[-1]] + list(Downtown_Toronto_onehot.columns[:-1])
DowntownToronto_onehot = Downtown_Toronto_onehot[fixed_columns]
Downtown_Toronto_onehot.head()

Unnamed: 0,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
Downtown_Toronto_onehot.shape

(1548, 224)

##### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [32]:
Downtown_Toronto_grouped = Downtown_Toronto_onehot.groupby('Neighborhood').mean().reset_index()
Downtown_Toronto_grouped

Unnamed: 0,Neighborhood,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.038462,0.038462,0.038462,0.076923,0.115385,0.115385,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.010638,0.010638,0.010638,0.0,0.010638,0.0,0.0,0.0,0.010638
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.010526
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.07,0.0,0.0,0.02,0.01,0.0,0.0,0.02


In [33]:
Downtown_Toronto_grouped.shape

(19, 224)

##### Print each neighborhood along with the top 5 most common venues

In [34]:
num_top_venues = 5

for hood in Downtown_Toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Downtown_Toronto_grouped[Downtown_Toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
         venue  freq
0  Coffee Shop  0.07
1        Hotel  0.06
2         Café  0.04
3   Restaurant  0.03
4     Beer Bar  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0  Rental Car Location  0.12
1     Airport Terminal  0.12
2      Airport Service  0.12
3     Sculpture Garden  0.08
4          Coffee Shop  0.08


----Central Bay Street----
               venue  freq
0        Coffee Shop  0.15
1               Café  0.06
2        Art Gallery  0.05
3  French Restaurant  0.02
4                Bar  0.02


----Christie----
           venue  freq
0  Grocery Store  0.19
1           Café  0.15
2           Park  0.12
3    Coffee Shop  0.08
4     Restaurant  0.04


----Church and Wellesley----
                      venue  freq
0               Coffee Shop  0.10
1       Japanese Restaurant  0.06
2          Sushi Restaurant  0.04
3                      Café  0.03
4  Mediterrane

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Display the top 10 venues for each neighborhood.

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = Downtown_Toronto_grouped['Neighborhood']

for ind in np.arange(Downtown_Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Downtown_Toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Cocktail Bar,Beer Bar,Cheese Shop,Seafood Restaurant,Park
1,"CN Tower, King and Spadina, Railway Lands, Har...",Rental Car Location,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden,Coffee Shop,Harbor / Marina,Airport Lounge,Airport Gate,Airport Food Court
2,Central Bay Street,Coffee Shop,Café,Art Gallery,Tea Room,Sandwich Place,French Restaurant,Pizza Place,Gastropub,Clothing Store,Chinese Restaurant
3,Christie,Grocery Store,Café,Park,Coffee Shop,Restaurant,Bakery,Nightclub,Baby Store,Candy Store,Athletics & Sports
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Hotel,Restaurant,Mediterranean Restaurant,Café,Gay Bar,Gym,Yoga Studio


##### Cluster Downtown Neighborhoods of Toronto using K-means

In [37]:
kclusters = 5
Downtown_Toronto_grouped_clustering = Downtown_Toronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(Downtown_Toronto_grouped_clustering)
kmeans.labels_[0:10] 

array([0, 2, 4, 3, 4, 0, 0, 4, 0, 4], dtype=int32)

##### New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [38]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Downtown_Toronto_merged = Downtown_Toronto_data
Downtown_Toronto_merged = Downtown_Toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
Downtown_Toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Dog Run,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,Ethiopian Restaurant
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675,4,Coffee Shop,Bakery,Restaurant,Grocery Store,Park,Café,Pizza Place,Pub,Japanese Restaurant,Italian Restaurant
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,4,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Hotel,Restaurant,Mediterranean Restaurant,Café,Gay Bar,Gym,Yoga Studio
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4,Coffee Shop,Restaurant,Park,Theater,Café,Bakery,Pub,Breakfast Spot,Performing Arts Venue,Thai Restaurant
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Coffee Shop,Hotel,Gastropub,Burger Joint,Sandwich Place,Ramen Restaurant,Clothing Store,Falafel Restaurant,Movie Theater,Diner


In [39]:
map_clusters_Downtown_Toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(Downtown_Toronto_merged['Latitude'], Downtown_Toronto_merged['Longitude'], Downtown_Toronto_merged['Neighborhood'], Downtown_Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_Downtown_Toronto)
       
map_clusters_Downtown_Toronto

##### Examine Clusters
##### Cluster 1

In [40]:
Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 0, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Downtown Toronto,0,Coffee Shop,Café,Seafood Restaurant,Bakery,Gastropub,Pizza Place,Cosmetics Shop,Gym,Restaurant,American Restaurant
6,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Cocktail Bar,Beer Bar,Cheese Shop,Seafood Restaurant,Park
8,Downtown Toronto,0,Café,Coffee Shop,Hotel,Clothing Store,Theater,Restaurant,Gastropub,Cosmetics Shop,Sandwich Place,Breakfast Spot
9,Downtown Toronto,0,Coffee Shop,Hotel,Boat or Ferry,Plaza,Brewery,Café,Park,Scenic Lookout,Aquarium,Sushi Restaurant
10,Downtown Toronto,0,Hotel,Coffee Shop,Café,Restaurant,Japanese Restaurant,American Restaurant,Gym,Seafood Restaurant,Theater,Concert Hall
11,Downtown Toronto,0,Coffee Shop,Hotel,Café,Japanese Restaurant,Asian Restaurant,Concert Hall,Gastropub,Restaurant,Gym,Seafood Restaurant
15,Downtown Toronto,0,Coffee Shop,Hotel,Japanese Restaurant,Café,Restaurant,Beer Bar,Seafood Restaurant,Gym,Bakery,Cocktail Bar
16,Downtown Toronto,0,Hotel,Coffee Shop,Café,Restaurant,Japanese Restaurant,Asian Restaurant,Seafood Restaurant,Gym,Theater,American Restaurant


In [41]:
Cluster_1 = Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 0, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]
Cluster_1.describe(include = 'all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,8,8.0,8,8,8,8,8,8,8,8,8,8
unique,1,,3,3,5,6,6,8,6,6,7,7
top,Downtown Toronto,,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Asian Restaurant,Seafood Restaurant,Gym,Theater,American Restaurant
freq,8,,5,4,4,3,3,1,2,3,2,2
mean,,0.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,0.0,,,,,,,,,,
25%,,0.0,,,,,,,,,,
50%,,0.0,,,,,,,,,,
75%,,0.0,,,,,,,,,,


##### Cluster 2

In [42]:
Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 1, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Park,Playground,Trail,Dog Run,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [43]:
Cluster_2 = Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 1, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]
Cluster_2.describe(include = 'all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1.0,1,1,1,1,1,1,1,1,1,1
unique,1,,1,1,1,1,1,1,1,1,1,1
top,Downtown Toronto,,Park,Playground,Trail,Dog Run,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,Ethiopian Restaurant
freq,1,,1,1,1,1,1,1,1,1,1,1
mean,,1.0,,,,,,,,,,
std,,,,,,,,,,,,
min,,1.0,,,,,,,,,,
25%,,1.0,,,,,,,,,,
50%,,1.0,,,,,,,,,,
75%,,1.0,,,,,,,,,,


##### Cluster 3

In [44]:
Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 2, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,2,Rental Car Location,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden,Coffee Shop,Harbor / Marina,Airport Lounge,Airport Gate,Airport Food Court


In [45]:
Cluster_3 = Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 2, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]
Cluster_3.describe(include = 'all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1.0,1,1,1,1,1,1,1,1,1,1
unique,1,,1,1,1,1,1,1,1,1,1,1
top,Downtown Toronto,,Rental Car Location,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden,Coffee Shop,Harbor / Marina,Airport Lounge,Airport Gate,Airport Food Court
freq,1,,1,1,1,1,1,1,1,1,1,1
mean,,2.0,,,,,,,,,,
std,,,,,,,,,,,,
min,,2.0,,,,,,,,,,
25%,,2.0,,,,,,,,,,
50%,,2.0,,,,,,,,,,
75%,,2.0,,,,,,,,,,


##### Cluster 4

In [46]:
Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 3, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Downtown Toronto,3,Grocery Store,Café,Park,Coffee Shop,Restaurant,Bakery,Nightclub,Baby Store,Candy Store,Athletics & Sports


In [47]:
Cluster_4 = Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 3, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]
Cluster_4.describe(include = 'all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1.0,1,1,1,1,1,1,1,1,1,1
unique,1,,1,1,1,1,1,1,1,1,1,1
top,Downtown Toronto,,Grocery Store,Café,Park,Coffee Shop,Restaurant,Bakery,Nightclub,Baby Store,Candy Store,Athletics & Sports
freq,1,,1,1,1,1,1,1,1,1,1,1
mean,,3.0,,,,,,,,,,
std,,,,,,,,,,,,
min,,3.0,,,,,,,,,,
25%,,3.0,,,,,,,,,,
50%,,3.0,,,,,,,,,,
75%,,3.0,,,,,,,,,,


##### Cluster 5

In [48]:
Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 4, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,4,Coffee Shop,Bakery,Restaurant,Grocery Store,Park,Café,Pizza Place,Pub,Japanese Restaurant,Italian Restaurant
2,Downtown Toronto,4,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Hotel,Restaurant,Mediterranean Restaurant,Café,Gay Bar,Gym,Yoga Studio
3,Downtown Toronto,4,Coffee Shop,Restaurant,Park,Theater,Café,Bakery,Pub,Breakfast Spot,Performing Arts Venue,Thai Restaurant
4,Downtown Toronto,4,Coffee Shop,Hotel,Gastropub,Burger Joint,Sandwich Place,Ramen Restaurant,Clothing Store,Falafel Restaurant,Movie Theater,Diner
7,Downtown Toronto,4,Coffee Shop,Café,Art Gallery,Tea Room,Sandwich Place,French Restaurant,Pizza Place,Gastropub,Clothing Store,Chinese Restaurant
12,Downtown Toronto,4,Café,Coffee Shop,Pizza Place,Bakery,Pub,Bubble Tea Shop,Bookstore,Hotel,Gym,Italian Restaurant
13,Downtown Toronto,4,Café,Vegetarian / Vegan Restaurant,Bar,Coffee Shop,Mexican Restaurant,Caribbean Restaurant,Yoga Studio,Park,Grocery Store,Gaming Cafe
18,Downtown Toronto,4,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Japanese Restaurant,Sushi Restaurant,Park,Burrito Place,Pharmacy,Pizza Place


In [49]:
Cluster_5 = Downtown_Toronto_merged.loc[Downtown_Toronto_merged['Cluster Labels'] == 4, Downtown_Toronto_merged.columns[[1] + list(range(5, Downtown_Toronto_merged.shape[1]))]]
Cluster_5.describe(include = 'all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,8,8.0,8,8,8,8,8,8,8,8,8,8
unique,1,,2,8,8,8,7,8,7,8,7,7
top,Downtown Toronto,,Coffee Shop,Japanese Restaurant,Café,Coffee Shop,Sandwich Place,French Restaurant,Pizza Place,Gastropub,Gym,Italian Restaurant
freq,8,,6,1,1,1,2,1,2,1,2,2
mean,,4.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,4.0,,,,,,,,,,
25%,,4.0,,,,,,,,,,
50%,,4.0,,,,,,,,,,
75%,,4.0,,,,,,,,,,


##### Explore New York City, United States dataset

##### This dataset exists for free on the web https://geo.nyu.edu/catalog/nyu_2451_34572.

In [50]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [51]:
with open('newyork_data.json') as json_data:
    NewYork_data = json.load(json_data)

In [52]:
NewYork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

##### Define a new variable that includes this data.

In [53]:
Neighborhoods_data = NewYork_data['features']

In [54]:
Neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

##### Tranform the data into a pandas dataframe 
##### Start by creating an empty dataframe.

In [55]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

In [56]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


##### Latitude and longitude coordinates with Neighborhood dataframe

In [57]:
for data in Neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [58]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [59]:
neighborhoods.shape

(306, 4)

In [60]:
print('New York City has {} boroughs and {} neighborhoods.'.format(len(neighborhoods['Borough'].unique()),neighborhoods.shape[0]))

New York City has 5 boroughs and 306 neighborhoods.


##### Get the latitude and longitude values of New York City

In [61]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="NYC_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [62]:
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

##### Work with only boroughs that contain the word Queens

In [63]:
Queens_data = neighborhoods[neighborhoods['Borough'].str.contains('Queens')].reset_index(drop=True)
Queens_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Queens,Astoria,40.768509,-73.915654
1,Queens,Woodside,40.746349,-73.901842
2,Queens,Jackson Heights,40.751981,-73.882821
3,Queens,Elmhurst,40.744049,-73.881656
4,Queens,Howard Beach,40.654225,-73.838138


In [64]:
Queens_data.shape

(81, 4)

##### Get the geographical coordinates of Queens

In [65]:
address = 'Queens, NY'

geolocator = Nominatim(user_agent = "Queens_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Queens are 40.7498243, -73.7976337.


##### Create a map of Queens with neighborhoods superimposed on top.

In [66]:
map_Queens_data = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(Queens_data['Latitude'], Queens_data['Longitude'], Queens_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Queens_data)  
    
map_Queens_data

##### Explore Queens neighborhoods with Foursquare API
##### Define Foursquare Credentials and Version

In [67]:
CLIENT_ID = 'Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0' 
CLIENT_SECRET = 'EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS' 
VERSION = '20180605' 
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0
CLIENT_SECRET:EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS


##### Get the first neighborhood's name in our dataframe.

In [68]:
name_neighbor = Queens_data.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{name_neighbor}'.")

The first neighborhood's name is 'Astoria'.


##### Get the neighborhood's latitude and longitude values of Astoria

In [69]:
neighborhood_latitude = Queens_data.loc[0, 'Latitude'] 
neighborhood_longitude = Queens_data.loc[0, 'Longitude'] 

print('Latitude and longitude values of {} are {}, {}.'.format(name_neighbor, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Astoria are 40.76850859335492, -73.91565374304234.


##### Top 100 venues that are in Astoria within a radius of 700 meters

In [70]:
LIMIT = 100 
radius = 700 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=Y53TUHWVOZ4I4Z2JA5U2Y5ZUSYWRG1Y5MNHSQ2NGYVKNRCL0&client_secret=EIN5XIXWZZQPVXGPE2UCOKN3ZY3KBDSUICZ05NNXEJKXJTBS&v=20180605&ll=40.76850859335492,-73.91565374304234&radius=700&limit=100'

##### Create the GET request URL. 

In [71]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '601a4de0d775ce36898fabf4'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Astoria',
  'headerFullLocation': 'Astoria, Queens',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 139,
  'suggestedBounds': {'ne': {'lat': 40.77480859965493,
    'lng': -73.90735083211446},
   'sw': {'lat': 40.762208587054914, 'lng': -73.92395665397022}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bdf502a89ca76b062b75d5e',
       'name': 'Favela Grill',
       'location': {'address': '33-18 28th Ave',
        'crossStreet': 'btwn 33rd & 34th St.',
        'lat': 40.76734843380796,
        'lng': -73.917897

In [72]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [73]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,Favela Grill,Brazilian Restaurant,40.767348,-73.917897
1,Titan Foods Inc.,Gourmet Shop,40.769198,-73.919253
2,CrossFit Queens,Gym,40.769404,-73.918977
3,Orange Blossom,Gourmet Shop,40.769856,-73.917012
4,Simply Fit Astoria,Gym,40.769114,-73.912403


In [74]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


##### Explore neighborhoods in Queens

In [75]:
def getNearbyVenues(names, latitudes, longitudes, radius = 700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
      
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)   
    

In [76]:
Queens_venues = getNearbyVenues(names =  Queens_data['Neighborhood'],
                                   latitudes =  Queens_data['Latitude'],
                                   longitudes =  Queens_data['Longitude'])

Astoria
Woodside
Jackson Heights
Elmhurst
Howard Beach
Corona
Forest Hills
Kew Gardens
Richmond Hill
Flushing
Long Island City
Sunnyside
East Elmhurst
Maspeth
Ridgewood
Glendale
Rego Park
Woodhaven
Ozone Park
South Ozone Park
College Point
Whitestone
Bayside
Auburndale
Little Neck
Douglaston
Glen Oaks
Bellerose
Kew Gardens Hills
Fresh Meadows
Briarwood
Jamaica Center
Oakland Gardens
Queens Village
Hollis
South Jamaica
St. Albans
Rochdale
Springfield Gardens
Cambria Heights
Rosedale
Far Rockaway
Broad Channel
Breezy Point
Steinway
Beechhurst
Bay Terrace
Edgemere
Arverne
Rockaway Beach
Neponsit
Murray Hill
Floral Park
Holliswood
Jamaica Estates
Queensboro Hill
Hillcrest
Ravenswood
Lindenwood
Laurelton
Lefrak City
Belle Harbor
Rockaway Park
Somerville
Brookville
Bellaire
North Corona
Forest Hills Gardens
Jamaica Hills
Utopia
Pomonok
Astoria Heights
Hunters Point
Sunnyside Gardens
Blissville
Roxbury
Middle Village
Malba
Hammels
Bayswater
Queensbridge


In [77]:
print(Queens_venues.shape)
Queens_venues.head()

(3460, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Astoria,40.768509,-73.915654,Favela Grill,40.767348,-73.917897,Brazilian Restaurant
1,Astoria,40.768509,-73.915654,Titan Foods Inc.,40.769198,-73.919253,Gourmet Shop
2,Astoria,40.768509,-73.915654,CrossFit Queens,40.769404,-73.918977,Gym
3,Astoria,40.768509,-73.915654,Orange Blossom,40.769856,-73.917012,Gourmet Shop
4,Astoria,40.768509,-73.915654,Simply Fit Astoria,40.769114,-73.912403,Gym


##### How many venues were returned for each neighborhood

In [78]:
Queens_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arverne,28,28,28,28,28,28
Astoria,100,100,100,100,100,100
Astoria Heights,27,27,27,27,27,27
Auburndale,61,61,61,61,61,61
Bay Terrace,39,39,39,39,39,39
...,...,...,...,...,...,...
Sunnyside Gardens,96,96,96,96,96,96
Utopia,26,26,26,26,26,26
Whitestone,8,8,8,8,8,8
Woodhaven,55,55,55,55,55,55


##### How many unique categories can be curated from all the returned venues

In [79]:
print('There are {} uniques categories.'.format(len(Queens_venues['Venue Category'].unique())))

There are 302 uniques categories.


In [80]:
print('There are {} distinct venues in {} categories.'.format(len(Queens_venues['Venue'].unique()),len(Queens_venues['Venue Category'].unique())))

There are 2559 distinct venues in 302 categories.


##### Analyze Each Neighborhood

In [81]:
Queens_onehot = pd.get_dummies(Queens_venues[['Venue Category']], prefix="", prefix_sep="")
Queens_onehot['Neighborhood'] = Queens_venues['Neighborhood'] 
fixed_columns = [Queens_onehot.columns[-1]] + list(Queens_onehot.columns[:-1])
Queens_onehot = Queens_onehot[fixed_columns]
Queens_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [82]:
Queens_onehot.shape

(3460, 302)

##### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [83]:
Queens_grouped = Queens_onehot.groupby('Neighborhood').mean().reset_index()
Queens_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Afghan Restaurant,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Arverne,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.035714,0.000000,0.000000,0.0,0.0
1,Astoria,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.010000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.030000,0.000000,0.000000,0.0,0.0
2,Astoria Heights,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
3,Auburndale,0.0,0.000000,0.000000,0.0,0.0,0.032787,0.000000,0.0,0.0,...,0.0,0.016393,0.0,0.000000,0.0,0.000000,0.016393,0.000000,0.0,0.0
4,Bay Terrace,0.0,0.025641,0.000000,0.0,0.0,0.051282,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.025641,0.0,0.000000,0.000000,0.051282,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,Sunnyside Gardens,0.0,0.000000,0.000000,0.0,0.0,0.020833,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
77,Utopia,0.0,0.000000,0.038462,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
78,Whitestone,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
79,Woodhaven,0.0,0.000000,0.000000,0.0,0.0,0.018182,0.018182,0.0,0.0,...,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0


In [84]:
Queens_grouped.shape

(81, 302)

##### Print each neighborhood along with the top 5 most common venues

In [85]:
num_top_venues = 5

for hood in Queens_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Queens_grouped[Queens_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arverne----
           venue  freq
0      Surf Spot  0.14
1       Bus Stop  0.07
2          Beach  0.07
3  Metro Station  0.07
4  Deli / Bodega  0.07


----Astoria----
                       venue  freq
0                        Bar  0.06
1  Middle Eastern Restaurant  0.04
2              Grocery Store  0.04
3                Coffee Shop  0.04
4         Seafood Restaurant  0.04


----Astoria Heights----
                 venue  freq
0  Rental Car Location  0.19
1          Bus Station  0.11
2          Supermarket  0.07
3   Italian Restaurant  0.04
4           Playground  0.04


----Auburndale----
                 venue  freq
0    Korean Restaurant  0.07
1     Sushi Restaurant  0.05
2          Pizza Place  0.05
3       Cosmetics Shop  0.03
4  American Restaurant  0.03


----Bay Terrace----
                 venue  freq
0       Clothing Store  0.10
1           Kids Store  0.05
2  American Restaurant  0.05
3    Mobile Phone Shop  0.05
4       Cosmetics Shop  0.05


----Bayside----
         

In [86]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Display the top 10 venues for each neighborhood.

In [87]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = Queens_grouped['Neighborhood']

for ind in np.arange(Queens_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Queens_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arverne,Surf Spot,Donut Shop,Bus Stop,Sandwich Place,Metro Station,Beach,Deli / Bodega,Gas Station,Pizza Place,Burrito Place
1,Astoria,Bar,Seafood Restaurant,Grocery Store,Middle Eastern Restaurant,Coffee Shop,Indian Restaurant,Pizza Place,Hookah Bar,Bakery,Wine Shop
2,Astoria Heights,Rental Car Location,Bus Station,Supermarket,Burger Joint,Chinese Restaurant,Laundromat,Greek Restaurant,Moving Target,Baseball Field,Liquor Store
3,Auburndale,Korean Restaurant,Sushi Restaurant,Pizza Place,Café,Mattress Store,Sandwich Place,Bar,Italian Restaurant,Ice Cream Shop,Greek Restaurant
4,Bay Terrace,Clothing Store,Women's Store,Cosmetics Shop,American Restaurant,Donut Shop,Shoe Store,Kids Store,Mobile Phone Shop,Home Service,Furniture / Home Store


##### Cluster Queens Neighborhoods of New York using K-means

In [88]:
kclusters = 5
Queens_grouped_clustering = Queens_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(Queens_grouped_clustering)
kmeans.labels_[0:10] 

array([0, 3, 3, 3, 3, 3, 2, 0, 0, 4], dtype=int32)

##### New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [89]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Queens_merged = Queens_data
Queens_merged = Queens_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
Queens_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Queens,Astoria,40.768509,-73.915654,3,Bar,Seafood Restaurant,Grocery Store,Middle Eastern Restaurant,Coffee Shop,Indian Restaurant,Pizza Place,Hookah Bar,Bakery,Wine Shop
1,Queens,Woodside,40.746349,-73.901842,3,Thai Restaurant,Bar,Bakery,Grocery Store,Pub,Chinese Restaurant,Filipino Restaurant,Pizza Place,Gym / Fitness Center,Discount Store
2,Queens,Jackson Heights,40.751981,-73.882821,3,Latin American Restaurant,Bakery,Mexican Restaurant,South American Restaurant,Peruvian Restaurant,Pizza Place,Thai Restaurant,Coffee Shop,Donut Shop,Pharmacy
3,Queens,Elmhurst,40.744049,-73.881656,3,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Bakery,Supermarket,Latin American Restaurant,South American Restaurant,Vietnamese Restaurant,Food Truck,Grocery Store
4,Queens,Howard Beach,40.654225,-73.838138,3,Italian Restaurant,Pharmacy,Park,Sandwich Place,Ice Cream Shop,Bank,Fast Food Restaurant,Sushi Restaurant,Breakfast Spot,Fried Chicken Joint


In [90]:
map_clusters_Queens = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(Queens_merged['Latitude'], Queens_merged['Longitude'], Queens_merged['Neighborhood'], Queens_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_Queens)
       
map_clusters_Queens

##### Examine Clusters
##### Cluster 1

In [91]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 0, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Richmond Hill,Pizza Place,Indian Restaurant,Deli / Bodega,Bank,Lounge,Latin American Restaurant,Diner,Discount Store,Caribbean Restaurant,Clothing Store
17,Woodhaven,Pizza Place,Department Store,Pharmacy,Latin American Restaurant,Deli / Bodega,Donut Shop,Fried Chicken Joint,Bank,Sandwich Place,Supermarket
18,Ozone Park,Pizza Place,Pharmacy,Gym,Deli / Bodega,Bank,Diner,Donut Shop,Grocery Store,Metro Station,Mattress Store
19,South Ozone Park,Deli / Bodega,Park,Bar,Donut Shop,Fast Food Restaurant,Sandwich Place,Food Truck,Hotel,Flea Market,Fish Market
21,Whitestone,Deli / Bodega,Bagel Shop,Italian Restaurant,Gastropub,Supermarket,Convenience Store,Bar,Food,Flower Shop,Flea Market
27,Bellerose,Deli / Bodega,Pizza Place,Mobile Phone Shop,Pub,Sandwich Place,Liquor Store,Storage Facility,Massage Studio,Motel,Seafood Restaurant
28,Kew Gardens Hills,Bank,Pizza Place,Donut Shop,Chinese Restaurant,Bus Stop,Sandwich Place,Restaurant,Boat or Ferry,Middle Eastern Restaurant,Bus Station
30,Briarwood,Pizza Place,Diner,Deli / Bodega,Indian Restaurant,Donut Shop,Discount Store,Latin American Restaurant,Sandwich Place,Sushi Restaurant,Market
35,South Jamaica,Pizza Place,Caribbean Restaurant,Fried Chicken Joint,Deli / Bodega,Grocery Store,Supermarket,Donut Shop,Discount Store,Sandwich Place,Vegetarian / Vegan Restaurant
36,St. Albans,Caribbean Restaurant,Motorcycle Shop,Train Station,Café,Liquor Store,Chinese Restaurant,Fast Food Restaurant,Donut Shop,Gym / Fitness Center,Fried Chicken Joint


In [92]:
Cluster_1 = Queens_merged.loc[Queens_merged['Cluster Labels'] == 0, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]
Cluster_1.describe(include = 'all')

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,23,23,23,23,23,23,23,23,23,23,23
unique,23,13,19,15,20,19,18,18,20,20,21
top,South Ozone Park,Pizza Place,Pizza Place,Deli / Bodega,Park,Deli / Bodega,Sandwich Place,Donut Shop,Supermarket,Metro Station,Supermarket
freq,1,6,3,4,2,3,3,4,3,2,2


##### Cluster 2

In [93]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 1, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Neponsit,Beach,Park,Pizza Place,Zoo,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant


In [94]:
Cluster_2 = Queens_merged.loc[Queens_merged['Cluster Labels'] == 1, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]
Cluster_2.describe(include = 'all')

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1,1,1,1,1,1,1,1,1,1
unique,1,1,1,1,1,1,1,1,1,1,1
top,Neponsit,Beach,Park,Pizza Place,Zoo,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
freq,1,1,1,1,1,1,1,1,1,1,1


##### Cluster 3

In [95]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 2, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
79,Bayswater,Playground,Park,Men's Store,Athletics & Sports,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant


In [96]:
Cluster_3 = Queens_merged.loc[Queens_merged['Cluster Labels'] == 2, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]
Cluster_3.describe(include = 'all')

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,1,1,1,1,1,1,1,1,1,1,1
unique,1,1,1,1,1,1,1,1,1,1,1
top,Bayswater,Playground,Park,Men's Store,Athletics & Sports,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
freq,1,1,1,1,1,1,1,1,1,1,1


##### Cluster 4

In [97]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 3, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Astoria,Bar,Seafood Restaurant,Grocery Store,Middle Eastern Restaurant,Coffee Shop,Indian Restaurant,Pizza Place,Hookah Bar,Bakery,Wine Shop
1,Woodside,Thai Restaurant,Bar,Bakery,Grocery Store,Pub,Chinese Restaurant,Filipino Restaurant,Pizza Place,Gym / Fitness Center,Discount Store
2,Jackson Heights,Latin American Restaurant,Bakery,Mexican Restaurant,South American Restaurant,Peruvian Restaurant,Pizza Place,Thai Restaurant,Coffee Shop,Donut Shop,Pharmacy
3,Elmhurst,Thai Restaurant,Mexican Restaurant,Chinese Restaurant,Bakery,Supermarket,Latin American Restaurant,South American Restaurant,Vietnamese Restaurant,Food Truck,Grocery Store
4,Howard Beach,Italian Restaurant,Pharmacy,Park,Sandwich Place,Ice Cream Shop,Bank,Fast Food Restaurant,Sushi Restaurant,Breakfast Spot,Fried Chicken Joint
5,Corona,Mexican Restaurant,Deli / Bodega,Playground,Park,Pizza Place,Donut Shop,Convenience Store,Ice Cream Shop,Sandwich Place,Café
6,Forest Hills,Park,Bakery,Burger Joint,Seafood Restaurant,Gym / Fitness Center,Bagel Shop,Boxing Gym,Cosmetics Shop,Mediterranean Restaurant,Yoga Studio
7,Kew Gardens,Chinese Restaurant,Deli / Bodega,Donut Shop,Pizza Place,Bakery,Supermarket,Cosmetics Shop,Indian Restaurant,Italian Restaurant,Latin American Restaurant
9,Flushing,Bubble Tea Shop,Chinese Restaurant,Hotpot Restaurant,Bakery,Korean Restaurant,Dumpling Restaurant,Asian Restaurant,Karaoke Bar,Food Court,Tea Room
10,Long Island City,Hotel,Coffee Shop,Café,Donut Shop,Pizza Place,Bar,Bubble Tea Shop,Italian Restaurant,Restaurant,Rental Car Location


In [98]:
Cluster_4 = Queens_merged.loc[Queens_merged['Cluster Labels'] == 3, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]
Cluster_4.describe(include = 'all')

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,52,52,52,52,52,52,52,52,52,52,52
unique,52,29,33,35,37,40,37,38,37,41,43
top,Pomonok,Chinese Restaurant,Pizza Place,Bakery,Donut Shop,Bakery,Sandwich Place,Bar,Fast Food Restaurant,Chinese Restaurant,Sandwich Place
freq,1,5,5,5,4,3,3,5,3,4,3


##### Cluster 5

In [99]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 4, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Rockaway Beach,Beach,Ice Cream Shop,Latin American Restaurant,Bar,Pizza Place,Bagel Shop,BBQ Joint,Eastern European Restaurant,Fast Food Restaurant,Seafood Restaurant
61,Belle Harbor,Beach,Pub,Spa,Deli / Bodega,Pharmacy,Boutique,Mexican Restaurant,Bagel Shop,Chinese Restaurant,Donut Shop
62,Rockaway Park,Beach,Pizza Place,Donut Shop,Liquor Store,Deli / Bodega,Pharmacy,Supermarket,Latin American Restaurant,Bar,Bank
78,Hammels,Beach,Supermarket,Donut Shop,Dog Run,Bar,Beach Bar,Bakery,Surf Spot,Fast Food Restaurant,Gym / Fitness Center


In [100]:
Cluster_5 = Queens_merged.loc[Queens_merged['Cluster Labels'] == 4, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]
Cluster_5.describe(include = 'all')

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,4,4,4,4,4,4,4,4,4,4,4
unique,4,1,4,3,4,4,4,4,4,3,4
top,Hammels,Beach,Supermarket,Donut Shop,Dog Run,Bar,Beach Bar,Mexican Restaurant,Bagel Shop,Fast Food Restaurant,Seafood Restaurant
freq,1,4,1,2,1,1,1,1,1,2,1


### Results and Discussion

#### Downtown Toronto, Canada 

#### The analysis shows the top of the most common venues:

1. Cluster 0 (Red dots): Coffee Shop, Café, Gastropub, Restaurant, Japanese Restaurant, Gym, Seafood Restaurant, American Restaurant and Pizza Place. 

2. Cluster 1 (Purple dots): Park, Playground, Gym / Fitness Center, Trail, Yoga Studio, Discount Store, Falafel Restaurant, Event Space, Ethiopian Restaurant and Electronics Store.

3. Cluster 2 (Blue dots): Airport Service, Airport Terminal, Rental Car Location, Coffee Shop, Harbor / Marina, Airport Lounge, Sculpture Garden, Bar, Music Venue and Pier.

4. Cluster 3 (Green dots): Grocery Store, Café, Park, Coffee Shop, Candy Store, Bakery, Playground, Beer Store, Italian Restaurant and Nightclub.

5. Cluster 4 (Orange dots): Coffee Shop, Japanese Restaurant, Café, Bakery, Gastropub, Mexican Restaurant, Bubble Tea Shop, Ramen Restaurant, Diner and Korean Restaurant.


Toronto has 10 boroughs and 103 neighborhoods and its geographical coordinate are for the latitude: 43.6534817 and longitude: -79.3839347. Downtown Toronto has 19 neighborhoods and 222 venues, further there are 823 distinct venues in 222 categories.

#### Queens, New York City

#### The analysis shows the top of the most common venues:

1. Cluster 0 (Red dots): Deli / Bodega, Food Truck, Caribbean Restaurant, Bagel Shop, Mediterranean Restaurant, Chinese Restaurant and Sandwich Place.

2. Cluster 1 (Purple dots): Beach, Park, Pizza Place, Zoo, Pharmacy, Falafel Restaurant, Mexican Restaurant, Bagel Shop, Fast Food Restaurant and Donut Shop.

3. Cluster 2 (Blue dots): Indian Restaurant, Ice Cream Shop, Grocery Store, Pizza Place, Fast Food Restaurant, Gift Shop, Bank, Bagel Shop, Dosa Place and Donut Shop.

4. Cluster 3 (Green dots): Pizza Place, Bakery, Grocery Store, Sandwich Place, Chinese Restaurant, Donut Shop and Bank.

5. Cluster 4 (Orange dots): Playground, Indian Restaurant, Tennis Court, Park, Construction & Landscaping, Cycle Studio, Filipino Restaurant, Cosmetics Shop, Event Space and Falafel Restaurant. 

New York city has 5 boroughs and 306 neighborhoods and its geographical coordinate are for the latitude: 40.7127281 and longitude -74.0060152. Queens borough has 81 neighborhoods and 304 venues, further there are 2535 distinct venues in 304 categories. 

### Conclusion

In this project it was collect the information about Toronto and New York boroughs from the websites, using geospatial libraries to mapped them, using Foursquare API to collect the venues and their types for each location within a radius of 700 meters. It was collected neighborhoods and venues by location and preparing them for clustering and finally by the k-means algorithm, analyze the top 10 most common venues in each cluster and visualized them on the map. In conclusion both cities are good but based on the quantity of venues and neighborhoods is the best for the Italian firm to choose Queens over Downtown Toronto to move its headquarters because offer more options for the company and its employees.