# Toronto - The City Of Neighborhoods 

The strength and vitality of the many neighbourhoods that make up Toronto, Ontario, Canada has earned the city its unofficial nickname of "the city of neighbourhoods. There are over 140 neighbourhoods officially recognized by the City of Toronto. The aim of this project is to explore, segment and clusterise Toronto according to its neighborhoods and find similarites and disimilarities using data science techniques.

### Install Dependencies

In [1]:
!pip3 install bs4
!pip3 install requests
!pip3 install html5lib



### Import Dependencies
 We import Beautifulsoup dependency for web scraping of wikipedia page, requests for making http calls, html5lib a type of beautifulsoup parser for html files and pandas for working with extracted data in the form of a dataframe
 

In [2]:
from bs4 import BeautifulSoup
import requests
import html5lib
import pandas as pd

## Data Collection - Scrape Files

In [3]:
scraping_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html_file = requests.get(scraping_url).text	
soup = BeautifulSoup(html_file, "html5lib")
pd.set_option("display.max_columns", None)


## Data Preprocessing - Convert Files into DataFrame

The scraped html files will be conerted using pandas into a dataframe consisting on three columns postal code , borough and neighborhood

In [4]:
# Create a list of neighborhoods
neighborhoods = []
for row in soup.find("table").findAll("td"):
    data = {}
    if row.span.text == "Not assigned":
        pass
    else:
        data["PostalCode"] = row.p.text[:3]
        data["Borough"] = row.span.text.split("(")[0]
        data["Neighborhood"] = (((row.span.text.split("(")[1]).strip(")")).replace("/",",").replace(')',' ')).strip(' ')
        neighborhoods.append(data)

# create dataframe
df = pd.DataFrame(neighborhoods)

# replace outlying formats for boroughs
df['Borough']=df['Borough'].replace({
    'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
    'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
    'EtobicokeNorthwest':'Etobicoke Northwest',
    'East YorkEast Toronto':'East York/East Toronto',
    'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

# replace not assigned neighborhoods                                           
not_assigned_neighborhoods = df[df["Neighborhood"]== "Not assigned"]
not_assigned_neighborhoods["Neighborhood"] = not_assigned_neighborhoods["Borough"]
df.sort_values(["PostalCode"],ascending=True, inplace=True)
df.reset_index(drop=True,inplace=True)
    

In [5]:
df.head(10)
print(df.shape)

(103, 3)


### Get the Latitude and Longitude based on Postal Codes

In [6]:
gc_df = pd.read_csv("Geospatial_Coordinates.csv")
gc_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
df["Latitude"] = gc_df[df["PostalCode"] == gc_df["Postal Code"]]["Latitude"]
df["Longitude"] = gc_df[df["PostalCode"] == gc_df["Postal Code"]]["Longitude"]
df.head(5)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern , Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill , Port Union , Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Exploring Neighborhoods
  
  We want to explore the borough that has the large number of neighborhoods to find out similiarities amongst it neighbourhood 

Get the Borough with the maximum number of postal codes placement

In [8]:
grouped_neighbourhood = df.groupby(["Borough"], axis=0).count()
grouped_neighbourhood.sort_values(["PostalCode"], ascending=False, inplace=True)
grouped_neighbourhood.reset_index(inplace=True)
grouped_neighbourhood.head(1)

Unnamed: 0,Borough,PostalCode,Neighborhood,Latitude,Longitude
0,North York,24,24,24,24


### Get the Borough with the maximum number of neighborhoods

In [9]:

# for each row get the number of neighbourhood in it
# first create a new data frame
new_neighborhoods = []
for index, row in df.iterrows():
    data = {}
    data["Borough"] = row["Borough"]
    data["Neighborhood Count"] = row["Neighborhood"].count(",") + 1
    new_neighborhoods.append(data)
neighborhoods_count = pd.DataFrame(new_neighborhoods)
grouped_neighbourhood = neighborhoods_count.groupby(["Borough"], axis=0).sum()
grouped_neighbourhood.sort_values(["Neighborhood Count"], ascending=False, inplace=True)
grouped_neighbourhood.reset_index(inplace=True)
grouped_neighbourhood.head(1)

Unnamed: 0,Borough,Neighborhood Count
0,Etobicoke,44


The results shows that while North York has the highest number of postal codes placements in the city of Toronto, Etobicoke has the highest numbr of neighborhoods. However because each postal code is attached to just a set of longitude and latitude, we will be using North York that has more postal codes instead of Etobicoke

### Exploring North York Neighborhoods

We install and import the neccessary packages for our exploration

In [10]:
north_york_data = df[df["Borough"] == "North York"].reset_index(drop=True)
north_york_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M2H,North York,Hillcrest Village,43.803762,-79.363452
1,M2J,North York,"Fairview , Henry Farm , Oriole",43.778517,-79.346556
2,M2K,North York,Bayview Village,43.786947,-79.385975
3,M2L,North York,"York Mills , Silver Hills",43.75749,-79.374714
4,M2M,North York,"Willowdale , Newtonbrook",43.789053,-79.408493


In [11]:
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import json
import matplotlib.colors as colors
import matplotlib.cm as cm
import folium

In [12]:
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode("North York, Toronto")
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York, Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York, Toronto are 43.7543263, -79.44911696639593.


### Map of North York with its neighbourhoods superimposed on it.

In [13]:
north_york_map = folium.Map(location=[latitude,longitude], zoom_start=10)
for lat, lng, label in zip(north_york_data['Latitude'], north_york_data['Longitude'], north_york_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(north_york_map)
north_york_map

In [14]:
### Using Forsquare API
using foursquare api, we collect data about places nearby to a specific longitude and latitude

SyntaxError: invalid syntax (<ipython-input-14-4ac3bff26eb2>, line 2)

In [387]:
CLIENT_ID = 'JQUCF02XCSTKMVE4GD1AT3G3XJJPQQ3V0MJXWQZKV4ABTBQZ' # your Foursquare ID
CLIENT_SECRET = '1IPWZTMXB3ASQYSQ3N5PLQESNYXYMH0D1DEQA0QOONW3HDNR' # your Foursquare Secret
ACCESS_TOKEN = "BMWWH5P3FNH3KLK5ZAUTXSCYD1AZXZ1OB4UPF512Y4EWLGQN" # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100

Let explore the neighbourhoods of north york by getting the top nearby venues for each neighbourhood in north york. 

In [388]:

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']) for venue in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
        

In [391]:
north_york_venues = getNearbyVenues(north_york_data["Neighborhood"], north_york_data["Latitude"], north_york_data["Longitude"])
north_york_venues

Error: connect ECONNREFUSED 127.0.0.1:62978

In [None]:

print(north_york_venues.groupby("Neighborhood").count().shape)
north_york_venues.groupby("Neighborhood").count()
#


Error: connect ECONNREFUSED 127.0.0.1:62978

In [None]:
# one of the neighborhoods in north york have no nearby places with a 500m range

## Analyzing North York Neighbourhoods
To be able to use this information for clustering we create dummy variables for each category

In [369]:

# add neighborhood column back to dataframe
north_york_dummies = pd.get_dummies(north_york_venues[['Venue Category']], prefix="", prefix_sep="")
north_york_dummies['Neighborhood'] = north_york_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [north_york_dummies.columns[-1]] + list(north_york_dummies.columns[:-1])
north_york_dummies = north_york_dummies[fixed_columns]


In [370]:
north_york_grouped = north_york_dummies.groupby("Neighborhood").mean().reset_index()
north_york_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Butcher,Café,Caribbean Restaurant,Carpet Store,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Electronics Store,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Hockey Arena,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lingerie Store,Liquor Store,Lounge,Luggage Store,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Movie Theater,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shoe Store,Shopping Mall,Sporting Goods Shop,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Bathurst Manor , Wilson Heights , Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.041667,0.041667,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.041667,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park , Lawrence Manor East",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.08,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0
3,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Don Mills South,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.047619,0.047619,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Downsview East,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Downsview Northwest,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Downsview West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Fairview , Henry Farm , Oriole",0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.031746,0.031746,0.015873,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.126984,0.079365,0.0,0.0,0.015873,0.015873,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.015873,0.063492,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.015873,0.031746,0.0,0.015873,0.015873,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.015873,0.0,0.0,0.015873,0.015873,0.0,0.015873,0.0,0.015873,0.0,0.015873,0.015873,0.015873,0.0,0.0,0.015873


In [371]:
 north_york_grouped.shape

(23, 107)

Lets print 10 top venues for each neighborhood

In [372]:
import numpy as np

In [373]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = north_york_grouped['Neighborhood']
for ind in np.arange(north_york_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_york_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor , Wilson Heights , Downsview North",Bank,Coffee Shop,Middle Eastern Restaurant,Chinese Restaurant,Pizza Place,Deli / Bodega,Mobile Phone Shop,Diner,Pharmacy,Sandwich Place
1,Bayview Village,Chinese Restaurant,Café,Japanese Restaurant,Bank,Accessories Store,Luggage Store,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop
2,"Bedford Park , Lawrence Manor East",Coffee Shop,Italian Restaurant,Pizza Place,Sandwich Place,Juice Bar,Pet Store,Pharmacy,Comfort Food Restaurant,Fast Food Restaurant,Café
3,Don Mills North,Gym,Caribbean Restaurant,Café,Japanese Restaurant,Luggage Store,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
4,Don Mills South,Gym,Coffee Shop,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Sandwich Place,Shopping Mall,Sporting Goods Shop
5,Downsview Central,Business Service,Baseball Field,Food Truck,Accessories Store,Lounge,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
6,Downsview East,Airport,Business Service,Park,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store
7,Downsview Northwest,Gym / Fitness Center,Liquor Store,Grocery Store,Athletics & Sports,Discount Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station
8,Downsview West,Shopping Mall,Grocery Store,Hotel,Bank,Park,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
9,"Fairview , Henry Farm , Oriole",Clothing Store,Coffee Shop,Fast Food Restaurant,Japanese Restaurant,Restaurant,Juice Bar,Bakery,Bank,Food Court,Salon / Barbershop


### Get Most Common Places in North York , Toronto

In [460]:
# get the unique list of most_common_places in all neigh
def get_most_common_place(neighborhoods_venues_sorted,val):
    common_places_list = [venue for venues in neighborhoods_venues_sorted.iloc[:,val:].to_numpy() for venue in venues]
    common_venues = pd.Series(np.array(common_places_list)).value_counts()
    most_common_venues = common_venues.to_frame()
    most_common_venues.reset_index(inplace =True)
    most_common_venues.columns = ["Venues","Count"]
    return most_common_venues

In [461]:
most_common_venues_in_north_york = get_most_common_place(neighborhoods_venues_sorted,1)
Ten_most_common_venues_in_north_york = most_common_venues_in_north_york .head(10)
Ten_most_common_venues_in_north_york

Unnamed: 0,Venues,Count
0,Miscellaneous Shop,18
1,Mobile Phone Shop,17
2,Middle Eastern Restaurant,17
3,Movie Theater,15
4,Accessories Store,9
5,Metro Station,9
6,Coffee Shop,9
7,Lounge,7
8,Park,7
9,Paper / Office Supplies Store,6


The results show that while you are in any neighbourhood of North York Toronto, within a raduis of 500M of the neighborhood, you are most likely to see one of the following: miscellaneous shop, mobile phone shop, middle Eastern restaurant, movie theater, metro station, Coffee shop, Accessories store, park , lounge or pizza place

## Clustering Neighborhoods in North York

There are 23 neighborhoods with nearby venues. We want to cluster them into four clusters to understand the similarities between this neighborhoods

In [392]:
kclusters = 4
north_york_clustering_data = north_york_grouped.drop("Neighborhood", 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_york_clustering_data)
kmeans.labels_

array([0, 0, 0, 1, 1, 3, 1, 0, 0, 1, 1, 1, 0, 3, 1, 2, 1, 2, 0, 2, 0, 0,
       2], dtype=int32)

In [451]:

if 'Cluster Labels' in neighborhoods_venues_sorted.columns:
    del neighborhoods_venues_sorted["Cluster Labels"]
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
ny_merged = north_york_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
ny_merged = ny_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# remove the neighborhood without any neraby values
ny_merged.dropna(inplace=True)
ny_merged["Cluster Labels"] = ny_merged["Cluster Labels"].astype(int)
ny_merged.reset_index(drop=True)

ny_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M2H,North York,Hillcrest Village,43.803762,-79.363452,1,Fast Food Restaurant,Golf Course,Pool,Mediterranean Restaurant,Dog Run,Liquor Store,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station
1,M2J,North York,"Fairview , Henry Farm , Oriole",43.778517,-79.346556,1,Clothing Store,Coffee Shop,Fast Food Restaurant,Japanese Restaurant,Restaurant,Juice Bar,Bakery,Bank,Food Court,Salon / Barbershop
2,M2K,North York,Bayview Village,43.786947,-79.385975,0,Chinese Restaurant,Café,Japanese Restaurant,Bank,Accessories Store,Luggage Store,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop
4,M2M,North York,"Willowdale , Newtonbrook",43.789053,-79.408493,2,Gym,Park,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store,Mediterranean Restaurant,Massage Studio
5,M2N,North York,Willowdale South,43.77012,-79.408493,0,Ramen Restaurant,Coffee Shop,Sushi Restaurant,Café,Shopping Mall,Pizza Place,Lounge,Fast Food Restaurant,Electronics Store,Middle Eastern Restaurant


In [452]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ny_merged['Latitude'], ny_merged['Longitude'], ny_merged['Neighborhood'], ny_merged
['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [469]:
first_cluster =ny_merged.loc[ny_merged['Cluster Labels'] == 0, ny_merged.columns[[1] + list(range(5, ny_merged.shape[1]))]]
first_cluster

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,North York,0,Chinese Restaurant,Café,Japanese Restaurant,Bank,Accessories Store,Luggage Store,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop
5,North York,0,Ramen Restaurant,Coffee Shop,Sushi Restaurant,Café,Shopping Mall,Pizza Place,Lounge,Fast Food Restaurant,Electronics Store,Middle Eastern Restaurant
7,North York,0,Wine Bar,Pharmacy,Pizza Place,Coffee Shop,Accessories Store,Lounge,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
11,North York,0,Bank,Coffee Shop,Middle Eastern Restaurant,Chinese Restaurant,Pizza Place,Deli / Bodega,Mobile Phone Shop,Diner,Pharmacy,Sandwich Place
14,North York,0,Shopping Mall,Grocery Store,Hotel,Bank,Park,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
16,North York,0,Gym / Fitness Center,Liquor Store,Grocery Store,Athletics & Sports,Discount Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station
17,North York,0,Hockey Arena,Pizza Place,Portuguese Restaurant,Coffee Shop,Accessories Store,Luggage Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
18,North York,0,Coffee Shop,Italian Restaurant,Pizza Place,Sandwich Place,Juice Bar,Pet Store,Pharmacy,Comfort Food Restaurant,Fast Food Restaurant,Café
22,North York,0,Pizza Place,Caribbean Restaurant,Intersection,Accessories Store,Lounge,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station


In [470]:
second_cluster =ny_merged.loc[ny_merged['Cluster Labels'] == 1, ny_merged.columns[[1] + list(range(5, ny_merged.shape[1]))]]
second_cluster

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,1,Fast Food Restaurant,Golf Course,Pool,Mediterranean Restaurant,Dog Run,Liquor Store,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station
1,North York,1,Clothing Store,Coffee Shop,Fast Food Restaurant,Japanese Restaurant,Restaurant,Juice Bar,Bakery,Bank,Food Court,Salon / Barbershop
9,North York,1,Gym,Caribbean Restaurant,Café,Japanese Restaurant,Luggage Store,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
10,North York,1,Gym,Coffee Shop,Restaurant,Beer Store,Discount Store,Clothing Store,Chinese Restaurant,Sandwich Place,Shopping Mall,Sporting Goods Shop
12,North York,1,Coffee Shop,Caribbean Restaurant,Miscellaneous Shop,Furniture / Home Store,Bar,Metro Station,Massage Studio,Luggage Store,Park,Paper / Office Supplies Store
13,North York,1,Airport,Business Service,Park,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store
19,North York,1,Clothing Store,Accessories Store,Boutique,Vietnamese Restaurant,Shoe Store,Miscellaneous Shop,Gift Shop,Furniture / Home Store,Coffee Shop,Carpet Store
20,North York,1,Playground,Bakery,Japanese Restaurant,Sushi Restaurant,Accessories Store,Lounge,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant


In [471]:
third_cluster =ny_merged.loc[ny_merged['Cluster Labels'] == 2, ny_merged.columns[[1] + list(range(5, ny_merged.shape[1]))]]
third_cluster

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,North York,2,Gym,Park,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store,Mediterranean Restaurant,Massage Studio
6,North York,2,Park,Construction & Landscaping,Convenience Store,Lounge,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store
8,North York,2,Food & Drink Shop,Park,Accessories Store,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store
21,North York,2,Park,Construction & Landscaping,Bakery,Basketball Court,Lounge,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant


In [472]:
fourth_cluster =ny_merged.loc[ny_merged['Cluster Labels'] == 3, ny_merged.columns[[1] + list(range(5, ny_merged.shape[1]))]]
fourth_cluster

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,North York,3,Business Service,Baseball Field,Food Truck,Accessories Store,Lounge,Paper / Office Supplies Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant
23,North York,3,Paper / Office Supplies Store,Baseball Field,Accessories Store,Liquor Store,Movie Theater,Mobile Phone Shop,Miscellaneous Shop,Middle Eastern Restaurant,Metro Station,Men's Store


In [475]:
clusters = [first_cluster, second_cluster, third_cluster, fourth_cluster]
for cluster in clusters:
    print(get_most_common_place(cluster, 2).head(10),"\n")

                      Venues  Count
0  Middle Eastern Restaurant      7
1          Mobile Phone Shop      7
2         Miscellaneous Shop      6
3                Pizza Place      6
4              Movie Theater      6
5                Coffee Shop      5
6          Accessories Store      4
7                     Lounge      3
8                   Pharmacy      3
9                       Café      3 

                          Venues  Count
0             Miscellaneous Shop      6
1              Mobile Phone Shop      4
2                    Coffee Shop      4
3      Middle Eastern Restaurant      4
4                 Clothing Store      3
5                  Movie Theater      3
6                  Metro Station      3
7            Japanese Restaurant      3
8                   Liquor Store      2
9  Paper / Office Supplies Store      2 

                       Venues  Count
0                        Park      4
1               Movie Theater      4
2   Middle Eastern Restaurant      4
3           

In [None]:
from the clusterization we see that the first clusters is a very busy neighbourhood, the second cluster is moderatively busy with fewer places, the third place is a more quiet neighbourhood with more venues like parks and movie theaters and finally the last venue with relatively few places to visit nearby.