<a id='index'></a>
# Comparing Toronto with the most important South America cities

## INDEX

[Problem Description & Objectives](#desc1)

[Data Description](#desc2)

[Analysis Part 1 - Scrape Wikipedia List of postal codes](#p1)

[Analysis Part 2 - Geospatial_data](#p2)

[Analysis Part 3 - Explore and cluster the neighborhoods in Toronto](#p3)

[Analysis Part 3.1. - Analyze Each Neighborhood](#p31)

[Analysis Part 3.2. - Explore and cluster the neighborhoods in Toronto](#p32)

[Analysis Part 3.3. - Examine Toronto Clusters](#p33)

[Analysis Part 4. - Examining Sao Paulo (Brazil), Buenos Aires (Argentina), Montividiu (Uruguay)](#p4)

[Analysis Part 5. - Fiting Sao Paulo (Brazil), Buenos Aires (Argentina), Montividiu (Uruguay)](#p5)

[Conclusions](#p6)


<a id='desc1'></a>

[Go back to index](#index)
## Problem Description & Objectives



This project aims to compare central points of major cities in South America with some neighborhoods of Toronto, Canada. 


As a result, it is expected to identify which cluster of neighborhoods in Toronto are most similar to each of these cities.

Will be analyzed 3 cities in South America: 
* Sao Paulo (Brazil), 
* Buenos Aires (Argentina) and 
* Montivideu (Uruguay).

<a id='desc2'></a>
[Go back to index](#index)
## Data Description




The data used will be:

* Foursquare API Locale Analysis, which will provide the most common establishments from all analyzed locations

* List of Neighborhoods and Postcodes in Toronto (wikipedia - https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M).

* Toronto geospatial data (https://cocl.us/Geospatial_data).

In part 1 of the paper (see index) we will capture the information from wikipedia and organize this information.

In part 2 we will take the Toronto geospatial data and combine it with the data in part 1.

In part 3 we will connect to the foursquare API and fetch data from each of the Toronto neighborhoods. We will also run machine learning and create 10 clusters for these neighborhoods and finally analyze each of the clusters.

In step 4 we will look at central regions of the 3 South American survey cities and finally, in step 5, we will fit these cities with the neighborhoods of Toronto.

As a result, in the conclusion, we hope to get the cluster that each of the cities in South America would belong to Toronto, as a way of identifying which of the neighborhood groups most resembles those cities.

<a id='p1'></a>

[Go back to index](#index)
## Analysis Part 1 - Scrape Wikipedia List of postal codes


### Importing libraries

In [1]:
import pandas as pd
import numpy as np

### Scrape the dataframe

In [2]:
#https://stackoverflow.com/questions/55234512/how-to-scrap-wikipedia-tables-with-python
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table = pd.read_html(url)[0]
table

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Queen's Park


###  Wrangling the data

In [3]:
table = table.drop(table[(table.Borough == "Not assigned")].index)

### Combining neighbourhood into one row

In [4]:
table["duplicated"]=table.duplicated(keep='first', subset="Postcode")
table['Neighborhood_2'] = np.where(table['duplicated']==True, table.Neighbourhood +', ', table.Neighbourhood)
table.drop(['Neighbourhood', 'duplicated'], axis=1, inplace=True)
table = table.groupby(['Postcode', 'Borough'], as_index=False).sum()
table.rename(columns={"Neighborhood_2": "Neighborhood"}, inplace=True)

### Neighborhood "Not assigned" to the same as the borough

In [5]:
table['Neighborhood_2'] = np.where(table['Neighborhood']=="Not assigned", table.Borough, table.Neighborhood)
table.drop(['Neighborhood'], axis=1, inplace=True)
table.rename(columns={"Neighborhood_2": "Neighborhood"}, inplace=True)

In [6]:
table.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"RougeMalvern,"
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,"
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
table.shape

(103, 3)

<a id='p2'></a>
[Go back to index](#index)
## Analysis Part 2 - Toronto Geospatial data

In [8]:
data = pd.read_csv('https://cocl.us/Geospatial_data')
data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
table = table.merge(data, left_on='Postcode', right_on='Postal Code')
table.drop(['Postal Code'], axis=1, inplace=True)
table.head(11)

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount ParkIonview, Kennedy Park,",43.727929,-79.262029
7,M1L,Scarborough,"ClairleaGolden Mile, Oakridge,",43.711112,-79.284577
8,M1M,Scarborough,"CliffcrestCliffside, Scarborough Village West,",43.716316,-79.239476
9,M1N,Scarborough,"Birch CliffCliffside West,",43.692657,-79.264848


<a id='p3'></a>
[Go back to index](#index)
## Analysis Part 3 - Explore and cluster the neighborhoods in Toronto

### Importing libraries

In [10]:
!pip install geopy



In [11]:
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

### Creating a map with neighborhoods superimposed on top

In [12]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [13]:
toronto_data = table
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476



address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))


In [14]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [46]:
CLIENT_ID = '##' # your Foursquare ID
CLIENT_SECRET = '##' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ##
CLIENT_SECRET:##


In [16]:
toronto_data.loc[0, 'Neighborhood']

'RougeMalvern, '

In [17]:
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of RougeMalvern,  are 43.806686299999996, -79.19435340000001.


In [18]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=CE1N3QEBYUJG4ETCLZ1QHK3C2K1DMJBAH2ZBXJPICVW5AE5Q&client_secret=HYA2B01YII2OQRP4OWFAEWTOUXHGM35FCQ4YOJI55Q1YVKUL&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100'

#### get_category_type function from the Foursquare lab.

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### clean the json and structure it into a *pandas* dataframe.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *Toronto_venues*.

In [21]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

RougeMalvern, 
Highland CreekRouge Hill, Port Union, 
GuildwoodMorningside, West Hill, 
Woburn
Cedarbrae
Scarborough Village
East Birchmount ParkIonview, Kennedy Park, 
ClairleaGolden Mile, Oakridge, 
CliffcrestCliffside, Scarborough Village West, 
Birch CliffCliffside West, 
Dorset ParkScarborough Town Centre, Wexford Heights, 
MaryvaleWexford, 
Agincourt
Clarks CornersSullivan, Tam O'Shanter, 
Agincourt NorthL'Amoreaux East, Milliken, Steeles East, 
L'Amoreaux West
Upper Rouge
Hillcrest Village
FairviewHenry Farm, Oriole, 
Bayview Village
Silver HillsYork Mills, 
NewtonbrookWillowdale, 
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon ParkDon Mills South, 
Bathurst ManorDownsview North, Wilson Heights, 
Northwood ParkYork University, 
CFB TorontoDownsview East, 
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine GardensParkview Hill, 
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth WestRi

#### Let's check the size of the resulting dataframe

In [22]:
print(toronto_venues.shape)
toronto_venues.head()

(2260, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"RougeMalvern,",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
3,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [23]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"AdelaideKing, Richmond,",100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
"Agincourt NorthL'Amoreaux East, Milliken, Steeles East,",3,3,3,3,3,3
"Albion GardensBeaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown,",7,7,7,7,7,7
"AlderwoodLong Branch,",10,10,10,10,10,10
"Bathurst ManorDownsview North, Wilson Heights,",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford ParkLawrence Manor East,",25,25,25,25,25,25
Berczy Park,56,56,56,56,56,56
"Birch CliffCliffside West,",4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [24]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 271 uniques categories.


<a id='p31'></a>
[Go back to index](#index)
## Analysis Part 3.1. - Analyze and Cluster Toronto Neighborhoods

In [25]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
toronto_onehot.shape

(2260, 271)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"AdelaideKing, Richmond,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.020000,0.000000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000,0.0
1,Agincourt,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
2,"Agincourt NorthL'Amoreaux East, Milliken, Stee...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
3,"Albion GardensBeaumond Heights, Humbergate, Ja...",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4,"AlderwoodLong Branch,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
5,"Bathurst ManorDownsview North, Wilson Heights,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.047619,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
6,Bayview Village,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
7,"Bedford ParkLawrence Manor East,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
8,Berczy Park,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.017857,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
9,"Birch CliffCliffside West,",0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0


In [28]:
toronto_grouped.shape

(99, 271)

#### Let's print each neighborhood along with the top 5 most common venues

In [29]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AdelaideKing, Richmond, ----
         venue  freq
0  Coffee Shop  0.07
1         Café  0.05
2   Steakhouse  0.04
3          Bar  0.04
4        Hotel  0.03


----Agincourt----
                       venue  freq
0             Sandwich Place  0.25
1                     Lounge  0.25
2             Breakfast Spot  0.25
3  Latin American Restaurant  0.25
4              Metro Station  0.00


----Agincourt NorthL'Amoreaux East, Milliken, Steeles East, ----
                        venue  freq
0                  Playground  0.33
1            Asian Restaurant  0.33
2                        Park  0.33
3               Metro Station  0.00
4  Modern European Restaurant  0.00


----Albion GardensBeaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown, ----
                  venue  freq
0  Fast Food Restaurant  0.14
1   Fried Chicken Joint  0.14
2        Sandwich Place  0.14
3              Pharmacy  0.14
4            Beer Store  0.14


----AlderwoodLong Branch, 

                venue  freq
0    Department Store  0.17
1  Chinese Restaurant  0.17
2      Discount Store  0.17
3         Bus Station  0.17
4         Coffee Shop  0.17


----East Toronto----
                        venue  freq
0                        Park  0.50
1           Convenience Store  0.25
2                 Coffee Shop  0.25
3                 Men's Store  0.00
4  Modern European Restaurant  0.00


----EmeryHumberlea, ----
                             venue  freq
0                   Baseball Field   1.0
1                      Yoga Studio   0.0
2               Mexican Restaurant   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----FairviewHenry Farm, Oriole, ----
                  venue  freq
0        Clothing Store  0.15
1  Fast Food Restaurant  0.07
2           Coffee Shop  0.07
3         Women's Store  0.04
4     Electronics Store  0.03


----First Canadian PlaceUnderground city, ----
         venue  freq
0  Coffee Shop  0.12
1         

                  venue  freq
0        Clothing Store  0.07
1           Coffee Shop  0.07
2        Cosmetics Shop  0.04
3  Fast Food Restaurant  0.03
4                  Café  0.03


----Scarborough Village----
                             venue  freq
0                       Playground   0.5
1                   Cosmetics Shop   0.5
2               Mexican Restaurant   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----Silver HillsYork Mills, ----
                             venue  freq
0                        Cafeteria   1.0
1               Mexican Restaurant   0.0
2              Monument / Landmark   0.0
3  Molecular Gastronomy Restaurant   0.0
4       Modern European Restaurant   0.0


----St. James Town----
                venue  freq
0         Coffee Shop  0.06
1                Café  0.06
2               Hotel  0.05
3          Restaurant  0.05
4  Italian Restaurant  0.04


----Stn A PO Boxes 25 The Esplanade----
         venue  freq
0  Coff

#### Let's put that into a *pandas* dataframe

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"AdelaideKing, Richmond,",Coffee Shop,Café,Bar,Steakhouse,Breakfast Spot,Asian Restaurant,Restaurant,Bakery,Hotel,Sushi Restaurant
1,Agincourt,Lounge,Sandwich Place,Latin American Restaurant,Breakfast Spot,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop
2,"Agincourt NorthL'Amoreaux East, Milliken, Stee...",Park,Playground,Asian Restaurant,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
3,"Albion GardensBeaumond Heights, Humbergate, Ja...",Pharmacy,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Pizza Place,Sandwich Place,Grocery Store,Airport Lounge,Event Space,Empanada Restaurant
4,"AlderwoodLong Branch,",Pizza Place,Gym,Pub,Coffee Shop,Sandwich Place,Pharmacy,Dance Studio,Skating Rink,Pool,Deli / Bodega


<a id='p32'></a>
[Go back to index](#index)
## Analysis Part 3.2. - Explore and cluster the neighborhoods in Toronto

Run *k*-means to cluster the neighborhood into 10 clusters.

In [32]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 8, 0, 0, 4, 4, 4, 4, 0])

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"RougeMalvern,",43.806686,-79.194353,9.0,Fast Food Restaurant,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop
1,M1C,Scarborough,"Highland CreekRouge Hill, Port Union,",43.784535,-79.160497,7.0,Bar,History Museum,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
2,M1E,Scarborough,"GuildwoodMorningside, West Hill,",43.763573,-79.188711,0.0,Electronics Store,Breakfast Spot,Mexican Restaurant,Rental Car Location,Intersection,Medical Center,Pizza Place,Doner Restaurant,Dog Run,Discount Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3.0,Coffee Shop,Korean Restaurant,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2.0,Fried Chicken Joint,Bakery,Hakka Restaurant,Bank,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant


In [34]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].fillna(0).astype(int)

### Finally, let's visualize the resulting clusters

In [35]:
# create map
map_clusters = folium.Map(location=[latitude+0.06, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='p33'></a>
[Go back to index](#index)
## Analysis Part 3.3. - Examine Toronto Clusters

#### Cluster 1

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"GuildwoodMorningside, West Hill,",Electronics Store,Breakfast Spot,Mexican Restaurant,Rental Car Location,Intersection,Medical Center,Pizza Place,Doner Restaurant,Dog Run,Discount Store
7,"ClairleaGolden Mile, Oakridge,",Bakery,Bus Line,Soccer Field,Fast Food Restaurant,Bus Station,Intersection,Metro Station,Park,Donut Shop,Doner Restaurant
9,"Birch CliffCliffside West,",Skating Rink,Café,College Stadium,General Entertainment,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
13,"Clarks CornersSullivan, Tam O'Shanter,",Pharmacy,Pizza Place,Bank,Fried Chicken Joint,Noodle House,Fast Food Restaurant,Chinese Restaurant,Thai Restaurant,Italian Restaurant,Dim Sum Restaurant
16,Upper Rouge,,,,,,,,,,
21,"NewtonbrookWillowdale,",,,,,,,,,,
26,Don Mills North,Japanese Restaurant,Gym / Fitness Center,Café,Caribbean Restaurant,Baseball Field,Women's Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run
35,"Woodbine GardensParkview Hill,",Fast Food Restaurant,Pizza Place,Pharmacy,Gastropub,Bus Line,Intersection,Bank,Café,Athletics & Sports,Breakfast Spot
76,"Dovercourt VillageDufferin,",Supermarket,Bakery,Pharmacy,Middle Eastern Restaurant,Music Venue,Café,Bar,Bank,Brewery,Gym / Fitness Center
88,"Humber Bay ShoresMimico South, New Toronto,",Café,Gym,Bakery,Fried Chicken Joint,Flower Shop,Fast Food Restaurant,Liquor Store,Pizza Place,Restaurant,Sandwich Place


#### Cluster 2

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"Silver HillsYork Mills,",Cafeteria,Dog Run,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store,Dance Studio


#### Cluster 3

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Cedarbrae,Fried Chicken Joint,Bakery,Hakka Restaurant,Bank,Athletics & Sports,Thai Restaurant,Caribbean Restaurant,Discount Store,Dessert Shop,Dim Sum Restaurant
11,"MaryvaleWexford,",Middle Eastern Restaurant,Auto Garage,Sandwich Place,Vietnamese Restaurant,Breakfast Spot,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
12,Agincourt,Lounge,Sandwich Place,Latin American Restaurant,Breakfast Spot,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop
31,Downsview West,Grocery Store,Hotel,Convenience Store,Shopping Mall,Bank,Park,German Restaurant,Curling Ice,Dumpling Restaurant,Drugstore
33,Downsview Northwest,Grocery Store,Discount Store,Athletics & Sports,Liquor Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dance Studio
75,Christie,Grocery Store,Café,Park,Athletics & Sports,Restaurant,Baby Store,Coffee Shop,Candy Store,Italian Restaurant,Nightclub
79,"DownsviewNorth Park, Upwood Park,",Construction & Landscaping,Basketball Court,Park,Bakery,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Women's Store
81,"The Junction NorthRunnymede,",Convenience Store,Bus Line,Brewery,Caribbean Restaurant,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store
92,"Kingsway Park South WestMimico NW, The Queensw...",Gym,Fast Food Restaurant,Tanning Salon,Burrito Place,Sandwich Place,Burger Joint,Discount Store,Convenience Store,Bakery,Supplement Shop


#### Cluster 4

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Woburn,Coffee Shop,Korean Restaurant,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Donut Shop


#### Cluster 5

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"East Birchmount ParkIonview, Kennedy Park,",Chinese Restaurant,Convenience Store,Bus Station,Discount Store,Department Store,Coffee Shop,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant
8,"CliffcrestCliffside, Scarborough Village West,",Motel,Movie Theater,American Restaurant,Women's Store,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
10,"Dorset ParkScarborough Town Centre, Wexford He...",Indian Restaurant,Light Rail Station,Vietnamese Restaurant,Chinese Restaurant,Brewery,Pet Store,German Restaurant,General Travel,Dumpling Restaurant,Drugstore
15,L'Amoreaux West,Fast Food Restaurant,Chinese Restaurant,Gym,Breakfast Spot,Electronics Store,Coffee Shop,Pizza Place,Camera Store,Sandwich Place,Bubble Tea Shop
17,Hillcrest Village,Dog Run,Golf Course,Athletics & Sports,Pool,Mediterranean Restaurant,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
18,"FairviewHenry Farm, Oriole,",Clothing Store,Coffee Shop,Fast Food Restaurant,Women's Store,Tea Room,Bakery,Japanese Restaurant,Toy / Game Store,Chinese Restaurant,Electronics Store
19,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Women's Store
22,Willowdale South,Sushi Restaurant,Ramen Restaurant,Coffee Shop,Sandwich Place,Pizza Place,Café,Ice Cream Shop,Fast Food Restaurant,Hotel,Steakhouse
24,Willowdale West,Pharmacy,Butcher,Home Service,Discount Store,Pizza Place,Coffee Shop,Grocery Store,Airport Lounge,Falafel Restaurant,Ethiopian Restaurant
25,Parkwoods,BBQ Joint,Park,Bus Stop,Food & Drink Shop,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store


In [41]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Downsview Central,Business Service,Food Truck,Home Service,Baseball Field,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store
97,"EmeryHumberlea,",Baseball Field,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Event Space


In [42]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough Village,Playground,Cosmetics Shop,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
48,"Moore ParkSummerhill East,",Trail,Restaurant,Playground,Tennis Court,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
73,Humewood-Cedarvale,Playground,Field,Hockey Arena,Trail,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant


In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Highland CreekRouge Hill, Port Union,",Bar,History Museum,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
102,Northwest,Drugstore,Rental Car Location,Bar,Women's Store,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run


In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 8, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"Agincourt NorthL'Amoreaux East, Milliken, Stee...",Park,Playground,Asian Restaurant,Women's Store,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
23,York Mills West,Park,Bank,Convenience Store,Women's Store,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
30,"CFB TorontoDownsview East,",Park,Airport,Snack Place,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
40,East Toronto,Park,Coffee Shop,Convenience Store,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
44,Lawrence Park,Park,Bus Line,Swim School,Women's Store,Discount Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
50,Rosedale,Park,Playground,Trail,Women's Store,Discount Store,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
74,Caledonia-Fairbanks,Park,Women's Store,Fast Food Restaurant,Market,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
98,Weston,Park,Convenience Store,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant


In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 9, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"RougeMalvern,",Fast Food Restaurant,Women's Store,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop


<a id='p4'></a>
[Go back to index](#index)
## Analysis Part 4 -  Examining Sao Paulo (Brazil), Buenos Aires (Argentina), Montividiu (Uruguay)

TO BE CONTINUED

<a id='p5'></a>
[Go back to index](#index)
## Analysis Part 5 - Fitting Sao Paulo (Brazil), Buenos Aires (Argentina), Montividiu (Uruguay)

TO BE CONTINUED

<a id='p6'></a>
[Go back to index](#index)
## Conclusions

TO BE CONTINUED