# Canadian Twin Cities ?

## Introduction
This report seeks to compare two of Canada's most popular and desirable cities; Toronto and Vancouver. These two cities has been eternal 'rivals' among migrating city dwellers, professional expatriates, and international tourists  .

The desirability and productivity of a city is the result of its economic growth and development, no less contributed by its land-use decisions. Through time, urban planners constantly monitors land use, so as to ensure the landscape not only retains their heritage and strategic advantage, but also its usage reaps the highest return.  

There are aplenty of websites and travel guides on these cities, with extensive write-ups on features such as city site, climate, history and culture. However, these is an absence of resources that allows stakeholders to have a simple direct comparison of neighborhoods' physical characteristics in terms of land use mix across these two cities. I will attempt to extend that comparison using FourSquare's location data.

#### Brief abstract on Toronto
Toronto is the provincial capital of Ontario, is the most populous city in Canada. It is an important international trading centre, with the greatest economic ties to, and influence from, the United States. It is home to the headquarters of Canada's five largest banks and other multinationals, and host a dominant stock exchange. By the 1980s, shifted to service employment, making Toronto a prominent financial, insurance, administration and retailing centre.
#### Brief abstract on Vancouver
Vancouver is a coastal seaport city in region of British Columbia, and is Canada's major Pacific coast port, and is today home to the largest fine natural port in Canada, serving as a main hub for trade with Asia and Pacific Rim. It is no wonder that it has trade and transportation as basic components of its economy, together with forestry and mining.  It is North America's most cosmopolitan place, with one of the most pictureque settings of any city in the world, making it a favorite tourist, as well as film and TV production destination.

## Significance of this project for stakeholders
This study will be of interest to the following stakeholders:
1. Prospective Internal and external immigrants, for reasons such as
  * to adopt a neighborhood with similar facilities (e.g. types of restaurants, supermarkets) to their current residents, so as to adapt quickly.
  * a preference to start new residences in neighborhoods that is either densely built (for vibrancy and convenience), or sparsely built (for better air quality or tranquility).

2. City planners and officials, for reasons such as
  * to monitor the development and growth of specified neighborhoods, and if the landscape deviated from intended usage. 
  * to compare, and differentiate, the land use of neighborhoods of their city from rival cities, perhaps to boost tourist visits.

3. Companies and businesses, for reasons such as
  * preferences to set up franchises, outlets and branches in other cities at locations with neighborhood characteristics that are similar to successful sites in present city. Successful or prosperous office and shop locations could be due to intangible correlations with surrounding features (such as near transport facilities, or complementary business types), and business owners usually will pay attention to such details when establishing new sites in another city.

## Neighborhoods for comparison
Comparison is conducted on metrics based on physical forms (characteristics) of neighborhoods within designated feature district of the city. I have selected three feature districts of the city for this task: waterfront/habour, financial district, airport.

Justification for neighborhood selection:

1. Shipping trades contributed a significant role in the early development of both cities, where both had thrived from trades with neighboring countries. And picturesque waterfronts are popular destinations for both tourists and affluent residents. A comparison could be established for potential residents before making real estate investments, as well as city officials if the land use is reaping good returns in terms of tourist dollars.

   *Neighnorhoods: Coal Harbour  vs Downtown Toronto, Harbourfront*
   

2. The central business district is the engine of economic growth for any city. A comparison for city planners would enable them to observe how the land-use in own or rival neighborhood reaps the most profitable returns in terms of rentals. The built density would be an indicator of  efficient land use.

    *Neighborhoods: Downtown Vancouver vs Downtown Toronto, Union Station*


3. The airport is the main aviation hub for any cities. A comparison of land-organization around the neighborhood would allow analysis of the traffic movements and vehicle speed, two crucial factors influencing any aviation hub stature.

   *Neighborhoods: Vancouver International Airport vs Pearson International Airport*

## Data Sources
Data will be extracted from the following sources,
  * https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
  * https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver
  * https://api.foursquare.com/v2/
  * https://www.latlong.net
 
The following underlines the sequence the data would be acquired.

1. Neighborhoods in Cities:

   Neighborhoods for Toronto and Vancouver would be scraped off the above wikipedia sites. The boroughs with their respective neighborhoods would be assigned to their respective dataframes. This is the first step in extracting the required neighborhoods for comparison.
   

2. Location data:

   From the dataframes of neighborhoods, two new columns, for the latitude and longitude for each neighborhood, would be inserted. Latitudes and longitudes are earth's geographical coordinates that enables location apps such as FourSquare, to pinpoint a location on earth. The geographical coordinates for each neighborhood are looked up from a free online geographic tool, such as https://www.latlong.net.
   

3. Neighborhood's venues:

   Each neighborhood under comparison will have their geographical coordinates used to extract venues residing within the neighborhood, via FourSquare's APIs calls. A venue is defined as a place of activity, and range any place from cafes and parks to sports facilities. The respond returned by FourSquare's venue endpoint includes data like name, address, categories and distance. The latter two will be utilized in comparison analysis.

   i. Venue categories:
This describes the class or 'group' which a venue belongs to, and venues that are assigned the same category implies they share common characteristics. For example, venues that are categorized 'parks' are likely large public gardens or areas of land for recreational purposes. This categorical data could be used to make land use organization comparison between cities.

   ii. Venue distance: 
This measures the space, in meters, of each venue from the given neighborhood's geographical location. This quantitative data could be be used to determine the area density of each neighborhood. Venues that on average closer to the neighborhood implies denser land use.

## Strategy for Comparison
#### In Brief
Analysis on the comparison for each categories of neighborhood is performed for 3 rings of buffer zones from the location's centre; 1km, 3 km, 5km. For each buffer zone, the top 10 most common venue categories is presented, and analyzed between cities. This allows us to compare and quantify the extent of land use mixture between cities. In addition, the average distance of the venues in each buffer zone from the location, as well as inter-buffer averge venue distance, is calculated to determine the density of each neighborhoods. 

#### Importing libraries
Begin by importing all necessary libraries and packages required for data manipulation, clustering and visualization

In [1]:
#libraries for downloading and parsing webpages  
from bs4 import BeautifulSoup
import requests

#Library for retrieving geographical coordinates
import geocoder

#Libraries for structuring data for manipulation and analysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize
import csv

#Library for handling json files
import json

#Library for converting addresses into latitude and longitude
import geocoder
from geopy.geocoders import Nominatim

#Library for performing vectorized operations on data
import numpy as np

#Library for plotting visuals displays and maps
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

#Library for handling clustering algorithm
from sklearn.cluster import KMeans

print('All Libraries Imported')

All Libraries Imported


#### Data Retrieval and Extraction
Extracting the data source for neighborhoods in Toronto.

In [4]:
#Retrieving the wikipage on list of postal codes in Toronto
toronto_page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
#Creating a BS4 object, for parsing webpage content
toronto_soup = BeautifulSoup(toronto_page.content, 'html.parser')
#Extracting the table containing postal codes, borough and neighborhood data
toronto_table = toronto_soup.find(class_='wikitable sortable')

#Writing each row in the table to a csv file
tablerows = toronto_table.find_all('tr')
f = csv.writer(open('toronto.csv', 'w'))
f.writerow(['PostalCode', 'Borough', 'Neighborhood'])

for a_row in tablerows:
    rdata = list(a_row.find_all('td'))
    if rdata == []:
        continue
    elif (rdata[1].get_text() =="Not assigned"):
        continue
    else:
        if (rdata[2].get_text() == 'Not assigned\n'):
            pcode = rdata[0].get_text('\r', strip = True)
            borough = rdata[1].get_text('\r', strip = True)
            nhood = rdata[1].get_text('\r', strip = True)
        else:
            pcode = rdata[0].get_text('\r', strip = True)
            borough = rdata[1].get_text('\r', strip = True)
            nhood = rdata[2].get_text('\r', strip = True)
     
        f.writerow([pcode, borough, nhood])

#### Transforming the data into pandas dataframe for both cities
Neighborhoods list are retrieved from wikipages, and populated with their respective geographical coordinates. 

In [5]:
#Creating a dataframe for data analysis and manipulation
df = pd.read_csv('toronto.csv')
#Creating a dataframe with the required column headers
toronto_df = pd.DataFrame(columns = ['PostalCode', 'Borough', 'Neighborhood'])

#Inserting the all record into the dataframe
toronto_df = toronto_df.append(df.iloc[:,:], ignore_index = False, sort = False)

lat_lng_df = pd.read_csv('Geospatial_Coordinates.csv')

#Creating two new columns for each neighborhood's coordinates to the dataframe, and initializing with zero values
toronto_df['Latitude'] = 0.0
toronto_df['Longitude'] = 0.0

#Populating each neighborhood's coordinates with latitude and longitude data
for index1, row1 in toronto_df.iterrows():
    
    for index2, row2 in lat_lng_df.iterrows():
        
        if row1['PostalCode'] == row2['Postal Code']:
            toronto_df.iloc[index1,3] = lat_lng_df.iloc[index2,1]
            toronto_df.iloc[index1,4] = lat_lng_df.iloc[index2,2]
            break

com_toronto = pd.DataFrame(columns = ['Neighborhood', 'Latitude', 'Longitude'])
com_index = 0
#com_toronto = pd.DataFrame()
for index, selected in toronto_df.iterrows():
    if toronto_df.iloc[index,2] == 'Harbourfront':
        com_toronto.loc[com_index,'Neighborhood'] = toronto_df.iloc[index,2]
        com_toronto.loc[com_index,'Latitude'] = toronto_df.iloc[index,3]
        com_toronto.loc[com_index,'Longitude'] = toronto_df.iloc[index,4]
        com_index = com_index+1
    if toronto_df.iloc[index,2] == 'Union Station':
        com_toronto.loc[com_index,'Neighborhood'] = toronto_df.iloc[index,2]
        com_toronto.loc[com_index,'Latitude'] = toronto_df.iloc[index,3]
        com_toronto.loc[com_index,'Longitude'] = toronto_df.iloc[index,4]
        com_index = com_index+1
#Inserting geographical coordinates of Toronto's Pearson Airport as it is not listed in the wikipage.
#Geodcoder could not be implemented, hence coordinates are assigned directly from https://www.latlong.net.
com_toronto.loc[com_index,'Neighborhood'] = 'Pearson International Airport'
com_toronto.loc[com_index,'Latitude'] = 43.6777
com_toronto.loc[com_index,'Longitude'] = -79.6248
        
com_toronto

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Harbourfront,43.6543,-79.3606
1,Union Station,43.6408,-79.3818
2,Pearson International Airport,43.6777,-79.6248


In [6]:
#Retrieving the wikipage on list of neighborhoods in Vancouver
vancouver_page = requests.get('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Vancouver')
#Creating a BS4 object, for parsing webpage content
vancouver_soup = BeautifulSoup(vancouver_page.content, 'html.parser')
#Extracting the table containing postal codes, borough and neighborhood data

com_vancouver = pd.DataFrame(columns = ['Neighborhood', 'Latitude', 'Longitude'])

#Geodcoder could not be implemented, hence coordinates are assigned directly from https://www.latlong.net.
com_vancouver.loc[0,'Neighborhood'] = vancouver_soup.find(title='Coal Harbour').contents[0]
com_vancouver.loc[0,'Latitude'] = 49.2891
com_vancouver.loc[0,'Longitude'] = -123.1225

com_vancouver.loc[1,'Neighborhood'] = vancouver_soup.find(title='Downtown Vancouver').contents[0]
com_vancouver.loc[1,'Latitude'] = 49.28394
com_vancouver.loc[1,'Longitude'] = -123.10553

com_vancouver.loc[2,'Neighborhood'] = 'Vancouver International Airport'
com_vancouver.loc[2,'Latitude'] = 49.1967
com_vancouver.loc[2,'Longitude'] = -123.1815

com_vancouver

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Coal Harbour,49.2891,-123.123
1,Downtown Vancouver,49.2839,-123.106
2,Vancouver International Airport,49.1967,-123.181


In [7]:
map_toronto = folium.Map(location = [43.6561136, -79.392321], zoom_start = 10)

#add markers to map
for lat, lng, neighborhood in zip(com_toronto['Latitude'], com_toronto['Longitude'], com_toronto['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_toronto)


map_toronto

In [8]:
map_vancouver = folium.Map(location = [49.2827, -123.1207], zoom_start = 10)

#add markers to map
for lat, lng, neighborhood in zip(com_vancouver['Latitude'], com_vancouver['Longitude'], com_vancouver['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_vancouver)


map_vancouver

#### Foursquare credentials
To be used for accessing Foursquare's APIs

In [9]:
CLIENT_ID = 'AVT01CGGJTLVXTNFWSABFLKPYKCRPHLAKFWBICJ1T4KXL4BF' # your Foursquare ID
CLIENT_SECRET = 'K3SSQWZCEOTJEFWHETNWDLUC13VDON2KWGCVNCP1N0AXNO1V' # your Foursquare Secret
VERSION = '20190308' # Foursquare API version
LIMIT=1000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AVT01CGGJTLVXTNFWSABFLKPYKCRPHLAKFWBICJ1T4KXL4BF
CLIENT_SECRET:K3SSQWZCEOTJEFWHETNWDLUC13VDON2KWGCVNCP1N0AXNO1V


In [10]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
# Function that returns a list of venues within a specified radius (metres) of a given neighborhood

def getNearbyVenues(name, lat, lng, radius):
    
    venues_list=[]
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
       
    # return only relevant information for each nearby venue
    venues_list.append([(
         name, 
         lat, 
         lng,
         v['venue']['name'], 
         v['venue']['location']['lat'], 
         v['venue']['location']['lng'],
         v['venue']['location']['distance'],
         v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Distance',
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
# A function that sorts the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[1:num_top_venues]

## Information Collection
In this section, we use FourSquare's API to collect all nearby venues for each of the neighborhoods in each of the three districts of waterfront, financial center and airport. 
The venues collected will be increasingly expanded further from the neighborhood center, in radius of 1km, 3km and 5km.
Data collected for each venue includes their coordinates, as well as category (for analysing land-mix usage) and distance from the neighborhood center (for analysing land use density).

In [13]:
# Compiling list of venues for each neighborhood in Toronto, for each buffer zones of 1km, 3km, 5km.
toronto_zone = []

for zone in range(3):
    toronto_venues_list = []
    for index, row in com_toronto.iterrows():
        if zone == 0:
            df = getNearbyVenues(row[0], row[1], row[2], 1000)
        elif zone == 1:
            df = getNearbyVenues(row[0], row[1], row[2], 3000)
        else:
            df = getNearbyVenues(row[0], row[1], row[2], 5000)
            
        toronto_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="_")
        # add neighborhood column back to dataframe
        toronto_onehot['Neighborhood'] = df['Neighborhood']
        # move neighborhood column to the first column
        fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
        toronto_onehot = toronto_onehot[fixed_columns]
        toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
        toronto_venues_list.append(toronto_grouped)
    
    toronto_zone.append(toronto_venues_list)

In [14]:
# Compiling list of venues for each neighborhood in Vancouver, for each buffer zones of 1km, 3km, 5km.
vancouver_zone = []

for zone in range(3):
    vancouver_venues_list = []
    for index, row in com_vancouver.iterrows():
        if zone == 0:
            df = getNearbyVenues(row[0], row[1], row[2], 1000)
        elif zone == 1:
            df = getNearbyVenues(row[0], row[1], row[2], 3000)
        else:
            df = getNearbyVenues(row[0], row[1], row[2], 5000)
            
        vancouver_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="_")
        # add neighborhood column back to dataframe
        vancouver_onehot['Neighborhood'] = df['Neighborhood']
        # move neighborhood column to the first column
        fixed_columns = [vancouver_onehot.columns[-1]] + list(vancouver_onehot.columns[:-1])
        vancouver_onehot = vancouver_onehot[fixed_columns]
        vancouver_grouped = vancouver_onehot.groupby('Neighborhood').mean().reset_index()
        vancouver_venues_list.append(vancouver_grouped)
    
    vancouver_zone.append(vancouver_venues_list)

In [15]:
# Presenting the Top 10 most common venues for each neighborhood in each buffer zone in Toronto.

num_top_venues = 10
for zone in range (3):
    for district in range (3):
        for hood in toronto_zone[zone][district]['Neighborhood']:
            print('In zone {}:'.format(zone+1))
            print("----"+hood+"----")
            temp = toronto_zone[zone][district][toronto_zone[zone][district]['Neighborhood'] == hood].T.reset_index()
            temp.columns = ['venue','freq']
            temp = temp.iloc[1:]
            temp['freq'] = temp['freq'].astype(float)
            temp = temp.round({'freq': 2})
            print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
            print('\n')

In zone 1:
----Harbourfront----
                 venue  freq
0         _Coffee Shop  0.16
1                _Café  0.05
2  _Italian Restaurant  0.04
3               _Diner  0.03
4          _Restaurant  0.03
5             _Theater  0.03
6              _Bakery  0.03
7                _Park  0.03
8      _Breakfast Spot  0.03
9                 _Pub  0.03


In zone 1:
----Union Station----
                 venue  freq
0         _Coffee Shop  0.08
1                _Café  0.07
2               _Hotel  0.06
3            _Aquarium  0.04
4      _Scenic Lookout  0.03
5                _Park  0.03
6             _Brewery  0.03
7             _Theater  0.03
8  _Italian Restaurant  0.03
9           _Hotel Bar  0.02


In zone 1:
----Pearson International Airport----
                        venue  freq
0           _Airport Terminal  0.15
1                _Coffee Shop  0.11
2                    _Airport  0.09
3             _Airport Lounge  0.09
4       _Fast Food Restaurant  0.04
5                   _Tea Roo

In [16]:
# Presenting the Top 10 most common venues for each neighborhood in each buffer zone in Vancouver.

num_top_venues = 10
for zone in range (3):
    for district in range (3):
        for hood in vancouver_zone[zone][district]['Neighborhood']:
            print('In zone {}:'.format(zone+1))
            print("----"+hood+"----")
            temp = vancouver_zone[zone][district][vancouver_zone[zone][district]['Neighborhood'] == hood].T.reset_index()
            temp.columns = ['venue','freq']
            temp = temp.iloc[1:]
            temp['freq'] = temp['freq'].astype(float)
            temp = temp.round({'freq': 2})
            print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
            print('\n')

In zone 1:
----Coal Harbour----
                  venue  freq
0                _Hotel  0.12
1           _Restaurant  0.05
2                 _Café  0.05
3           _Steakhouse  0.04
4  _Japanese Restaurant  0.04
5  _American Restaurant  0.04
6          _Coffee Shop  0.04
7         _Dessert Shop  0.04
8       _Breakfast Spot  0.03
9   _Seafood Restaurant  0.03


In zone 1:
----Downtown Vancouver----
                     venue  freq
0             _Coffee Shop  0.07
1                     _Pub  0.05
2                    _Café  0.05
3              _Restaurant  0.04
4      _Italian Restaurant  0.03
5             _Pizza Place  0.03
6                  _Lounge  0.03
7          _Sandwich Place  0.03
8  _Furniture / Home Store  0.03
9      _Mexican Restaurant  0.02


In zone 1:
----Vancouver International Airport----
                    venue  freq
0            _Coffee Shop  0.22
1         _Airport Lounge  0.11
2        _Airport Service  0.09
3           _Burger Joint  0.04
4   _Fast Food Restaur

## Analyzing Land-Mix Use
### Comparing venues within 1km
Here, we will compare the Top 10 most commmon venue categories for each of the three neighborhood, within a 1km buffer radius.

In [17]:
# Top 10 Venues within 1 km of each neighborhood in Toronto.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_consol_1 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(toronto_zone[0][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = toronto_zone[0][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_zone[0][district].iloc[0, :], num_top_venues)
        toronto_consol_1 = toronto_consol_1.append(neighborhoods_venues_sorted, ignore_index = True)
toronto_consol_1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront,,_Café,_Italian Restaurant,_Park,_Restaurant,_Breakfast Spot,_Pub,_Theater,_Diner,_Bakery
1,Union Station,,_Café,_Hotel,_Aquarium,_Theater,_Park,_Brewery,_Italian Restaurant,_Scenic Lookout,_Restaurant
2,Pearson International Airport,,_Coffee Shop,_Airport,_Airport Lounge,_Restaurant,_Middle Eastern Restaurant,_Italian Restaurant,_Fast Food Restaurant,_Tea Room,_Burger Joint


In [18]:
# Top 10 Venues within 1 km of each neighborhood in Vancouver.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
vancouver_consol_1 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(vancouver_zone[0][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = vancouver_zone[0][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(vancouver_zone[0][district].iloc[0, :], num_top_venues)
        vancouver_consol_1 = vancouver_consol_1.append(neighborhoods_venues_sorted, ignore_index = True)
vancouver_consol_1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Coal Harbour,,_Restaurant,_Café,_American Restaurant,_Steakhouse,_Dessert Shop,_Japanese Restaurant,_Coffee Shop,_Seafood Restaurant,_Breakfast Spot
1,Downtown Vancouver,,_Café,_Pub,_Restaurant,_Furniture / Home Store,_Lounge,_Pizza Place,_Sandwich Place,_Italian Restaurant,_Breakfast Spot
2,Vancouver International Airport,,_Airport Lounge,_Airport Service,_Burger Joint,_Sandwich Place,_Rental Car Location,_Fast Food Restaurant,_Wine Bar,_Food Truck,_Bakery


##### Observation:
* __Harbour__: Cafe appears among top 3 venue for both. But Coal Harbour has a higher proportion of wide restaurants choices in the top 10, probably due to Coal Harbour having more scenic view than Harbourfront. In fact, all top 10 venues for Coal Harbour are of dining categories. On the contrary, Harfront has a park and theater, indicating more mix use of land.
* __Financial centre__: Again, dining and food spots dominate for Downtown Vancouver, with 9/10 venues are of restaurants and cafe types. For Union station, more mix use is observed,with a park, theater and scenic lookouts.
* __Airports__: Not much difference observed between land use for both, where the majority of joints are of food outlets and restaurants.

### Comparing venues within 3km
Here, we will compare the Top 10 most commmon venue categories for each of the three neighborhood, within a 3km buffer radius.

In [19]:
# Top 10 Venues within 3 km of each neighborhood in Toronto.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_consol_2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(toronto_zone[1][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = toronto_zone[1][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_zone[1][district].iloc[0, :], num_top_venues)
        toronto_consol_2 = toronto_consol_2.append(neighborhoods_venues_sorted, ignore_index = True)
toronto_consol_2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront,,_Café,_Park,_Restaurant,_Japanese Restaurant,_Diner,_Farmers Market,_Gastropub,_Italian Restaurant,_Vietnamese Restaurant
1,Union Station,,_Hotel,_Café,_Italian Restaurant,_Gym,_Restaurant,_Park,_Steakhouse,_Theater,_Beer Bar
2,Pearson International Airport,,_Hotel,_Airport Lounge,_Sandwich Place,_Fast Food Restaurant,_American Restaurant,_Steakhouse,_Convenience Store,_Hobby Shop,_Rental Car Location


In [20]:
# Top 10 Venues within 3 km of each neighborhood in Vancouver.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
vancouver_consol_2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(vancouver_zone[1][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = vancouver_zone[1][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(vancouver_zone[1][district].iloc[0, :], num_top_venues)
        vancouver_consol_2 = vancouver_consol_2.append(neighborhoods_venues_sorted, ignore_index = True)
vancouver_consol_2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Coal Harbour,,_Japanese Restaurant,_Coffee Shop,_Dessert Shop,_Sandwich Place,_Café,_Seafood Restaurant,_Italian Restaurant,_Lounge,_Bakery
1,Downtown Vancouver,,_Coffee Shop,_Café,_Breakfast Spot,_Restaurant,_Sandwich Place,_Pizza Place,_Taco Place,_Italian Restaurant,_Concert Hall
2,Vancouver International Airport,,_Airport Lounge,_Park,_Clothing Store,_Sandwich Place,_Fast Food Restaurant,_Pizza Place,_Golf Course,_Rental Car Location,_Airport


##### Observation:
* __Harbour__: Coal Harbour still does not exhibit much change in the land-mix usage, where a high proportion (90%) of venue categories in the top 10 are still dining and food outlets. On the contrary, Harfront has a park, theater and a farmers market on a wider zoning.
* __Financial centre__: Again, dining and food spots dominate for Downtown Vancouver, with 9/10 venues are of restaurants and cafe types. However, there is a Concert Hall in a outer zone, suggesting cultural sites is allocated. Union Station also has land for cultural usage in a theater. And there is wider mix use,incorporating hotels, tourist facilites and gyms (lifesytle).
* __Airports__: Food and dining still dominates for both, with arrival facilities such as car rentals and hotels. Vancouver airport has more land to accommodate 'land-consuming' facilities such as park and golf course.

### Comparing venues within 5km
Here, we will compare the Top 10 most commmon venue categories for each of the three neighborhood, within a 5km buffer radius.

In [21]:
# Top 10 Venues within 5 km of each neighborhood in Toronto.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_consol_3 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(toronto_zone[2][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = toronto_zone[2][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(toronto_zone[2][district].iloc[0, :], num_top_venues)
        toronto_consol_3 = toronto_consol_3.append(neighborhoods_venues_sorted, ignore_index = True)
toronto_consol_3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront,,_Hotel,_Café,_Japanese Restaurant,_Park,_Thai Restaurant,_Steakhouse,_Gastropub,_Neighborhood,_Restaurant
1,Union Station,,_Hotel,_Italian Restaurant,_Café,_Restaurant,_Farmers Market,_Steakhouse,_Park,_Gym,_Japanese Restaurant
2,Pearson International Airport,,_Steakhouse,_Hotel,_Fast Food Restaurant,_Indian Restaurant,_Sandwich Place,_Chinese Restaurant,_Japanese Restaurant,_Hookah Bar,_Mediterranean Restaurant


In [22]:
# Top 10 Venues within 5 km of each neighborhood in Vancouver.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
vancouver_consol_3 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
for ind in np.arange(vancouver_zone[2][1].shape[0]):
    for district in range(3):
        neighborhoods_venues_sorted['Neighborhood'] = vancouver_zone[2][district]['Neighborhood']
        neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(vancouver_zone[2][district].iloc[0, :], num_top_venues)
        vancouver_consol_3 = vancouver_consol_3.append(neighborhoods_venues_sorted, ignore_index = True)
vancouver_consol_3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Coal Harbour,,_Japanese Restaurant,_Sandwich Place,_Dessert Shop,_Italian Restaurant,_Seafood Restaurant,_French Restaurant,_Bakery,_Coffee Shop,_Restaurant
1,Downtown Vancouver,,_Italian Restaurant,_Japanese Restaurant,_Café,_Sandwich Place,_Restaurant,_Dessert Shop,_Trail,_Bakery,_Taco Place
2,Vancouver International Airport,,_Japanese Restaurant,_Coffee Shop,_Park,_Bubble Tea Shop,_Korean Restaurant,_Food Court,_Café,_Sandwich Place,_Rental Car Location


##### Observation:
* __Harbour__: Coal Harbour retains its characteristics of having high proportion of dining and food outlets in its land use even at a 5km radius. On the contrary, Harfront has a park, theater and a farmers market on a wider zoning.
* __Financial centre__: Again, dining and food spots dominate for Downtown Vancouver, with 9/10 venues are of restaurants and cafe types. However, there is a Concert Hall in a outer zone, suggesting cultural sites is allocated.A surprise inclusion is a 'trail' in Downtown Vancouver on a wider zoning. Union Station also has land for cultural usage in a theater. And there is wider mix use,incorporating hotels, tourist facilites and gyms (lifesytle), with a new inclusion of parks for lifestyle well-being.
* __Airports__: Food and dining still dominates for both, with arrival facilities such as car rentals and hotels. Vancouver airport has more land to accommodate 'land-consuming' facilities such as park and golf course.

## Analyzing Land Density
In this section, we observe how closely situated each venues are to each other, so as to generally determine the land density use for each neighborhoods for each city. One simple general approach is to calculated the average distance each venue is, from the neighborhood center. This method is applied for each of the three zonal areas.

In [23]:
# Determining land use density in Toronto.

district_stats = []
toronto_stats = []
# Extracting land use information for 3 buffer zones, at 1000m, 2000m ,3000m.
for zone in range(3):
    lst = []
    # determining land use within each buffer radius
    for index, row in com_toronto.iterrows():
        total_dist = 0
        if zone == 0:
            df = getNearbyVenues(row[0], row[1], row[2], 1000)
        elif zone == 1:
            df = getNearbyVenues(row[0], row[1], row[2], 3000)
        else:
            df = getNearbyVenues(row[0], row[1], row[2], 5000)
         
        name = row[0]
        num_venue = df.shape[0]
    
        for index2,row2 in df.iterrows():
            total_dist = total_dist + df.iloc[index2,6]
        avg_dist = total_dist / df.shape[0]
        
        ucat = len(df['Venue Category'].unique())
    
        lst.append([name,num_venue,avg_dist, ucat])
                
    district_stats = pd.DataFrame(lst,columns = ['Neighborhood', 'Number of Venues', 'Average Distance', 'Unique Categories'])
    toronto_stats.append(district_stats)

In [24]:
# Determining land use density in Vancouver.

district_stats = []
vancouver_stats = []
# Extracting land use information for 3 buffer zones, at 1000m, 2000m ,3000m.
for zone in range(3):
    lst = []
    # determining land use within each buffer radius
    for index, row in com_vancouver.iterrows():
        total_dist = 0
        if zone == 0:
            df = getNearbyVenues(row[0], row[1], row[2], 1000)
        elif zone == 1:
            df = getNearbyVenues(row[0], row[1], row[2], 3000)
        else:
            df = getNearbyVenues(row[0], row[1], row[2], 5000)
         
        name = row[0]
        num_venue = df.shape[0]
    
        for index2,row2 in df.iterrows():
            total_dist = total_dist + df.iloc[index2,6]
        avg_dist = total_dist / df.shape[0]
        
        ucat = len(df['Venue Category'].unique())
    
        lst.append([name,num_venue,avg_dist, ucat])
                
    district_stats = pd.DataFrame(lst,columns = ['Neighborhood', 'Number of Venues', 'Average Distance', 'Unique Categories'])
    vancouver_stats.append(district_stats)

In [25]:
for i in range(len(toronto_stats)):
    print(toronto_stats[i].to_string())

                    Neighborhood  Number of Venues  Average Distance  Unique Categories
0                   Harbourfront               100        600.950000                 54
1                  Union Station               100        449.450000                 58
2  Pearson International Airport                47        811.957447                 25
                    Neighborhood  Number of Venues  Average Distance  Unique Categories
0                   Harbourfront               100       1162.150000                 56
1                  Union Station               100        980.480000                 64
2  Pearson International Airport                72       1760.069444                 39
                    Neighborhood  Number of Venues  Average Distance  Unique Categories
0                   Harbourfront               100        1559.59000                 52
1                  Union Station               100        1160.28000                 62
2  Pearson International Airport

In [26]:
for i in range(len(vancouver_stats)):
    print(vancouver_stats[i].to_string())

                      Neighborhood  Number of Venues  Average Distance  Unique Categories
0                     Coal Harbour               100        476.440000                 51
1               Downtown Vancouver               100        311.090000                 61
2  Vancouver International Airport                46        358.565217                 26
                      Neighborhood  Number of Venues  Average Distance  Unique Categories
0                     Coal Harbour               100        737.230000                 53
1               Downtown Vancouver               100        707.660000                 59
2  Vancouver International Airport                76       1469.973684                 47
                      Neighborhood  Number of Venues  Average Distance  Unique Categories
0                     Coal Harbour               100            938.41                 52
1               Downtown Vancouver               100            887.99                 54
2  Vancouv

##### Observation:
* __Harbour__: Within a small zonal area of 1km, Coal Harbour appears to be more dense than Harbourfront, with average distance from neighbourhood center at 476m. With the radius increment of 200% and 160% respectively, Harbourfront has an average distance increment of 95% and 34% respectively. Whereas Coal Harbour has only 55% and 27% increment. Hence, it seems Coal Harbour Vancouver has a much more densely crowded harbour district compared to Toronto's Harbourfront.

* __Financial centre__: The same observation can be made regarding the financial districts of both cities, where Vancouver's financial centre is more densely built than Toronto's. There is a almost 200% and 120% increment for Toronto's downtown average distance, quite proportional increase relative to the increase in zonal radius. But for Vancouver's, a 227% and 25% average increase disproportionately. This suggest there is a limit to expanded built area outside of the city centre area.

* __Airports__: The average distance of venues from the airport center is greater for Toronto's Pearson airport than Vancouver's airport, but by not much as the zonal radius increases. This suggest that for both cities, there are sufficient outer zonal lands outside the airports for venues to be located further apart.

## Conclusion
In regards to land-mix organization, Toronto's Harbourfront district has a more varied land-mix usage compared to Vancouver's Coal Harbour. This characteristics remains with an increase in zonal radius.
For downtown financial districts, again Toronto's Union Station neighborhoods has more varied land usage than Vancouver's Downtown. However, there seems more activity venues available in the further zones for Vancouver's Downtown compared to its Coal Habour.
Not much difference in terms of airport zone's land usage, almost similar for both cities, except Vancouver has a larger outer airport zone to pair with parks and golf courses. 
On average, Vancouver is a much denser built city than Toronto, with venues more closely located to each other.

Therefore, I can conclude that both cities are not exactly alike in the waterfront and city center, with Toronto's having more varied landscape, and Toronto being the bigger cousin than Vancouver, defintely has more spacious landscapes.