# Capstone

## Abstract

## Introduction

Business problem: Which neighborhood in the City of Vancouver should the investor open a chinese restaurant? Chinese account for nearly 27% of the population. Chinese restaurants have been emerging over the past decade. An accelerating rise of chinese-oriented restaurants can be expected. These cultural analysts foresee an accelerating rise of Asian-oriented restaurants, retail outlets, artistic events, religions, community service organizations, schools, neighbourhood enclaves, family-reunification programs, signage, international business partnerships, corporate customs and Asian-language newspapers and TV stations. Forty-three per cent of Metro Vancouver residents have an Asian heritage, which is a much higher proportion than any other major city outside the continent of Asia. Based on Statistics Canada reports, the number of those with Asian roots in Metro Vancouver will continue to grow at a faster rate than the non-Asian population. Around the globe, the only major cities outside Asia {* with more than 1.2 million residents} that come close to Metro Vancouver for their portion of residents with Asian backgrounds are San Francisco (33 per cent Asian), London, England (21 per cent), Metro Toronto (35 per cent), Calgary (23 per cent) and Sydney, Australia (19 per cent).

## Methods

### Data

Postal code  
Geocoding  
shapefile to geojson centroid  

Census data  
Calculate population density  
Mother tongue for the total population excluding institutional residents - 100% data  
Language spoken most often at home for the total population excluding institutional residents - 100% data  
Selected places of birth for the immigrant population in private households - 25% sample data  

Foursquare API   
Location of venues  
heatmap

In [500]:
import requests
import urllib
from zipfile import ZipFile

In [501]:
# get vancouver boundary shapefile
url = 'ftp://webftp.vancouver.ca/OpenData/shape/local_area_boundary_shp.zip'
response = urllib.request.urlretrieve(url, 'boundary.zip')
with ZipFile('boundary.zip', 'r') as boundary: 
    # print all the contents of the zip file 
    boundary.printdir() 
    
    # extract all files 
    boundary.extractall() 
    print('All files were extracted.') 

File Name                                             Modified             Size
local_area_boundary.dbf                        2019-02-03 02:19:00         6148
local_area_boundary.prj                        2019-02-03 02:19:00          413
local_area_boundary.shp                        2019-02-03 02:19:00        20772
local_area_boundary.shx                        2019-02-03 02:19:00          276
All files were extracted.


In [3]:
#import shapefile
#from json import dumps

In [23]:
# read the shapefile
#reader = shapefile.Reader('local_area_boundary.shp')
#fields = reader.fields[1:]
#field_names = [field[0] for field in fields]
#buffer = []
#for sr in reader.shapeRecords():
    #atr = dict(zip(field_names, sr.record))
    #geom = sr.shape.__geo_interface__
    #buffer.append(dict(type="Feature", \
                       #geometry=geom, properties=atr)) 

# write the GeoJSON file
#geojson = open('boundary.json', 'w')
#geojson.write(dumps({'type': 'FeatureCollection', 'features': buffer}, indent=2) + '\n')
#geojson.close()

In [503]:
import geopandas
import pyepsg

In [505]:
shp = geopandas.GeoDataFrame.from_file('local_area_boundary.shp')
# check current coordinate system
pyepsg.get(shp.crs['init'].split(':')[1])

<ProjectedCRS: 26910, NAD83 / UTM zone 10N>

In [506]:
# make coordinate system consistent
new_shp = shp.to_crs(epsg='4326')
# transfer shp to json
van_json = new_shp.to_json()
van_json

'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"MAPID": "SUN", "NAME": "Sunset"}, "geometry": {"type": "Polygon", "coordinates": [[[-123.10696411132812, 49.204158782959226], [-123.10616302490234, 49.21887588500999], [-123.10562133789061, 49.233116149902564], [-123.09053802490233, 49.23282241821311], [-123.07703399658203, 49.23266601562523], [-123.07742309570314, 49.219970703125234], [-123.07701110839844, 49.219184875488516], [-123.0771484375, 49.20761489868186], [-123.07750701904297, 49.20756149292015], [-123.07786560058592, 49.20749664306663], [-123.07821655273438, 49.20742034912134], [-123.0785598754883, 49.207332611084226], [-123.07889556884766, 49.207233428955305], [-123.0792236328125, 49.207122802734595], [-123.07995605468751, 49.20687484741234], [-123.08071899414062, 49.20666122436546], [-123.08149719238281, 49.20647811889671], [-123.08839416503908, 49.20477676391624], [-123.08956146240234, 49.20444488525413], [-123.08972930908203, 49.20

In [507]:
# get centroid of each polygon
cpoint = shp['geometry'].centroid
# make coordinate system consistent and transfer to json
cjson = cpoint.to_crs(epsg='4326').to_json()

In [508]:
import json

In [579]:
van_data = json.loads(cjson)
van_data = van_data['features']

In [580]:
van_data

[{'id': '0',
  'type': 'Feature',
  'properties': {},
  'geometry': {'type': 'Point',
   'coordinates': [-123.0920382187122, 49.21875524075676]},
  'bbox': [-123.0920382187122,
   49.21875524075676,
   -123.0920382187122,
   49.21875524075676]},
 {'id': '1',
  'type': 'Feature',
  'properties': {},
  'geometry': {'type': 'Point',
   'coordinates': [-123.09851250389197, 49.26306545462446]},
  'bbox': [-123.09851250389197,
   49.26306545462446,
   -123.09851250389197,
   49.26306545462446]},
 {'id': '2',
  'type': 'Feature',
  'properties': {},
  'geometry': {'type': 'Point',
   'coordinates': [-123.10314689371549, 49.24476610864839]},
  'bbox': [-123.10314689371549,
   49.24476610864839,
   -123.10314689371549,
   49.24476610864839]},
 {'id': '3',
  'type': 'Feature',
  'properties': {},
  'geometry': {'type': 'Point',
   'coordinates': [-123.1165669909403, 49.28074666385142]},
  'bbox': [-123.1165669909403,
   49.28074666385142,
   -123.1165669909403,
   49.28074666385142]},
 {'id': '4

In [581]:
column_names = ['Latitude', 'Longitude'] 

# instantiate the dataframe
van = pd.DataFrame(columns=column_names)

In [582]:
# get latitude and longitude of each centroid
for data in van_data:
    van_latlon = data['geometry']['coordinates']
    van_lat = van_latlon[1]
    van_lon = van_latlon[0]
    van = van.append({'Latitude': van_lat,'Longitude': van_lon}, ignore_index=True)

In [585]:
van

Unnamed: 0,Latitude,Longitude
0,49.218755,-123.092038
1,49.263065,-123.098513
2,49.244766,-123.103147
3,49.280747,-123.116567
4,49.26754,-123.163295
5,49.23796,-123.189548
6,49.223655,-123.159576
7,49.246804,-123.161669
8,49.268401,-123.203467
9,49.210208,-123.128382


In [601]:
shp['NAME']

0                       Sunset
1               Mount Pleasant
2                   Riley Park
3                     Downtown
4                    Kitsilano
5            Dunbar-Southlands
6                   Kerrisdale
7                Arbutus-Ridge
8              West Point Grey
9                      Marpole
10                    Oakridge
11                 Shaughnessy
12                    Fairview
13                South Cambie
14                    West End
15                   Killarney
16         Renfrew-Collingwood
17            Hastings-Sunrise
18         Victoria-Fraserview
19    Kensington-Cedar Cottage
20                  Strathcona
21          Grandview-Woodland
Name: NAME, dtype: object

In [589]:
shp['area'] = shp.area/10**6

In [608]:
dict = {'Neighborhood':shp['NAME'], 
                'Latitude':van['Latitude'], 
                'Longitude':van['Longitude'], 
                'Area':shp['area']}

# instantiate the dataframe
neighborhood = pd.DataFrame(dict)

In [656]:
neighborhood

Unnamed: 0,Neighborhood,Latitude,Longitude,Area
0,Sunset,49.218755,-123.092038,6.575731
1,Mount Pleasant,49.263065,-123.098513,3.720549
2,Riley Park,49.244766,-123.103147,4.931676
3,Downtown,49.280747,-123.116567,4.674227
4,Kitsilano,49.26754,-123.163295,6.362855
5,Dunbar-Southlands,49.23796,-123.189548,9.079848
6,Kerrisdale,49.223655,-123.159576,6.608907
7,Arbutus-Ridge,49.246804,-123.161669,3.70062
8,West Point Grey,49.268401,-123.203467,5.300219
9,Marpole,49.210208,-123.128382,6.003074


In [657]:
census_df

Unnamed: 0,Neighborhood,Chinese
0,Arbutus-Ridge,6970
1,Downtown,9490
2,Dunbar-Southlands,6525
3,Fairview,3865
4,Grandview-Woodland,3885
5,Hastings-Sunrise,13120
6,Kensington-Cedar Cottage,15560
7,Kerrisdale,6445
8,Killarney,11670
9,Kitsilano,3615


In [682]:
cleaned_neighborhood = pd.merge(neighborhood, census_df, on='Neighborhood')
cleaned_neighborhood

Unnamed: 0,Neighborhood,Latitude,Longitude,Area,Chinese
0,Sunset,49.218755,-123.092038,6.575731,8180
1,Mount Pleasant,49.263065,-123.098513,3.720549,3580
2,Riley Park,49.244766,-123.103147,4.931676,5210
3,Downtown,49.280747,-123.116567,4.674227,9490
4,Kitsilano,49.26754,-123.163295,6.362855,3615
5,Dunbar-Southlands,49.23796,-123.189548,9.079848,6525
6,Kerrisdale,49.223655,-123.159576,6.608907,6445
7,Arbutus-Ridge,49.246804,-123.161669,3.70062,6970
8,West Point Grey,49.268401,-123.203467,5.300219,3100
9,Marpole,49.210208,-123.128382,6.003074,10585


In [684]:
cleaned_neighborhood['Chinese'] = cleaned_neighborhood['Chinese'].astype('int')

In [685]:
cleaned_neighborhood['Density'] = cleaned_neighborhood['Chinese']/cleaned_neighborhood['Area']

In [686]:
cleaned_neighborhood

Unnamed: 0,Neighborhood,Latitude,Longitude,Area,Chinese,Density
0,Sunset,49.218755,-123.092038,6.575731,8180,1243.968169
1,Mount Pleasant,49.263065,-123.098513,3.720549,3580,962.223642
2,Riley Park,49.244766,-123.103147,4.931676,5210,1056.435881
3,Downtown,49.280747,-123.116567,4.674227,9490,2030.282296
4,Kitsilano,49.26754,-123.163295,6.362855,3615,568.141185
5,Dunbar-Southlands,49.23796,-123.189548,9.079848,6525,718.62433
6,Kerrisdale,49.223655,-123.159576,6.608907,6445,975.199019
7,Arbutus-Ridge,49.246804,-123.161669,3.70062,6970,1883.467974
8,West Point Grey,49.268401,-123.203467,5.300219,3100,584.881449
9,Marpole,49.210208,-123.128382,6.003074,10585,1763.263213


0     1243.968169
1      962.223642
2     1056.435881
3     2030.282296
4      568.141185
5      718.624330
6      975.199019
7     1883.467974
8      584.881449
9     1763.263213
10    1865.243560
11     745.488135
12    1063.168279
13     916.670950
14    1214.416348
15    1683.014003
16    2638.148844
17    1574.600765
18    2856.088062
19    2145.467366
20     655.636295
21     817.183377
dtype: float64

In [74]:
import pandas as pd

In [156]:
url_census = 'ftp://webftp.vancouver.ca/opendata/xls/CensusLocalAreaProfiles2016.csv'

In [157]:
census = urllib.request.urlretrieve(url, 'CensusLocalAreaProfiles2016.csv')

In [518]:
census = pd.read_csv('CensusLocalAreaProfiles2016.csv',encoding = "ISO-8859-1") 
# Preview the first 5 lines of the loaded data 
census.head()

Unnamed: 0,The data shown here is provided by Statistics Canada from the 2016 Census as a custom data order for the City of Vancouver using the City's 22 local planning areas,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25
0,The data may be reproduced provided they are c...,,,,,,,,,,...,,,,,,,,,,
1,,,,,,,,,,,...,,,,,,,,,,
2,CENSUS DATA FOR CITY OF VANCOUVER LOCAL AREAS...,,,,,,,,,,...,,,,,,,,,,
3,ID,Variable,Arbutus-Ridge,Downtown,Dunbar-Southlands,Fairview,Grandview-Woodland,Hastings-Sunrise,Kensington-Cedar Cottage,Kerrisdale,...,Riley Park,Shaughnessy,South Cambie,Strathcona,Sunset,Victoria-Fraserview,West End,West Point Grey,Vancouver CSD,Vancouver CMA
4,1,Total - Age groups and average age of the pop...,15295,62030,21425,33620,29175,34575,49325,13975,...,22555,8430,7970,12585,36500,31065,47200,13065,631485,2463430


In [519]:
census.columns = census.iloc[3, :].str.rstrip()

In [520]:
census.drop(['ID'], axis=1, inplace=True)
census.drop(['Vancouver CSD', 'Vancouver CMA'], axis=1, inplace=True)
census.drop(census.index[0:4], axis=0, inplace=True)

In [521]:
del census.columns.name
census.reset_index(drop=True, inplace=True)

In [522]:
census

Unnamed: 0,Variable,Arbutus-Ridge,Downtown,Dunbar-Southlands,Fairview,Grandview-Woodland,Hastings-Sunrise,Kensington-Cedar Cottage,Kerrisdale,Killarney,...,Oakridge,Renfrew-Collingwood,Riley Park,Shaughnessy,South Cambie,Strathcona,Sunset,Victoria-Fraserview,West End,West Point Grey
0,Total - Age groups and average age of the pop...,15295,62030,21425,33620,29175,34575,49325,13975,29325,...,13030,51530,22555,8430,7970,12585,36500,31065,47200,13065
1,0 to 14 years,2015,4000,3545,2580,3210,4595,7060,1880,4185,...,1565,6305,3415,1175,1105,1065,5460,3790,1945,1900
2,0 to 4 years,455,2080,675,1240,1320,1510,2515,430,1300,...,490,2065,1175,270,360,360,1695,1175,965,420
3,5 to 9 years,685,1105,1225,760,1025,1560,2390,600,1400,...,480,2115,1160,405,365,365,1780,1210,560,670
4,10 to 14 years,880,810,1650,580,865,1525,2160,845,1485,...,590,2130,1080,500,375,340,1985,1410,415,810
5,15 to 64 years,9805,51275,14215,25140,22535,23945,35385,9395,19985,...,8560,37315,15875,5440,5430,8745,25490,21090,38255,8660
6,15 to 19 years,1230,1180,1800,655,830,1780,2630,1050,1760,...,835,2815,1065,575,380,370,2570,1895,610,915
7,20 to 24 years,1165,4050,1740,1865,1525,2360,3470,1150,1995,...,1040,4340,1335,635,475,575,2975,2220,3100,1215
8,25 to 29 years,805,8810,1110,4025,3450,2755,4325,905,1755,...,995,4820,1955,545,705,885,2900,2295,7375,780
9,30 to 34 years,570,9750,695,4395,3720,2515,4205,560,1800,...,735,3985,2010,445,670,960,2450,1985,7140,610


TypeError: wide_to_long() missing 3 required positional arguments: 'stubnames', 'i', and 'j'

In [620]:
tcensus = census.transpose()
tcensus.columns = tcensus.iloc[0,:].str.lstrip()

In [624]:
cleaned_census = tcensus['Chinese'].iloc[1:,0]

In [625]:
cleaned_census

Arbutus-Ridge                6970
Downtown                     9490
Dunbar-Southlands            6525
Fairview                     3865
Grandview-Woodland           3885
Hastings-Sunrise            13120
Kensington-Cedar Cottage    15560
Kerrisdale                   6445
Killarney                   11670
Kitsilano                    3615
Marpole                     10585
Mount Pleasant               3580
Oakridge                     7505
Renfrew-Collingwood         21370
Riley Park                   5210
Shaughnessy                  3340
South Cambie                 1995
Strathcona                   2865
Sunset                       8180
Victoria-Fraserview         15710
West End                     2740
West Point Grey              3100
Name: Chinese, dtype: object

In [638]:
census_df = cleaned_census.to_frame().reset_index()
census_df.rename(columns={'index':'Neighborhood'}, inplace=True)

In [639]:
census_df

Unnamed: 0,Neighborhood,Chinese
0,Arbutus-Ridge,6970
1,Downtown,9490
2,Dunbar-Southlands,6525
3,Fairview,3865
4,Grandview-Woodland,3885
5,Hastings-Sunrise,13120
6,Kensington-Cedar Cottage,15560
7,Kerrisdale,6445
8,Killarney,11670
9,Kitsilano,3615


In [24]:
from geopy.geocoders import Nominatim
import folium

In [25]:
# geocoding vancouver
address = 'Vancouver, British Columbia'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vancouver are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vancouver are 49.2608724, -123.1139529.


In [687]:
import folium

mapa = folium.Map([latitude, longitude],
                  zoom_start=12,
                  tiles='cartodbpositron')

mapa.choropleth(geo_data=gjson,fill_color='grey')
for lat, lng in zip(van['Latitude'], van['Longitude']):
    label = f'lat: {lat}\nlng: {lng}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#f9ba00',
        fill=True,
        fill_color='#f9ba00',
        fill_opacity=0.7,
        parse_html=False).add_to(mapa)  

mapa

<ProjectedCRS: 26910, NAD83 / UTM zone 10N>

In [434]:
cpoint

0      POINT (493297.623885159 5451778.302478819)
1     POINT (492832.5710942394 5456704.857862097)
2     POINT (492492.6143612485 5454670.966576269)
3     POINT (491522.0198607397 5458672.354567361)
4     POINT (488120.2612360372 5457210.450702526)
5     POINT (486202.1625533467 5453926.527669333)
6     POINT (488380.5557101681 5452331.198394036)
7     POINT (488233.6652576678 5454905.018354704)
8     POINT (485198.0568214911 5457313.287573236)
9     POINT (490649.4184549566 5450831.948140365)
10    POINT (491042.5354563056 5452631.796960058)
11    POINT (489827.9557256088 5454776.881854942)
12    POINT (490465.6383772573 5456872.433940169)
13    POINT (491135.0425028555 5454760.763597974)
14    POINT (490150.4143491607 5459148.729315017)
15    POINT (497258.3720622362 5451582.071860868)
16    POINT (497076.6808230506 5454952.955290285)
17    POINT (497070.9746404239 5458353.790556923)
18    POINT (495329.7036570702 5451915.812631234)
19    POINT (494695.3719458144 5454881.802774654)


In [473]:
cjson

'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [-123.0920382187122, 49.21875524075676]}, "bbox": [-123.0920382187122, 49.21875524075676, -123.0920382187122, 49.21875524075676]}, {"id": "1", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [-123.09851250389197, 49.26306545462446]}, "bbox": [-123.09851250389197, 49.26306545462446, -123.09851250389197, 49.26306545462446]}, {"id": "2", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [-123.10314689371549, 49.24476610864839]}, "bbox": [-123.10314689371549, 49.24476610864839, -123.10314689371549, 49.24476610864839]}, {"id": "3", "type": "Feature", "properties": {}, "geometry": {"type": "Point", "coordinates": [-123.1165669909403, 49.28074666385142]}, "bbox": [-123.1165669909403, 49.28074666385142, -123.1165669909403, 49.28074666385142]}, {"id": "4", "type": "Feature", "properties": {}, "

In [477]:
van_data[0]

{'id': '0',
 'type': 'Feature',
 'properties': {},
 'geometry': {'type': 'Point',
  'coordinates': [-123.0920382187122, 49.21875524075676]},
 'bbox': [-123.0920382187122,
  49.21875524075676,
  -123.0920382187122,
  49.21875524075676]}

Unnamed: 0,Latitude,Longitude
0,49.218755,-123.092038
1,49.263065,-123.098513
2,49.244766,-123.103147
3,49.280747,-123.116567
4,49.26754,-123.163295
5,49.23796,-123.189548
6,49.223655,-123.159576
7,49.246804,-123.161669
8,49.268401,-123.203467
9,49.210208,-123.128382


AttributeError: 'str' object has no attribute 'area'

In [494]:
hidro.crs

{'init': 'epsg:26910'}

In [498]:
hidro

Unnamed: 0,MAPID,NAME,geometry,area
0,SUN,Sunset,"POLYGON ((492208.4021460207 5450157.053499949,...",6.575731
1,MP,Mount Pleasant,"POLYGON ((492676.5070098151 5457379.340971759,...",3.720549
2,RP,Riley Park,"POLYGON ((492310.7074588875 5453376.090940117,...",4.931676
3,CBD,Downtown,"POLYGON ((491836.3489202514 5459718.918809408,...",4.674227
4,KITS,Kitsilano,"POLYGON ((489985.3567529821 5458071.445235047,...",6.362855
5,DS,Dunbar-Southlands,"POLYGON ((487615.5153227601 5455027.114970088,...",9.079848
6,KERR,Kerrisdale,"POLYGON ((486957.9095499917 5451434.123423224,...",6.608907
7,AR,Arbutus-Ridge,"POLYGON ((488896.3689033413 5456062.110754157,...",3.70062
8,WPG,West Point Grey,"POLYGON ((483675.094250936 5458487.318162165, ...",5.300219
9,MARP,Marpole,"POLYGON ((492208.4021460207 5450157.053499949,...",6.003074


In [None]:
# foursquare information
CLIENT_ID = 'CMGOG4ZRSMWXCBOJ5L1ERRLDS0KBNCDZB5YIQUBYV0RSFESM' # Foursquare ID
CLIENT_SECRET = 'WSNVRUT4NVSFXME5VVG3XQIIVFKJWLC5PVI5OFWNFCJ1NZ5Z' # Foursquare Secret
VERSION = '20190420' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
# create function to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',                   
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# get nearby venues
toronto_venues = getNearbyVenues(names=gc_zip['Neighborhood'],
                                   latitudes=gc_zip['Latitude'],
                                   longitudes=gc_zip['Longitude']
                                  )

In [None]:
print(toronto_venues.shape)
toronto_venues.head()

In [None]:
# count venues in each neighborhood 
toronto_venues.groupby('Neighborhood').count()

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

toronto_onehot.head()

In [None]:
# get the frequency of each venue category in each neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

### Analysis

Calculate population density  
DBSCAN

## Results

## Discussion

## Conclusion

## References