## The Battle of Neighborhoods in Metro Vancouver
 ### - IBM data science capstone project

### Table of Contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction - A description of the problem and a discussion of the background.<a name="introduction"></a>

Vancouver is a very multicultural city, with a strong Asian presence that has seen high levels of migration from Mainland China and Hong Kong over the last two decades.
In recent years Metor Vancouver has also evolved into one of the world’s most expensive cities, and was recently rated as having the highest cost of living of any North American city. Even with the much higher cost, Metro Vancouver is still one of the top immigrantion destination in Canada. The new immigrants coming to the city are mostly not able to make decision to buy one of the properties in the area before settling down and finding a stable job.

In this project we will help one of our clients recently immigrate to Metro Vancouver from China to find the optimal neighborhoods to settle in.
They are trying to rent a property first to get to know the city.
* They are not driving yet and looking for a neighborhood with easier access to the public transportation.
* They are not very adapted to the western food culture yet so they are looking for a neighborhood with lots of Asian food resturant nearby to choose.
* They are looking for a neighborhood with lots green space nearby, preferably a park.
They did not emphasize the necessity of being closer to downtown, so we will explore the entire Metro vancouver area to find the neigberhoods with **easy access to the Skytrain system**, lots of quality **Asian food resturant** nearby and at least one **big park** nearby.

We will use data science technic and analysis to generate a few most promissing neighborhoods based on these criterias. Neighborhoods will be visualized on a map and we will present the advantage of these neighborhoods.


### Data - A description of the data and how it will be used to solve the problem. <a name='data'></a>

Based on the definition of the problem we are trying to solve, we will use data science answer 4 questions:
1. What is a  characteristics of this neighborhood? 
2. how many asian food resturants are in the neighborhood, what is the quality of these resturants?
3. how many parks are in the neighborhood?
4. Is there at least one Skytrain station nearby?

The data sourses we acquired to answer above questions and find the optimal neighborhoods are:
1. a list of neighborhood candidates. (We will scrape **wikipage 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V'** to find the first 3 digits of the postal code of each neigberhood, use **geocoder with arcgis** to find the center of the neighborhood on the map.)
2. a list of Skytrain stations in metro Vancouver. (We will scrape **wikipage 'https://en.wikipedia.org/wiki/List_of_Vancouver_SkyTrain_stations'** to find name of each station and use **Nominatim** to locate each stations on the map.)
3. **Foursquare API** to find venues nearby each neighborhood candidate and skytrain station (1km radius). (map top 10 venues of each neighborhood and use k-mean clustering technic to seperate the neighborhood into 4 clusters, observe the cluster and identify the difference of characteristics of each cluster)
4. use venue catergory from Foursquare we will be able to generate a list of neighborhood with the most asian resturants and parks nearby. Foursquare API premium call will provide the ratings of each resturant so we know the quality.
5. cross referencing the nearby venues of the neighborhood and the stations we will find the stations with the close distance to the center of the neigborhood (max range about 2km)


In [1]:
# to read html, scraping tables
!pip install lxml    
import pandas as pd
import numpy as np 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\Leonardo\anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be UPDATED:

  conda                       pkgs/main::conda-4.8.3-py37_0 --> conda-forge::conda-4.8.3-py37hc8dfbb8_1


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Libraries imported.


-To create a dataframe will consist of three columns: PostalCode, Borough, and Neighborhood. We will focus on the neighborhoods in Great Vancouver which more likely to have skytrain stations nearby and we will limit our neighborhood exploration to four Borough including Vancouver, Richmond, Burnaby, Surrey and New Westminster. (This will not cover North Vancouver and West Vancouver as they are on the other side of bridge and have no skytrain available.)

In [None]:
# scrape postcode starting of 'V'from the wikipedia page
raw=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_V')
raw_list=(raw[0]).values.flatten() # flatten the list
df = pd.DataFrame(raw_list) 
df['postalcode']=df[0].str[:3]
df['address']=df[0].str[3:]

# address contains vancouver
df1=df[df.address.str.contains('Vancouver|Burnaby')]
df1=df1[~df1.address.str.contains('North Vancouver|West Vancouver')] # take out North Vancouver and West Vancouver
df1['borough']=df1.address.str.replace(r"\(.*\)","")
df1['borough']=df1.borough.apply(lambda x: x.strip())
df1['neighborhood'] = df1.address.str.extract(('.*\((.*)\).*'),expand=False)
df1['neighborhood'] = df1.neighborhood.str.replace(" / ",",")

# address contains Surrey
df2=df[df.address.str.contains('Surrey')]
df2['borough']=df2.address.str[:6]
df2['neighborhood']=df2.address.str[6:]

# address contains Richmond
df3=df[df.address.str.contains('Richmond')]
df3['borough']=df3.address.str[:8]
df3['neighborhood']=df3.address.str[8:]
df3.drop([31],inplace=True)

# address contains New Westminster
df4=df[df.address.str.contains('New Westminster')]
df4['borough']=df4.address.str[:15]
df4['neighborhood']=df4.address.str[15:]

# adress contains Coquitlam
df5=df[df.address.str.contains('Coquitlam')]
df5=df5[~df5.address.str.contains('Port Coquitlam')]
df5['borough']=df5.address.str[:9]
df5['neighborhood']=df5.address.str[9:]

great_vancouver=[df1,df2,df3,df4,df5]
df_final=pd.concat(great_vancouver)
df_final=df_final.drop([0,'address'],axis=1)
df_final=df_final.reset_index(drop=True)

In [4]:
df_final.head()

Unnamed: 0,postalcode,borough,neighborhood
0,V5A,Burnaby,"Government Road,Lake City,SFU,Burnaby Mountain"
1,V6A,Vancouver,"Strathcona,Chinatown,Downtown Eastside"
2,V5B,Burnaby,"Parkcrest-Aubrey,Ardingley-Sprott"
3,V6B,Vancouver,"NE Downtown,Gastown,Harbour Centre,Internation..."
4,V5C,Burnaby,"Burnaby Heights,Willingdon Heights,West Centra..."


In [5]:
df_final.shape[0]

62

-We will explore 62 neighborhoods in Great Vancouver area.  Now let's add latitude and longtitude of each neighborhood to the dataframe.

In [6]:
# first let's define a function to find the Latitude Longitude based on the postal code (using geocoder and arcgis)
import geocoder
def get_geocoder(postal_code):
     # initialize your variable to None
     lat_lng_coords = None
     # loop until you get the coordinates
     while(lat_lng_coords is None):
       g = geocoder.arcgis('{}, British Columbia'.format(postal_code))
       lat_lng_coords = g.latlng
     latitude = lat_lng_coords[0]
     longitude = lat_lng_coords[1]
     return latitude,longitude

In [7]:
latlng_list=[]
for i in df['postalcode'].values:
    m=list(get_geocoder(i))
    m.append(i)
    latlng_list.append(m)
df_geo=pd.DataFrame(latlng_list,columns=['latitude','longitude','postalcode'])
df_geo.head()

Unnamed: 0,latitude,longitude,postalcode
0,49.691079,-115.952463,V1A
1,49.49101,-119.574217,V2A
2,49.088009,-122.641184,V3A
3,49.032073,-122.821241,V4A
4,49.266244,-122.931096,V5A


In [8]:
df_final=pd.merge(df_final,df_geo,on='postalcode')

In [9]:
df_final.head()

Unnamed: 0,postalcode,borough,neighborhood,latitude,longitude
0,V5A,Burnaby,"Government Road,Lake City,SFU,Burnaby Mountain",49.266244,-122.931096
1,V6A,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971
2,V5B,Burnaby,"Parkcrest-Aubrey,Ardingley-Sprott",49.26606,-122.95922
3,V6B,Vancouver,"NE Downtown,Gastown,Harbour Centre,Internation...",49.280253,-123.115695
4,V5C,Burnaby,"Burnaby Heights,Willingdon Heights,West Centra...",49.275565,-123.002918


-This process takes so long so we will saved it under a csv file

In [None]:
df_final.to_csv('Vancouver Neighborhood.csv')
print('csv generated')

-Now let's find out the geo locations of the skytrain stations in Great Vancouver. We scraped the data from Wikipedia.
the dataframe will list all the names of the stations and what line it belongs to, what 'borough it belongs to and Zone information, we might need the information later.

In [10]:
raw1=pd.read_html('https://en.wikipedia.org/wiki/List_of_Vancouver_SkyTrain_stations')
df_station=raw1[1].iloc[:,0:4]
df_station['Station']=df_station['Station'].str.replace('*','')
df_station['Station']=df_station['Station'].str.replace(r"\[.*\]","")
df_station['Station']=df_station['Station'] +' Station'
df_station.head()

Unnamed: 0,Station,Line(s),Municipality,Zone[a]
0,22nd Street Station,Expo,New Westminster,Zone 2
1,29th Avenue Station,Expo,Vancouver,Zone 1
2,Aberdeen Station,Canada,Richmond,Zone 2
3,Braid Station,Expo,New Westminster,Zone 2
4,Brentwood Town Centre Station,Millennium,Burnaby,Zone 2


-Now add geo information to the dataframe.

In [11]:
station_geo=[]
import time
for n in range(len(df_station)):
    station_address= df_station['Station'][n]+','+'British Columbia'
    geolocator = Nominatim(user_agent="V_explorer") 
    location = geolocator.geocode(station_address) 
    Latitude = location.latitude 
    Longitude = location.longitude
    station_geo.append([df_station['Station'][n],Latitude,Longitude])
    
print('complete')

complete


In [13]:
station_geo=pd.DataFrame(station_geo,columns=['Station','Lat','Lng'])

In [14]:
df_station=pd.merge(df_station,station_geo,on='Station')

In [15]:
df_station['Station']=df_station['Station'].str.replace('–','-')
df_station.head()

Unnamed: 0,Station,Line(s),Municipality,Zone[a],Lat,Lng
0,22nd Street Station,Expo,New Westminster,Zone 2,49.200065,-122.949015
1,29th Avenue Station,Expo,Vancouver,Zone 1,49.244208,-123.045922
2,Aberdeen Station,Canada,Richmond,Zone 2,49.183982,-123.136316
3,Braid Station,Expo,New Westminster,Zone 2,49.233109,-122.882776
4,Brentwood Town Centre Station,Millennium,Burnaby,Zone 2,49.266402,-123.001724


In [16]:
print(df_station.shape[0])

53


-There are 53 skytrain stations in Metro Vancouver, again this takes very long and we want to save this dataframe into a csv file.

In [None]:
df_station.to_csv('Vancouver Skytrain Station.csv')
print('csv created')

#### All the data we need is ready now, now we can visualize it on a map.

In [69]:
#find Geo coordinate of center of great vancouver(I use Burnaby,BC here)
geolocator = Nominatim(user_agent="V_explorer")
location = geolocator.geocode('Burnaby,BC')
latitude = location.latitude
longitude = location.longitude
GVancouver=[latitude,longitude]

-let's now visualize the neighborhood and Stations.

In [110]:
# create map of Vancouver using latitude and longitude values
map_vancouver= folium.Map(location=GVancouver, zoom_start=11.2)

# Add the skytrain stations, green train icon with popup.  
for lat, lng, station in zip(df_station['Lat'], df_station['Lng'], df_station['Station']):
    label = station
    label = folium.Popup(label, parse_html=False)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color='green',icon="train", prefix='fa')
        ).add_to(map_vancouver)
# Add the neighborhoods, red circle with pop up.
for lat, lng, borough, neighborhood in zip(df_final['latitude'], df_final['longitude'], df_final['borough'], df_final['neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='#f51845',
        fill=True,
        fill_color='#e6aab6',
        fill_opacity=0.5,
        parse_html=False).add_to(map_vancouver)
map_vancouver


-By briefly observe the map there are lots of neighberhoods are closed to the skytrain stations.   

#### Let's use Foursquare api to explore and cluster these neighborhood.

In [5]:
# define credential and version with some variables
CLIENT_ID = '3SXSKUTJPZQZUQCLQ440ESICM4VZ5AGZMQIBDF45FAJVQ2ZB'
CLIENT_SECRET = 'VY3EAI11XCAHOSDW2EMZPKP0A1VDDOP103IHJWNQOBEQ23K0'
VERSION = '20180605'

In [6]:
# define a function to explore the venues in each neighborhood.
def getNearbyVenues(citys, names, latitudes, longitudes):
    radius=1000
    LIMIT=300
    venues_list=[]
    for city, name, lat, lng in zip(citys,names, latitudes, longitudes):   
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            city,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results]) # we need venue id for further exploring the venues

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough','Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category','Venue Id']
    
    return(nearby_venues)

In [7]:
# list all the venues we can find from 1km radius of each neighborhood of metrovancouver.
greatvan_venues = getNearbyVenues(citys=df_final['borough'],names=df_final['neighborhood'],latitudes=df_final['latitude'],longitudes=df_final['longitude'])

In [8]:
greatvan_venues.shape[0]

2715

-There are 2726 venues in the final search result with the radius of 1 Km from the centers of the neighborhoods.

In [None]:
greatvan_venues['Venue Category'].unique()

-browse through the unique categories provided by foursquare, we can easily filter out all the asian food related keywords:
Asian,Chinese,Japanese,Noodle,Ramen,Sushi,Shanghai,Cantonese,Vietnamese,Taiwanese

In [9]:
#find out how many asian restaurants are in these venues we are exploring.
asianfood = greatvan_venues[greatvan_venues['Venue Category'].str.contains('Asian|Chinese|Japanese|Noodle|Ramen|Sushi|Shanghai|Cantonese|Vietnamese|Taiwanese')].reset_index(drop=True)
asianfood.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Id
0,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Phnom Penh,49.278517,-123.098214,Asian Restaurant,4aa7fa85f964a520704e20e3
1,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Kissa Tanto,49.280412,-123.098133,Japanese Restaurant,57242c7ecd1040df61be5d0b
2,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Bao Bei,49.279491,-123.100595,Chinese Restaurant,4b513a38f964a5200b4827e3
3,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Torafuku,49.275951,-123.099814,Asian Restaurant,55af01e7498ef6fadceecda9
4,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,New Town Bakery & Restaurant,49.28042,-123.101205,Chinese Restaurant,4abe7fb7f964a520078e20e3


In [10]:
asianfood.shape[0]

351

-there are total of 351 asian restaurants near in the neighborhood. Let's add the rating of these restaurants from foursquare API.

In [11]:
foodrating=[]
for i in (range(len(asianfood))):
    venue_id = asianfood['Venue Id'][i]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        R=result['response']['venue']['rating']
        foodrating.append(R)
    except:
        foodrating.append('NaN')
ratings=pd.DataFrame(foodrating,columns=['Rating'])
asianfood=pd.merge(asianfood,ratings,left_index=True,right_index=True)

In [25]:
asianfood.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Id,Rating
0,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Phnom Penh,49.278517,-123.098214,Asian Restaurant,4aa7fa85f964a520704e20e3,9.0
1,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Kissa Tanto,49.280412,-123.098133,Japanese Restaurant,57242c7ecd1040df61be5d0b,8.7
2,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Bao Bei,49.279491,-123.100595,Chinese Restaurant,4b513a38f964a5200b4827e3,9.0
3,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Torafuku,49.275951,-123.099814,Asian Restaurant,55af01e7498ef6fadceecda9,8.0
4,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,New Town Bakery & Restaurant,49.28042,-123.101205,Chinese Restaurant,4abe7fb7f964a520078e20e3,8.1


In [18]:
##venue_id='4bdb8a9ac79cc928816d83e9'
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
result = requests.get(url).json()
result['response']['venue']['rating']

7.0

In [None]:
#there are resturants without ratings on four square, let's find out how many they are
len(asianfood[asianfood['Rating']=='NaN'])

In [19]:
#calculating median rating for all the asian resturuant in the list.
median_rating=asianfood[asianfood['Rating']!='NaN']['Rating'].median()
median_rating

7.1

-33 of these resturants have no ratings, we can use the median rating (7.1) of all these asian resturuant to replace the missing rating.

In [20]:
# use the median rating to replace NaN.
asianfood['Rating'].replace('NaN',median_rating,inplace=True)

In [26]:
#now let's find out which municiple has the most asian resturants in the area we are exploring.
asianfood[['Borough','Venue']].groupby('Borough').count().sort_values(['Venue'],ascending=False)

Unnamed: 0_level_0,Venue
Borough,Unnamed: 1_level_1
Vancouver,263
Burnaby,37
Richmond,29
Surrey,13
New Westminster,6
Coquitlam,3


-looks like Vancouver has the most, 263 out of the 351 venues we are currently exploring. Burnaby and Richmond come in second and third place.

In [30]:
#Now let's find out which neigborhood has the most density in asian food. showing the top 10 neighborhood.
Mostvenues=asianfood[['Borough','Neighborhood','Venue']].groupby(['Borough','Neighborhood']).count().sort_values(['Venue'],ascending=False).head(10)
Mostvenues

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Borough,Neighborhood,Unnamed: 2_level_1
Richmond,North,23
Vancouver,"NW West End,Stanley Park",20
Vancouver,"West Kensington-Cedar Cottage,NE Riley Park-Little Mountain",18
Vancouver,North Grandview-Woodland,18
Vancouver,South Renfrew-Collingwood,17
Burnaby,"Maywood,Marlborough,Oakalla,Windsor",16
Burnaby,"Burnaby Heights,Willingdon Heights,West Central Valley",14
Vancouver,"SE Kerrisdale,SW Oakridge,West Marpole",14
Vancouver,"West Mount Pleasant,West Riley Park-Little Mountain",14
Vancouver,"Strathcona,Chinatown,Downtown Eastside",12


-looks like although Richmond comes in third in the total restuarant number, Richmond North has the most asian food within the radius. 

In [34]:
#Let's calculate the average ratings of these resturaunts in each neighborhood and showing the top 10.
bestquality=asianfood[['Borough','Neighborhood','Rating']].groupby(['Borough','Neighborhood']).mean().sort_values(['Rating'],ascending=False)
bestquality.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Rating
Borough,Neighborhood,Unnamed: 2_level_1
Vancouver,Bentall Centre,8.675
Vancouver,"Waterfront,Coal Harbour,Canada Place",8.566667
Vancouver,"NE Downtown,Gastown,Harbour Centre,International Village,Victory Square,Yaletown",8.5625
Vancouver,Pacific Centre,8.485714
Vancouver,SW Downtown,8.444444
Vancouver,"SE West End,Davie Village",8.4
Vancouver,"Strathcona,Chinatown,Downtown Eastside",7.916667
Vancouver,"West Kerrisdale,South Dunbar-Southlands,Musqueam",7.9
Vancouver,"NW Shaughnessy,East Kitsilano,Quilchena",7.828571
Vancouver,"NW West End,Stanley Park",7.55


In [37]:
#lets add the ratings to top 10 neighborhoods has the best rating for asian food.
df_top10=pd.merge(Mostvenues,bestquality,on=['Borough','Neighborhood'])
df_top10

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue,Rating
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1
Richmond,North,23,6.952174
Vancouver,"NW West End,Stanley Park",20,7.55
Vancouver,"West Kensington-Cedar Cottage,NE Riley Park-Little Mountain",18,6.994444
Vancouver,North Grandview-Woodland,18,6.855556
Vancouver,South Renfrew-Collingwood,17,6.794118
Burnaby,"Maywood,Marlborough,Oakalla,Windsor",16,6.85625
Burnaby,"Burnaby Heights,Willingdon Heights,West Central Valley",14,6.857143
Vancouver,"SE Kerrisdale,SW Oakridge,West Marpole",14,6.485714
Vancouver,"West Mount Pleasant,West Riley Park-Little Mountain",14,6.907143
Vancouver,"Strathcona,Chinatown,Downtown Eastside",12,7.916667


In [None]:
-looks like although 

In [38]:
parks = greatvan_venues[greatvan_venues['Venue Category'].str.contains('Park')].reset_index(drop=True)
parks.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Id
0,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Creekside Park,49.274641,-123.102701,Park,4b7c2ccaf964a5201a822fe3
1,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Strathcona Park,49.275183,-123.084919,Park,4abe4197f964a520428c20e3
2,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Crab Park,49.28499,-123.101327,Park,4aaaf0f4f964a5202a5820e3
3,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,MacLean Park,49.278809,-123.088546,Park,4bbebef598f4952118c2d163
4,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971,Trillium Park,49.274308,-123.093691,Park,4d9fad3083f0b1f7b0ae9fc7


In [41]:
Mostparks=parks[['Borough','Neighborhood','Venue']].groupby(['Borough','Neighborhood']).count().sort_values(['Venue'],ascending=False)
Mostparks

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Borough,Neighborhood,Unnamed: 2_level_1
Vancouver,North Hastings-Sunrise,9
Vancouver,"Strathcona,Chinatown,Downtown Eastside",5
Vancouver,South Renfrew-Collingwood,5
Vancouver,"NW West End,Stanley Park",5
New Westminster,Northeast,4
Vancouver,"South Hastings-Sunrise,North Renfrew-Collingwood",4
Vancouver,SW Downtown,4
Vancouver,"SE Kerrisdale,SW Oakridge,West Marpole",4
Vancouver,"East Fairview,South Cambie",4
Burnaby,"Burnaby Heights,Willingdon Heights,West Central Valley",3


In [2]:
df_final=pd.read_csv('Vancouver Neighborhood.csv')

In [5]:
df_final.head()

Unnamed: 0.1,Unnamed: 0,postalcode,borough,neighborhood,latitude,longitude
0,0,V5A,Burnaby,"Government Road,Lake City,SFU,Burnaby Mountain",49.266244,-122.931096
1,1,V6A,Vancouver,"Strathcona,Chinatown,Downtown Eastside",49.278421,-123.092971
2,2,V5B,Burnaby,"Parkcrest-Aubrey,Ardingley-Sprott",49.26606,-122.95922
3,3,V6B,Vancouver,"NE Downtown,Gastown,Harbour Centre,Internation...",49.280253,-123.115695
4,4,V5C,Burnaby,"Burnaby Heights,Willingdon Heights,West Centra...",49.275565,-123.002918


In [43]:
df_station=pd.read_csv('Vancouver Skytrain Station.csv')
df_station.head()

Unnamed: 0.1,Unnamed: 0,Station,Line(s),Municipality,Zone[a],Lat,Lng
0,0,22nd Street Station,Expo,New Westminster,Zone 2,49.200065,-122.949015
1,1,29th Avenue Station,Expo,Vancouver,Zone 1,49.244208,-123.045922
2,2,Aberdeen Station,Canada,Richmond,Zone 2,49.183982,-123.136316
3,3,Braid Station,Expo,New Westminster,Zone 2,49.233109,-122.882776
4,4,Brentwood Town Centre Station,Millennium,Burnaby,Zone 2,49.266402,-123.001724


In [119]:
def getNearbyVenues2(stations, latitudes, longitudes):
    radius=1000
    LIMIT=200
    venues_list=[]
    for station, lat, lng in zip(stations, latitudes, longitudes):   
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            station, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng']) for v in results]) # we need venue id for further exploring the venues

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station', 'Station_Latitude', 'Station_Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude']
    
    return(nearby_venues)

In [120]:
stationnearby = getNearbyVenues2(stations=df_station['Station'],latitudes=df_station['Lat'],longitudes=df_station['Lng'])

In [121]:
stationnearby.head()

Unnamed: 0,Station,Station_Latitude,Station_Longitude,Venue,Venue Latitude,Venue Longitude
0,22nd Street Station,49.200065,-122.949015,Grimston Park,49.202087,-122.942628
1,22nd Street Station,49.200065,-122.949015,Pho Maxima Restaurant,49.203505,-122.949788
2,22nd Street Station,49.200065,-122.949015,Lindt Outlet Boutique,49.191788,-122.948675
3,22nd Street Station,49.200065,-122.949015,Banana Republic,49.192818,-122.950226
4,22nd Street Station,49.200065,-122.949015,GUESS Factory Store,49.193683,-122.947768


In [122]:
df_stationnearby=pd.merge(stationnearby,greatvan_venues,on=['Venue Latitude','Venue Longitude'])
df_stationnearby.head()
df_stationnearby[['Station','Borough','Neighborhood','Neighborhood Latitude','Neighborhood Longitude']].head()

Unnamed: 0,Station,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude
0,22nd Street Station,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376
1,22nd Street Station,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376
2,22nd Street Station,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376
3,22nd Street Station,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376
4,22nd Street Station,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376


In [123]:
station_nearby_list=df_stationnearby[['Borough','Neighborhood','Neighborhood Latitude','Neighborhood Longitude','Station']].groupby(['Borough','Neighborhood','Neighborhood Latitude','Neighborhood Longitude']).size().reset_index(name='venue_count')

In [124]:
station_nearby_list

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,venue_count
0,Burnaby,"Burnaby Heights,Willingdon Heights,West Centra...",49.275565,-123.002918,23
1,Burnaby,"Government Road,Lake City,SFU,Burnaby Mountain",49.266244,-122.931096,2
2,Burnaby,"Lakeview-Mayfield,Richmond Park,Kingsway-Beres...",49.226349,-122.945568,1
3,Burnaby,"Maywood,Marlborough,Oakalla,Windsor",49.230275,-122.99777,139
4,Burnaby,"Parkcrest-Aubrey,Ardingley-Sprott",49.26606,-122.95922,11
5,Burnaby,"Suncrest,Sussex-Nelson,Clinton-Glenwood,West B...",49.207474,-122.995569,6
6,New Westminster,Northeast,49.220213,-122.90239,12
7,New Westminster,Southwest(Includes Annacis Island),49.18822,-122.943376,27
8,Richmond,Central,49.159208,-123.118387,1
9,Richmond,North,49.186035,-123.11868,55


In [125]:
map_vancouver_stationnearby= folium.Map(location=GVancouver, zoom_start=11.5)

# Add the skytrain stations, green train icon with popup.  
for lat, lng, station in zip(df_station['Lat'], df_station['Lng'], df_station['Station']):
    label = station
    label = folium.Popup(label, parse_html=False)
    folium.Marker(
        [lat, lng],
        popup=label,
        icon=folium.Icon(color='green',icon="train", prefix='fa')
        ).add_to(map_vancouver_stationnearby)
# Add the neighborhoods, red circle with pop up.
for lat, lng, borough, neighborhood in zip(station_nearby_list['Neighborhood Latitude'],station_nearby_list['Neighborhood Longitude'],station_nearby_list['Borough'],station_nearby_list['Neighborhood'],):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='Blue',
        fill=True,
        fill_color='#e6aab6',
        fill_opacity=0.5,
        parse_html=False).add_to(map_vancouver_stationnearby)
map_vancouver_stationnearby