## <font size = 6> Explore the Neighborhood and Venues of Montreal

## 1. Introduction


 
The city of Montreal is the second most populous city in Canada, just behind the city of Toronto. It is the most important city in the province of Quebec. Montreal is a place that is warm, dynamic, relaxed, innovative, cosmopolitan, modern and historic. In the city, we live there in bilingualism and multiculturalism. It is undoubtedly the most bilingual city in North America where more than 50% of the “Montrealers” are bilingual. 
Less pretentious than Paris, less busy than New York, more creative than Toronto, Montreal has kept its own identity without influencing by the international standard. The unique art of living that combines good humor, accessibility, cosmopolitanism and culture is a perfect mix that inevitably makes you want to spend at least few days in this city. 

The city is also known its international gastronomy and it is a home for numerous international festivals. 

## 2. Business Problem
 
Similar to various part in the world, due to the arrival of COVID-19 pandemic, most of the people in the world are restricted to travelling. The objective of the project is to use data science skills to allow the reader to explore the beautiful city of Montreal virtually. With the hope that, the reader will gain interest of the city throughout the project and will explore the city of Montreal in person when the pandemic end (hopefully, very soon).

## 3. Data Description
The following data is used for the analysis:

-   List of FSA with the corresponding neighborhood in the city of Montreal [1].
-   Forsquare API to get the most common venues of given Borough of Montreal [2].
-   Geopy package to get the latitude and the longitude coordinates of each neighborhood. 


## 4. Methodology

### 4.1 Data Cleaning and Dataset Exploration

#### Import all the necessary Python Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium==0.5.0
import folium # map rendering library

# import BeautifulSoup package for web scraping 
!pip install bs4
from bs4 import BeautifulSoup

!pip install pgeocode
import pgeocode 

print('Libraries imported.')

Libraries imported.


#### Web scrapping - Information on Montreal's neighborhood 
As a starting point, a list of Montreal's neighborhood with the corresponding postal code is extracted from Wikipedia web page with the web scrapping technic from BeautifulSoup package. 

In [2]:
# Parsing the HTML
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_H'
result = requests.get(url)

soup = BeautifulSoup(result.content, 'html.parser')
table = soup.find('table')

a_i_s = table.find_all(['a', 'i'])
b_s = table.find_all('b')

# Get the fsas
postcodes = []
for b in b_s:
    postcode = b
    if postcode:
        postcodes.append(postcode)
#postcodes
mtl_postcode = []
for postcode in postcodes:
    postalcode = postcode.text.rstrip()
    #neighborhood = row[1].text.rstrip()
    mtl_postcode.append([postalcode])
#mtl_postcode

# Get the the neighborhood
neighbors = []
for a in a_i_s:
    neighbor = a
    if neighbor:
        neighbors.append(neighbor)
#neighbors
mtl_neighborhood = []
for neighbor in neighbors:
    neighborhood = neighbor.text.rstrip()
    #neighborhood = row[1].text.rstrip()
    mtl_neighborhood.append([neighborhood])
#mtl_neighborhood

# define the dataframe columns
column_post = ['Postcode' ] 
column_neigh = ['Neighborhood']

# instantiate the dataframe & further data clean-up
df_pc = pd.DataFrame(mtl_postcode,columns=column_post)

df_neigh = pd.DataFrame(mtl_neighborhood,columns=column_neigh)

df_neigh.drop([4, 23, 24, 25, 44, 50, 51, 59, 123, 134, 156, 177, 181], inplace=True)

df_neigh.reset_index(drop=True, inplace=True)

df_pc.drop([45, 46], inplace=True)

df_pc.reset_index(drop=True, inplace=True)

#print(df_neigh.shape)
#print(df_pc.shape)

#df_pc
#df_neigh

df_neighborhood = pd.concat([df_pc, df_neigh], axis=1)

df_neighborhood = df_neighborhood[df_neighborhood.Neighborhood != 'Not assigned']

df_neighborhood

Unnamed: 0,Postcode,Neighborhood
1,H1A,Pointe-aux-Trembles
2,H2A,Saint-Michel
3,H3A,Downtown Montreal
4,H4A,Notre-Dame-de-Grâce
5,H5A,Place Bonaventure
6,H7A,Duvernay-Est
8,H9A,Dollard-des-Ormeaux
10,H1B,Montreal East
11,H2B,Ahuntsic
12,H3B,Downtown Montreal


#### Use Geocode package to get the longtitude and latitude of the postal code
Now that a list of postal code with the corresponding neighborhood is extracted from Wikipedia, the longtitude and latitude of each neighborhood is obtained using the Geopy package. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent mtl_explorer.

In [None]:
# define the dataframe columns
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
Lat_Long_table = pd.DataFrame(columns=column_names)
geolocator = Nominatim(user_agent="mtl_explorer")

# get the latitude and longtitude of the neighborhood
for data in df_neighborhood['Neighborhood']:
    neighborhood = data
    location = geolocator.geocode(data + ' Montréal')
    if location:
        latitude = location.latitude
        longitude = location.longitude
    
    Lat_Long_table = Lat_Long_table.append({'Latitude': latitude,
                                            'Longitude': longitude,
                                            'Neighborhood': neighborhood}, ignore_index=True)
Lat_Long_table


In [26]:
df_mtl_neigh_LongLat = Lat_Long_table['Neighborhood'].unique()

Unnamed: 0,Postcode,Neighborhood,Latitude,Longitude
0,H1A,Pointe-aux-Trembles,45.6753,-73.5016
1,H2A,Saint-Michel,45.5618,-73.599
2,H3A,Downtown Montreal,45.504,-73.5747
3,H4A,Notre-Dame-de-Grâce,45.4717,-73.6149
4,H5A,Place Bonaventure,45.4992,-73.5646
5,H7A,Duvernay-Est,45.6739,-73.5924
6,H9A,Dollard-des-Ormeaux,45.4948,-73.8317
7,H1B,Montreal East,45.632,-73.5075
8,H2B,Ahuntsic,45.5741,-73.6507
9,H3B,Downtown Montreal,45.5005,-73.5684


#### Use geopy package to get the longtitude and latitude value for the city of Montreal.

In [42]:
address = 'Montreal, CA'

geolocator = Nominatim(user_agent="mtl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montreal are 45.4972159, -73.6103642.


Location(Montréal, Agglomération de Montréal, Montréal (06), Québec, Canada, (45.4972159, -73.6103642, 0.0))

#### Create a map of Montreal with neighborhoods superimposed on top.
A map of Montreal with neighborhoods superimposed on top. We use latitude and longitude values to get the visual

In [13]:
# create map of Montreal using latitude and longitude values
map_mtl = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mtl)  
    
map_mtl

NameError: name 'neighborhoods' is not defined

#### Define Foursquare Credentials and Version
Now, Foursquare API is used to explore and segment the neighbors 

In [14]:
CLIENT_ID = 'KILPJAZ5ZQ1E4PKG35SUOFPSG5G2YEIDCI3DMD1VAACW55NK' # your Foursquare ID
CLIENT_SECRET = 'YPGN5GXXVHJPOQIP05YQTXXS2PHOJELTO0ZLSKPDOI3Z4XD1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KILPJAZ5ZQ1E4PKG35SUOFPSG5G2YEIDCI3DMD1VAACW55NK
CLIENT_SECRET:YPGN5GXXVHJPOQIP05YQTXXS2PHOJELTO0ZLSKPDOI3Z4XD1


#### Explore third neighborhood in our dataframe: Downtown Montreal	

In [None]:
neighborhood_latitude = df_mtl_LongLat.loc[2, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_mtl_LongLat.loc[2, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_mtl_LongLat.loc[2, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

First, a request URL is created, then the top 100 venues with a radius of 500 meters are set parameter for the exploration

In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

NameError: name 'neighborhood_latitude' is not defined

Send the GET request

In [None]:
results = requests.get(url).json()

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.

In [18]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

NameError: name 'results' is not defined

And how many venues were returned by Foursquare?

In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

### 4.2 Explore Neighborhoods in Montreal

Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now, the above function is executed on each neighborhood and create a new dataframe called montreal_venues

In [21]:
address = 'H7J'

#geolocator = Nominatim(user_agent="mtl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude, longitude))

AttributeError: 'NoneType' object has no attribute 'latitude'

In [10]:
location = geolocator.geocode("Pointe-aux-Trembles Montréal")

print((location))

Pointe-aux-Trembles, Rivière-des-Prairies–Pointe-aux-Trembles, Montréal, Agglomération de Montréal, Montréal (06), Québec, Canada
