<h1> Captsone Project - The Battle Of Neighbourhoods - Live in Montreal </h1>

<h2> Introduction/Business Problem </h2>

Montreal, the 2nd most populated city in Canada has more than 4 millions population with a slow growth of 0.73% average every year. Montreal is also known as the 2nd largest economy in Canada by having a variety of businesses implementing themselves out there. As the nest of opportunities, many big tech companies started considering the city to have new offices - Google, Facebook, Microsoft to name a few. 

In the case where an individual had to relocate for a job opportunity, what would be the best locations we could suggest him? The purpose of this report is to identify what would be the best options thru a data driven research. We will identify amneties and venues based on their ratings from which we will offer options based on the relocator preferences.

This project targets mostly individuals that are not familiar with the city and that are searching a convenient borough where they can live. It will bring also options that fit the individual interest. As an example, an individual in the need of relocating, who likes parks, we expect him willing to be close to that type of venue.

<h2> Data Description </h2>

Data :<br>

- Montreal city boroughs names with their coordinates (latitude and longitude).
    - Data pulled from Wikipedia with the BeautifulSoup library. Alternatively, we could manually put the data in a CSV file.
    - Will be used with Foursquare API data to define the best venues of each boroughs.
    - We will use Folium to vizualize the different boroughs within Montreal.

- Top 10 venues based on ratings, which include their type(e.g : Restaurants, Bars, Malls, Parks, etc.) and their location (latitude and logitude).
    - For each 19 neighborhoods of Montreal. 
    - Clustering process with K-Mean algorithm to define more precisely where good venues and amneties are.
    - Data will be vizualized on folium generated map.
    - Will use the panda library to analyze and organize the data.

How : 

Several platform and techniques will be used during this report.
- Python as the interpreter language. As R, extensively used in Data analytic field. Useful for the diversity of libraries.
- Geocoders to convert address into coordinates value.
- Pandas for dataframe manipulation.
- Folium for map vizualization with our point of interest(Neighbordhoods and venues).
- Foursquare offers an API giving access to a wide range of data related to locations.
- As a clustering algorithm, K-Mean will be used to define ideal locations.


In [94]:
%pip install -q geocoder geopy folium bs4 pandas lxml html5lib sklearn matplotlib OSMPythonTools
from bs4 import BeautifulSoup
import pandas as pd
from geopy.geocoders import Nominatim
import geocoder
import numpy as np
import requests
import branca.colormap as cm
from io import StringIO
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from folium import plugins
from folium.plugins import HeatMap

Note: you may need to restart the kernel to use updated packages.


In [3]:
# @hidden_cell
VERSION = '20180605'
radius = 500
LIMIT = 100

In [4]:
import requests
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
Montreal = get_coordinates(MyGoogleAPIKey, "Montreal")

In [5]:
geoDF = pd.read_csv('MontrealBoroughs.csv')
geoDF['Lat'] = 0
geoDF['Lat'] = geoDF['Lat'].astype(float)
geoDF['Long'] = 0
geoDF['Long'] = geoDF['Long'].astype(float)
for i,borough in enumerate(geoDF['Boroughs']):
    boroughCoor = get_coordinates(MyGoogleAPIKey, borough+" Montreal")
    geoDF.at[i,'Lat']= boroughCoor[0]
    geoDF.at[i,'Long']= boroughCoor[1]
geoDF

Unnamed: 0,Boroughs,Type,Lat,Long
0,Pierrefonds-Roxboro,B,45.50963,-73.819152
1,Cote-des-Neiges-Notre-Dame-de-Grace,B,45.491151,-73.632653
2,Mercier-Hochelaga-Maisonneuve,B,45.572978,-73.530795
3,Verdun,B,45.454827,-73.569873
4,LaSalle,B,45.430627,-73.634801
5,Ville-Marie,B,45.508794,-73.555302
6,Lachine,B,45.441347,-73.688585
7,Saint-Laurent,B,45.498564,-73.749757
8,Saint-Leonard,B,45.587473,-73.59701
9,Southwest,B,45.466107,-73.593866


In [6]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,Type,lat,long in zip(geoDF['Boroughs'],geoDF['Type'],geoDF['Lat'],geoDF['Long']):
    label = folium.Popup(borough, parse_html=True)
    color='blue'
    if Type == 'B':
        color='blue'
    else:
        color='green'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [8]:
mtlvenues = getNearbyVenues(names=geoDF['Boroughs'],
                                   latitudes=geoDF['Lat'],
                                   longitudes=geoDF['Long']
                                  )
mtlvenues.count()
#mtlvenues[montreal_venus['Neighborhood'].isin(['Pierrefonds-Roxboro'])].count()

Neighborhood              1757
Neighborhood Latitude     1757
Neighborhood Longitude    1757
Venue                     1757
Venue Latitude            1757
Venue Longitude           1757
Venue Category            1757
dtype: int64

In [9]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,lat,long in zip(mtlvenues['Venue'],mtlvenues['Venue Latitude'],mtlvenues['Venue Longitude']):
    label = folium.Popup(borough, parse_html=True)
    color='red'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [10]:
montrealone = pd.get_dummies(mtlvenues[['Venue Category']], prefix="", prefix_sep="")
montrealone['Neighborhood'] = mtlvenues['Neighborhood']
montrealone = montrealone[([montrealone.columns[-1]] + list(montrealone.columns[:-1]))]
montreal_grouped = montrealone.groupby('Neighborhood').mean().reset_index()
montreal_grouped.head(100)

Unnamed: 0,Neighborhood,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arepa Restaurant,Art Gallery,Art Museum,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store,Yoga Studio,Zoo
0,Ahuntsic-Cartierville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.011765,0.023529,0.011765,0.011765,0.011765,0.0,0.0
1,Anjou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0
2,Baie-d'Urfe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857
3,Beaconsfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cote Saint-Luc,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Cote-des-Neiges-Notre-Dame-de-Grace,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.0
6,Dorval,0.0,0.024096,0.048193,0.048193,0.0,0.012048,0.0,0.0,0.0,...,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Kirkland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0
8,L'ile-Bizard-Sainte-Genevieve,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,LaSalle,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,...,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
num_top_venues = 5
indicators = ['st', 'nd', 'rd']
def fTopVenues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = fTopVenues(montreal_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Ahuntsic-Cartierville,Pharmacy,Grocery Store,Breakfast Spot,Italian Restaurant,Park
1,Anjou,Coffee Shop,Restaurant,Pizza Place,Furniture / Home Store,Clothing Store
2,Baie-d'Urfe,Zoo,Hotel,Sandwich Place,Moving Target,Grocery Store
3,Beaconsfield,Soccer Field,Bank,Pharmacy,Pizza Place,Pub
4,Cote Saint-Luc,Bank,Discount Store,Grocery Store,Park,Shopping Mall
5,Cote-des-Neiges-Notre-Dame-de-Grace,Vietnamese Restaurant,Coffee Shop,Chinese Restaurant,Park,Fast Food Restaurant
6,Dorval,Coffee Shop,Hotel,Park,Airport Lounge,Airport Service
7,Kirkland,Fast Food Restaurant,Coffee Shop,Pharmacy,Italian Restaurant,Pizza Place
8,L'ile-Bizard-Sainte-Genevieve,Convenience Store,Golf Course,Athletics & Sports,Pharmacy,Park
9,LaSalle,Fast Food Restaurant,Pizza Place,Grocery Store,Pharmacy,Coffee Shop


In [105]:
url = 'http://donnees.ville.montreal.qc.ca/dataset/5829b5b0-ea6f-476f-be94-bc2b8797769a/resource/c6f482bf-bf0f-4960-8b2f-9982c211addd/download/interventionscitoyendo.csv'
r = requests.get(url)
montrealcrime = pd.read_csv(StringIO(r.text))
montrealcrime = pd.DataFrame(montrealcrime)
montrealcrime['DATE'] = pd.to_datetime(montrealcrime['DATE'])
montrealcrime.sort_values('DATE',inplace=True,ascending=False)
montrealcrime = montrealcrime[(montrealcrime['DATE'] > '2019-06-01') & (montrealcrime['DATE'] < '2020-01-01')]
montrealcrime.rename(columns={'LONGITUDE':'long','LATITUDE':'lat'}, inplace=True)
montrealcrime.drop(['QUART', 'PDQ','X','Y','DATE'], axis=1, inplace=True)
montrealcrime.head()
#montrealcrime.shape

Unnamed: 0,CATEGORIE,long,lat
135995,Méfait,-73.568503,45.54552
149838,Vols qualifiés,1.0,1.0
142192,Vol dans / sur véhicule à moteur,-73.539019,45.589891
136763,Méfait,-73.533879,45.602308
141840,Introduction,-73.620379,45.535574


In [102]:
map = folium.Map(location=Montreal, zoom_start=11)
steps = 20

plugins.ScrollZoomToggler().add_to(map)

heat_data = [[row['lat'],row['long']] for index, row in montrealcrime.iterrows()]
HeatMap(heat_data,radius=12).add_to(map)
map

In [114]:
from geopy.geocoders import Nominatim

print(geolocator.reverse("45.545520,-73.568503"))
print(str(geolocator.reverse("45.589891,-73.539019")).split(",")[3])

def fGeoToAddr(lat,long)
    geolocator = Nominatim(user_agent="https")
    return geolocator.reverse(lat,long)
print(fGeoToAddr("45.545520",""))

2840, Rue Gilford, Petite-Côte, Rosemont–La Petite-Patrie, Montréal, Agglomération de Montréal, Montréal (06), Québec, H1Y 1Z6, Canada
 Montréal
