<h1> Captsone Project - The Battle Of Neighbourhoods - Live in Montreal </h1>

<h2> Introduction/Business Problem </h2>

Montreal, the 2nd most populated city in Canada has more than 4 millions population with a slow growth of 0.73% average every year. Montreal is also known as the 2nd largest economy in Canada by having a variety of businesses implementing themselves out there. As the nest of opportunities, many big tech companies started considering the city to have new offices - Google, Facebook, Microsoft to name a few. 

In the case where an individual had to relocate for a job opportunity, what would be the best locations we could suggest him? The purpose of this report is to identify what would be the best options thru a data driven research. We will identify amneties and venues based on their ratings from which we will offer options based on the relocator preferences.

This project targets mostly individuals that are not familiar with the city and that are searching a convenient borough where they can live. It will bring also options that fit the individual interest. As an example, an individual in the need of relocating, who likes parks, we expect him willing to be close to that type of venue.

<h2> Data Description </h2>

Data :<br>

- Montreal city boroughs names with their coordinates (latitude and longitude).
    - Data pulled from Wikipedia with the BeautifulSoup library. Alternatively, we could manually put the data in a CSV file.
    - Will be used with Foursquare API data to define the best venues of each boroughs.
    - We will use Folium to vizualize the different boroughs within Montreal.

- Top 10 venues based on ratings, which include their type(e.g : Restaurants, Bars, Malls, Parks, etc.) and their location (latitude and logitude).
    - For each 19 neighborhoods of Montreal. 
    - Clustering process with K-Mean algorithm to define more precisely where good venues and amneties are.
    - Data will be vizualized on folium generated map.
    - Will use the panda library to analyze and organize the data.

How : 

Several platform and techniques will be used during this report.
- Python as the interpreter language. As R, extensively used in Data analytic field. Useful for the diversity of libraries.
- Geocoders to convert address into coordinates value.
- Pandas for dataframe manipulation.
- Folium for map vizualization with our point of interest(Neighbordhoods and venues).
- Foursquare offers an API giving access to a wide range of data related to locations.
- As a clustering algorithm, K-Mean will be used to define ideal locations.


In [1]:
%pip install -q geocoder geopy folium bs4 pandas lxml html5lib sklearn matplotlib OSMPythonTools
from bs4 import BeautifulSoup
import pandas as pd
from geopy.geocoders import Nominatim
import geocoder
import numpy as np
import requests
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from folium import plugins

Note: you may need to restart the kernel to use updated packages.


In [2]:
# @hidden_cell
VERSION = '20180605'
radius = 500
LIMIT = 100

In [3]:
import requests
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
Montreal = get_coordinates(MyGoogleAPIKey, "Montreal")

In [4]:
geoDF = pd.read_csv('MontrealBoroughs.csv')
geoDF['Lat'] = 0
geoDF['Lat'] = geoDF['Lat'].astype(float)
geoDF['Long'] = 0
geoDF['Long'] = geoDF['Long'].astype(float)
for i,borough in enumerate(geoDF['Boroughs']):
    boroughCoor = get_coordinates(MyGoogleAPIKey, borough+" Montreal")
    geoDF.at[i,'Lat']= boroughCoor[0]
    geoDF.at[i,'Long']= boroughCoor[1]
geoDF

Unnamed: 0,Boroughs,Type,Lat,Long
0,Pierrefonds-Roxboro,B,45.50963,-73.819152
1,Cote-des-Neiges-Notre-Dame-de-Grace,B,45.491151,-73.632653
2,Mercier-Hochelaga-Maisonneuve,B,45.572978,-73.530795
3,Verdun,B,45.454827,-73.569873
4,LaSalle,B,45.430627,-73.634801
5,Ville-Marie,B,45.508794,-73.555302
6,Lachine,B,45.441347,-73.688585
7,Saint-Laurent,B,45.498564,-73.749757
8,Saint-Leonard,B,45.587473,-73.59701
9,Southwest,B,45.466107,-73.593866


In [5]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,Type,lat,long in zip(geoDF['Boroughs'],geoDF['Type'],geoDF['Lat'],geoDF['Long']):
    label = folium.Popup(borough, parse_html=True)
    color='blue'
    if Type == 'B':
        color='blue'
    else:
        color='green'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
mtlvenues = getNearbyVenues(names=geoDF['Boroughs'],
                                   latitudes=geoDF['Lat'],
                                   longitudes=geoDF['Long']
                                  )
mtlvenues.count()
#mtlvenues[montreal_venus['Neighborhood'].isin(['Pierrefonds-Roxboro'])].count()

Neighborhood              898
Neighborhood Latitude     898
Neighborhood Longitude    898
Venue                     898
Venue Latitude            898
Venue Longitude           898
Venue Category            898
dtype: int64

In [8]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,lat,long in zip(mtlvenues['Venue'],mtlvenues['Venue Latitude'],mtlvenues['Venue Longitude']):
    label = folium.Popup(borough, parse_html=True)
    color='red'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [None]:
montrealone = pd.get_dummies(mtlvenues[['Venue Category']], prefix="", prefix_sep="")
montrealone['Neighborhood'] = mtlvenues['Neighborhood']
montrealone = montrealone[([montrealone.columns[-1]] + list(montrealone.columns[:-1]))]
montreal_grouped = montrealone.groupby('Neighborhood').mean().reset_index()
montreal_grouped.head()