<h1> Captsone Project - The Battle Of Neighbourhoods - Live in Montreal </h1>

<h2> Introduction/Business Problem </h2>

Montreal, the 2nd most populated city in Canada has more than 4 millions population with a slow growth of 0.73% average every year. Montreal is also known as the 2nd largest economy in Canada by having a variety of businesses implementing themselves out there. As the nest of opportunities, many big tech companies started considering the city to have new offices - Google, Facebook, Microsoft to name a few. 

In the case where an individual had to relocate for a job opportunity, what would be the best locations we could suggest him? The purpose of this report is to identify what would be the best options thru a data driven research. We will identify amneties and venues based on their ratings from which we will offer options based on the relocator preferences.

This project targets mostly individuals that are not familiar with the city and that are searching a convenient borough where they can live. It will bring also options that fit the individual interest. As an example, an individual in the need of relocating, who likes parks, we expect him willing to be close to that type of venue.

<h2> Data Description </h2>

Data :<br>

- Montreal city boroughs names with their coordinates (latitude and longitude).
    - Data pulled from Wikipedia with the BeautifulSoup library. Alternatively, we could manually put the data in a CSV file.
    - Will be used with Foursquare API data to define the best venues of each boroughs.
    - We will use Folium to vizualize the different boroughs within Montreal.

- Top 10 venues based on ratings, which include their type(e.g : Restaurants, Bars, Malls, Parks, etc.) and their location (latitude and logitude).
    - For each 19 neighborhoods of Montreal. 
    - Clustering process with K-Mean algorithm to define more precisely where good venues and amneties are.
    - Data will be vizualized on folium generated map.
    - Will use the panda library to analyze and organize the data.

How : 

Several platform and techniques will be used during this report.
- Python as the interpreter language. As R, extensively used in Data analytic field. Useful for the diversity of libraries.
- Geocoders to convert address into coordinates value.
- Pandas for dataframe manipulation.
- Folium for map vizualization with our point of interest(Neighbordhoods and venues).
- Foursquare offers an API giving access to a wide range of data related to locations.
- As a clustering algorithm, K-Mean will be used to define ideal locations.


In [1]:
%pip install -q geocoder geopy folium bs4 pandas lxml html5lib sklearn matplotlib OSMPythonTools
from bs4 import BeautifulSoup
import pandas as pd
from geopy.geocoders import Nominatim
import geocoder
import numpy as np
import requests
import branca.colormap as cm
from io import StringIO
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from folium import plugins
from folium.plugins import HeatMap

You should consider upgrading via the 'C:\Users\aeezzaam\AppData\Local\Programs\Python\Python36-32\python.exe -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# @hidden_cell
VERSION = '20180605'
radius = 500
LIMIT = 100

In [3]:
import requests
def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
Montreal = get_coordinates(MyGoogleAPIKey, "Montreal")

In [6]:
geoDF = pd.read_csv('MontrealBoroughs.csv')
geoDF['Lat'] = 0
geoDF['Lat'] = geoDF['Lat'].astype(float)
 = 0
geoDF['Long'] = geoDF['Long'].astype(float)
for i,borough in enumerate(geoDF['Boroughs']):
    boroughCoor = get_coordinates(MyGoogleAPIKey, borough+" Montreal")
    geoDF.at[i,'Lat']= boroughCoor[0]
    geoDF.at[i,'Long']= boroughCoor[1]
geoDF

IndentationError: unexpected indent (<ipython-input-6-350ce4e28d6c>, line 4)

In [7]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,Type,lat,long in zip(geoDF['Boroughs'],geoDF['Type'],geoDF['Lat'],geoDF['Long']):
    label = folium.Popup(borough, parse_html=True)
    color='blue'
    if Type == 'B':
        color='blue'
    else:
        color='green'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
mtlvenues = getNearbyVenues(names=geoDF['Boroughs'],
                                   latitudes=geoDF['Lat'],
                                   longitudes=geoDF['Long']
                                  )
mtlvenues.count()
#mtlvenues[montreal_venus['Neighborhood'].isin(['Pierrefonds-Roxboro'])].count()

Neighborhood              1731
Neighborhood Latitude     1731
Neighborhood Longitude    1731
Venue                     1731
Venue Latitude            1731
Venue Longitude           1731
Venue Category            1731
dtype: int64

In [10]:
map = folium.Map(location=Montreal, zoom_start=11)
plugins.ScrollZoomToggler().add_to(map)
for borough,lat,long in zip(mtlvenues['Venue'],mtlvenues['Venue Latitude'],mtlvenues['Venue Longitude']):
    label = folium.Popup(borough, parse_html=True)
    color='red'
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map) 
map

In [11]:
montrealone = pd.get_dummies(mtlvenues[['Venue Category']], prefix="", prefix_sep="")
montrealone['Neighborhood'] = mtlvenues['Neighborhood']
montrealone = montrealone[([montrealone.columns[-1]] + list(montrealone.columns[:-1]))]
montreal_grouped = montrealone.groupby('Neighborhood').mean().reset_index()
montreal_grouped.head(100)

Unnamed: 0,Neighborhood,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,...,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Women's Store,Yoga Studio,Zoo
0,Ahuntsic-Cartierville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.011364,0.011364,0.011364,0.0,0.0
1,Anjou,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0
2,Baie-d'Urfe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667
3,Beaconsfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cote Saint-Luc,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Cote-des-Neiges-Notre-Dame-de-Grace,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.01,0.0
6,Dorval,0.02439,0.04878,0.04878,0.0,0.012195,0.0,0.0,0.0,0.012195,...,0.02439,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Kirkland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
8,L'ile-Bizard-Sainte-Genevieve,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,LaSalle,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,...,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [12]:
num_top_venues = 5
indicators = ['st', 'nd', 'rd']
def fTopVenues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = montreal_grouped['Neighborhood']

for ind in np.arange(montreal_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = fTopVenues(montreal_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Ahuntsic-Cartierville,Pharmacy,Grocery Store,Breakfast Spot,Italian Restaurant,Liquor Store
1,Anjou,Coffee Shop,Restaurant,Pizza Place,Clothing Store,Fast Food Restaurant
2,Baie-d'Urfe,Zoo,Train Station,Sandwich Place,Hotel,Liquor Store
3,Beaconsfield,Furniture / Home Store,Pharmacy,Pub,Soccer Field,Bank
4,Cote Saint-Luc,Bank,Grocery Store,Discount Store,Park,Pharmacy
5,Cote-des-Neiges-Notre-Dame-de-Grace,Vietnamese Restaurant,Chinese Restaurant,Coffee Shop,Fast Food Restaurant,Bakery
6,Dorval,Coffee Shop,Hotel,Park,Airport Lounge,Airport Service
7,Kirkland,Fast Food Restaurant,Coffee Shop,Pharmacy,Italian Restaurant,Furniture / Home Store
8,L'ile-Bizard-Sainte-Genevieve,Golf Course,Convenience Store,Athletics & Sports,Pharmacy,Park
9,LaSalle,Fast Food Restaurant,Pharmacy,Grocery Store,Coffee Shop,Pizza Place


In [78]:
url = 'http://donnees.ville.montreal.qc.ca/dataset/5829b5b0-ea6f-476f-be94-bc2b8797769a/resource/c6f482bf-bf0f-4960-8b2f-9982c211addd/download/interventionscitoyendo.csv'
r = requests.get(url)
montrealcrime = pd.read_csv(StringIO(r.text))
montrealcrime = pd.DataFrame(montrealcrime)
montrealcrime['DATE'] = pd.to_datetime(montrealcrime['DATE'])
montrealcrime.sort_values('DATE',inplace=True,ascending=False)
montrealcrime = montrealcrime[(montrealcrime['DATE'] > '2019-12-01') & (montrealcrime['DATE'] < '2020-06-06')]
montrealcrime.rename(columns={'LONGITUDE':'long','LATITUDE':'lat'}, inplace=True)
montrealcrime.drop(['QUART', 'PDQ','X','Y','DATE'], axis=1, inplace=True)
montrealcrime = montrealcrime[montrealcrime.long != 1]
#montrealcrime=montrealcrime.reindex(columns=columns_titles)
#montrealcrime.shape

In [79]:
montrealcrime.shape

(1719, 3)

In [80]:
montrealcrime = montrealcrime.tail(1000)

In [83]:
#montrealcrime.shape
montrealcrime.head()

Unnamed: 0,CATEGORIE,long,lat
139969,Vol de véhicule à moteur,-73.595614,45.579468
139970,Vol de véhicule à moteur,-73.595614,45.579468
139971,Vol de véhicule à moteur,-73.595614,45.579468
139972,Vol de véhicule à moteur,-73.595614,45.579468
139996,Méfait,-73.697992,45.527682


In [84]:
map = folium.Map(location=Montreal, zoom_start=11)
steps = 20
plugins.ScrollZoomToggler().add_to(map)

heat_data = [[row['lat'],row['long']] for index, row in montrealcrime.iterrows()]
HeatMap(heat_data,radius=12).add_to(map)
map

In [85]:
def fGeoToAddr(lat,long):
    geolocator = Nominatim(user_agent="https")
    return geolocator.reverse(str(lat)+","+str(long))
montrealcrime['Address'] = "Test"
montrealcrime.reset_index(inplace=True,drop=True)
#print(montrealcrime.at[144159,'Address'])
for index,row in enumerate(montrealcrime.iterrows()):
    print(index)
    montrealcrime.at[index,"Address"] = fGeoToAddr(row[1][2],row[1][1])
#montrealcrime["long"].iloc[10]
#test.split(',')[1]

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [86]:
montrealcrime.head(500)

Unnamed: 0,CATEGORIE,long,lat,Address
0,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon..."
1,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon..."
2,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon..."
3,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon..."
4,Méfait,-73.697992,45.527682,"(11525, Rue Lavigne, Cartierville, Montréal, A..."
...,...,...,...,...
495,Vol dans / sur véhicule à moteur,-73.510028,45.606967,"(9351, Rue Bellerive, Bellerive, Beaurivage, M..."
496,Méfait,-73.609393,45.560233,"(7820, 6e Avenue, Gabriel-Sagard, Montréal, Ag..."
497,Introduction,-73.582804,45.516829,"(97, Rue Rachel Ouest, Saint-Louis, Plateau Mo..."
498,Vol dans / sur véhicule à moteur,-73.568652,45.468776,"(3501, Rue Gertrude, Verdun-Centre, Verdun, Mo..."


In [87]:
montrealcrime['Neighborhood'] = " "

In [89]:
montrealcrime['Address'][0]

Location(4898, Rue Jarry Est, Port-Maurice, Saint-Léonard, Montréal, Agglomération de Montréal, Montréal (06), Québec, H1R 3A9, Canada, (45.5793096, -73.5956092, 0.0))

In [91]:
for index,row in enumerate(montrealcrime.iterrows()):
    montrealcrime.at[index,"Neighborhood"] = str(montrealcrime.at[index,"Address"]).split(',')[3]

In [92]:
montrealcrime.head(500)

Unnamed: 0,CATEGORIE,long,lat,Address,Neighborhood
0,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon...",Saint-Léonard
1,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon...",Saint-Léonard
2,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon...",Saint-Léonard
3,Vol de véhicule à moteur,-73.595614,45.579468,"(4898, Rue Jarry Est, Port-Maurice, Saint-Léon...",Saint-Léonard
4,Méfait,-73.697992,45.527682,"(11525, Rue Lavigne, Cartierville, Montréal, A...",Montréal
...,...,...,...,...,...
495,Vol dans / sur véhicule à moteur,-73.510028,45.606967,"(9351, Rue Bellerive, Bellerive, Beaurivage, M...",Beaurivage
496,Méfait,-73.609393,45.560233,"(7820, 6e Avenue, Gabriel-Sagard, Montréal, Ag...",Montréal
497,Introduction,-73.582804,45.516829,"(97, Rue Rachel Ouest, Saint-Louis, Plateau Mo...",Plateau Mont-Royal
498,Vol dans / sur véhicule à moteur,-73.568652,45.468776,"(3501, Rue Gertrude, Verdun-Centre, Verdun, Mo...",Verdun


In [94]:
montrealcrime.groupby(['CATEGORIE'])['Neighborhood'].count()

CATEGORIE
Infractions entrainant la mort        5
Introduction                        330
Méfait                              200
Vol dans / sur véhicule à moteur    271
Vol de véhicule à moteur            140
Vols qualifiés                       54
Name: Neighborhood, dtype: int64

In [97]:
montrealcrime.groupby(['Neighborhood'])['CATEGORIE'].count().sort_values(ascending=False)

Neighborhood
 Montréal                                    204
 Ville-Marie                                 140
 Plateau Mont-Royal                           63
 Agglomération de Montréal                    58
 Côte-des-Neiges–Notre-Dame-de-Grâce          54
 Saint-Laurent                                44
 Saint-Léonard                                42
 Le Sud-Ouest                                 39
 Montréal-Nord                                38
 Rosemont–La Petite-Patrie                    36
 LaSalle                                      34
 Tétreaultville                               31
 Verdun                                       28
 Rivière-des-Prairies–Pointe-aux-Trembles     25
 Sault-au-Récollet                            15
 Pierrefonds-Roxboro                          15
 Beaurivage                                   14
 Crémazie                                     13
 Préfontaine                                  11
 La Visitation                                10
 Père-M