<h1>The Battle of Neighbourhoods</h1>

<h3>Introduction</h3>

This project primarily helps people who have to move to Brussels, Belgium for professional reasons to find a suitable neighbourhood to live in. The analysis is intended to help future expats to find a suitable area more easily and to find a similar neighbourhood as they were used to before. For this purpose, a comparison is made between the different neighbourhoods in Brussels.

<u>Background</u>

Brussels grew from a small rural settlement on the river Senne to become an important city-region in Europe. Since the end of the Second World War, it has been a major centre for international politics and home to numerous international organisations, politicians, diplomats and civil servants. Brussels is the de facto capital of the European Union, as it hosts a number of principal EU institution. 

<u>Problem</u>

This circumstance includes the fact that a large number of employees, also called expats, who come from all countries in Europe and worldwide, have to find accommodation in Brussels on a temporary, but also on a longer-term basis and look for suitable residential areas to feel at home.

<h3>Data</h3>

In this project, three different datasets will be used to solve the problem – Monitoring of the Neighbourhoods in Brussels, Brussels Recorded Crime and Foursquare API. After scraping them from original and reliable sources, they will be wrangled and cleansed for further analysis into more useful forms.

<b>Load required libraries</b>

In [182]:
import pandas as pd
import numpy as np

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from geopy.distance import great_circle

! pip install folium==0.5.0
import folium # map rendering library

import json # library to handle JSON files # tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import requests # library to handle requests
from requests import get

!pip install beautifulsoup4
from bs4 import BeautifulSoup

import re

from time import sleep

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

import seaborn as sns


# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Folium installed')
print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Folium installed
Libraries imported.


<h3>1. Get  data from "monitoringdesquartiers.brussels"</h3>

<ul>The dataset includes a list of all neighbourhoods in the Brussels-Capital Region and the related and required data:
                <li>	Municipality and Postal Code</li>
                <li>	Population data like population density</li>
                <li>	Building structure like share of high-rise buildings, office density</li>
                <li>	Income structure</li>
                <li>	Resident structure like family share, single person share</li>
                <li>	Environment like access to green spaces</li>
                <li>	Mobility like access to public transport</li>
</ul>


In [183]:

import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

if os.environ.get('RUNTIME_ENV_LOCATION_TYPE') == 'external':
    endpoint_de43d4a52d9e4fc7abc22e6384097331 = 'https://s3-api.us-geo.objectstorage.softlayer.net'
else:
    endpoint_de43d4a52d9e4fc7abc22e6384097331 = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'

client_de43d4a52d9e4fc7abc22e6384097331 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='AIsxCLpQcUYKOEytlQzqR-gJMsPhpRheJoDEVSl9lkN2',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url=endpoint_de43d4a52d9e4fc7abc22e6384097331)

body = client_de43d4a52d9e4fc7abc22e6384097331.get_object(Bucket='courseraapplieddatasciencecapston-donotdelete-pr-deazennyia8bho',Key='y_monitoringdesquartiers.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

monitoring = pd.read_csv(body)
monitoring.head()


Unnamed: 0,Code,Neighbourhood,Postal Code,Densité de population,Loyer mensuel moyen par logemen,Taux de mobilité,Revenu imposable médian des déclarations,Part des isoles au part des couples sans enfants dans le total de 2019,Part des couples avec enfants dans le total des ménages privés,Part de lEurope (hors Belgique),Part des bâtiments de 5 niveaux et plusStockwerke,Part des ménages résidant en appartement,Part des ménages résidant en maison unifamiliale,Densité de bureaux,Part de la population à proximité d'un espace vert accessible au public,Part de la population à proximité d'un arrêt de métro ou d'un tram Chrono
0,1,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,535255,61,98
1,2,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,227800,61,94
2,3,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,379896,100,100
3,4,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,908364,78,100
4,5,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,1711023,100,100


The data set has been simplified and prepared for further analysis.

In [184]:
translate = {"Densité de population" : "Population","Loyer mensuel moyen par logemen" : "Rent", "Taux de mobilité" : "Mobility", "Revenu imposable médian des déclarations" : "Tax", "Part des isoles au part des couples sans enfants dans le total de 2019" : "Singles", "Part des couples avec enfants dans le total des ménages privés" : "Families","Part de lEurope (hors Belgique)" : "European", "Part des bâtiments de 5 niveaux et plusStockwerke" : "Skyscrapers", "Part des ménages résidant en appartement" : "Apartments", "Part des ménages résidant en maison unifamiliale" : "Houses", "Densité de bureaux" : "Offices", "Part de la population à proximité d'un espace vert accessible au public" : "GreenSpace", "Part de la population à proximité d'un arrêt de métro ou d'un tram Chrono" : "Metro" }

# rename and translate columns
monitoring.rename(columns=translate, inplace=True)

# clenase and delete column 'Offices'
del monitoring['Offices']

monitoring.head()

Unnamed: 0,Code,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
0,1,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,61,98
1,2,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,61,94
2,3,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,100,100
3,4,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,78,100
4,5,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,100,100


<b>Get the adresses and geospatial datas of the neighbourhoods</b>

In [185]:
# get the adresses of the neighbourhoods
body = client_de43d4a52d9e4fc7abc22e6384097331.get_object(Bucket='courseraapplieddatasciencecapston-donotdelete-pr-deazennyia8bho',Key='adressen.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

import io
adressquartiers = pd.read_excel(io.BytesIO(body.read()))
adressquartiers.head()


Unnamed: 0,Code,Adresse
0,1,"Grote Markt, 1000 Brussel"
1,2,"Rue du Vieux Marché aux Grains 5, 1000 Bruxelles"
2,3,"Quai au Foin 1000, 1000 Bruxelles"
3,4,"Place des Martyrs, 1000 Bruxelles"
4,5,"Place du Congrès, 1000 Bruxelles"


In [186]:
# get the geospatial data

CLIENT_ID = 'VXS3YZNIKYQOCNWWMD4GCYKDHZ3CDTEHKN0HC0CCRNQMOJFY' # your Foursquare ID
CLIENT_SECRET = 'NRRNHCVJBQ22I10UR3TOVSUEYBOOA2MSO4Y22WSGPKT0S4LH' # your Foursquare Secret
ACCESS_TOKEN = '0FKZR5DNIUZ4I1HKKWQ5ZNQRMEFJU0C4P5MK4RFKMZT4P1PD' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VXS3YZNIKYQOCNWWMD4GCYKDHZ3CDTEHKN0HC0CCRNQMOJFY
CLIENT_SECRET:NRRNHCVJBQ22I10UR3TOVSUEYBOOA2MSO4Y22WSGPKT0S4LH


In [187]:
# get the geospatial data

latitude = np.zeros(adressquartiers.shape[0])
longitude = np.zeros(adressquartiers.shape[0])

geolocator = Nominatim(user_agent="bruxelles_explorer")

for idx in range(adressquartiers.shape[0]):
    address = adressquartiers['Adresse'].loc[idx]
    
    location = geolocator.geocode(address)
    latitude[idx] = location.latitude
    longitude[idx] = location.longitude

In [188]:
district_coordinates = adressquartiers
district_coordinates['Latitude'] = latitude
district_coordinates['Longitude'] = longitude
district_coordinates

Unnamed: 0,Code,Adresse,Latitude,Longitude
0,1,"Grote Markt, 1000 Brussel",50.846714,4.352514
1,2,"Rue du Vieux Marché aux Grains 5, 1000 Bruxelles",50.850453,4.346755
2,3,"Quai au Foin 1000, 1000 Bruxelles",50.855666,4.350933
3,4,"Place des Martyrs, 1000 Bruxelles",50.851834,4.356594
4,5,"Place du Congrès, 1000 Bruxelles",50.850006,4.363218
...,...,...,...,...
113,114,"Dieweg, 1180 Uccle",50.796539,4.348159
114,115,"Rue du Bourdon, 1180 Uccle",50.787221,4.330628
115,116,"Avenue Brugmann 603, 1180 Uccle",50.800846,4.337438
116,117,"Rue Gatti de Gamond 95, 1180 Uccle",50.803488,4.326774


In [189]:
# merge data - geospatial data and data from "monitoringdesquartiers"
monitoring_data = pd.merge(district_coordinates, monitoring, on = 'Code', how = 'outer')
# delete column not required
del monitoring_data['Adresse']
# add community names
body = client_de43d4a52d9e4fc7abc22e6384097331.get_object(Bucket='courseraapplieddatasciencecapston-donotdelete-pr-deazennyia8bho',Key='commune.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

commune = pd.read_csv(body)

monitoring_data = pd.merge(commune, monitoring_data, on = 'Code', how = 'outer')

monitoring_data.head()

Unnamed: 0,Code,Commune,Latitude,Longitude,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
0,1,Bruxelles,50.846714,4.352514,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,61,98
1,2,Bruxelles,50.850453,4.346755,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,61,94
2,3,Bruxelles,50.855666,4.350933,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,100,100
3,4,Bruxelles,50.851834,4.356594,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,78,100
4,5,Bruxelles,50.850006,4.363218,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,100,100


Note on each variable: Population = Number of inhabitants per neighborhood, Rent = Average value of the rents in €, Mobility = Frequency of residential moves (abs.), Tax = average taxable salary in €, Singles = Households without children, Families = Households with children, European = Proportion of other European residents (except Belgium), Skyscrapers = Proportion of buildings with 5 or more floors, Apartments = Percentage of apartments (not houses), Houses = Percentage of houses (not apartments), GreenSpace = Percentage of the population near a publicly accessible green space, Metro = Percentage of the population near a metro or a Tram stop

In [190]:
# K Means Clustering
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [191]:
bruxelles_grouped_clustering = monitoring_data.drop('Code', 1)
bruxelles_grouped_clustering = bruxelles_grouped_clustering.drop('Commune', 1)
bruxelles_grouped_clustering = bruxelles_grouped_clustering.drop('Neighbourhood', 1)
bruxelles_grouped_clustering = bruxelles_grouped_clustering.drop('Postal Code', 1)
bruxelles_grouped_clustering = bruxelles_grouped_clustering.drop('Latitude', 1)
bruxelles_grouped_clustering = bruxelles_grouped_clustering.drop('Longitude', 1)
bruxelles_grouped_clustering.head()

Unnamed: 0,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
0,8887,947,119,18275,90,7,37,15,89,9,61,98
1,17105,711,101,18277,79,14,24,18,89,11,61,94
2,17973,826,103,18856,78,15,23,17,91,9,100,100
3,7075,663,109,19626,87,9,26,28,95,4,78,100
4,8855,663,123,20667,84,10,31,23,92,7,100,100


In [192]:
kclusters = 4

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bruxelles_grouped_clustering)

kmeans.labels_[0:10]

array([1, 0, 0, 1, 1, 3, 1, 0, 0, 2], dtype=int32)

In [193]:
# add clustering labels
monitoring_data.insert(0, 'Cluster Labels', kmeans.labels_)

monitoring_data.head()

Unnamed: 0,Cluster Labels,Code,Commune,Latitude,Longitude,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
0,1,1,Bruxelles,50.846714,4.352514,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,61,98
1,0,2,Bruxelles,50.850453,4.346755,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,61,94
2,0,3,Bruxelles,50.855666,4.350933,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,100,100
3,1,4,Bruxelles,50.851834,4.356594,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,78,100
4,1,5,Bruxelles,50.850006,4.363218,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,100,100


In [194]:
# data type of each column
monitoring_data['Cluster Labels'] = monitoring_data['Cluster Labels'].astype(int)
monitoring_data.dtypes

Cluster Labels      int64
Code                int64
Commune            object
Latitude          float64
Longitude         float64
Neighbourhood      object
Postal Code         int64
Population          int64
Rent                int64
Mobility            int64
Tax                 int64
Singles             int64
Families            int64
European            int64
Skyscrapers         int64
Apartments          int64
Houses              int64
GreenSpace          int64
Metro               int64
dtype: object

In [195]:
# create map
address = 'Grote Markt, 1000 Brussel'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(monitoring_data['Latitude'], monitoring_data['Longitude'], monitoring_data['Neighbourhood'], monitoring_data['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-2],
        fill=True,
        fill_color=rainbow[cluster-2],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

50.84671435 4.352514119250888


In [196]:
# Cluster 0
monitoring_data.loc[monitoring_data['Cluster Labels'] == 0, monitoring_data.columns[[1] + list(range(5, monitoring_data.shape[1]))]]

Unnamed: 0,Code,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
1,2,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,61,94
2,3,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,100,100
7,8,Marolles,1000,19221,658,81,15104,80,17,15,11,88,11,48,92
8,9,Stalingrad,1000,16493,647,107,16038,80,11,26,13,84,14,41,100
10,11,Cureghem Bara,1070,20679,645,89,15518,61,28,24,6,72,27,59,99
11,12,Cureghem Vétérinaire,1070,14427,622,78,15542,62,29,21,5,79,19,99,19
13,14,Duchesse,1080,14445,702,87,16952,51,36,14,2,66,32,40,55
14,15,Gare de l'ouest,1080,19467,679,78,15269,58,34,13,3,70,28,19,99
18,19,Vieux Laeken Ouest,1000,16781,751,86,17492,58,34,20,2,64,35,63,91
19,20,Vieux Laeken Est,1000,18273,723,76,16608,58,34,17,2,65,34,70,75


In [197]:
# Cluster 1
monitoring_data.loc[monitoring_data['Cluster Labels'] == 1, monitoring_data.columns[[1] + list(range(5, monitoring_data.shape[1]))]]

Unnamed: 0,Code,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
0,1,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,61,98
3,4,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,78,100
4,5,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,100,100
6,7,Sablon,1000,6004,808,108,18326,91,10,39,13,86,13,87,60
12,13,Cureghem Rosée,1070,8801,613,85,15279,58,30,15,3,72,27,27,48
17,18,Quartier Maritime,1080,10358,666,83,17004,55,33,17,6,75,24,41,95
20,21,Quartier Nord,1000,12079,668,76,16257,63,28,16,7,84,15,67,96
29,30,Porte Tervueren,1040,11631,941,96,22657,80,17,44,6,72,27,54,80
38,39,Etangs d'Ixelles,1050,10572,886,98,23451,87,14,42,19,87,12,96,51
42,43,Brugmann - Lepoutre,1050,12847,903,94,22471,81,17,44,9,78,22,87,48


In [198]:
# Cluster 2
monitoring_data.loc[monitoring_data['Cluster Labels'] == 2, monitoring_data.columns[[1] + list(range(5, monitoring_data.shape[1]))]]

Unnamed: 0,Code,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
9,10,Anneessens,1000,22717,632,86,16038,67,23,16,8,80,18,61,82
15,16,Molenbeek Historique,1080,24908,643,69,15345,55,34,11,5,70,28,44,93
16,17,Koekelberg,1081,24868,648,82,16909,54,35,15,3,76,23,99,100
21,22,Quartier Brabant,1030,24532,593,72,15480,58,30,22,3,62,36,59,86
22,23,Colignon,1030,23170,613,77,17761,55,34,16,3,69,30,87,56
23,24,Chaussée de Haecht,1030,25411,636,76,16314,55,32,23,3,61,37,86,20
24,25,Saint-Josse Centre,1210,31295,640,79,15927,67,24,25,5,72,26,82,52
47,48,Porte de Hal,1060,25596,690,79,16308,68,21,29,8,78,21,96,95
48,49,Bosnie,1060,37531,698,82,15462,72,20,26,15,84,15,75,43


In [199]:
# Cluster 3
monitoring_data.loc[monitoring_data['Cluster Labels'] == 3, monitoring_data.columns[[1] + list(range(5, monitoring_data.shape[1]))]]

Unnamed: 0,Code,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro
5,6,Quartier Royal,1000,458,663,109,28195,72,18,38,34,86,11,100,100
30,31,Saint-Michel,1040,9659,733,89,26454,78,21,43,9,72,28,69,96
34,35,Quartier Européen,1000,3181,985,124,23775,86,10,58,35,86,13,100,84
53,54,Vogelenzang - Erasme,1070,744,595,70,18365,93,18,13,3,63,36,73,95
54,55,Neerpede,1070,468,595,96,26140,76,26,14,0,39,60,47,2
65,66,Potaarde,1082,5339,832,51,25533,61,37,8,0,30,69,72,0
72,73,Heymbosch - AZ-Jette,1090,6966,749,79,24643,73,26,14,3,79,21,86,0
77,78,Haren,1000,3199,736,66,21833,47,45,9,0,19,80,92,0
90,91,Val d'Or,1200,6905,926,71,26675,77,22,29,5,80,20,86,16
92,93,Boulevard de la Woluwe,1200,6065,926,69,23264,72,25,23,1,47,52,100,6


Each cluster is given a name that refers to the main character of the environment: 
<ul>
    <li>Cluster 0 = "Below average price and high population area"</li>
    <li>Cluster 1 = "Higher-priced and well-situated area"</li>
    <li>Cluster 2 = "Affordable and very densely populated area"</li>
    <li>Cluster 3 = "High-priced and upscale area"</li>
</ul>

In [200]:
# Add column with monitoring level
monitoring_data.loc[monitoring_data['Cluster Labels'] == 0, 'monitoring level'] = '3 - Below average price and high population area' 
monitoring_data.loc[monitoring_data['Cluster Labels'] == 1, 'monitoring level'] = '2 - Higher-priced and well-situated area'
monitoring_data.loc[monitoring_data['Cluster Labels'] == 2, 'monitoring level'] = '4 - Affordable and very densely populated area'
monitoring_data.loc[monitoring_data['Cluster Labels'] == 3, 'monitoring level'] = '1 - High-priced and upscale area'

monitoring_data.head()

Unnamed: 0,Cluster Labels,Code,Commune,Latitude,Longitude,Neighbourhood,Postal Code,Population,Rent,Mobility,Tax,Singles,Families,European,Skyscrapers,Apartments,Houses,GreenSpace,Metro,monitoring level
0,1,1,Bruxelles,50.846714,4.352514,Grand Place,1000,8887,947,119,18275,90,7,37,15,89,9,61,98,2 - Higher-priced and well-situated area
1,0,2,Bruxelles,50.850453,4.346755,Dansaert,1000,17105,711,101,18277,79,14,24,18,89,11,61,94,3 - Below average price and high population area
2,0,3,Bruxelles,50.855666,4.350933,Béguinage - Dixmude,1000,17973,826,103,18856,78,15,23,17,91,9,100,100,3 - Below average price and high population area
3,1,4,Bruxelles,50.851834,4.356594,Martyrs,1000,7075,663,109,19626,87,9,26,28,95,4,78,100,2 - Higher-priced and well-situated area
4,1,5,Bruxelles,50.850006,4.363218,Notre-Dame-aux-Neiges,1000,8855,663,123,20667,84,10,31,23,92,7,100,100,2 - Higher-priced and well-situated area


<h3>2. Get Crime Statistsics data from "policedédérale.be" (brussels recorded crime)</h3>

The data shows Statistics of Total Number of Crimes by Year and Location.The data has been processed: for simplicity, the crimes of the last eight years (2012-2019) have been averaged and divided by the number of inhabitants of each municipality, giving a value that indicates the average crimes per 1000 per month.


In [201]:
# get crime statistics brussels
body = client_de43d4a52d9e4fc7abc22e6384097331.get_object(Bucket='courseraapplieddatasciencecapston-donotdelete-pr-deazennyia8bho',Key='policefédérale_crimestatistics_brussels.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

import io
crimedata = pd.read_excel(io.BytesIO(body.read()))
crimedata.head()


Unnamed: 0,Code,Quartier,Commune,Postal Code,Total population (Number of inhabitants),Crimes per year average,Crimes per month (average),Crimes per 1000 inhabitants per month on average,Crime level
0,1,Grand Place,Bruxelles,1000,185103,48456.375,4038.03125,21.81505,very high
1,2,Dansaert,Bruxelles,1000,185103,48456.375,4038.03125,21.81505,very high
2,3,Béguinage - Dixmude,Bruxelles,1000,185103,48456.375,4038.03125,21.81505,very high
3,4,Martyrs,Bruxelles,1000,185103,48456.375,4038.03125,21.81505,very high
4,5,Notre-Dame-aux-Neiges,Bruxelles,1000,185103,48456.375,4038.03125,21.81505,very high


In addition, the values of the different municipalities have been compared and evaluated. Crime level ranges from "best" to "very high".

In [202]:
# select required columns
crimedata_required  = crimedata[["Code", "Crime level"]]

monitoringdata_required = monitoring_data[["Code","Commune","Neighbourhood","Postal Code", "monitoring level"]]

# merge data - monitoring_data and crime_data
monitoring_crime = pd.merge(monitoringdata_required, crimedata_required, on = 'Code', how = 'outer')
monitoring_crime.head()

Unnamed: 0,Code,Commune,Neighbourhood,Postal Code,monitoring level,Crime level
0,1,Bruxelles,Grand Place,1000,2 - Higher-priced and well-situated area,very high
1,2,Bruxelles,Dansaert,1000,3 - Below average price and high population area,very high
2,3,Bruxelles,Béguinage - Dixmude,1000,3 - Below average price and high population area,very high
3,4,Bruxelles,Martyrs,1000,2 - Higher-priced and well-situated area,very high
4,5,Bruxelles,Notre-Dame-aux-Neiges,1000,2 - Higher-priced and well-situated area,very high


<h3>3. Get data from Foursquare API</h3>

In [203]:
# select required columns
foursquare_data = monitoring_data[["Code", "Commune", "Neighbourhood", "Latitude", "Longitude"]]
foursquare_data.head()

Unnamed: 0,Code,Commune,Neighbourhood,Latitude,Longitude
0,1,Bruxelles,Grand Place,50.846714,4.352514
1,2,Bruxelles,Dansaert,50.850453,4.346755
2,3,Bruxelles,Béguinage - Dixmude,50.855666,4.350933
3,4,Bruxelles,Martyrs,50.851834,4.356594
4,5,Bruxelles,Notre-Dame-aux-Neiges,50.850006,4.363218


In [204]:
# Picking a random neighborhood to explore
quartier_latitude = foursquare_data.loc[17, 'Latitude']
quartier_longitude = foursquare_data.loc[17, 'Longitude']
quartier_name = foursquare_data.loc[17, 'Neighbourhood']

print('Latitude and longitude values of {} are {}, {}.'.format(quartier_name, 
                                                               quartier_latitude, 
                                                               quartier_longitude))

Latitude and longitude values of Quartier Maritime are 50.8611687, 4.3386675.


In [205]:
# Top 10 venues inQuartier Maritime within 500 meters radius
LIMIT = 10
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    quartier_latitude, 
    quartier_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6093eafb2046ea161eab5f91'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Molenbeek-Saint-Jean',
  'headerFullLocation': 'Molenbeek-Saint-Jean, Brussels',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 15,
  'suggestedBounds': {'ne': {'lat': 50.8656687045, 'lng': 4.345783455291584},
   'sw': {'lat': 50.8566686955, 'lng': 4.331551544708415}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d9ce8efc540236a21bc834b',
       'name': 'La Maison Des Grillades',
       'location': {'address': 'Rue de Ribaucourtstraat 83',
        'lat': 50.85963672802107,
        'lng': 4.341399836867262,
        'labeledLatLngs': [{'label': 'displ

In [206]:
#Create a function

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean column names
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues



Unnamed: 0,name,categories,lat,lng
0,La Maison Des Grillades,Moroccan Restaurant,50.859637,4.3414
1,Snack-Friture La Duchesse,Friterie,50.860159,4.34156
2,La Luna,Italian Restaurant,50.863014,4.333676
3,VK Concerts,Music Venue,50.856671,4.338713
4,Olympic,Gym,50.863309,4.332846
5,Basic-Fit Brussels Simonis Blvd Leopold II,Gym,50.863,4.332293
6,Jubelfeest (MIVB | De Lijn),Tram Station,50.861383,4.338248
7,Shintori,Chinese Restaurant,50.863344,4.344833
8,Carrefour Express,Convenience Store,50.859353,4.33398
9,Sainctelette (MIVB | De Lijn),Tram Station,50.859173,4.3448


In [207]:
#Create a function to repeat the process to all neighborhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [208]:
bruxelles_venues = getNearbyVenues(names=foursquare_data['Neighbourhood'],
                                   latitudes=foursquare_data['Latitude'],
                                   longitudes=foursquare_data['Longitude']
                                  )

Grand Place
Dansaert
Béguinage - Dixmude
Martyrs
Notre-Dame-aux-Neiges
Quartier Royal
Sablon
Marolles
Stalingrad
Anneessens
Cureghem Bara
Cureghem Vétérinaire
Cureghem Rosée
Duchesse
Gare de l'ouest
Molenbeek Historique
Koekelberg
Quartier Maritime
Vieux Laeken Ouest
Vieux Laeken Est
Quartier Nord
Quartier Brabant
Colignon
Chaussée de Haecht
Saint-Josse Centre
Dailly
Josaphat
Plasky
Squares
Porte Tervueren
Saint-Michel
Saint-Pierre
Chasse
Jourdan
Quartier Européen
Matonge
Flagey - Malibran
Hôpital Etterbeek-Ixelles
Etangs d'Ixelles
Louise - Longue Haie
Berckmans - Hôtel des Monnaies
Châtelain
Brugmann - Lepoutre
Churchill
Molière - Longchamp
Altitude 100
Haut Saint-Gilles
Porte de Hal
Bosnie
Bas Forest
Van Volxem - Van Haelen
Veeweyde - Aurore
Bizet - Roue- Ceria
Vogelenzang - Erasme
Neerpede
Bon Air
Scherdemael
Anderlecht - Centre - Wayez
Scheut
Buffon
Moortebeek - Peterbos
Machtens
Karreveld
Hôpital Français
Korenbeek
Potaarde
Berchem Sainte-Agathe Centre
Villas de Ganshoren
Ganshore

In [209]:
print(bruxelles_venues.shape)
bruxelles_venues.head()

(1123, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Grand Place,50.846714,4.352514,Grand Place / Grote Markt (Grote Markt),50.846776,4.352481,Plaza
1,Grand Place,50.846714,4.352514,AB Ancienne Belgique,50.847143,4.349485,Concert Hall
2,Grand Place,50.846714,4.352514,Aux Merveilleux de Fred,50.848117,4.352141,Dessert Shop
3,Grand Place,50.846714,4.352514,The Grasshopper,50.847595,4.3527,Toy / Game Store
4,Grand Place,50.846714,4.352514,Pierre Marcolini,50.847353,4.354647,Chocolate Shop


In [210]:
count = bruxelles_venues.groupby('Neighborhood').count()
count = count[['Venue']]
count.rename = {'Total Venues' : 'Venue'}
count.head()

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Altitude 100,10
Anderlecht - Centre - Wayez,10
Anneessens,10
Auderghem centre,10
Avenue Léopold III,10


In [211]:
print('There are {} uniques categories.'.format(len(bruxelles_venues['Venue Category'].unique())))

There are 195 uniques categories.


In [212]:
# One hot encoding
bruxelles_onehot = pd.get_dummies(bruxelles_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe and bring to front
neighbor = bruxelles_venues['Neighborhood']
#bruxelles_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
bruxelles_onehot.insert(0, 'Neighborhood', neighbor)

bruxelles_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,...,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Grand Place,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Grand Place,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Grand Place,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Grand Place,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Grand Place,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [213]:
bruxelles_onehot.shape

(1123, 196)

In [214]:
# Group rows by neighborhoodand take the mean of the frequency of each category
bruxelles_grouped = bruxelles_onehot.groupby('Neighborhood').mean().reset_index()
bruxelles_grouped

Unnamed: 0,Neighborhood,African Restaurant,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,...,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Altitude 100,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anderlecht - Centre - Wayez,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Anneessens,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Auderghem centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Avenue Léopold III,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113,Vivier d'Oie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
114,Vogelenzang - Erasme,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
115,Vossegat - Roosendaal,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
116,Watermael Centre,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [215]:
bruxelles_grouped.shape

(118, 196)

In [216]:
# top 5 venues for each neighborhood
num_top_venues = 5

for hood in bruxelles_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = bruxelles_grouped[bruxelles_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Altitude 100----
                venue  freq
0               Plaza   0.1
1  Italian Restaurant   0.1
2         Salad Place   0.1
3         Snack Place   0.1
4      Sandwich Place   0.1


----Anderlecht - Centre - Wayez----
                  venue  freq
0  Gym / Fitness Center   0.2
1    Chinese Restaurant   0.2
2           Supermarket   0.2
3        Sandwich Place   0.1
4           Pizza Place   0.1


----Anneessens----
                venue  freq
0          Restaurant   0.1
1               Plaza   0.1
2  Belgian Restaurant   0.1
3  Falafel Restaurant   0.1
4            Beer Bar   0.1


----Auderghem centre----
                       venue  freq
0       Fast Food Restaurant   0.3
1           Sushi Restaurant   0.2
2  Middle Eastern Restaurant   0.2
3                  Brasserie   0.1
4               Cocktail Bar   0.1


----Avenue Léopold III----
            venue  freq
0      Restaurant   0.1
1           Hotel   0.1
2            Pool   0.1
3    Hockey Field   0.1
4  Sandwich Place 

               venue  freq
0     Sandwich Place   0.2
1          Gastropub   0.2
2                Bar   0.2
3  French Restaurant   0.1
4     Soccer Stadium   0.1


----Karreveld----
                 venue  freq
0  Fried Chicken Joint   0.1
1             Gym Pool   0.1
2            Brasserie   0.1
3        Tanning Salon   0.1
4                  Spa   0.1


----Koekelberg----
                venue  freq
0         Supermarket   0.1
1  Turkish Restaurant   0.1
2         Snack Place   0.1
3         Pastry Shop   0.1
4   French Restaurant   0.1


----Korenbeek----
                    venue  freq
0             Supermarket   0.2
1      Chinese Restaurant   0.1
2                   Plaza   0.1
3  Thrift / Vintage Store   0.1
4              Food Truck   0.1


----Kriekenput - Homborch - Verrewinkel----
                venue  freq
0        Tram Station   0.3
1  African Restaurant   0.1
2                Park   0.1
3          Restaurant   0.1
4   Convenience Store   0.1


----Louise - Longue Haie---

           venue  freq
0    Supermarket   0.2
1     Hookah Bar   0.1
2  Grocery Store   0.1
3    Pizza Place   0.1
4    Sports Club   0.1


----Villas de Ganshoren----
               venue  freq
0                Bar   0.2
1        Pizza Place   0.2
2      Deli / Bodega   0.1
3         Restaurant   0.1
4  Convenience Store   0.1


----Vivier d'Oie----
                venue  freq
0          Restaurant   0.2
1              Bakery   0.2
2  Belgian Restaurant   0.2
3  Italian Restaurant   0.1
4               Plaza   0.1


----Vogelenzang - Erasme----
           venue  freq
0          Plaza   0.2
1    Pizza Place   0.2
2     Restaurant   0.1
3  Metro Station   0.1
4    Snack Place   0.1


----Vossegat - Roosendaal----
                venue  freq
0  Athletics & Sports   0.2
1               Plaza   0.2
2              Bakery   0.2
3                Park   0.2
4       Train Station   0.2


----Watermael Centre----
            venue  freq
0     Supermarket   0.2
1  Ice Cream Shop   0.1
2          

In [217]:
# Put into pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bruxelles_grouped['Neighborhood']

for ind in np.arange(bruxelles_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bruxelles_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altitude 100,Park,Salad Place,Belgian Restaurant,Snack Place,Sandwich Place,Bakery,Italian Restaurant,Supermarket,Plaza,Convenience Store
1,Anderlecht - Centre - Wayez,Supermarket,Gym / Fitness Center,Chinese Restaurant,Tram Station,Trail,Pizza Place,Sandwich Place,Exhibit,Fair,Falafel Restaurant
2,Anneessens,Plaza,Piercing Parlor,Lebanese Restaurant,Belgian Restaurant,Falafel Restaurant,Mediterranean Restaurant,Coffee Shop,Restaurant,Art Gallery,Beer Bar
3,Auderghem centre,Fast Food Restaurant,Sushi Restaurant,Middle Eastern Restaurant,Brasserie,Snack Place,Cocktail Bar,Food & Drink Shop,Food,Flower Shop,Fish Market
4,Avenue Léopold III,Pool,Restaurant,Brasserie,Tennis Court,Bar,Sandwich Place,Park,Hotel,Hockey Field,Sports Bar


In [218]:
# K Means Clustering
kclusters = 4
bruxelles_grouped_clustering = bruxelles_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bruxelles_grouped_clustering)

kmeans.labels_[0:10]

array([0, 2, 3, 3, 0, 0, 2, 3, 3, 2], dtype=int32)

In [219]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto coords to add latitude/longitude for each neighborhood
bruxelles_merged = foursquare_data
bruxelles_merged = bruxelles_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

bruxelles_merged.head()

Unnamed: 0,Code,Commune,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Bruxelles,Grand Place,50.846714,4.352514,3,Chocolate Shop,Concert Hall,Toy / Game Store,Dessert Shop,Plaza,Cheese Shop,Shopping Mall,Hotel,Auto Dealership,Forest
1,2,Bruxelles,Dansaert,50.850453,4.346755,3,Plaza,Bakery,French Restaurant,Sushi Restaurant,Fish & Chips Shop,Bookstore,Bar,Seafood Restaurant,Moroccan Restaurant,Asian Restaurant
2,3,Bruxelles,Béguinage - Dixmude,50.855666,4.350933,3,Yoga Studio,Restaurant,Bookstore,Butcher,Cultural Center,Ethiopian Restaurant,Grocery Store,Organic Grocery,Plaza,Ice Cream Shop
3,4,Bruxelles,Martyrs,50.851834,4.356594,3,Department Store,Clothing Store,Bookstore,Cosmetics Shop,Coffee Shop,Sporting Goods Shop,Seafood Restaurant,Event Service,Fish Market,Fish & Chips Shop
4,5,Bruxelles,Notre-Dame-aux-Neiges,50.850006,4.363218,3,Ice Cream Shop,Bar,Gastropub,Deli / Bodega,Concert Hall,Coffee Shop,Sandwich Place,Smoke Shop,Thai Restaurant,Hotel


In [220]:
bruxelles_merged['Cluster Labels'] = bruxelles_merged['Cluster Labels'].astype(int)
bruxelles_merged.shape

(118, 16)

In [221]:
# create map
map_clusters = folium.Map(location=[quartier_latitude, quartier_longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bruxelles_merged['Latitude'], bruxelles_merged['Longitude'], bruxelles_merged['Neighbourhood'], bruxelles_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-2],
        fill=True,
        fill_color=rainbow[cluster-2],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [222]:
# Cluster 0
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 0, bruxelles_merged.columns[[1]+[2] + list(range(5, bruxelles_merged.shape[1]))]]

Unnamed: 0,Commune,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Bruxelles,Quartier Royal,0,Plaza,Salad Place,Belgian Restaurant,Men's Store,Boutique,Coffee Shop,Gym / Fitness Center,Italian Restaurant,Art Museum,Palace
6,Bruxelles,Sablon,0,Art Museum,Plaza,Cocktail Bar,Shoe Store,Tourist Information Center,Scenic Lookout,Dessert Shop,Coffee Shop,Park,Ethiopian Restaurant
11,Anderlecht,Cureghem Vétérinaire,0,Sandwich Place,French Restaurant,Italian Restaurant,Furniture / Home Store,Greek Restaurant,Yoga Studio,Event Service,Fish Market,Fish & Chips Shop,Fast Food Restaurant
19,Bruxelles,Vieux Laeken Est,0,Italian Restaurant,Cosmetics Shop,Supermarket,Cemetery,Sandwich Place,Toy / Game Store,Tram Station,Brasserie,Convenience Store,Dessert Shop
23,Schaerbeek,Chaussée de Haecht,0,Sandwich Place,Kebab Restaurant,Hookah Bar,Restaurant,Friterie,Italian Restaurant,Brasserie,Gym / Fitness Center,Bakery,Fair
27,Schaerbeek,Plasky,0,Italian Restaurant,Convenience Store,Pizza Place,French Restaurant,Friterie,Sandwich Place,Performing Arts Venue,Vietnamese Restaurant,Restaurant,Deli / Bodega
28,Bruxelles,Squares,0,Park,Steakhouse,Trattoria/Osteria,Sandwich Place,Bookstore,Greek Restaurant,Italian Restaurant,Plaza,Indian Restaurant,Gym / Fitness Center
29,Etterbeek,Porte Tervueren,0,Museum,Burger Joint,History Museum,Fountain,Italian Restaurant,Organic Grocery,Toy / Game Store,Fair,Cheese Shop,Park
31,Etterbeek,Saint-Pierre,0,Museum,Gym / Fitness Center,Event Service,Bakery,Diner,Park,Pizza Place,Health Food Store,History Museum,Thai Restaurant
34,Bruxelles,Quartier Européen,0,Hotel,Beer Garden,Salad Place,Coffee Shop,Chocolate Shop,Park,Palace,Bookstore,Food Truck,Gym / Fitness Center


In [223]:
# Cluster 1
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 1, bruxelles_merged.columns[[1]+[2] + list(range(5, bruxelles_merged.shape[1]))]]

Unnamed: 0,Commune,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Anderlecht,Cureghem Bara,1,Sandwich Place,Tram Station,Gourmet Shop,Deli / Bodega,Italian Restaurant,Greek Restaurant,Performing Arts Venue,Ethiopian Restaurant,Fish Market,Fish & Chips Shop
12,Anderlecht,Cureghem Rosée,1,Tram Station,Comedy Club,Snack Place,French Restaurant,Restaurant,Museum,Art Gallery,Hotel,Lebanese Restaurant,Fish Market
17,Molenbeek-Saint-Jean,Quartier Maritime,1,Tram Station,Gym,Convenience Store,Music Venue,Friterie,Chinese Restaurant,Moroccan Restaurant,Italian Restaurant,Discount Store,Deli / Bodega
96,Woluwe-Saint-Pierre,Putdael,1,Tram Station,Park,Sake Bar,French Restaurant,Yoga Studio,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
105,Ixelles,Boondael,1,Tram Station,Bakery,Plaza,Moving Target,Massage Studio,Train Station,Event Service,Flower Shop,Fish Market,Fish & Chips Shop
108,Uccle,Observatoire,1,Hockey Field,Tram Station,Dive Spot,Caribbean Restaurant,Sporting Goods Shop,Furniture / Home Store,Supermarket,Italian Restaurant,Gym Pool,Himalayan Restaurant
111,Uccle,Kriekenput - Homborch - Verrewinkel,1,Tram Station,African Restaurant,Restaurant,Park,Bus Stop,Train Station,Convenience Store,Indian Restaurant,Gym Pool,Fish Market
113,Uccle,Dieweg,1,Tram Station,Playground,French Restaurant,Convenience Store,Tennis Stadium,Park,Ethiopian Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant
114,Uccle,Kalevoet - Moensberg,1,Tram Station,Supermarket,Café,Friterie,Gym,Gym / Fitness Center,Convenience Store,Gym Pool,Health Food Store,Fish & Chips Shop
117,Forest,Saint-Denis - Neerstalle,1,Tram Station,Italian Restaurant,Theme Park,Park,Soccer Field,Thrift / Vintage Store,Golf Course,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [224]:
# Cluster 2
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 2, bruxelles_merged.columns[[1]+[2] + list(range(5, bruxelles_merged.shape[1]))]]

Unnamed: 0,Commune,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Molenbeek-Saint-Jean,Duchesse,2,Burger Joint,Tram Station,Nightclub,Moving Target,Supermarket,Coffee Shop,Athletics & Sports,Exhibit,Flower Shop,Fish Market
14,Molenbeek-Saint-Jean,Gare de l'ouest,2,Nightclub,Snack Place,Moving Target,Supermarket,Burger Joint,Tram Station,Sandwich Place,Bar,Yoga Studio,Exhibit
16,Koekelberg,Koekelberg,2,Steakhouse,Church,Snack Place,French Restaurant,Pastry Shop,Italian Restaurant,Tram Station,Turkish Restaurant,Supermarket,Plaza
18,Bruxelles,Vieux Laeken Ouest,2,Supermarket,Hookah Bar,Italian Restaurant,Pizza Place,Sandwich Place,Sports Club,Grocery Store,Plaza,Art Museum,Fish Market
22,Schaerbeek,Colignon,2,Supermarket,Gastropub,Spanish Restaurant,Turkish Restaurant,Music Venue,Farmers Market,Beer Store,Electronics Store,Snack Place,Yoga Studio
30,Etterbeek,Saint-Michel,2,Supermarket,Comedy Club,Grocery Store,Sandwich Place,Chinese Restaurant,Thrift / Vintage Store,Laser Tag,Steakhouse,Plaza,Greek Restaurant
51,Anderlecht,Veeweyde - Aurore,2,Supermarket,Gym / Fitness Center,Chinese Restaurant,Pizza Place,Bus Stop,Sandwich Place,Trail,Yoga Studio,Ethiopian Restaurant,Fish & Chips Shop
52,Anderlecht,Bizet - Roue- Ceria,2,Supermarket,Gym / Fitness Center,Chinese Restaurant,Tram Station,Trail,Pizza Place,Sandwich Place,Exhibit,Fair,Falafel Restaurant
57,Anderlecht,Anderlecht - Centre - Wayez,2,Supermarket,Gym / Fitness Center,Chinese Restaurant,Tram Station,Trail,Pizza Place,Sandwich Place,Exhibit,Fair,Falafel Restaurant
61,Molenbeek-Saint-Jean,Machtens,2,Supermarket,Moving Target,Metro Station,Discount Store,Tram Station,Fried Chicken Joint,Spa,Fair,Flower Shop,Fish Market


In [225]:
# Cluster 3
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 3, bruxelles_merged.columns[[1]+[2] + list(range(5, bruxelles_merged.shape[1]))]]

Unnamed: 0,Commune,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bruxelles,Grand Place,3,Chocolate Shop,Concert Hall,Toy / Game Store,Dessert Shop,Plaza,Cheese Shop,Shopping Mall,Hotel,Auto Dealership,Forest
1,Bruxelles,Dansaert,3,Plaza,Bakery,French Restaurant,Sushi Restaurant,Fish & Chips Shop,Bookstore,Bar,Seafood Restaurant,Moroccan Restaurant,Asian Restaurant
2,Bruxelles,Béguinage - Dixmude,3,Yoga Studio,Restaurant,Bookstore,Butcher,Cultural Center,Ethiopian Restaurant,Grocery Store,Organic Grocery,Plaza,Ice Cream Shop
3,Bruxelles,Martyrs,3,Department Store,Clothing Store,Bookstore,Cosmetics Shop,Coffee Shop,Sporting Goods Shop,Seafood Restaurant,Event Service,Fish Market,Fish & Chips Shop
4,Bruxelles,Notre-Dame-aux-Neiges,3,Ice Cream Shop,Bar,Gastropub,Deli / Bodega,Concert Hall,Coffee Shop,Sandwich Place,Smoke Shop,Thai Restaurant,Hotel
7,Bruxelles,Marolles,3,Tapas Restaurant,Wine Bar,Plaza,Record Shop,Restaurant,Farmers Market,Nightclub,Sandwich Place,Belgian Restaurant,Yoga Studio
8,Bruxelles,Stalingrad,3,Bar,French Restaurant,Coffee Shop,Falafel Restaurant,Beer Bar,Brazilian Restaurant,Wine Bar,Art Museum,Diner,Food Truck
9,Bruxelles,Anneessens,3,Plaza,Piercing Parlor,Lebanese Restaurant,Belgian Restaurant,Falafel Restaurant,Mediterranean Restaurant,Coffee Shop,Restaurant,Art Gallery,Beer Bar
15,Molenbeek-Saint-Jean,Molenbeek Historique,3,Bar,Brewery,Cocktail Bar,Bed & Breakfast,Gastropub,Furniture / Home Store,Coffee Shop,Arts & Crafts Store,Music Venue,Fish Market
20,Bruxelles,Quartier Nord,3,Sandwich Place,Thai Restaurant,Indian Restaurant,Spa,Music Venue,Belgian Restaurant,Bar,Bakery,Fair,Falafel Restaurant


cluster 0 = busy and lively
cluster 1 = focus green
cluster 2 = restaurant, food
cluster 3 = supermarket

Each cluster is given a name according to the characteristics that can be derived from the popular venues: 
<ul>
    <li>Cluster 0 = <b>"Busy and touristic"</b> - everything is nearby(Restaurants, Pubs, Supermarkets, Sports, Culture - these are the most hectic areas</li>
    <li>Cluster 1 = <b>"Green and Sports"</b> - the main focus is on a green environment as well as the possibility to do sports</li>
    <li>Cluster 2 = <b>"Foods and Quiet" </b>- Restaurants, Take aways, Convenient Shops, but not too busy</li>
    <li>Cluster 3 = <b>"Lively"</b> - everything is nearby, but it is not too busy and less touristic</li>
</ul>

In [226]:
# Add column with foursquare level
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 0, 'foursquare level'] = 'Lively' 
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 1, 'foursquare level'] = 'Green and Sports'
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 2, 'foursquare level'] = 'Foods and Quiet'
bruxelles_merged.loc[bruxelles_merged['Cluster Labels'] == 3, 'foursquare level'] = 'Busy and touristic'

bruxelles_merged.head()

Unnamed: 0,Code,Commune,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,foursquare level
0,1,Bruxelles,Grand Place,50.846714,4.352514,3,Chocolate Shop,Concert Hall,Toy / Game Store,Dessert Shop,Plaza,Cheese Shop,Shopping Mall,Hotel,Auto Dealership,Forest,Busy and touristic
1,2,Bruxelles,Dansaert,50.850453,4.346755,3,Plaza,Bakery,French Restaurant,Sushi Restaurant,Fish & Chips Shop,Bookstore,Bar,Seafood Restaurant,Moroccan Restaurant,Asian Restaurant,Busy and touristic
2,3,Bruxelles,Béguinage - Dixmude,50.855666,4.350933,3,Yoga Studio,Restaurant,Bookstore,Butcher,Cultural Center,Ethiopian Restaurant,Grocery Store,Organic Grocery,Plaza,Ice Cream Shop,Busy and touristic
3,4,Bruxelles,Martyrs,50.851834,4.356594,3,Department Store,Clothing Store,Bookstore,Cosmetics Shop,Coffee Shop,Sporting Goods Shop,Seafood Restaurant,Event Service,Fish Market,Fish & Chips Shop,Busy and touristic
4,5,Bruxelles,Notre-Dame-aux-Neiges,50.850006,4.363218,3,Ice Cream Shop,Bar,Gastropub,Deli / Bodega,Concert Hall,Coffee Shop,Sandwich Place,Smoke Shop,Thai Restaurant,Hotel,Busy and touristic


In [227]:
# select required columns
foursquare_required  = bruxelles_merged[["Code", "foursquare level"]]

# merge data - monitoring_crime_data and foursquare_data and geospatial_data
monitoring_crime_foursquare = pd.merge(monitoring_crime, foursquare_required, on = 'Code', how = 'outer')
monitroing_crime_foursquare = pd.merge(district_coordinates, monitoring_crime_foursquare, on = 'Code', how = 'outer')
del monitroing_crime_foursquare['Adresse']

monitroing_crime_foursquare.head()

Unnamed: 0,Code,Latitude,Longitude,Commune,Neighbourhood,Postal Code,monitoring level,Crime level,foursquare level
0,1,50.846714,4.352514,Bruxelles,Grand Place,1000,2 - Higher-priced and well-situated area,very high,Busy and touristic
1,2,50.850453,4.346755,Bruxelles,Dansaert,1000,3 - Below average price and high population area,very high,Busy and touristic
2,3,50.855666,4.350933,Bruxelles,Béguinage - Dixmude,1000,3 - Below average price and high population area,very high,Busy and touristic
3,4,50.851834,4.356594,Bruxelles,Martyrs,1000,2 - Higher-priced and well-situated area,very high,Busy and touristic
4,5,50.850006,4.363218,Bruxelles,Notre-Dame-aux-Neiges,1000,2 - Higher-priced and well-situated area,very high,Busy and touristic


The table shown above includes all the indicators for evaluating the analysis - <b>monitoring level, crime level and foursuqare level</b>. Based on these variables, a selection can be made and a suitable neighborhood selected.

<h3>Results and Recommendation</h3>

The following selection of suitable neighborhoods is simplified and shaped according to subjective preferences.

<b>a) Monitoring Level</b>
The monitoring level describes neighborhoods based on facts such as population density, rent levels, housing characteristics, metro and green space access. 
<ul>    
    <li>1 - High-priced and upscale area</li>
    <li>2 - Higher-priced and well-situated area</li>
    <li>3 - Below average price and high population area</li>
    <li>4 - Affordable and very densely populated area</li>
</ul> 
For further analysis, only <b>Levels 1 and 2</b> are relevant, as these areas have higher rents but lower population density, are family-oriented, well connected, and offer access to green space.

<b>b) Crime Level</b> The Crime Level describes the crimes of the last eight years (2012-2019), averaged and divided by the number of inhabitants of each municipality, giving a value that indicates <b>the average crimes per 1000 habitants per month</b>. The crime level ranges from "Very high" to "best", where "best" represent the least crimes:
<ul>
    <li>very high </li>
    <li>high </li>
    <li>relatively high  </li>
    <li>ok </li>
    <li>good </li>
    <li>best </li>
</ul>
For further analysis, only the <b>crime levels ok, good and best</b> are relevant. These are areas where less than 10 crimes per 1000 habitants per month occures.

<b>c) Foursquare Level</b> The Foursquare Level based on the top 10 venues nearby every neighbourhood. Every cluster represents another main focus and characteristic of the neighbourhood:
<ul>
    <li><b>"Busy and touristic"</b> - everything is nearby(Restaurants, Pubs, Supermarkets, Sports, Culture - these are the most hectic areas</li>
    <li><b>"Green and Sports"</b> - the main focus is on a green environment as well as the possibility to do sports</li>
    <li><b>"Foods and Quiet" </b>- Restaurants, Take aways, Convenient Shops, but not too busy</li>
    <li><b>"Lively"</b> - everything is nearby, but it is not too busy and less touristic</li>
</ul>
For further analysis, only three levels are relevant - <b>"Green and Sports", "Foods and Quiet" </b>and<b> "Lively"</b>. The "Busy and touristic" area will be excluded for subjective reason.

Based on this selection a new data frame is created.

In [228]:
# filter monitoring_level rows
results = monitroing_crime_foursquare[(monitroing_crime_foursquare["monitoring level"] == "1 - High-priced and upscale area") | (monitroing_crime_foursquare["monitoring level"] == "2 - Higher-priced and well-situated area")]

In [229]:
# filter crime_level rows
results = results[(results["Crime level"] == "best") | (results["Crime level"] == "good") | (results["Crime level"] == "ok")]

In [230]:
# filter foursquare_level rows
results = results[(results["foursquare level"] == "Green and Sports") | (results["foursquare level"] == "Foods and Quiet") | (results["foursquare level"] == "Lively")]

results

Unnamed: 0,Code,Latitude,Longitude,Commune,Neighbourhood,Postal Code,monitoring level,Crime level,foursquare level
17,18,50.861169,4.338667,Molenbeek-Saint-Jean,Quartier Maritime,1080,2 - Higher-priced and well-situated area,ok,Green and Sports
29,30,50.842452,4.39751,Etterbeek,Porte Tervueren,1040,2 - Higher-priced and well-situated area,ok,Lively
30,31,50.83189,4.404529,Etterbeek,Saint-Michel,1040,1 - High-priced and upscale area,ok,Foods and Quiet
64,65,50.829735,4.398131,Molenbeek-Saint-Jean,Korenbeek,1080,2 - Higher-priced and well-situated area,ok,Foods and Quiet
65,66,50.86108,4.28576,Berchem-Sainte-Agathe,Potaarde,1082,1 - High-priced and upscale area,good,Foods and Quiet
72,73,50.888936,4.325685,Jette,Heymbosch - AZ-Jette,1090,1 - High-priced and upscale area,good,Lively
78,79,50.87691,4.401511,Evere,Paix,1140,2 - Higher-priced and well-situated area,good,Foods and Quiet
83,84,50.870982,4.407182,Evere,Avenue Léopold III,1140,2 - Higher-priced and well-situated area,good,Lively
84,85,50.860716,4.393047,Schaerbeek,Gare Josaphat,1030,2 - Higher-priced and well-situated area,ok,Foods and Quiet
85,86,50.858486,4.412081,Evere,Paduwa,1140,2 - Higher-priced and well-situated area,good,Foods and Quiet


In [231]:
results.shape

(27, 9)

A total of 27 suitable neighborhoods have been identified.

In [232]:
# select required columns
results_map = results[["Code","Latitude","Longitude","Commune","Neighbourhood","Postal Code","monitoring level","foursquare level"]]

# add cluster labels
results_map['Cluster Label'] = [5, 2, 6, 3, 2, 6, 5, 6, 4, 5, 6, 2, 4, 2, 2, 3, 4, 3, 1, 2, 2, 2, 2, 2, 4, 5, 5]

# clustering for visualization
kclusters = 6

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, monito, foursq in zip(results_map['Latitude'], results_map['Longitude'], results_map['Neighbourhood'], results_map['Cluster Label'], results_map['monitoring level'], results_map['foursquare level']):
    label = folium.Popup(str(poi) + ' - ' + str(monito) + ' - ' + str(foursq), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-2],
        fill=True,
        fill_color=rainbow[int(cluster)-2],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

There are six different clusters regarding a suitable neighbourhood in Brussels:
<ul>
    <li> Red: High-priced and upscale area - Foods and Quiet</li>
    <li> Violet: High-priced and upscale area - Green and Sports</li>
    <li> Blue:  High-priced and upscale area - Lively</li>
    <li> Green: Higher-priced and well-situated area - Foods and Quiet</li>
    <li> Light Green: Higher-priced and well-situated area - Green and Sports</li>
    <li> Orange: Higher-priced and well-situated area - Lively</li>
</ul>   

A total of 27 suitable neighbourhoods have been identified.