# Capstone Project : The Battle of Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction : Business Problem<a name="introduction"></a>

Brussels is the capital and the economic center of Belgium (https://fr.wikipedia.org/wiki/Bruxelles). 

- In 2020, Brussels has more than 185,000 inhabitants with arout 5 510,21 habitants/km2. Being the the seat of the French Community and the Flemish Community, as well as the seat of several European Union institutions, Brussels attracts thousands workers and tourists from all over the world every week. 

Being a place to be for investers, an friterie chain would like to open a restaurant in an economically attractive area of the city.

In this capstone project I will thus try to find an optimal location to open a **Friterie** in **Brussels**, Belgium.

To this end, I will use the information about restaurants (and friteries) in Brussels from Foursquares to find the best location for a successfully restaurant.

## Data<a name="data"></a>

To find the best area in Brussels, I will use data from wikipedia to determine Brussels boroughs and **neighborhoods**: 

https://fr.wikipedia.org/wiki/Mod%C3%A8le:Palette_Quartiers_de_la_ville_de_Bruxelles

The number of fritures and restaurants and their respective location in the neighborhoods will be obtained using **Foursquare API**

Using these information, I will determine the best neighborhood to open a friterie based on **3 factors** :
* *The number of existing friteries in each neighborhood.*
* *The number of existing restaurants in each neighborhood.*
* *Distance of neighborhood from Brussels center.*


Together, these analyses will provide me a clear indication of zones that involve **less than three restaurants** an **no fritures** in radius of **500m**.


#### Library

In [2]:
import requests 
import pandas as pd 
import numpy as np 
import random 

from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize

!pip install geopy
from geopy.geocoders import Nominatim 

! pip install folium==0.5.0
import folium 

print('Libraries are all imported.')

Libraries are all imported.


#### Creating a relevant DataFrame including Brussels' boroughs and neighborhood

In [3]:
brussels= pd.DataFrame()
path= "C:/Users/Oli/Documents/Formation Data science IBM/Cours 9 - Applied Data Science Capstone/III. The Project/Project BXL/BXL.xlsx"
brussels = pd.read_excel(path)

brussels

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,1000,Center,Centre,50.846718,4.353221
1,1000,Center,Marais-Jacqmain,50.853034,4.357105
2,1000,Center,des Libertés,50.849882,4.366689
3,1000,Center,Royal,50.844352,4.36305
4,1000,Center,Sablon,50.839958,4.356142
5,1000,Center,Marolles,50.83674,4.346251
6,1000,Center,Midi-Lemonnier,50.843217,4.344633
7,1000,Center,Quartier de la Senne,50.849088,4.341938
8,1000,Center,des Quais,50.85421,4.347946
9,1020,Nord,Laeken,50.883392,4.348713


In [4]:
print('The dataframe has {} Borough and {} neighborhoods.'.format(len(brussels['Borough'].unique()), len(brussels['Neighborhood'])))

The dataframe has 4 Borough and 28 neighborhoods.


Now, I define Brussels center coordinates

In [5]:
address = 'Brussels'

geolocator = Nominatim(user_agent="Brussels_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brussels are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Brussels are 50.8465573, 4.351697.


In [6]:
brussels_center = [latitude, longitude]
brussels_center

[50.8465573, 4.351697]

Let's visualize **Brussels center** (in black) and each **neighborhood** (in blue).

In [7]:
map_brussels = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(brussels['Latitude'], brussels['Longitude'], brussels['Borough'], brussels['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brussels) 
    
    # add a marker for the center
name = 'Brussels center'
label = folium.Popup(name, parse_html=True)
folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=name,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brussels)

    
map_brussels

#### Distance between neighborhoods and Brussels center

In [8]:
!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)





In [55]:
print('Brussels center longitude={}, latitude={}'.format(brussels_center[1], brussels_center[0]))
x, y = lonlat_to_xy(brussels_center[1], brussels_center[0])
print('Brussels center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)

Brussels center longitude=4.351697, latitude=50.8465573
Brussels center UTM X=-248755.06731465342, Y=5686997.581266119


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)


In [10]:
distances_center = []
xs = brussels['Longitude']
ys = brussels['Latitude']

for i in range(0,len(xs)):
    x1 = xs[i]
    y1 = ys[i]
    x_x, y_y = lonlat_to_xy(x1, y1)
    distance_from_center = calc_xy_distance(x, y, x_x, y_y)
    distances_center.append(distance_from_center)

  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lo

Then, I update the dataframe with distances information.

In [11]:
brussels['Distances to center'] = distances_center
brussels

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances to center
0,1000,Center,Centre,50.846718,4.353221,109.525738
1,1000,Center,Marais-Jacqmain,50.853034,4.357105,820.291618
2,1000,Center,des Libertés,50.849882,4.366689,1125.97861
3,1000,Center,Royal,50.844352,4.36305,841.821817
4,1000,Center,Sablon,50.839958,4.356142,803.324866
5,1000,Center,Marolles,50.83674,4.346251,1165.039304
6,1000,Center,Midi-Lemonnier,50.843217,4.344633,625.009706
7,1000,Center,Quartier de la Senne,50.849088,4.341938,747.571882
8,1000,Center,des Quais,50.85421,4.347946,897.202954
9,1020,Nord,Laeken,50.883392,4.348713,4129.701476


## Methodology<a name="methodology"></a>

In first step, I will established the concentration of fritures and restaurants in the different Brussels'neighborhoods.

Second, analyses will focus on neighborhoods that meet stakeholders requirements : neighborhoods that involve less than three restaurants an no fritures in radius of 500m.

Then, the resulting candidate neighborhoods will be cluster to create **areas of interest** to open a friterie. To this end, I will display a folium map of all  clusters of interested (defined using **k-means clustering**) and determine which areas could be the best candidates for an optimal friterie location based on the distance between those areas and Brussels center.

## Analysis <a name="analysis"></a>

### Using Foursquare Data to determine the number of friteries and restaurants in each neighborhood.

I have now information about neighborhoods (i.e. locations and distances from the center of Brussels).

Lets use Foursquare API to get information on **friteries** and **restaurants** in each neighborhood.

In [1]:
# API Connection information

#### Exploring friteries in each neighborhood

In [13]:
search_query = 'Friterie'
radius = 500
print(' Ready to search for Friteries ')

 Ready to search for Friteries 


In [14]:
def NearbyFriterie(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
            
        results = requests.get(url).json()['response']['venues']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


def NearbyRestaurants(names, latitudes, longitudes, radius=500):
    search_query = 'Restaurant'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
            
        results = requests.get(url).json()['response']['venues']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude']
    
    return(nearby_venues)


In [15]:
brussels_friteries = NearbyFriterie(names=brussels['Neighborhood'], latitudes=brussels['Latitude'],longitudes=brussels['Longitude'])

Centre
Marais-Jacqmain
des Libertés
Royal
Sablon
Marolles
Midi-Lemonnier
Quartier de la Senne
des Quais
Laeken 
Heysel
Mutsaard
Tour et Taxis
Port
Nord
Hoogleest
Neerleest
Haren
Buda
Da Vinci
Neder-Over-Heembeek
Quartier Louise
La Cambre
Roosevelt
Européen
Léopold
des Squares
Cinquantenaire


In [16]:
print(brussels_friteries.shape)
brussels_friteries

(25, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centre,50.846718,4.353221,Friterie De Corte Frituur,50.849551,4.353236,Friterie
1,Centre,50.846718,4.353221,Friterie Tabora,50.848291,4.351698,Friterie
2,Centre,50.846718,4.353221,Friterie 2000,50.848041,4.351015,Friterie
3,Centre,50.846718,4.353221,Friterie du Café Georgette,50.848458,4.353457,Friterie
4,Marais-Jacqmain,50.853034,4.357105,Friterie Yser / Frituur IJzer,50.856296,4.352687,Friterie
5,Marais-Jacqmain,50.853034,4.357105,Friterie De Corte Frituur,50.849551,4.353236,Friterie
6,Marais-Jacqmain,50.853034,4.357105,Friterie du Café Georgette,50.848458,4.353457,Friterie
7,Royal,50.844352,4.36305,Friterie mobile,50.843589,4.367677,Friterie
8,Marolles,50.83674,4.346251,Friterie Fontainas Frituur,50.832888,4.342578,Friterie
9,des Quais,50.85421,4.347946,Friterie Yser / Frituur IJzer,50.856296,4.352687,Friterie


Some Turkish restaurants were also reported as Friterie, so we need to drop these data from ou dataframe

In [17]:
brussels_friteries.drop(brussels_friteries.loc[brussels_friteries['Venue Category']=='Turkish Restaurant'].index, inplace=True)


In [18]:
print(brussels_friteries.shape)
brussels_friteries

(23, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centre,50.846718,4.353221,Friterie De Corte Frituur,50.849551,4.353236,Friterie
1,Centre,50.846718,4.353221,Friterie Tabora,50.848291,4.351698,Friterie
2,Centre,50.846718,4.353221,Friterie 2000,50.848041,4.351015,Friterie
3,Centre,50.846718,4.353221,Friterie du Café Georgette,50.848458,4.353457,Friterie
4,Marais-Jacqmain,50.853034,4.357105,Friterie Yser / Frituur IJzer,50.856296,4.352687,Friterie
5,Marais-Jacqmain,50.853034,4.357105,Friterie De Corte Frituur,50.849551,4.353236,Friterie
6,Marais-Jacqmain,50.853034,4.357105,Friterie du Café Georgette,50.848458,4.353457,Friterie
7,Royal,50.844352,4.36305,Friterie mobile,50.843589,4.367677,Friterie
8,Marolles,50.83674,4.346251,Friterie Fontainas Frituur,50.832888,4.342578,Friterie
9,des Quais,50.85421,4.347946,Friterie Yser / Frituur IJzer,50.856296,4.352687,Friterie


#### Grouping friteries by neighborhood

In [19]:
fcount = brussels_friteries.groupby('Neighborhood').count()
fcount = fcount.reset_index()
fcount


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centre,4,4,4,4,4,4
1,Européen,2,2,2,2,2,2
2,La Cambre,2,2,2,2,2,2
3,Laeken,1,1,1,1,1,1
4,Léopold,1,1,1,1,1,1
5,Marais-Jacqmain,3,3,3,3,3,3
6,Marolles,1,1,1,1,1,1
7,Mutsaard,1,1,1,1,1,1
8,Neder-Over-Heembeek,1,1,1,1,1,1
9,Neerleest,2,2,2,2,2,2


In [20]:
nearby_friteries = pd.DataFrame()
nearby_friteries ['Neighborhood']= fcount['Neighborhood']
nearby_friteries ['Friterie in area'] = fcount['Venue']
nearby_friteries

Unnamed: 0,Neighborhood,Friterie in area
0,Centre,4
1,Européen,2
2,La Cambre,2
3,Laeken,1
4,Léopold,1
5,Marais-Jacqmain,3
6,Marolles,1
7,Mutsaard,1
8,Neder-Over-Heembeek,1
9,Neerleest,2


In [21]:
nearby_friteries.shape

(15, 2)

#### Exploring restaurants in each neighborhood

In [22]:
brussels_restaurants = NearbyRestaurants(names=brussels['Neighborhood'], latitudes=brussels['Latitude'],longitudes=brussels['Longitude'])

Centre
Marais-Jacqmain
des Libertés
Royal
Sablon
Marolles
Midi-Lemonnier
Quartier de la Senne
des Quais
Laeken 
Heysel
Mutsaard
Tour et Taxis
Port
Nord
Hoogleest
Neerleest
Haren
Buda
Da Vinci
Neder-Over-Heembeek
Quartier Louise
La Cambre
Roosevelt
Européen
Léopold
des Squares
Cinquantenaire


In [23]:
print(brussels_restaurants.shape)
brussels_restaurants

(308, 6)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Centre,50.846718,4.353221,Restaurant NMBS / SNCB,50.846345,4.357042
1,Centre,50.846718,4.353221,Restaurant de l'Ogenblik,50.848300,4.354540
2,Centre,50.846718,4.353221,Restaurant HEMA,50.850345,4.353446
3,Centre,50.846718,4.353221,Restaurant Poissons Grill,50.846563,4.353374
4,Centre,50.846718,4.353221,Radisson Blu Royal Hotel Brussels - Atrium Res...,50.849701,4.356567
...,...,...,...,...,...,...
303,des Squares,50.847222,4.398780,Le Petit Restaurant,50.849866,4.399032
304,des Squares,50.847222,4.398780,Savarin Business Restaurant,50.851174,4.401909
305,des Squares,50.847222,4.398780,Mess,50.853751,4.403044
306,Cinquantenaire,50.841037,4.393492,Ellinikon Restaurant,50.837898,4.397221


#### Grouping restaurants by neighborhood

In [24]:
count = brussels_restaurants.groupby('Neighborhood').count()
count = count.reset_index()
count


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Centre,50,50,50,50,50
1,Cinquantenaire,2,2,2,2,2
2,Da Vinci,2,2,2,2,2
3,Européen,19,19,19,19,19
4,Haren,1,1,1,1,1
5,Heysel,5,5,5,5,5
6,Hoogleest,1,1,1,1,1
7,La Cambre,3,3,3,3,3
8,Laeken,2,2,2,2,2
9,Léopold,12,12,12,12,12


In [25]:
nearby_restaurants = pd.DataFrame()
nearby_restaurants ['Neighborhood']= count['Neighborhood']
nearby_restaurants ['Restaurant in area'] = count['Venue']
nearby_restaurants

Unnamed: 0,Neighborhood,Restaurant in area
0,Centre,50
1,Cinquantenaire,2
2,Da Vinci,2
3,Européen,19
4,Haren,1
5,Heysel,5
6,Hoogleest,1
7,La Cambre,3
8,Laeken,2
9,Léopold,12


We can now merge our brussels dataframe with information about the number of friteries and restaurants around each neighborhood.
Note that Buda and Neder-Over-Heembeek having no restaurant in their 500m radius.

In [26]:
brussels = pd.merge(brussels, nearby_restaurants, how="outer")
brussels.fillna(0, inplace=True)
brussels = pd.merge(brussels, nearby_friteries, how="outer")
brussels.fillna(0, inplace=True)
brussels

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances to center,Restaurant in area,Friterie in area
0,1000,Center,Centre,50.846718,4.353221,109.525738,50.0,4.0
1,1000,Center,Marais-Jacqmain,50.853034,4.357105,820.291618,36.0,3.0
2,1000,Center,des Libertés,50.849882,4.366689,1125.97861,20.0,0.0
3,1000,Center,Royal,50.844352,4.36305,841.821817,14.0,1.0
4,1000,Center,Sablon,50.839958,4.356142,803.324866,20.0,0.0
5,1000,Center,Marolles,50.83674,4.346251,1165.039304,13.0,1.0
6,1000,Center,Midi-Lemonnier,50.843217,4.344633,625.009706,19.0,0.0
7,1000,Center,Quartier de la Senne,50.849088,4.341938,747.571882,19.0,0.0
8,1000,Center,des Quais,50.85421,4.347946,897.202954,16.0,1.0
9,1020,Nord,Laeken,50.883392,4.348713,4129.701476,2.0,1.0


Saving our data locally

In [27]:
brussels.to_excel('C:/Users/Oli/Documents/Formation Data science IBM/Cours 9 - Applied Data Science Capstone/III. The Project/Project BXL/BXL_Processed.xlsx')

Now, I have all information about friteries and restaurants in each neighborhood. 

#### Filtering neighborhoods

Now I **filter** the Neighborhood  by removing Neighborhoods with no more than two restaurants and no friteries in radius of 500 meters.

In [28]:
res_count = np.array((brussels['Restaurant in area']<=2))
print('Locations with no more than two restaurants within 500m:', res_count.sum())

fri_count = np.array(brussels['Friterie in area']==0)
print('Locations with no friteries within 500m:', fri_count.sum())

good_locations = np.logical_and(res_count, fri_count)
print('Locations with both conditions met:', good_locations.sum())

good_brussels_locations = brussels[good_locations]


Locations with no more than two restaurants within 500m: 10
Locations with no friteries within 500m: 13
Locations with both conditions met: 7


In [29]:
good_brussels_locations

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances to center,Restaurant in area,Friterie in area
12,1000,Nord,Tour et Taxis,50.866598,4.348435,2255.82664,1.0,0.0
15,1000,Nord,Hoogleest,50.8875,4.409167,6131.36023,1.0,0.0
17,1000,Nord,Haren,50.890944,4.415755,6730.175989,1.0,0.0
18,1000,Nord,Buda,50.90519,4.41415,7916.250241,0.0,0.0
19,1000,Nord,Da Vinci,50.877144,4.4161,5705.641304,2.0,0.0
23,1050,Sud,Roosevelt,50.800556,4.370833,5326.45716,2.0,0.0
27,1040,Est,Cinquantenaire,50.841037,4.393492,3026.59718,2.0,0.0


We now have good candidate locations, that have no more than two restaurants and no friterie in radius of 500m,for opening a new friterie.

#### Clustering

Let us now **cluster** those locations to create **centers of zones containing good locations**.  

In [30]:
from sklearn.cluster import KMeans

number_of_clusters = 3

good_coord = good_brussels_locations[['Longitude', 'Latitude']]
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_coord)

kmeans.labels_[0:10]



array([2, 0, 0, 0, 0, 1, 1])

In [31]:
good_brussels_locations.insert(0, 'Cluster Labels', kmeans.labels_)
good_brussels_locations

Unnamed: 0,Cluster Labels,Postal Code,Borough,Neighborhood,Latitude,Longitude,Distances to center,Restaurant in area,Friterie in area
12,2,1000,Nord,Tour et Taxis,50.866598,4.348435,2255.82664,1.0,0.0
15,0,1000,Nord,Hoogleest,50.8875,4.409167,6131.36023,1.0,0.0
17,0,1000,Nord,Haren,50.890944,4.415755,6730.175989,1.0,0.0
18,0,1000,Nord,Buda,50.90519,4.41415,7916.250241,0.0,0.0
19,0,1000,Nord,Da Vinci,50.877144,4.4161,5705.641304,2.0,0.0
23,1,1050,Sud,Roosevelt,50.800556,4.370833,5326.45716,2.0,0.0
27,1,1040,Est,Cinquantenaire,50.841037,4.393492,3026.59718,2.0,0.0


Let's visualize the **3 areas of interest** and their distance from **Brussels center.**

In [33]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(number_of_clusters)
ys = [i + x + (i*x)**2 for i in range(number_of_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(good_brussels_locations['Latitude'], good_brussels_locations['Longitude'], good_brussels_locations['Neighborhood'], good_brussels_locations['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
name = 'Brussels center'
label = folium.Popup(name, parse_html=True)
folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=name,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)


map_clusters

#### Examining Clusters

We now determine which caracteristics distinguish the three clusters

##### Cluster 1 - one to two restaurants with no fritures 

In [35]:
good_brussels_locations.loc[good_brussels_locations['Cluster Labels'] == 0, good_brussels_locations.columns[[3] + list(range(7, good_brussels_locations.shape[1]))]]

Unnamed: 0,Neighborhood,Restaurant in area,Friterie in area
15,Hoogleest,1.0,0.0
17,Haren,1.0,0.0
18,Buda,0.0,0.0
19,Da Vinci,2.0,0.0


##### Cluster 2 - Two restaurants with no fritures 

In [36]:
good_brussels_locations.loc[good_brussels_locations['Cluster Labels'] == 1, good_brussels_locations.columns[[3] + list(range(7, good_brussels_locations.shape[1]))]]

Unnamed: 0,Neighborhood,Restaurant in area,Friterie in area
23,Roosevelt,2.0,0.0
27,Cinquantenaire,2.0,0.0


##### Cluster 3 - one restaurants with no fritures 

In [37]:
good_brussels_locations.loc[good_brussels_locations['Cluster Labels'] == 2, good_brussels_locations.columns[[3] + list(range(7, good_brussels_locations.shape[1]))]]

Unnamed: 0,Neighborhood,Restaurant in area,Friterie in area
12,Tour et Taxis,1.0,0.0


**Cluster 1** and **Cluster 3** seem to be the two most interesting clusters. Let's now look at which cluster is the closest to the Brussels center. 

#### Distance between Cluster 1 and Cluster 3 from Brussels center

In [38]:
cluster_centers = [kmeans.cluster_centers_]
print(cluster_centers)
Cluster1_latlon = [cluster_centers [0][0][0], cluster_centers [0][0][1]]
Cluster3_latlon = [cluster_centers [0][2][0], cluster_centers [0][2][1]]

print('Cluster 1 longitude = {} and latitude = {}'.format(Cluster1_latlon[1], Cluster1_latlon[0]))
print('Cluster 3 longitude = {} and latitude = {}'.format(Cluster3_latlon[1], Cluster3_latlon[0]))



[array([[ 4.41379308, 50.89019458],
       [ 4.38216286, 50.8207962 ],
       [ 4.3484355 , 50.8665984 ]])]
Cluster 1 longitude = 50.890194575 and latitude = 4.413793075
Cluster 3 longitude = 50.8665984 and latitude = 4.3484355


In [42]:
list = {'Cluster': [1,3], 'Latitude': [Cluster1_latlon[1], Cluster3_latlon[1]], 'Longitude':[Cluster1_latlon[0], Cluster3_latlon[0]]}
distances_clusters = pd.DataFrame(list)
distances_clusters

Unnamed: 0,Cluster,Latitude,Longitude
0,1,50.890195,4.413793
1,3,50.866598,4.348435


In [54]:
distances_center = []
xss = distances_clusters['Longitude'][0]
yss = distances_clusters['Latitude'][0]

x_x, y_y = lonlat_to_xy(xss, yss)

print(x, y)

[0 1 2] 5686997.581266119


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


In [56]:
cluster_distances_center = []
xss = distances_clusters['Longitude']
yss = distances_clusters['Latitude']

for i in range(0,len(xss)):
    x1 = xss[i]
    y1 = yss[i]
    x_x, y_y = lonlat_to_xy(x1, y1)
    distance_from_center = calc_xy_distance(x, y, x_x, y_y)
    cluster_distances_center.append(distance_from_center)

print('The distance between Cluster 1 and Brussels center is {} whereas the distance between Cluster 3 and Brussels center is {}'.format(cluster_distances_center[0], cluster_distances_center[1]))

The distance between Cluster 1 and Brussels center is 6574.653889235116 whereas the distance between Cluster 3 and Brussels center is 2255.826640335436


  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
  xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)


## Results and Discussion<a name="results"></a>

Overall, our analyses show that there is a great number of restaurants in Brussels, but the number of fritures is surprisingly small in the different neighborhoods. As expected, we found that neighborhoods in the center of Brussels includes the greatest number of restaurants and fritures, whereas neighborhoods of the north, east and south include much less restaurants and fritures. 

Being established the concentration of fritures and restaurants in the different Brussels'neighborhoods, I focuses my attention on neighborhoods that involve less than three restaurants an no fritures in radius of 500m. Analyses showed that 7 Neighborhoods from the north, east and south correspond to our criterion.

Then, those candidate neighborhoods were clustered to create **areas of interest**, leaving 3 areas in which a friture could be open. A deepen examination of those three clusters show that 2 clusters are best candidates to open the fritures. However, when looking for the distance between those two areas and the Brussels center, **Tour et Taxis** emerges as an optimal candidate to open a friture.

It is important to note that these analyses rely on predetermined criterion. They should be taken as a starting point for a deepen investigation and thus do not imply that the three zones of interest, and especially, Tour et Taxis, are actually optimal locations for a fritures. 


## Conclusion <a name="conclusion"></a>

In this project, I wanted to determine in which Brussels' neighborhoods it could be optimal to open a new friterie. To address this question, Stakeholders provided three main criterion: the neighborhoods should involve less than 3 restaurants and no fritures in radius of 500m.

By using Foursquare data to estimate friterie and restaurant concentration in each neighborhood, I identify 7 candidate neighborhoods. A subsequent clustering of those neighborhoods revealed that 3 areas could be interesting to open a new friterie, with **Tour et Taxis** being the optimal location because of it's proximity from the Brussels center.

Taken together, these information could be use by Stakeholders as a starting point for a deepen exploration of neighborhoods in the three recommended zone, based on which the final decision for an optimal friterie location will be taken.