# Coursera capstone  : where to install a new bakery in Paris region

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Methodology](#methodology)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Business problem <a name="introduction"></a>

In this project, we want to find an optimal location to install a new bakery in the paris suburbs.  
There are already a lot of bakery in Paris region, or places that sell bread and pastries, so we want to find a location where there is no bakery in a 500 m perimeter.  
We will use datascience techniques to identify promising neighborhoods regarding this criteria. 


## Methodology <a name="methodology"></a>

First we will use **Foursquare API** to retrieve all **bakeries locations in Paris and Suburbs**, in a 15km distance form Paris Center. 
To do so, as forusquare answer is limited to 50 place,  we will have to create a grid of points, ans interrogate Foursquare for each of theses points.

Second step we be to analyse the neighborhoods by **vizualising** (with a Heatmap) existing bakeries. We can then identify one **promising region**.

Third in this region we look at **every location** ( on each 300m) and calculate for each **the distance to the closest bakery**, and filter the locations to keep only the locations with **no bakery inthe 500m radius**.

Finally we will cluster all the resulting locations ( using **k-means clustering**), to find candidates zones, where our criteria is fullfilled. Thoses candidates location should be a starting point for more detailled analysis.


## Data <a name="data"></a>

Based on the problem definition, insights are :  
* number of existing bakeries in the neighborhood
* distance of he nearest bakery in the neighborhood

We decided to use a regurlarly spaced grid of locations, centered around the city center, to define our neighborhoods.  

Following data sources are used in this projets :
* Foursquare api to retrieve all bakeries in the Paris region
* Google maps api to geocode and reverse geocode adresses to coordinates

In [68]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import json # library to handle json API responses

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# to save and import binary files
import pickle  
    
# tranforming json file into a pandas dataframe library
from pandas import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from folium import plugins
from folium.plugins import HeatMap


import pyproj # to work wth coordinate
from pyproj import CRS , Transformer, Proj

import math # helpfull in calculations
import numpy as np

### Define Foursquare Credentials and Version


#### Make sure that you have created a Google maps API, and  Foursquare developer account and have your credentials handy


In [3]:
# Google API KEY
google_api_key = 'AIzaSyCYpdPAJTZY3SZvuA6zdp9VXB3xrRGXEr4' 

# Foursquare
CLIENT_ID = 'SBDQCUVHPUSYGURUVCO0BKJZJIUP42FS2BRPWZKXIWPQ0HZA' # your Foursquare ID
CLIENT_SECRET = '23IEJUSVPUF54EJ01AQQNCGNTQ5RW4KFUT1MZJW2HCLGF4BX' # your Foursquare Secret
VERSION = '20180724'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('Google API Key :' + google_api_key)

Your credentails:
CLIENT_ID: SBDQCUVHPUSYGURUVCO0BKJZJIUP42FS2BRPWZKXIWPQ0HZA
CLIENT_SECRET:23IEJUSVPUF54EJ01AQQNCGNTQ5RW4KFUT1MZJW2HCLGF4BX
Google API Key :AIzaSyCYpdPAJTZY3SZvuA6zdp9VXB3xrRGXEr4


#### Centre de Paris

In [4]:
address = '1 parvis Notre Dame, 75004 Paris, France'
latitude = 48.853299
longitude = 2.348726
Paris_lat = latitude
Paris_long = longitude
print(latitude, longitude)

48.853299 2.348726


## Creating a Grid


Our area of interest is about 15km around center Paris. 
Let's create points, with their latitudes and longitudes coordinates,  regularly separated by 1200 m.  
First define 3 functions to tranform longitudes, latitudes to x, y coordinates, and calculate distance

In [7]:
def lonlat_to_xy(lon, lat):
    myProj = Proj("+proj=utm +zone=23K, +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
    xy = myProj(lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    myProj = Proj("+proj=utm +zone=23K, +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
    lonlat = myProj(x,y, inverse=True )
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return np.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Paris center longitude={}, latitude={}'.format(Paris_long, Paris_lat))
x, y = lonlat_to_xy(Paris_long, Paris_lat)
print('Paris center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Paris center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Paris center longitude=2.348726, latitude=48.853299
Paris center UTM X=3873528.9831384164, Y=6584516.716673762
Paris center longitude=2.3487260000000005, latitude=48.85329899999998


In [8]:
Paris_center_x, Paris_center_y = lonlat_to_xy(Paris_long, Paris_lat) # City center in Cartesian coordinates
print('Paris center UTM X={}, Y={}'.format(Paris_center_x, Paris_center_y))

Paris center UTM X=3873528.9831384164, Y=6584516.716673762


Now calculate the grid centers

In [9]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = Paris_center_x - 15000
x_step = 1200
y_min = Paris_center_y - 15000 
y_step = 1200 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
i_s = []
j_s = []
for i in range(0, 32):
    y = y_min + i * y_step
    x_offset = 600 if i%2==0 else 0
    for j in range(0, 26):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Paris_center_x, Paris_center_y, x, y)
        if (distance_from_center <= 15001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)
            i_s.append(i)
            j_s.append(j)

print(len(latitudes), 'candidate neighborhood centers generated.')

567 candidate neighborhood centers generated.


In [10]:
map_paris = folium.Map(location=[Paris_lat, Paris_long], zoom_start=11)

for lat, lon , x, y,i,j in zip(latitudes, longitudes, xs, ys, i_s,j_s):
#   folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_paris)
    folium.Marker(location=[lat, lon], popup='i: '+str(i)+' - j: '+str(j)).add_to(map_paris)
map_paris

#### Looking good !  We have a grid of regularly spaces points, with their coordinates.
Let's transform in a dataframe and save as csv file

In [11]:
centres= pd.DataFrame({'latitudes' : latitudes, 'longitudes' : longitudes,'distances_from_center': distances_from_center, 'xs':xs, 'ys': ys, 'i_s': i_s, 'j_s' : j_s })
centres

Unnamed: 0,latitudes,longitudes,distances_from_center,xs,ys,i_s,j_s
0,48.761816,2.235710,15000.000000,3.873529e+06,6.569517e+06,0,12
1,48.794992,2.193618,14968.736936,3.868129e+06,6.570556e+06,1,8
2,48.789030,2.204714,14578.857481,3.869329e+06,6.570556e+06,1,9
3,48.783068,2.215806,14279.463767,3.870529e+06,6.570556e+06,1,10
4,48.777106,2.226895,14076.330682,3.871729e+06,6.570556e+06,1,11
...,...,...,...,...,...,...,...
562,48.939103,2.455469,14098.453567,3.873529e+06,6.598615e+06,28,12
563,48.933104,2.466542,14149.430836,3.874729e+06,6.598615e+06,28,13
564,48.927103,2.477611,14301.272425,3.875929e+06,6.598615e+06,28,14
565,48.921103,2.488677,14550.821042,3.877129e+06,6.598615e+06,28,15


### Now working with Foursquare


In [12]:
def get_venues_near_location(lat, lon, Foursq_boulang, CLIENT_ID, CLIENT_SECRET, radius=500, LIMIT=100):
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, Foursq_boulang, radius, LIMIT)
    try:
        results = requests.get(url).json()['response']['venues']
        venues = [(item['id'],
                   item['name'],
                   get_categories(item['categories']),
                   (item['location']['lat'], item['location']['lng']),
                   format_address(item['location']),
                   item['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

#### 2 blocks of code,  not used each time because forusquare has limits per day

In [None]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found italian restaurants
boul = {}
location_boul = []
print('Obtaining venues around candidate locations:', end='')
for lat, lon in zip(latitudes, longitudes):
    # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
    venues = get_venues_near_location(lat, lon, Foursq_boulang, CLIENT_ID, CLIENT_SECRET, 500, 100)
    area_boulang = []
    for venue in venues:
        venue_id = venue[0]
        venue_name = venue[1]
        venue_categories = venue[2]
        venue_latlon = venue[3]
        venue_address = venue[4]
        venue_distance = venue[5]
            
#        x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
        restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance)
        if venue_distance<=500:
            area_boulang.append(restaurant)
        boul[venue_id] = restaurant
    location_boul.append(area_boulang)
    print(' .', end='')
print(' done.')

In [None]:
    # Let's persists this in local file system
    with open('boulangeries.pkl', 'wb') as f:
        pickle.dump(boul, f)
    with open('location_boulangeries.pkl', 'wb') as f:
        pickle.dump(location_boul, f)

If data were already saved, laod them. If not request foursquare to have all the necessary data

In [13]:
    with open('boulangeries.pkl', 'rb') as f:
        boulangeries = pickle.load(f)
    with open('location_boulangeries.pkl', 'rb') as f:
        location_boulangeries = pickle.load(f)

#### Transform in a dataframe , then show all bakeries on a map

In [64]:
dfboul = pd.DataFrame(boulangeries).transpose()
col = ('id', 'Name', 'latitude', 'longitude', 'adress', 'distance')
dfboul.columns = col
dfboul.drop(columns = ['id','distance'], inplace = True)
dfboul.head()

Unnamed: 0,Name,latitude,longitude,adress
4ecf86ba6da162f1bd663711,Pâtissier Boulanger,48.7961,2.19556,"Chaville, France"
55d5b9b4498ec226b3730eff,Paul,48.7876,2.21999,"Technopôle Bouygues Telecom (RDC), 92190 Meudo..."
5a60a1061108ba710699c1d3,Starbucks,48.7784,2.21629,"2 avenue De L'Europe, 78140 Vélizy-Villacoubla..."
4c4b1578959220a1467e110f,Paul,48.7792,2.21528,"22 Avenue De L'Europe, Centre Commercial Véliz..."
4e6865e5b0fb8e94c7c74cff,Le Fournil d'Émilie,48.7774,2.2304,"1 rond-point du Général Leclerc, 92140 Clamart..."


In [76]:
dfboul.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2821 entries, 4ecf86ba6da162f1bd663711 to 5817560738faf80051e42331
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Name       2821 non-null   object
 1   latitude   2821 non-null   object
 2   longitude  2821 non-null   object
 3   adress     2821 non-null   object
dtypes: object(4)
memory usage: 110.2+ KB


In [18]:
map_boul = folium.Map(location=[Paris_lat, Paris_long], zoom_start=12)

for lat, lon, name in zip(dfboul.latitude, dfboul.longitude, dfboul.Name):
    folium.Circle([lat, lon], radius=5, color='blue', fill=False).add_to(map_boul)
#    folium.Marker(location=[lat, lon], popup=name).add_to(map_boul)
map_boul

#### Looking Good ! We have all bakeries in Paris and suburbs

## Analysis <a name="analysis"></a>

Now show results on a with a Heatmap

In [65]:
boul_latlon = [[res[2], res[3]] for res in boulangeries.values()]
boul_name = [res[1] for res in boulangeries.values()]


In [66]:
map_heat = folium.Map(location=[Paris_lat, Paris_long], zoom_start=12)
HeatMap(boul_latlon).add_to(map_heat)
for loc, name in zip(boul_latlon,  boul_name):
    folium.Circle(loc, radius=5, color='blue', fill=False).add_to(map_heat)
map_heat

In [67]:
print('boulangeries around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_boulangeries[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Boulangeries around location {}: {}'.format(i+1, names))

boulangeries around location
---------------------------
Boulangeries around location 101: Au Pas de Saint-Cloud, L'Atelier des Pains, Carrefour City, Boulangerie Monté, Boulangerie Paul
Boulangeries around location 102: La Fromentine, Boulangerie Chesneau, Boulangeire Chesneau, Le Pétrin de Boulogne, Boulangerie Lelan, Le Fournil de Boulogne, Festival des pains, Retrodor
Boulangeries around location 103: Boulangerie Dollé - La Huche Campagnarde, Boulangerie, Les Co'pains Toqués, Paul, Starbucks, Carrefour City, Claude Chatry, Saines Saveurs
Boulangeries around location 104: Carrefour Bio, Le Quartier du Pain, Carrefour City, La Craquante, Les Délices de Boulogne, L'épi De Blé, Caprice des Anges, Aux Délices de Boulogne
Boulangeries around location 105: Boulangerie Jan Geslin, Boulangerie Kharroubi Sahbi, Le Pain d'ici, Au Vieux Pétrin
Boulangeries around location 106: Boulangerie Maxime Ollivier, Aux Délices de Clamart, Maison Lecomte, Boulangerie Brard
Boulangeries around location 10

In [69]:


print('Total number of boulangeries :', len(boulangeries))
print('Average number of boulangeries in neighborhood:', np.array([len(r) for r in location_boulangeries]).mean())

Total number of boulangeries : 2821
Average number of boulangeries in neighborhood: 4.111111111111111


### Resticting the project 

Due to the stakeholder living place, we limit the area of decision to the south of Paris.  
Defining the limits

In [71]:
lat_max = 48.82
lat_min = 48.765
lon_max = 2.421
lon_min = 2.25

lat_sq = (lat_max+lat_min) / 2
lon_sq = (lon_max+lon_min) /2

In [72]:
coord = [[lat_min,lon_min],[lat_max, lon_min],[lat_max,lon_max], [lat_min, lon_max],[lat_min,lon_min]]
coord

[[48.765, 2.25],
 [48.82, 2.25],
 [48.82, 2.421],
 [48.765, 2.421],
 [48.765, 2.25]]

In [73]:
dfboulSud=dfboul[(dfboul['latitude']<lat_max)&(dfboul['latitude']>lat_min)&(dfboul['longitude']<lon_max)&(dfboul['longitude']>lon_min)]
dfboulSud.head()

Unnamed: 0,Name,latitude,longitude,adress
584052cbf59572431df27657,Le Festival des Pains,48.7774,2.25401,"7 rue Marcel Gimond, 92350 Le Plessis-Robinson..."
5a301224947c0515caae8340,Saines Saveurs,48.7773,2.2586,"92350 Le Plessis-Robinson, France"
50cd8b6be4b0f94b3793785b,boulangerie aux mil epis,48.7659,2.25968,France
4fc08432e4b0117b73406ec7,Aux Fins Délices,48.7669,2.28043,France
4b9382bff964a520814634e3,L'Escargot d'Or,48.7811,2.26272,"passage de l'Escargot d'Or, 92350 Le Plessis-R..."


In [74]:
map_1 = folium.Map(location=[lat_sq, lon_sq], zoom_start=12)
#HeatMap(boul_latlon).add_to(map_1)

# les centroids de calcul avec 0 boulangeries
#for lat, lon in latlon9294 : 
#    folium.Circle([lat, lon], radius=50, color='blue', fill=False).add_to(map_cluster)

#ajout de la zone
folium.PolyLine(coord, color='green', fill=False).add_to(map_1)

# ajout des boulangeries
for lat, lon in zip(dfboulSud.latitude, dfboulSud.longitude):
    loc = [lat, lon]
    folium.Circle(loc, radius=5, color='blue', fill=False).add_to(map_1)
    
map_1

We have a zone of interest, with all bakeries currently existing

Creating a more defined grid in the square zone defined

In [28]:
x1,y1 = lonlat_to_xy(lon_min,lat_min )
x2,y2 = lonlat_to_xy(lon_max,lat_min )
x3,y3 = lonlat_to_xy(lon_max,lat_max )
x4,y4 = lonlat_to_xy(lon_min,lat_max )

x_min = min(x1,x2,x3,x4)
x_max = max(x1,x2,x3,x4)
y_min = min(y1,y2,y3,y4)
y_max = max(y1,y2,y3,y4)

print('x are : ', x1,x2,x3,x4)
print('x min : ', x_min, ' x max : ', x_max, ' distance x : ', x_max-x_min)
print('y are : ', y1,y2,y3,y4)
print('y min : ', y_min, ' y max : ', y_max, ' distance y : ', y_max-y_min)

x are :  3874204.387248585 3885332.50513803 3880900.5474434705 3869792.0495771267
x min :  3869792.0495771267  x max :  3885332.50513803  distance x :  15540.455560903065
y are :  6570588.056641696 6579676.344031759 6585084.534904094 6576004.683136744
y min :  6570588.056641696  y max :  6585084.534904094  distance y :  14496.478262398392


In [29]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells

x_step = 300
y_step = 300 * k 

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int((y_max-y_min)/y_step)):
    y = y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, int((x_max-x_min)/x_step)):
        x = x_min + j * x_step + x_offset
        lon, lat = xy_to_lonlat(x, y)
        roi_latitudes.append(lat)
        roi_longitudes.append(lon)
        roi_xs.append(x)
        roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2805 candidate neighborhood centers generated.


In [30]:
data = {'latitude' : roi_latitudes , 'longitude' :  roi_longitudes , 'x' : roi_xs, 'y' :roi_ys}
centres_sq = pd.DataFrame(data)
centres_sq = centres_sq[(centres_sq['latitude']<lat_max)&(centres_sq['latitude']>lat_min)&(centres_sq['longitude']<lon_max)&(centres_sq['longitude']>lon_min)]
centres_sq.reset_index(drop= True, inplace = True)
centres_sq

Unnamed: 0,latitude,longitude,x,y
0,48.766150,2.252763,3.874292e+06,6.570848e+06
1,48.768978,2.252408,3.874042e+06,6.571108e+06
2,48.767487,2.255178,3.874342e+06,6.571108e+06
3,48.765996,2.257947,3.874642e+06,6.571108e+06
4,48.772304,2.251130,3.873692e+06,6.571367e+06
...,...,...,...,...
1280,48.812629,2.420961,3.881492e+06,6.584358e+06
1281,48.819945,2.412328,3.880342e+06,6.584618e+06
1282,48.818449,2.415092,3.880642e+06,6.584618e+06
1283,48.816953,2.417856,3.880942e+06,6.584618e+06


In [31]:
map_1 = folium.Map(location=[lat_sq, lon_sq], zoom_start=12)


# les centres
for lat, lon in zip(centres_sq.latitude, centres_sq.longitude) :
    folium.Circle([lat, lon], radius=3, color='blue', fill=True).add_to(map_1)

#ajout de la zone
folium.PolyLine(coord, color='green', fill=False).add_to(map_1)

  
map_1

In the defined region, we have a new grid, regularly spaced by 300m.  
Now calculate for each center of this grid, the distance of the nearest bakery

In [56]:
print('Obtaining boulangeries distances ', end='')
boul_dist = []
for c in centres_sq.index : 
    lat_c =  (centres_sq.at[c, 'latitude'])
    lon_c =  (centres_sq.at[c, 'longitude'])
    x_c, y_c = lonlat_to_xy(lon_c,lat_c)
    dist = 30000
    for b in dfboulSud.index :
        lat_b =  (dfboulSud.at[b, 'latitude'])
        lon_b =  (dfboulSud.at[b, 'longitude'])
        x,y = lonlat_to_xy( lon_b,lat_b)
        d = calc_xy_distance(x,y,x_c,y_c)
        if d<=dist:
            dist = d
    boul_dist.append(dist)
    print(c, end=' ')
    

Obtaining boulangeries distances 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 2

In [59]:
centres_sq['boul_dist']=boul_dist
centres_sq

Unnamed: 0,latitude,longitude,x,y,boul_dist
0,48.766150,2.252763,3.874292e+06,6.570848e+06,582.055208
1,48.768978,2.252408,3.874042e+06,6.571108e+06,725.918645
2,48.767487,2.255178,3.874342e+06,6.571108e+06,429.159264
3,48.765996,2.257947,3.874642e+06,6.571108e+06,146.278735
4,48.772304,2.251130,3.873692e+06,6.571367e+06,696.822281
...,...,...,...,...,...
1280,48.812629,2.420961,3.881492e+06,6.584358e+06,638.358759
1281,48.819945,2.412328,3.880342e+06,6.584618e+06,544.290790
1282,48.818449,2.415092,3.880642e+06,6.584618e+06,799.342527
1283,48.816953,2.417856,3.880942e+06,6.584618e+06,922.774324


In [61]:
# save this work in a csv file
centres_sq.to_csv('centres_sq.csv')

In [35]:
# retrieve previous work by loading the csv file
centres_sq = pd.read_csv('centres_sq.csv')
centres_sq.drop(columns='Unnamed: 0', inplace = True)
centres_sq.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1285 entries, 0 to 1284
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   latitude   1285 non-null   float64
 1   longitude  1285 non-null   float64
 2   x          1285 non-null   float64
 3   y          1285 non-null   float64
 4   boul_dist  1285 non-null   float64
dtypes: float64(5)
memory usage: 50.3 KB


#### Results on a map : we will show in red the points where there is a bakery in less than 500m, and in green the points where nearest baekry is more than 500m

In [37]:
map_1 = folium.Map(location=[lat_sq, lon_sq], zoom_start=12)


# les centres
for lat, lon, dist in zip(centres_sq.latitude, centres_sq.longitude, centres_sq.boul_dist) :
    if dist < 500 :
        folium.Circle([lat, lon], radius=3, color='red', fill=True).add_to(map_1)
    if 500 <= dist  :
        folium.Circle([lat, lon], radius=3, color='green', fill=True).add_to(map_1)
        
#ajout de la zone
#folium.PolyLine(coord, color='blue', fill=False).add_to(map_1)

  
map_1

Now filer on points, where distance to the closest bakery is more than 500m

In [38]:
centres_sq_500 = centres_sq[centres_sq['boul_dist']>500]
centres_sq_500

Unnamed: 0,latitude,longitude,x,y,boul_dist
0,48.766150,2.252763,3.874292e+06,6.570848e+06,582.055208
1,48.768978,2.252408,3.874042e+06,6.571108e+06,725.918645
4,48.772304,2.251130,3.873692e+06,6.571367e+06,696.822281
5,48.770813,2.253900,3.873992e+06,6.571367e+06,791.585343
6,48.769321,2.256670,3.874292e+06,6.571367e+06,503.812614
...,...,...,...,...,...
1280,48.812629,2.420961,3.881492e+06,6.584358e+06,638.358759
1281,48.819945,2.412328,3.880342e+06,6.584618e+06,544.290790
1282,48.818449,2.415092,3.880642e+06,6.584618e+06,799.342527
1283,48.816953,2.417856,3.880942e+06,6.584618e+06,922.774324


We find 632 point in the region, where the nearest bakery is farther than 500m

### We will group the points into clusters, to find approximate location candidates for our new bakery

In [39]:
from sklearn.cluster import KMeans
number_of_clusters = 20
latlon500 = centres_sq_500[['latitude','longitude']].values
latlon500

array([[48.76615007,  2.25276269],
       [48.76897839,  2.25240792],
       [48.77230383,  2.25112978],
       ...,
       [48.81844871,  2.41509218],
       [48.81695252,  2.41785626],
       [48.8154563 ,  2.42062011]])

In [40]:
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(latlon500)

In [41]:
candidates = kmeans.cluster_centers_
candidates

array([[48.79111493,  2.3437704 ],
       [48.80219645,  2.38896863],
       [48.80660575,  2.2547466 ],
       [48.77659958,  2.41387493],
       [48.78824641,  2.28683773],
       [48.7716878 ,  2.37997475],
       [48.80947911,  2.34608707],
       [48.80172723,  2.31199043],
       [48.77559227,  2.26253563],
       [48.81159258,  2.40774033],
       [48.77626243,  2.36720796],
       [48.80256608,  2.27620333],
       [48.7789178 ,  2.32523696],
       [48.78504534,  2.35692751],
       [48.78643921,  2.38009946],
       [48.77088944,  2.30166991],
       [48.76925077,  2.3420572 ],
       [48.79313658,  2.41206106],
       [48.77877762,  2.3983775 ],
       [48.80401004,  2.37441064]])

In [42]:
map_1 = folium.Map(location=[lat_sq, lon_sq], zoom_start=12)

        
#ajout de la zone
folium.PolyLine(coord, color='blue', fill=False).add_to(map_1)

# ajout des boulangeries
for lat, lon in zip(dfboulSud.latitude, dfboulSud.longitude):
    loc = [lat, lon]
    folium.Circle(loc, radius=5, color='blue', fill=False).add_to(map_1)

# les centres
for lat, lon, dist in zip(centres_sq_500.latitude, centres_sq_500.longitude, centres_sq_500.boul_dist) :
    if 500 <= dist  :
        folium.Circle([lat, lon], radius=3, color='green', fill=True).add_to(map_1)    
    

for lat, lon in candidates : 
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_1)

map_1

Existing bakeries are in blue, points with no bakeries in the 500 m in green dots.  
The green circles are candidates location for a new bakery.

Now put all these in a DataFrame,  and use google reverse geocode to find adresses of the candidates locations

In [47]:
df_cand = pd.DataFrame(candidates)
df_cand.columns = ['latitude','longitude']
df_cand

Unnamed: 0,latitude,longitude
0,48.791115,2.34377
1,48.802196,2.388969
2,48.806606,2.254747
3,48.7766,2.413875
4,48.788246,2.286838
5,48.771688,2.379975
6,48.809479,2.346087
7,48.801727,2.31199
8,48.775592,2.262536
9,48.811593,2.40774


In [60]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

    
addr = get_address(google_api_key, Paris_lat, Paris_long)
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Paris_lat, Paris_long, addr))

 

Reverse geocoding check
-----------------------
Address of [48.853299, 2.348726] is: 22 Prom. Maurice Carême, 75004 Paris, France


In [62]:
adresses = []
for lat, lon in zip(df_cand.latitude, df_cand.longitude) :
    print('Coordinates are ',lat, lon)
    adr = get_address(google_api_key, lat, lon)
    adresses.append(adr)
adresses    

Coordinates are  48.79111492947919 2.343770400458489
Coordinates are  48.80219645008432 2.3889686260381833
Coordinates are  48.80660575495435 2.2547465952470764
Coordinates are  48.7765995823539 2.4138749260193157
Coordinates are  48.78824640577134 2.2868377296834903
Coordinates are  48.77168780468165 2.379974748561059
Coordinates are  48.80947911276591 2.346087069466697
Coordinates are  48.80172722959299 2.3119904307488306
Coordinates are  48.77559227085536 2.2625356278828743
Coordinates are  48.811592575864594 2.407740332947359
Coordinates are  48.776262430839196 2.367207957514946
Coordinates are  48.802566081576266 2.2762033310136527
Coordinates are  48.77891780483258 2.325236960094856
Coordinates are  48.78504534383665 2.3569275071523985
Coordinates are  48.78643920976434 2.3800994558142694
Coordinates are  48.770889437002495 2.3016699091473263
Coordinates are  48.76925076540425 2.3420571960224335
Coordinates are  48.793136583433906 2.4120610572746988
Coordinates are  48.7787776164

["148 Rue Gabriel Péri, 94230 L'Haÿ-les-Roses, France",
 '16 Rue Jean-Baptiste Renoult, 94200 Ivry-sur-Seine, France',
 '43 Avenue Adolphe Schneider, 92140 Clamart, France',
 'Seine Bridge, A86, 94600 Choisy-le-Roi, France',
 '14 Rue André Neyts, 92260 Fontenay-aux-Roses, France',
 '14 Rue Jean Mermoz, 94320 Thiais, France',
 '30 Rue Labourse, 94250 Gentilly, France',
 '29 Rue de Verdun, 92220 Bagneux, France',
 '47 Rue du Moulin Fidel, 92350 Le Plessis-Robinson, France',
 '371 Quai Henri Pourchassé, 94200 Ivry-sur-Seine, France',
 '72 Avenue de Stalingrad, 94550 Chevilly-Larue, France',
 '98 Rue Pierre Brossolette, 92320 Châtillon, France',
 "25 Rue Mangin, 94240 L'Haÿ-les-Roses, France",
 '7 Rue Guillaume Apollinaire, 94800 Villejuif, France',
 '26 Voie Lancret, 94400 Vitry-sur-Seine, France',
 'Unnamed Road, 92330, 92330 Sceaux, France',
 '9 Avenue de la Croix du Sud, 94550 Chevilly-Larue, France',
 '23 Rue Eugène Hénaff, 94400 Vitry-sur-Seine, France',
 '99 Rue Anselme Rondenay, 94

In [63]:
df_cand['adresses']=adresses
df_cand

Unnamed: 0,latitude,longitude,adresses
0,48.791115,2.34377,"148 Rue Gabriel Péri, 94230 L'Haÿ-les-Roses, F..."
1,48.802196,2.388969,"16 Rue Jean-Baptiste Renoult, 94200 Ivry-sur-S..."
2,48.806606,2.254747,"43 Avenue Adolphe Schneider, 92140 Clamart, Fr..."
3,48.7766,2.413875,"Seine Bridge, A86, 94600 Choisy-le-Roi, France"
4,48.788246,2.286838,"14 Rue André Neyts, 92260 Fontenay-aux-Roses, ..."
5,48.771688,2.379975,"14 Rue Jean Mermoz, 94320 Thiais, France"
6,48.809479,2.346087,"30 Rue Labourse, 94250 Gentilly, France"
7,48.801727,2.31199,"29 Rue de Verdun, 92220 Bagneux, France"
8,48.775592,2.262536,"47 Rue du Moulin Fidel, 92350 Le Plessis-Robin..."
9,48.811593,2.40774,"371 Quai Henri Pourchassé, 94200 Ivry-sur-Sein..."


### We have now 20 candidates locations, with their adresses.  
We know there is no bakery within 500m.

## Results <a name="results"></a>

Our analysis show that there is a big number of bakeries in Paris region (more than 2800). most of them are concentrated in Paris City and in the west Suburb.  

We focus our attention on the south suburb, due to it's proximity to the stakeholder living place, the econonmic and social dynamism, and the low number of existing bakeries. in a rectangle zone, we created a a dense grid of points, spaced by 300m. Thoses points were then filtered by proximity with an existaing bakery, adn we removed all points within 500m of another bakery.

The resulting points were then clustered to create zones of interest. Adresses of centers of thoses zones were generated to be used as starting points for more detailled analysis.

Results of the project is 20 candidates locations to be further analysed, based on existing bakeries in the surrounding. there may be a very good reason to explain why there is no bakery in this neighborhood. So recommended zones should be considered only as starting point for more detailled analysis.


## Conclusion<a name="conclusion"></a>

The goal of the project was to identify location with low number of bakery, in order to help stakeholders define where a new bakery can be installed.  
First we identified all the currently existing bakery in Paris and close suburbs, from foursquare data. Then we choose to focus on South Paris. We generated a large collection of locations, filter them with some requirements regaring existing bakeries. Then by clustering thoses locations, we find some candidates adresses, that can be used for further exploration.  
The decision on the new bakery location can now be made by stakeholders, starting in the recommended zone and taking into considerations other factors, like attractiveness of every location, population, proximity of majors roads or railways station, prices, social environment, etc.