# Creating smarter city districts in Bucharest

## Introduction
### Background
Bucharest, the capital of Romania is currently divided into 6 administrative units, called sectors (“sectoare” in Romanian). Each sector has its own mayor and council who are responsible over local affairs (secondary streets, parks, cleaning services, for example). After World War I, each administrative unit of Bucharest (called “culori” at that time), was first given its own mayor and council. The current divisions of sectors of this city date back to august 1979, and there is an incentive to redefine they way Bucharest is divided, as the territories encompass diverse neighborhoods which translate into diverse needs hard to tackle by the local administrations.

### Problem
The problem of dividing a territory into coherent divisions demands taking into account a large number of factors regarding what venues are present in the district, what kinds of services are operating and at what level of quality, etc. These factors, if correctly used, will create useful territorial divisions reflecting local needs. Organizing the administration of the city in smaller, more representative units is a necessity at this point, although the question begged by this initiative is how do we do this and take into account enough factors to make it efficient?
My goal is to use machine learning algorithms to try to divide the territory of Bucharest in a more efficient way, with neighborhoods defined by the types of restaurants, parks, museums present. Making the divisions based upon the types of venues which can be found within the territory, will ensure a more accurate depiction of local’s needs.


Import useful libraries and packages

In [2]:
import os
import folium
import json
import requests
from geopy.geocoders import Nominatim
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.colors as colors
from folium.plugins import MarkerCluster
import numpy as np
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN 

In [3]:
#conda install geopandas

In [4]:
import geopandas as gpd
import matplotlib.pyplot as plt #plotting tool
from matplotlib.patches import RegularPolygon #drawing hexagons
import shapely #to attribute geometric properties for shapes
from shapely.geometry import Polygon
import pyproj
import math

In [5]:
#! pip install geojson

#### Define Bucharest administrative boundaries with geojson files extracted from Open Street Maps

In [6]:
#Bucharest sectors geojson polygon definitions
s1 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960954&params=0"
s2 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960953&params=0"
s3 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960937&params=0"
s4 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960957&params=0"
s5 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960956&params=0"
s6 = "http://polygons.openstreetmap.fr/get_geojson.py?id=7960955&params=0"
B = "http://polygons.openstreetmap.fr/get_geojson.py?id=377733&params=0"
sectors = [s1,s2,s3,s4,s5,s6]
geo_json = []

for s in sectors:
    geo_json_data_temp = json.loads(requests.get(s).text)
    geo_json.append(geo_json_data_temp)

Get latitude and longitute for Bucharest

In [7]:
# Use geolocator to determine Bucharest coordinates
address='Bucharest'
geolocator = Nominatim(user_agent="claudia")
location = geolocator.geocode(address)
loc = [location.latitude, location.longitude]
lat = location.latitude
long = location.longitude
print (lat, long)

44.4361414 26.1027202


#### Create GeoDataframe for the administrative limits of Bucharest - will be used later to select the centers of neighborhoods which are within the city's limits

In [8]:
geo_json_data_temp = json.loads(requests.get(B).text)
B_geo = gpd.read_file(B)
B_geo

Unnamed: 0,geometry
0,GEOMETRYCOLLECTION (MULTIPOLYGON (((25.96734 4...


Define a function which creates a grid of points within a given set of coordinates

In [12]:
def get_grid(upper_right, lower_left, n=6):

    all_points = []

    lat_steps = np.linspace(lower_left[0], upper_right[0], n+1)
    lon_steps = np.linspace(lower_left[1], upper_right[1], n+1)
    for lat in lat_steps[:-1]:
        for lon in lon_steps[:-1]:
            point = [lon, lat]

            all_points.append(point)

    return all_points

Create a grid of points covering Bucharest

In [13]:
top_right = [44.5415,26.2279]
bottom_left = [44.33338,25.96262]

grid = get_grid(top_right, bottom_left, n=25)

grid

[[25.96262, 44.33338],
 [25.9732312, 44.33338],
 [25.9838424, 44.33338],
 [25.9944536, 44.33338],
 [26.0050648, 44.33338],
 [26.015676000000003, 44.33338],
 [26.026287200000002, 44.33338],
 [26.036898400000002, 44.33338],
 [26.0475096, 44.33338],
 [26.0581208, 44.33338],
 [26.068732, 44.33338],
 [26.0793432, 44.33338],
 [26.0899544, 44.33338],
 [26.100565600000003, 44.33338],
 [26.111176800000003, 44.33338],
 [26.121788000000002, 44.33338],
 [26.132399200000002, 44.33338],
 [26.1430104, 44.33338],
 [26.1536216, 44.33338],
 [26.1642328, 44.33338],
 [26.174844, 44.33338],
 [26.185455200000003, 44.33338],
 [26.196066400000003, 44.33338],
 [26.206677600000003, 44.33338],
 [26.217288800000002, 44.33338],
 [25.96262, 44.341704799999995],
 [25.9732312, 44.341704799999995],
 [25.9838424, 44.341704799999995],
 [25.9944536, 44.341704799999995],
 [26.0050648, 44.341704799999995],
 [26.015676000000003, 44.341704799999995],
 [26.026287200000002, 44.341704799999995],
 [26.036898400000002, 44.3417047

#### Create a dataframe from these points and add an indexing variable in order to distinguish them

In [14]:
# Dataframe
griddf = pd.DataFrame(grid, columns=['LON','LAT'])  
#Create index column
griddf['Neighborhood index'] = np.arange(len(griddf))

#Set type of column to string
griddf['Neighborhood index'] = griddf['Neighborhood index'].astype(str)
griddf.dtypes

LON                   float64
LAT                   float64
Neighborhood index     object
dtype: object

In [15]:
griddf

Unnamed: 0,LON,LAT,Neighborhood index
0,25.962620,44.333380,0
1,25.973231,44.333380,1
2,25.983842,44.333380,2
3,25.994454,44.333380,3
4,26.005065,44.333380,4
...,...,...,...
620,26.174844,44.533175,620
621,26.185455,44.533175,621
622,26.196066,44.533175,622
623,26.206678,44.533175,623


Create a geodataframe from the above dataframe and create a new column which verifies if the point is within Bucharest or not

In [16]:
from shapely.geometry import Point, Polygon, MultiPolygon
#create geodataframe
dat_gpd = gpd.GeoDataFrame(griddf, geometry=gpd.points_from_xy(griddf.LON, griddf.LAT))

#define polygon which will be used to verify if a point is within Bucharest or not
polygon = B_geo.geometry[0]

#Create boolean variable
griddf['Within Bucharest'] =  dat_gpd.within(polygon)


In [17]:
a = griddf.groupby('Within Bucharest').count()
a

Unnamed: 0_level_0,LON,LAT,Neighborhood index,geometry
Within Bucharest,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,317,317,317,317
True,308,308,308,308


Drop rows which are not inside the city and create a column with lists of coordinates in order to help with putting markers on a map in Folium

In [18]:
#identify the index of points outside the city and drop them from the dataframe
not_bucharest = griddf[ griddf['Within Bucharest'] == False ].index
bucharest_grid = griddf.drop(not_bucharest)

#add column to create a list of the coordinates in order to place them on a map
bucharest_grid["location"] = bucharest_grid[["LAT", "LON"]].values.tolist()


bucharest_grid

Unnamed: 0,LON,LAT,Neighborhood index,geometry,Within Bucharest,location
42,26.143010,44.341705,42,POINT (26.14301 44.34170),True,"[44.341704799999995, 26.1430104]"
43,26.153622,44.341705,43,POINT (26.15362 44.34170),True,"[44.341704799999995, 26.1536216]"
66,26.132399,44.350030,66,POINT (26.13240 44.35003),True,"[44.3500296, 26.132399200000002]"
67,26.143010,44.350030,67,POINT (26.14301 44.35003),True,"[44.3500296, 26.1430104]"
68,26.153622,44.350030,68,POINT (26.15362 44.35003),True,"[44.3500296, 26.1536216]"
...,...,...,...,...,...,...
588,26.100566,44.524850,588,POINT (26.10057 44.52485),True,"[44.5248504, 26.100565600000003]"
610,26.068732,44.533175,610,POINT (26.06873 44.53318),True,"[44.5331752, 26.068732]"
611,26.079343,44.533175,611,POINT (26.07934 44.53318),True,"[44.5331752, 26.0793432]"
612,26.089954,44.533175,612,POINT (26.08995 44.53318),True,"[44.5331752, 26.0899544]"


#### Create a map of Bucharest, with city districts and Circle neighborhoods which will be used for clustering

In [19]:
m = folium.Map([lat, long], zoom_start=12)

#folium.GeoJson(geo_json_data_temp).add_to(m)
for i in range(0,len(bucharest_grid)):
    folium.Circle(bucharest_grid.iloc[i]['location'],radius=500, popup=griddf.iloc[i]['Neighborhood index']).add_to(m)
for j in geo_json:
    folium.GeoJson(j).add_to(m)
m

## Get data from Foursquare API for each square

In [36]:
CLIENT_ID = 'NZPX4DYNBR5CAEDUGALEG5B4QVUNIDBSBPZZ1PD10UL1NPTJ' 
CLIENT_SECRET = '5HA1SVNJ0XHKNUD0RP5YK0GDDX5YABIKIIRDZIKJOAKHVLPU' 
VERSION = '20180604'

#### Define a function to get venues inside the neighborhood

In [44]:
#Limit the number of results
LIMIT=50


def getVenues(latitudes, longitudes, neighborhood, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(neighborhood, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [ 
                  'Neighborhood index',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude',
                  'Venue', 
                  'Venue Latitude',
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Use the above function to get venues inside each circle

In [189]:
bucharest_venues = getVenues(latitudes=bucharest_grid['LAT'],
                             longitudes=bucharest_grid['LON'],
                             neighborhood=bucharest_grid['Neighborhood index'],
                             radius='500'
                                  )

42
43
66
67
68
90
91
92
113
114
115
116
117
135
136
137
138
139
140
141
142
157
158
159
160
161
162
163
164
165
166
167
181
182
183
184
185
186
187
188
189
190
191
192
193
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
227
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
380
381
382
383
384
385
386
387
388
389
390
391
392
393
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
453
454
455
456
457
458
459
460


In [190]:
print(bucharest_venues.shape)
bucharest_venues.head()

(3124, 7)


Unnamed: 0,Neighborhood index,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,43,44.341705,26.153622,Baza Steaua Bucuresti(Berceni),44.341298,26.153184,Soccer Field
1,90,44.358354,26.121788,Jumbo,44.361887,26.124094,Toy / Game Store
2,90,44.358354,26.121788,Auchan,44.36126,26.122536,Department Store
3,90,44.358354,26.121788,Hasco Fashion,44.36004,26.123019,Shopping Mall
4,90,44.358354,26.121788,Orange store,44.361326,26.122492,Electronics Store


In [191]:
#export to excel
bucharest_venues.to_excel("bucharest_venues_within.xlsx")  

Drop duplicates, if any

In [192]:
bucharest_venues_clean = bucharest_venues.drop_duplicates(subset=None, keep='first', inplace=False)

In [193]:
bucharest_venues_clean.shape

(3124, 7)

#### Explore the results. There are 302 venue types in our data set

In [194]:
bucharest_venues_clean.groupby("Venue Category").count()

Unnamed: 0_level_0,Neighborhood index,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ATM,2,2,2,2,2,2
Accessories Store,2,2,2,2,2,2
Airport Terminal,1,1,1,1,1,1
American Restaurant,8,8,8,8,8,8
Amphitheater,2,2,2,2,2,2
...,...,...,...,...,...,...
Waterfront,1,1,1,1,1,1
Wine Bar,10,10,10,10,10,10
Wine Shop,3,3,3,3,3,3
Women's Store,11,11,11,11,11,11


Prepare dataset for analysis - create a dataframe in which each row represents a district. 

One hot encoding venue categories

In [195]:
# one hot encoding
bucharest_onehot = pd.get_dummies(bucharest_venues_clean[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bucharest_onehot['Neighborhood index'] = bucharest_venues_clean['Neighborhood index'] 

# move neighborhood column to the first column
fixed_columns = [bucharest_onehot.columns[-1]] + list(bucharest_onehot.columns[:-1])
bucharest_onehot = bucharest_onehot[fixed_columns]

bucharest_onehot.head()

Unnamed: 0,Neighborhood index,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Veterinarian,Vietnamese Restaurant,Warehouse Store,Water Park,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo
0,43,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,90,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group new dataset by Neighborhood index and count each type of venue within each neighborhood

In [201]:

bucharest_grouped = bucharest_onehot.groupby('Neighborhood index').mean().reset_index()
# add location of neighborhood back
bucharest = bucharest_grouped.merge(bucharest_grid, how='left', on='Neighborhood index')
bucharest = bucharest.drop(['geometry','location','Within Bucharest'],1)
bucharest


Unnamed: 0,Neighborhood index,ATM,Accessories Store,Airport Terminal,American Restaurant,Amphitheater,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Warehouse Store,Water Park,Waterfront,Wine Bar,Wine Shop,Women's Store,Zoo,LON,LAT
0,113,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.100566,44.366679
1,114,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.111177,44.366679
2,115,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.121788,44.366679
3,116,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.132399,44.366679
4,117,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.143010,44.366679
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
267,612,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.089954,44.533175
268,613,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.100566,44.533175
269,90,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.121788,44.358354
270,91,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,26.132399,44.358354


### Further processing for DBSCAN - PCA Analysis

In [202]:
from sklearn.decomposition import PCA

#select features which will be used for PCA
features = bucharest.drop(['Neighborhood index','LAT','LON'],1)

# Standardize them
x = StandardScaler().fit_transform(features)

#Run PCA with 10 components
pca = PCA(n_components=10)

principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2', 'principal component 3', 'principal component 4', 'principal component 5',
                         'principal component 6', 'principal component 7', 'principal component 8', 'principal component 9', 'principal component 10'
                         ])

Add to the new dataset Latitudes and Longitudes of neighborhoods

In [203]:
principalDf['LAT']=bucharest['LAT']
principalDf['LON']=bucharest['LON']
Bucharest_DBSCAN = principalDf
Bucharest_DBSCAN

Unnamed: 0,principal component 1,principal component 2,principal component 3,principal component 4,principal component 5,principal component 6,principal component 7,principal component 8,principal component 9,principal component 10,LAT,LON
0,-1.188686,0.222272,0.185851,-0.477468,0.412854,0.132388,0.396173,0.589247,-0.067252,0.210014,44.366679,26.100566
1,-1.212115,0.334354,0.317480,-0.415647,0.424840,0.363880,0.297824,0.678941,-0.143923,0.116646,44.366679,26.111177
2,-1.254399,0.131917,0.490719,-0.490620,0.549470,-0.021312,0.510665,0.616766,-0.012034,-0.347121,44.366679,26.121788
3,-1.183395,0.255866,0.326575,-0.437731,0.412390,0.226118,0.367707,0.627006,-0.063849,0.094994,44.366679,26.132399
4,-0.290958,0.028854,0.392725,0.139747,-0.540154,-0.335909,1.080192,0.728426,-0.612113,-0.459225,44.366679,26.143010
...,...,...,...,...,...,...,...,...,...,...,...,...
267,-1.174844,0.238706,0.266944,-0.387348,0.312172,0.204702,0.593901,0.661348,-0.163267,0.176935,44.533175,26.089954
268,-1.426632,0.208617,0.485592,-0.605706,0.623290,0.118338,0.687633,0.776213,-0.630681,0.329806,44.533175,26.100566
269,-1.078632,0.433739,1.240629,-0.869439,0.305050,0.116106,0.918119,1.015051,-0.221664,-0.119576,44.358354,26.121788
270,-1.481491,0.310229,0.788381,-0.605131,0.564863,0.108351,0.577781,0.811739,-0.276436,0.040928,44.358354,26.132399


# Cluster neighborhoods

### KMEANS

In [206]:
# set number of clusters
kclusters = 10

bucharest_grouped_clustering = bucharest_grouped.drop('Neighborhood index', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bucharest_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 6, 1, 1, 2, 1, 6, 1, 1])

In [207]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bucharest_merged = bucharest_grid

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bucharest_merged = bucharest_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood index'), on='Neighborhood index')


In [208]:
# Eliminate neighborhoods which are not clustered
bucharest = bucharest_merged.dropna(axis=0)

bucharest[["Cluster Labels"]] = bucharest[["Cluster Labels"]].astype('int32')
bucharest.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


LON                       float64
LAT                       float64
Neighborhood index         object
geometry                 geometry
Within Bucharest             bool
location                   object
Cluster Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
dtype: object

In [209]:
bucharest.shape

(272, 12)

In [210]:
# create map
map_clusters = folium.Map(location=[lat, long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bucharest['LAT'], bucharest['LON'], bucharest['Neighborhood index'], bucharest['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=450,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# DBSCAN

In [232]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler

#PREPROCESSING BEFORE CLUSTERING - using the modified dataset for DBSCAN
sklearn.utils.check_random_state(1000)
bucharest_clustering = Bucharest_DBSCAN
Clus_dataSet = bucharest_clustering
Clus_dataSet = np.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

# Compute DBSCAN
db = DBSCAN(eps=0.5, min_samples=2, algorithm='ball_tree', metric='minkowski', leaf_size=90, p=2).fit(Clus_dataSet)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
Bucharest_DBSCAN["Clus_Db"]=labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 


# add neighborhood index back
bucharest_DBSCAN_final = Bucharest_DBSCAN.merge(bucharest_grid, how='left', on=['LAT','LON'])
clusterNum

19

In [233]:
# create map
map_clusters = folium.Map(location=[lat, long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(clusterNum)
ys = [i + x + (i*x)**2 for i in range(clusterNum)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for j in geo_json:
    folium.GeoJson(j).add_to(map_clusters)
markers_colors = []
for lat, lon, poi, cluster in zip(bucharest_DBSCAN_final['LAT'], bucharest_DBSCAN_final['LON'], bucharest_DBSCAN_final['Neighborhood index'], bucharest_DBSCAN_final['Clus_Db']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=450,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

### Discussion
As one could expect, the newer areas where Bucharest has expanded over the last years, have a distinct composition of local venues, which in turn will be reflected in the needs of their residents. This aspect of our results clearly shows a need for these areas to be administered more independently, with solutions targeting their specific needs, and not be included just in umbrella projects aimed for the entire sector. 
As noted above, the algorithm wasn’t able to cluster points in the center of the city, which suggests that venues in central Bucharest tend to be fairly homogenously distributed. This may tempt us to conclude that central Bucharest does not need to be re-analyzed, however I argue that this needs to be done adding additional information about other aspects that may affect the quality of life of citizens.
### Conclusion
Local administrations are struggling every year to tackle problems in these big areas, and to balance their budget in a way that is fair for every part of the sector. This problem could be greatly improved by creating smaller districts, centered around neighborhoods which present similar problems. This small study focuses on venues which are available on Foursquare for Bucharest, and creates an argument for rethinking the way Bucharest is administered today. 
For future projects, there are lots of additional data which can be added in order to create smarter city districts, and more efficient local administrations. In order to really take into account the needs of residents of a neighborhood, structured and unstructured data regarding water quality, heating conditions, sewage system, public transport should be taken into consideration when redesigning a city’s administrative units.
