# Applied Data Science Capstone Project (part 2)

## 1. Introduction 
### 1.1. Description of the problem and discussion of the background
The purpose of this project is to determine the best district in San Francisco for a person that may want to live in 
such city. The evaluation metrics for the problem considered are:
1. Safety, which is related to the total number of crimes corresponding to each district;
2. Attractiveness, which is assessed by listing the ten most common venues in each neighborhood.

The above features have been chosen because they have a relevant impact in determining the choice of a new location for
a given individual.

Lastly, each neighborhood will be clustered using a procedure based on **HDBSCAN**, such that neighborhoods with
similar characteristics (from a safety and attractiveness point of view) are assigned the same label.

To briefly summarize:
1. **Business problem**: determining the best district to live in San Francisco;
2. **Target audience**: individuals willing to relocate to San Francisco; 
3. **Methods**: unsupervised clustering of data using the HDBSCAN algorithm.

## 2. Data
### 2.1. Description of the data that will be used to solve the problem and source of such data
The crime dataset used in this work corresponds to all the crimes that occurred in San Francisco in 2016. It can be freely
downloaded from Kaggle website at the following link: https://www.kaggle.com/roshansharma/sanfranciso-crime-dataset.

The crime dataset, displayed in detail in Section **YYY**, is structured in a table. The column labels have the following meaning:
1. **IncidntNum**: incident number;
2. **Category**: category of crime;
3. **Descript**: description of the crime;
4. **DayOfWeek**: day of the week in which the crime occurred;
5. **Date**: date in which the crime occurred;
6. **Time**: time in which the crime occurred;
7. **PdDistrict**: Police department district;
8. **Resolution**: kind of punishment given to the criminal to resolve the case;
9. **Address**: address where the crime scene happened;
10. **X**: longitude of the crime location;
11. **Y**: latitude of the crime location;
12. **Location**: Exact location (latitude, longitude);
13. **PdId**: Pd ID.

Lastly, the venues related to each district in San Francisco are retrieved from Foursquare API, using the same procedure
displayed in the assignments of the previous weeks.

## 3. Methodology
### 3.1. Preprocessing crime data
#### 3.1.1. Importing libraries and loading the data into a dataframe

In [1]:
import pandas as pd
import os
import numpy as np
import requests
from pytictoc import TicToc
from geopy.geocoders import Nominatim
import re
from sklearn.preprocessing import StandardScaler
import hdbscan
import folium
import json
import geopandas as gpd
from branca.colormap import linear

In [2]:
absolute_path = os.path.abspath(os.path.dirname('Data/'))
df = pd.read_csv(absolute_path + "\Crime_SF.csv")
print(f"Shape of the raw dataframe: {df.shape}")
df.head()

Shape of the raw dataframe: (150500, 13)


Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


#### 3.1.2. Selection of the columns of interest

In [5]:
df2 = pd.DataFrame(df[['PdDistrict', 'Category', 'X', 'Y']])
df2.sort_values(by=['PdDistrict', 'Category'], inplace=True)
df2.reset_index(drop=True, inplace=True)
df2.tail()

Unnamed: 0,PdDistrict,Category,X,Y
150495,TENDERLOIN,WEAPON LAWS,-122.411966,37.784914
150496,TENDERLOIN,WEAPON LAWS,-122.412054,37.781614
150497,TENDERLOIN,WEAPON LAWS,-122.416711,37.783357
150498,TENDERLOIN,WEAPON LAWS,-122.416711,37.783357
150499,,LARCENY/THEFT,-122.413352,37.708202


#### 3.1.3. Removing the last row, which contains a NaN value

In [8]:
df2.drop(df2.tail(1).index, inplace=True)
df2.tail()

Unnamed: 0,PdDistrict,Category,X,Y
150493,TENDERLOIN,WEAPON LAWS,-122.409661,37.786439
150494,TENDERLOIN,WEAPON LAWS,-122.409661,37.786439
150495,TENDERLOIN,WEAPON LAWS,-122.411966,37.784914
150496,TENDERLOIN,WEAPON LAWS,-122.412054,37.781614
150497,TENDERLOIN,WEAPON LAWS,-122.416711,37.783357


#### 3.1.4. Crime dataset, general information

In [9]:
print(f"Current shape of the dataframe: {df2.shape}")
print("-------------------------------------------")
print(df2.info())
# Total number of crimes in each Police district
print("-------------------------------------------")
print("District      Number of crimes\n")
print(df2['PdDistrict'].value_counts())
print("-------------------------------------------")
# Total number of crimes per category
print("Category\t\t       Number of occurrencies\n")
print(df2['Category'].value_counts())
print("-------------------------------------------")
print(f"Number of Police districts = {len(df2['PdDistrict'].unique())}.")
print(f"Number of different crimes = {len(df2['Category'].unique())}.")

Current shape of the dataframe: (150498, 4)
-------------------------------------------
<class 'pandas.core.frame.DataFrame'>
Int64Index: 150498 entries, 0 to 150497
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   PdDistrict  150498 non-null  object 
 1   Category    150498 non-null  object 
 2   X           150498 non-null  float64
 3   Y           150498 non-null  float64
dtypes: float64(2), object(2)
memory usage: 5.7+ MB
None
-------------------------------------------
District      Number of crimes

SOUTHERN      28445
NORTHERN      20100
MISSION       19503
CENTRAL       17666
BAYVIEW       14303
INGLESIDE     11594
TARAVAL       11325
TENDERLOIN     9941
RICHMOND       8922
PARK           8699
Name: PdDistrict, dtype: int64
-------------------------------------------
Category		       Number of occurrencies

LARCENY/THEFT                  40408
OTHER OFFENSES                 19599
NON-CRIMINAL               

#### 3.1.5. Pivoting the table to observe the number of crimes for each category in each district 

In [10]:
temp = pd.DataFrame(df2)
temp.insert(2, "", np.ones(df2.shape[0]))

In [11]:
df2_pivot = pd.pivot_table(temp, values=[""], index=['Category'], columns=['PdDistrict'], aggfunc=np.sum, fill_value=0)
df2_pivot['Total'] = df2_pivot.sum(axis=1)
df2_pivot.columns = df2_pivot.columns.map(''.join)
df2_pivot.head()

Unnamed: 0_level_0,BAYVIEW,CENTRAL,INGLESIDE,MISSION,NORTHERN,PARK,RICHMOND,SOUTHERN,TARAVAL,TENDERLOIN,Total
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
ARSON,71,29,22,46,27,13,14,33,18,13,286
ASSAULT,1775,1187,1506,2110,1536,524,473,2352,918,1196,13577
BAD CHECKS,4,3,2,2,4,2,5,6,6,0,34
BRIBERY,20,3,8,10,4,1,2,8,4,6,66
BURGLARY,521,645,534,793,803,413,395,842,695,161,5802


### 3.2. Location data retrieval from Foursquare API
#### 3.2.1. Definition of Foursquare credentials

In [12]:
ID = '1B2QEZLDCQCQUAXR325SRGV0T1YT4FPSQLIJLYBBGTBEHIKE'
secret = 'CUVST2LHRHS1LHKBXH4SPMH1C3H3P5FBVIFLNFSSH4GZFDF2' 
version = '20200401'
limit = 500
print(f"Credentials\nCLIENT ID:     {ID}\nCLIENT SECRET: {secret}")

Credentials
CLIENT ID:     1B2QEZLDCQCQUAXR325SRGV0T1YT4FPSQLIJLYBBGTBEHIKE
CLIENT SECRET: CUVST2LHRHS1LHKBXH4SPMH1C3H3P5FBVIFLNFSSH4GZFDF2


#### 3.2.2. San Francisco Police dept. addresses (available on: https://sfgov.org/policecommission/police-district-maps), used as "districts' center points"

In [13]:
address_dict = {
    'Bayview': '201 Williams Avenue',
    'Central': '766 Vallejo Street',
    'Ingleside': 'Havelock St',
    'Mission': '630 Valencia Street',
    'Northern': '1125 Fillmore Street',
    'Park': '1899 Waller Street',
    'Richmond': '461 6th Avenue',
    'Southern': '1251 3rd Street',
    'Taraval': '2345 24th Avenue',
    'Tenderloin': '301 Eddy Street',
}
# Coordinates of the districts
geo_locator = Nominatim(user_agent="my_username")
SFPD_lat = []
SFPD_lon = []
for _ in address_dict.values():
    loc = geo_locator.geocode(_ + ", San Francisco, CA, 94122")
    SFPD_lat.append(loc.latitude)
    SFPD_lon.append(loc.longitude)

#### 3.2.3. Extraction of San Francisco venues for each district in the database

In [14]:
# This function retrieves the nearby venues for a given district in San Francisco
def getNearbyVenues(names, LAT, LON, radius=3500):
    venues_list=[]
    for i, j, k in zip(names, LAT, LON):
        #  API request URL
        url = f"https://api.foursquare.com/v2/venues/explore?&client_id={ID}&client_secret={secret}&v={version}" \
              f"&ll={j},{k}&radius={radius}&limit={limit}"
        # GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # Return only relevant information for each nearby venue
        venues_list.append([(i, j, k, v['venue']['name'], v['venue']['location']['lat'], 
            v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue',
                             'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return nearby_venues

In [15]:
t = TicToc()
t.tic()
venues = getNearbyVenues(names=df2['PdDistrict'].unique(), LAT=SFPD_lat, LON=SFPD_lon)
t.toc("All venues successfully extracted. Elapsed time:")

All venues successfully extracted. Elapsed time: 7.567211 seconds.


#### 3.2.3. General information about the extracted location database + overview of the different venue types

In [16]:
print(f"Shape of the venues dataframe: {venues.shape}")
print(f"There are {len(venues['Venue Category'].unique())} different types of venues.")

Shape of the venues dataframe: (1000, 7)
There are 181 different types of venues.


#### 3.2.4. One-hot encoding of venues

In [17]:
venues_df = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")
venues_df['PdDistrict'] = venues['Neighborhood']
pos = [venues_df.columns[-1]] + list(venues_df.columns[:-1])
venues_df = venues_df[pos]

In [18]:
# Rows are then grouped by neighborhood and the values corresponding to each category are obtained by computing the
# total number of occurrence of each type of venue
venues_df = venues_df.groupby('PdDistrict').sum().reset_index()
print(f"Original dataframe shape: {venues.shape}.")
print(f"Transformed dataframe shape: {venues_df.shape}.")
venues_df.head(10)

Original dataframe shape: (1000, 7).
Transformed dataframe shape: (10, 182).


Unnamed: 0,PdDistrict,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Trail,Turkish Restaurant,Udon Restaurant,Vietnamese Restaurant,Waterfall,Wine Bar,Wine Shop,Yoga Studio,Zoo,Zoo Exhibit
0,BAYVIEW,0,0,1,1,0,0,0,1,0,...,0,1,0,1,0,0,0,2,0,0
1,CENTRAL,0,0,0,0,0,0,0,0,3,...,2,0,0,1,0,6,1,2,0,1
2,INGLESIDE,0,0,0,0,0,0,0,0,0,...,1,1,0,1,0,1,1,2,0,0
3,MISSION,0,1,0,0,0,0,1,0,0,...,0,0,0,0,0,4,0,8,0,0
4,NORTHERN,0,1,0,1,0,0,0,0,1,...,1,0,0,0,0,3,0,5,0,0
5,PARK,1,0,0,0,0,1,0,1,1,...,1,0,0,0,1,0,0,2,0,0
6,RICHMOND,1,0,0,0,1,1,0,0,1,...,1,0,0,1,1,1,0,2,0,0
7,SOUTHERN,0,0,0,0,0,0,0,1,2,...,0,0,0,2,0,0,3,2,0,0
8,TARAVAL,0,0,0,0,0,0,0,0,0,...,2,0,1,2,0,1,1,1,1,0
9,TENDERLOIN,0,0,0,1,0,0,0,0,3,...,0,0,0,1,0,6,2,2,0,0


### 3.3. Construction of the final dataset
#### 3.3.1. Extraction of relevant features from a set of keywords that group together similar venues

In [19]:
t.tic()
df_final = pd.DataFrame(df2['PdDistrict'].unique(), columns=['PdDistrict'])

df_final['Latitude'] = SFPD_lat

df_final['Longitude'] = SFPD_lon

df_final['Number of Crimes'] = df2.groupby(['PdDistrict']).sum()[''].values

df_final['Restaurants'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Restaurant|Burger|Burrito|Food|Pizza|Sandwich", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values

df_final['Stores/Shops'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Store|Shop|Bookstore|Boutique", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values

df_final['Groceries'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Grocery|Bakery|Breakfast|Butcher|Market", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values

df_final['Sports Facilities'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Stadium|Gym|Dance|Cycle|Golf|Massage|Playground|Pool|Spa|Trail|Yoga", flags=re.IGNORECASE, regex=True) ==
                                                  True].sum(axis=1).values

df_final['Entertainment/Culture'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Art|Gallery|Museum|Hall|Exhibit|Theater|Opera|Venue", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values

df_final['Landscape'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Beach|Garden|Farm|Field|Hill|Lake|Park|Lookout", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values

df_final['Other'] = venues_df.iloc[:, venues_df.columns.str.contains(
    "Dog|Marijuana|Hotel", flags=re.IGNORECASE, regex=True) == True].sum(axis=1).values
t.toc()

Elapsed time is 0.042144 seconds.


#### 3.3.2. Displaying the final dataset 

In [20]:
df_final

Unnamed: 0,PdDistrict,Latitude,Longitude,Number of Crimes,Restaurants,Stores/Shops,Groceries,Sports Facilities,Entertainment/Culture,Landscape,Other
0,BAYVIEW,37.729978,-122.398246,14303.0,30,20,8,12,4,10,2
1,CENTRAL,37.798769,-122.409932,17666.0,24,25,5,8,8,10,4
2,INGLESIDE,37.726698,-122.446569,11594.0,36,24,9,8,0,15,2
3,MISSION,37.762997,-122.421984,19503.0,21,32,8,19,7,3,1
4,NORTHERN,37.780146,-122.432471,20100.0,20,24,11,15,9,5,2
5,PARK,37.767771,-122.455166,8699.0,19,26,9,8,8,21,2
6,RICHMOND,37.76046,-122.46286,8922.0,18,23,7,9,5,26,1
7,SOUTHERN,37.772236,-122.389044,28445.0,16,29,8,20,9,4,4
8,TARAVAL,37.743731,-122.481459,11325.0,31,27,11,10,1,14,1
9,TENDERLOIN,37.783675,-122.412919,9941.0,21,28,5,13,12,5,3


### 3.4. Unsupervised clustering using HDBSCAN algorithm
#### 3.4.1. Building the feature matrix "X"

In [21]:
# Feature matrix
X = df_final.drop(['PdDistrict', 'Latitude', 'Longitude'], 1)
# Z-score normalization of features
scale = StandardScaler()
X = scale.fit_transform(X)

#### 3.4.2. Building the HDBSCAN clustering object "cl_obj"

In [22]:
cl_obj = hdbscan.HDBSCAN()
# Setting some parameters
cl_obj.min_cluster_size = 2
cl_obj.min_samples = 1

#### 3.4.3. Clustering data

In [23]:
cl_obj.fit(X)

HDBSCAN(algorithm='best', allow_single_cluster=False, alpha=1.0,
        approx_min_span_tree=True, cluster_selection_epsilon=0.0,
        cluster_selection_method='eom', core_dist_n_jobs=4,
        gen_min_span_tree=False, leaf_size=40,
        match_reference_implementation=False, memory=Memory(location=None),
        metric='euclidean', min_cluster_size=2, min_samples=1, p=None,
        prediction_data=False)

## 4. Results
#### 4.0.1. Clustering outputs

In [24]:
print(f"Labels assigned to the ten districts: {cl_obj.labels_ + 1}")
print(f"Number of clusters found by the algorithm: {cl_obj.labels_.max() + 1}")
print(f"Cluster membership scores (probabilities): {np.around(cl_obj.probabilities_, 3)}")
df_final['Labels'] = cl_obj.labels_ + 1
df_final

Labels assigned to the ten districts: [2 1 2 2 2 3 3 0 2 1]
Number of clusters found by the algorithm: 3
Cluster membership scores (probabilities): [0.821 1.    1.    0.591 0.602 1.    1.    0.    1.    1.   ]


Unnamed: 0,PdDistrict,Latitude,Longitude,Number of Crimes,Restaurants,Stores/Shops,Groceries,Sports Facilities,Entertainment/Culture,Landscape,Other,Labels
0,BAYVIEW,37.729978,-122.398246,14303.0,30,20,8,12,4,10,2,2
1,CENTRAL,37.798769,-122.409932,17666.0,24,25,5,8,8,10,4,1
2,INGLESIDE,37.726698,-122.446569,11594.0,36,24,9,8,0,15,2,2
3,MISSION,37.762997,-122.421984,19503.0,21,32,8,19,7,3,1,2
4,NORTHERN,37.780146,-122.432471,20100.0,20,24,11,15,9,5,2,2
5,PARK,37.767771,-122.455166,8699.0,19,26,9,8,8,21,2,3
6,RICHMOND,37.76046,-122.46286,8922.0,18,23,7,9,5,26,1,3
7,SOUTHERN,37.772236,-122.389044,28445.0,16,29,8,20,9,4,4,0
8,TARAVAL,37.743731,-122.481459,11325.0,31,27,11,10,1,14,1,2
9,TENDERLOIN,37.783675,-122.412919,9941.0,21,28,5,13,12,5,3,1


### 4.1. Displaying the labelled districts on the map of San Francisco
#### 4.1.1. Retrieving the raw map of San Francisco

In [25]:
pos = 'San Francisco, United States'
geo = Nominatim(user_agent="my_username")
loc = geo.geocode(pos)
lat = loc.latitude
lon = loc.longitude
print(f"The geographical coordinates of San Francisco are:\nLatitude:    {np.around(lat, 2)}° N;\nLongitude: "
      f"{np.around(lon, 2)}° E.")

The geographical coordinates of San Francisco are:
Latitude:    37.78° N;
Longitude: -122.42° E.


#### 4.1.2. Retrieving the .json file of San Francisco districts

In [26]:
with open(absolute_path + '\SFPD_districts.geojson') as f:
    data = json.load(f)
for _ in range(len(data['features'])):
    data['features'][_]['district'] = data['features'][_]['properties']['district']

##### Displaying the content of the .json file as a dataframe

In [27]:
gdf = gpd.GeoDataFrame.from_features(data).sort_values('district')
gdf['Centroid_Lat'] = gdf['geometry'].centroid.y
gdf['Centroid_Lon'] = gdf['geometry'].centroid.x
gdf['Labels'] = df_final['Labels']
gdf.reset_index(drop=True, inplace=True)
gdf

Unnamed: 0,geometry,shape_area,shape_leng,company,shape_le_1,district,Centroid_Lat,Centroid_Lon,Labels
0,"MULTIPOLYGON (((-122.38098 37.76480, -122.3810...",201384622.317,163013.798332,C,144143.480351,BAYVIEW,37.734328,-122.389641,1
1,"MULTIPOLYGON (((-122.42612 37.80684, -122.4261...",55950268.8396,64025.1290733,A,67686.5228649,CENTRAL,37.797912,-122.409162,3
2,"MULTIPOLYGON (((-122.40450 37.74858, -122.4040...",193580502.155,74737.9362951,H,74474.1811635,INGLESIDE,37.727883,-122.431586,2
3,"MULTIPOLYGON (((-122.40954 37.76932, -122.4086...",80623839.7922,40152.783389,D,40518.8342346,MISSION,37.75757,-122.422646,2
4,"MULTIPOLYGON (((-122.43379 37.80793, -122.4337...",82781685.5603,56493.858208,E,50608.3103205,NORTHERN,37.789985,-122.431758,2
5,"MULTIPOLYGON (((-122.43956 37.78314, -122.4383...",84878956.0842,46307.7769684,F,50328.9132939,PARK,37.764349,-122.449108,3
6,"MULTIPOLYGON (((-122.44127 37.79149, -122.4406...",137964024.157,69991.465355,G,75188.6283612,RICHMOND,37.777583,-122.47932,0
7,"MULTIPOLYGON (((-122.39186 37.79425, -122.3917...",91344142.1925,87550.2751419,B,100231.353916,SOUTHERN,37.788336,-122.390942,2
8,"MULTIPOLYGON (((-122.49842 37.70810, -122.4984...",284676677.833,75350.2175209,I,73470.4240002,TARAVAL,37.736633,-122.48183,1
9,"MULTIPOLYGON (((-122.40217 37.78626, -122.4171...",11072154.5623,12424.2689691,J,18796.7841847,TENDERLOIN,37.782367,-122.412762,2


#### 4.1.3. Labelled San Francisco map generation

In [28]:
# Raw map generation
map_SF = folium.Map(location=[loc.latitude, loc.longitude], zoom_start=12)
# Colormap generation
colormap = linear.RdYlGn_04.scale(df_final.Labels.min(), df_final.Labels.max())
colormap.caption = 'Label value'
colormap.add_to(map_SF)
color_dict = {key: colormap(df_final['Labels'][idx]) for key, idx in zip(df_final['PdDistrict'], 
                                                                         range(len(df_final['PdDistrict'])))}
# Creation of the colorfill for the map, according to the values of the labels
folium.GeoJson(data, style_function=lambda feature: {
        'fillColor': color_dict[feature['district']],
        'color': 'black',
        'weight': 1,
        'dashArray': '5, 5',
        'fillOpacity': 0.85,
    }
).add_to(map_SF)

# Addition of markers containing the name of each district
fg = folium.FeatureGroup(name='District Info')
for name, lat, lon, label in zip(gdf['district'], gdf['Centroid_Lat'], gdf['Centroid_Lon'], df_final['Labels']):
    html = f"""{name}<br>
    District tier = {label}"""
    fg.add_child(folium.Marker(location=[lat, lon], popup=html))
map_SF.add_child(fg)

# Displaying the map
map_SF