#### Generate new AED locations using clustering technique

In this notebook, we calculated the coverage of medical facilities for the AED related interventions (`P003-Cardiac arrest` and `P039-Cardiac problem(other than thoracic pain)`). According to the golden 4 minutes for CPR, we assume that a patient is covered by medical facilities if he can get an AED device or goes to hospital within 4 minutes. And we also assume that the walking speed is 100m/min and driving speed is 500m/min. Thus, if an AED is located within 200m (go and get the AED back by walking) of a patient or a hospital is within 2000m (drive to the hospital) of the patient, the patient is covered(`1`) by medical facilities, otherwise not covered(`0`). The initial coverage rate of the AED related intervention is 42.7%. We used Kmeans to cluster the interventions as n clusters, where n is the number of AEDs we have, and used the centers of clusters as the new AED locations. After this process, the new coverage rate increases to 99.6%.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

In [2]:
itv_aed = pd.read_csv(
    '/Users/lye/Downloads/MDA/Github-MDA2024/1_Data/CLEANED/intervention_aed_related_distance.csv',
    low_memory=False)

itv_aed.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43093 entries, 0 to 43092
Data columns (total 56 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   mission_id                        43093 non-null  int64  
 1   service_name                      38978 non-null  object 
 2   postalcode_permanence             27284 non-null  float64
 3   cityname_permanence               28483 non-null  object 
 4   streetname_permanence             28631 non-null  object 
 5   housenumber_permanence            2156 non-null   float64
 6   latitude_permanence               39916 non-null  float64
 7   longitude_permanence              40255 non-null  float64
 8   permanence_short_name             43053 non-null  object 
 9   permanence_long_name              38982 non-null  object 
 10  vector_type                       42545 non-null  object 
 11  eventtype_firstcall               27434 non-null  object 
 12  even

In [3]:
aed = pd.read_csv('/Users/lye/Downloads/MDA/Github-MDA2024/1_Data/CLEANED/aed_location_latlon.csv')

aed.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15775 entries, 0 to 15774
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            15775 non-null  float64
 1   type          5661 non-null   object 
 2   address       15775 non-null  object 
 3   number        13577 non-null  float64
 4   postal_code   15775 non-null  int64  
 5   municipality  15775 non-null  object 
 6   province      15775 non-null  object 
 7   location      8962 non-null   object 
 8   public        8656 non-null   object 
 9   available     4739 non-null   object 
 10  hours         1148 non-null   object 
 11  full_address  15775 non-null  object 
 12  lat           15775 non-null  float64
 13  lon           15775 non-null  float64
dtypes: float64(4), int64(1), object(9)
memory usage: 1.7+ MB


In [4]:
## drop duplicates mission_id and location of aed_related_interventions 

itv_aed.drop_duplicates(subset=['mission_id'], keep='first', inplace=True)
itv_aed.drop_duplicates(subset=['lat_itv', 'lon_itv'], keep='first', inplace=True)
itv_aed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 25381 entries, 0 to 43091
Data columns (total 56 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   mission_id                        25381 non-null  int64  
 1   service_name                      22837 non-null  object 
 2   postalcode_permanence             16087 non-null  float64
 3   cityname_permanence               16830 non-null  object 
 4   streetname_permanence             16878 non-null  object 
 5   housenumber_permanence            1366 non-null   float64
 6   latitude_permanence               23546 non-null  float64
 7   longitude_permanence              23741 non-null  float64
 8   permanence_short_name             25350 non-null  object 
 9   permanence_long_name              22839 non-null  object 
 10  vector_type                       25049 non-null  object 
 11  eventtype_firstcall               16135 non-null  object 
 12  eventleve

In [5]:
speed_running = 100 # m/min  average running speed = 6 km/h
speed_driving = 500 # m/min  average driving speed = 30 km/h (urban area in Belgium)
golden_minutes = 4 # best interval for cpr using AED

itv_covered = itv_aed.loc[((itv_aed['aed_distance']<=speed_running * golden_minutes / 2) |
                           (itv_aed['hospital_distance']<=speed_driving * golden_minutes))]

itv_covered.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10837 entries, 0 to 43064
Data columns (total 56 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   mission_id                        10837 non-null  int64  
 1   service_name                      9959 non-null   object 
 2   postalcode_permanence             7344 non-null   float64
 3   cityname_permanence               7935 non-null   object 
 4   streetname_permanence             7951 non-null   object 
 5   housenumber_permanence            726 non-null    float64
 6   latitude_permanence               10070 non-null  float64
 7   longitude_permanence              10148 non-null  float64
 8   permanence_short_name             10816 non-null  object 
 9   permanence_long_name              9961 non-null   object 
 10  vector_type                       10589 non-null  object 
 11  eventtype_firstcall               7362 non-null   object 
 12  eventleve

In [6]:
## Coverage of AEDs and hospitals

coverage = len(itv_covered) / len(itv_aed)
print('Coverage of AEDs and hospitals:', coverage)

Coverage of AEDs and hospitals: 0.4269729325085694


In [7]:
## Use KMeans to find the optimal location for AEDs

kmeans = KMeans(n_clusters=aed.shape[0], random_state=0).fit(itv_aed[['lat_itv', 'lon_itv']])

aed[['new_lat', 'new_lon']] = kmeans.cluster_centers_
aed.head()

Unnamed: 0,id,type,address,number,postal_code,municipality,province,location,public,available,hours,full_address,lat,lon,new_lat,new_lon
0,13.0,,Blvd. fr. roosevelt,24.0,7060,Soignies,Hainaut,,Y,,,"Blvd. fr. roosevelt, 7060 Soignies, Hainaut",50.576042,4.06574,50.92432,5.25369
1,70.0,,Ch. de wégimont,76.0,4630,Ayeneux,Liège,,,,,"Ch. de wégimont, 4630 Ayeneux, Liège",50.60768,5.730187,50.892865,4.075619
2,71.0,,Place saint-lambert,,4020,Liège,Liège,,,,,"Place saint-lambert, 4020 Liège, Liège",50.645622,5.57362,51.183575,3.141258
3,72.0,,Rue du doyard,,4990,Lierneux,Liège,,,,,"Rue du doyard, 4990 Lierneux, Liège",50.287416,5.786325,50.41584,4.876835
4,73.0,,Fond saint servais,,4000,Liège,Liège,,,,,"Fond saint servais, 4000 Liège, Liège",50.646806,5.571031,51.159505,4.485365


In [8]:
# Calculate distance between itervention and the nearest new AED location

from pyproj import Geod

geod = Geod(ellps='WGS84')


def find_closest_distance(df1, df2):
    distances = []
    for index1, row1 in df1.iterrows():
        min_distance = float('inf')
        for index2, row2 in df2.iterrows():
            # calculate the distance between two points
            _, _, distance = geod.inv(row1['lon_itv'], row1['lat_itv'],
                                      row2['new_lon'], row2['new_lat'])
            # update the min distance
            if distance < min_distance:
                min_distance = distance
        # append the min distance to the list
        distances.append(min_distance)
    return distances

itv_aed['new_aed_distance'] = find_closest_distance(itv_aed, aed)
itv_aed.head()

Unnamed: 0,mission_id,service_name,postalcode_permanence,cityname_permanence,streetname_permanence,housenumber_permanence,latitude_permanence,longitude_permanence,permanence_short_name,permanence_long_name,...,t7_Hour,t7_Day,t7_Month,t7_DayName,province,intervention_time_(t1confirmed),departure_time_(t1confirmed),aed_distance,hospital_distance,new_aed_distance
0,20222490147,FB PDS SCHA [PASI Paul Brien] SIAMU,1030.0,Schaarbeek (Schaarbeek),ChaussÈe de Haecht,,50.86948,4.38649,ABSCHA01A,AMB BRIEN 1,...,16.0,6.0,9.0,Tuesday,Brussels Hoofdstedelijk Gewest,,,5.954143,1670.861212,93.50082
2,20222490155,HB UR BRUX Europe Elisabeth,1180.0,ukkel (ukkel),de frÈlaan,,50.8047,4.36763,ABBRUX09A,AMB ST ELISABETH 9,...,17.0,6.0,9.0,Tuesday,Provincie Vlaams-Brabant,,,389.414213,4173.642942,23.661984
3,20222490162,FB PDS BRUX [PASI CitÈ] SIAMU,1000.0,Brussel (Brussel),Vesaliusstraat,,50.85097,4.36411,ABBRUX11A,AMB CITE 2,...,17.0,6.0,9.0,Tuesday,Brussels Hoofdstedelijk Gewest,,,13.534626,1034.347073,29.654307
4,20222490200,FB PDS ANDE [PASI Anderlecht] SIAMU,1070.0,Anderlecht,Bergense Steenweg,,50.83254,4.31199,ABANDE03A,AMB AND 3,...,18.0,6.0,9.0,Tuesday,Brussels Hoofdstedelijk Gewest,,,46.5782,3008.489539,21.247651
7,20222490246,HB UR BRUX St Jean,1000.0,Brussel (Brussel),Broekstraat,,50.8523,4.35988,UBBRUX02A,SMUR STJEAN 2,...,21.0,6.0,9.0,Tuesday,Brussels Hoofdstedelijk Gewest,,,166.420606,2087.394053,69.738682


In [9]:
## Cvoverage after kmeans optimization

kmeans_covered = itv_aed.loc[(
    (itv_aed['new_aed_distance'] <= speed_running * golden_minutes / 2) |
    (itv_aed['hospital_distance'] <= speed_driving * golden_minutes))]

coverage = len(kmeans_covered) / len(itv_aed)
print('Coverage of AEDs and hospitals after KMeans optimization:', coverage)


Coverage of AEDs and hospitals after KMeans optimization: 0.9960600449154879


In [10]:
## Add province information to the new AED locations

import geopandas as gpd
from shapely import geometry as geo
from shapely.validation import explain_validity

geo_path = '/Users/lye/Downloads/MDA/Github-MDA2024/1_Data/Belgium.provinces.WGS84.geojson'
geo_be = gpd.read_file(geo_path)

# Check if the geometries are valid
for i in range(len(geo_be)):
    if not geo_be.loc[i, 'geometry'].is_valid:
        print(explain_validity(geo_be.loc[i, 'geometry']))
        geo_be.loc[i, 'geometry'] = geo_be.loc[i, 'geometry'].buffer(0)
        print(geo_be.loc[i, 'geometry'].is_valid)

Self-intersection[6.24760990547934 50.640636186645]
True


In [11]:
def get_medical_province(df, geo_df):
    province = []
    missing_province = 0
    for i in range(len(df)):
        point = geo.Point(df.loc[i, 'new_lon'], df.loc[i, 'new_lat'])
        contained = geo_df.loc[geo_df['geometry'].contains(
            point)]['NameDUT'].values
        if contained.size > 0:
            province.append(contained[0])
        else:
            province.append(None)
            missing_province += 1

    df['new_province'] = province
    print(f'{missing_province} coordinates are not located in any province')

    return df


aed = get_medical_province(aed, geo_be)
aed['new_province'].value_counts()

0 coordinates are not located in any province


new_province
Provincie Antwerpen               2438
Provincie Oost-Vlaanderen         2183
Provincie Henegouwen              1953
Provincie Vlaams-Brabant          1916
Provincie West-Vlaanderen         1660
Provincie Luik                    1512
Provincie Limburg                 1322
Brussels Hoofdstedelijk Gewest     925
Provincie Namen                    821
Provincie Waals-Brabant            533
Provincie Luxemburg                512
Name: count, dtype: int64

In [12]:
aed.to_csv('/Users/lye/Downloads/MDA/Github-MDA2024/1_Data/CLEANED/aed_with_KmeansLocation.csv', index=False)
itv_aed.to_csv('/Users/lye/Downloads/MDA/Github-MDA2024/1_Data/CLEANED/intervention_aed_kmeans_distance.csv', index=False)

In [13]:
aed['province'].value_counts()

province
Antwerpen            2355
Bruxelles-brussel    2117
Hainaut              1950
Liège                1771
Oost-vlaanderen      1664
West-vlaanderen      1374
Vlaams-brabant       1282
Limburg               998
Namur                 900
Luxembourg            683
Brabant wallon        681
Name: count, dtype: int64