# Leuven - Simulating Community First Responders

To create a realistic distribution of potential first responders, we need to get an approximate distribution of where people of working-age are located in Leuven. We must have an accurate perception of how many working-age people are located in the city, and where they are located. After doing this, when running the algorithm, we can randomly sample CFR locations in a way which reflects where they are likely to be: higher sampling probability in areas of the city with more working-age people.

#### Data sources:
- **Population age distribution by statistical sector**: https://publish.geo.be/geonetwork/IGiLJUGB/api/records/0202b8dd-1c7e-4331-8ba7-35e1fef4037a?language=eng
| Number of Leuven residents in the 0-14, 15-64 and 65+ age groups.
- **Flemish kotlabel**: https://www.vlaanderen.be/datavindplaats/catalogus/vlaams-kotlabel-via-poi-service
| Location of student rooms which have requested, received or been refused the Flemish kotlabel.
- **Student statistics**: https://onderwijs.vlaanderen.be/nl/onderwijsstatistieken/dataloep-aan-de-slag-met-cijfers-over-onderwijs
| Contains information on the number of student at institutions registered in each municipality in Flanders, and their place of residence.
- **KUL associated residences**: https://www.kuleuven.be/english/life-at-ku-leuven/housing/find-housing/students/residences
| Data on number of students in each residence, and their locations, attained manually from their respective webpages.
- **Municipality workers data, municipality of employment**: https://statbel.fgov.be/nl/themas/census/arbeidsmarkt/jobkenmerken#panel-13
| (file: T01_BE_LPW_REFNIS_07JAN25_NL.xlsx)
| Where Leuven workers are commuting from, and where Leuven workers are commuting to.
- **Municipality workers data, employment status**: https://statbel.fgov.be/nl/themas/census/arbeidsmarkt/jobkenmerken#panel-13
| (file: T01_CAS_AGE_BE_NL.xlsx)
| Gives us the number of Leuven residents who are working, unemployed, or registered as students (15-64, or 20-64).



In [None]:
import numpy as np
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point 
import matplotlib.pyplot as plt
import folium

### Nighttime population
We want to calculate the number of working-age actually residing in the city.

The first step in this process involves using publicly available data on residents of the city. This is easily obtained from Statistics Flanders and geo.be. We will use data on individuals in the 15-64 age group, who may be eligible for becoming a community first responder, and will supplement these numbers by accounting for students who live in Leuven.



#### Population age breakdown by statistical sector

In [None]:
## load in the data
pop_dist_stat_sector_gdf = gpd.read_file("Data/BE_SB_TF_PD_STATDIS_2024.gpkg")

In [None]:
## take the NIS code corresponding to leuven municipality (24062)
leuven_gdf = pop_dist_stat_sector_gdf[pop_dist_stat_sector_gdf["CNIS5_2024"]=="24062"]
# Replace NaN values with 0 for sectors with no residents
leuven_gdf.fillna(0, inplace=True)

In [None]:
leuven_gdf.head()

In [None]:
## calculate the propotion of people of working age (possible CFRs)
leuven_gdf["working_age_prop"] = leuven_gdf["group15_64"]/leuven_gdf["TOTAL"];

In [None]:
## You can get an nice interactive plot using "geodataframe_name".explore("attribute_of_interest")
# leuven_gdf.explore("group15_64")



### Student Population

The municipality has a very high student population. Most of these students are not registered as city residents. This is because Belgian students, and students from neighbouring counntries (annex 33), remain officially resident in their hometown while studying. Only other international students must register at the town hall (annex 19). It is assumed that international PhD students must register in the city. Anecdotally, it is known that some other international students do not officially register during their time as a student in the city. 

## ----------
## OBSOLETE

#### Census data

We calculate the number of university students registered as residents in the city using a combination of the two datasets:
- Employment data between 15-64
- Population age data

We assume that approximately all students under 18 remain in full time education. We take the number of students aged over 15, and subtract the number of people aged between 15 and 17 => we get the number of students aged 18 or older.
## ----------

In [None]:
## Obsolete

# ## calculating number of students over 18
# # reading in population employment data 15-64
# employment_15_64 = pd.read_excel("Data/T01_CAS_AGE_BE_NL.XLSX", "CENSUS_T01_2021_BE_CAS1564_2021", skiprows=3)
# # Create a new list of headers, keeping the existing non-unnamed headers
# new_columns = []
# for current_col, new_col in zip(employment_15_64.columns, employment_15_64.iloc[0]):
#     if "Unnamed" in current_col:  # Replace only Unnamed headers
#         new_columns.append(new_col)
#     else:  # Keep the current non-unnamed header
#         new_columns.append(current_col)
# # Update the DataFrame headers
# new_columns[2] = "Gender"
# employment_15_64.columns = new_columns
# # drop first row (used to update headers)
# employment_15_64 = employment_15_64.drop(0).reset_index(drop=True)
# ## just take the totals, ignore gender
# ## fill in the NaNs
# employment_15_64[['CODE-NIS', 'Verblijfplaats']] = employment_15_64[['CODE-NIS', 'Verblijfplaats']].ffill()
# ## just take the totals, ignore gender
# totals_employment_15_64 = employment_15_64.loc[(employment_15_64["Gender"] == "Totaal")]
# ## filter Leuven
# leuven_students_over_15 = totals_employment_15_64.loc[(totals_employment_15_64["CODE-NIS"] == "24062")]

# ## count the number of people in leuven between 15 and 17
# # reading in statistical sector data which contains populations counts
# pop_df = pd.read_csv("Data/TF_SOC_POP_STRUCT_2024.txt", sep = '|')
# pop_df_leuven = pop_df[pop_df["TX_DESCR_NL"] == "Leuven"]
# pop_df_leuven_age = pop_df_leuven.groupby("CD_AGE")["MS_POPULATION"].sum().reset_index()
# minors_over_15 = sum(pop_df_leuven_age.loc[
#                     (pop_df_leuven_age["CD_AGE"] == 15)| 
#                     (pop_df_leuven_age["CD_AGE"] == 16)|
#                     (pop_df_leuven_age["CD_AGE"] == 17), 
#                     "MS_POPULATION"])

# leuven_students_over_18 = leuven_students_over_15["2.3 Studenten"] - minors_over_15
# leuven_students_over_18

#### Calculating number of students who live in Leuven, but who are not officially resident there
To try to account for distribution of unregistered students, we will estimate their number and rescale the number of student rooms throughout the city so that each student is accommodated.

In [None]:
## data taken from https://onderwijs.vlaanderen.be/nl/onderwijsstatistieken/dataloep-aan-de-slag-met-cijfers-over-onderwijs
## total leuven students
total_leuven_students = 59399
## total number of students in Leuven registered as residents of Leuven
leuven_resident_students = 8902
## total unregistered leuven students residents
unreg_leuven_students = total_leuven_students - leuven_resident_students
print(unreg_leuven_students)

unreg_prop = leuven_resident_students/total_leuven_students

##
#### Obsolete below, found better data, no need to approximate anymore
##

# ## taken from https://www.kuleuven.be/prodstudinfo/v2/50000050/aant_det_en_v2.html
# ## total number of ku leuven students
# total_ku_leuven_students = 65535
# ## total number of students at the leuven campus
# leuven_campus_students = 50133
# ## number of UCLL students in Leuven (rough estimate, half of UCLL campuses in leuven => half of students in Leuven)
# ucll_students = 17000/2
# ## number of students at LUCA school of arts in Leuven
# luca_students = 572
# ## total number of KUL (degree-seeing and exchange) international students
# kul_internationals = 7259 + 2055
# ## number of LUCA internationals in Leuven
# luca_internationals = 103
# ## most international students are (supposed to be) registered at the city hall, except those from neighbouring countries
# ## (subtract number of dutch, german, french students (no data on luxembourgers))
# reg_kul_internationals = kul_internationals-(1168+616+439)()
# ## calculating total proportion of students who are international and registered as living in the city out of ALL INTERNATIONALS
# ## (assume similar proportion for UCLL and LUCA school of arts)
# reg_international_prop = reg_kul_internationals/kul_internationals
# ## calculating total proportion of students who are international and registered as living in the city OUT OF TOTAL
# ## (assume similar proportion for UCLL and LUCA school of arts)
# reg_student_prop = reg_kul_internationals/total_ku_leuven_students
# ## calculating proportion of unregistered students
# unreg_prop = 1-reg_student_prop 
# unreg_leuven_students = round((leuven_campus_students + ucll_students + luca_students)*(unreg_prop))
# unreg_leuven_students

This is the estimated number of students in Leuven who are not registered as living in the city. They are unaccounted for in the statistical sector data. After accounting for those who are unregistered and living in KUL associated residences, we will rescale the number of kots in each statistical sector to get a realistic nighttime population distribution

#### Flemish Kotlabel

Using data from geopunt.be on locations of student rooms, we can see where many students in the city live.

In [None]:
## load in the data
std_rooms_df = pd.read_csv("Data/student_rooms.csv", sep=';')

In [None]:
# Display the first few rows of the dataframe to verify
std_rooms_df.iloc[1]
# Create a 'geometry' column with shapely Points using the longitude and latitude
std_rooms_df['geometry'] = std_rooms_df.apply(lambda row: Point(row['WGS84_LONGITUDE'], row['WGS84_LATITUDE']), axis=1)

# Convert the DataFrame to a GeoDataFrame
std_rooms_gdf = gpd.GeoDataFrame(std_rooms_df, geometry='geometry')

# Set the coordinate reference system (CRS) to WGS84 (EPSG:4326)
std_rooms_gdf.set_crs(epsg=4326, inplace=True)

## change the coordinate reference system of our student room geodataframe to match census one
std_rooms_gdf = std_rooms_gdf.to_crs(leuven_gdf.crs)

print(std_rooms_gdf.head())

In [None]:
### rooms vary according to if they have requested, been approved for, or been rejected for getting the flemish kotlabel
std_rooms_gdf.OMSCHR.unique()

In [None]:
# std_rooms_gdf.explore()

In [None]:
known_kots = std_rooms_gdf.shape[0]
known_kots

The geodataframe has _point_ geometry objects, indicating the location of the student room. Many of the student room points are overlapping. We would like to sum the number of kots at each address to get the number of students living at each address.

In [None]:
## each room can house one student, add a population column for this
std_rooms_gdf["pop"] = np.ones(std_rooms_gdf.shape[0])
std_rooms_gdf = std_rooms_gdf[["pop","geometry"]]
std_rooms_gdf.head()

In [None]:
# Group by 'geometry' and sum the 'pop' column to get the total number of rooms at each location
grouped_std_rooms_gdf = std_rooms_gdf.groupby('geometry').agg({'pop': 'sum'}).reset_index()
grouped_std_rooms_gdf
# Convert back to a GeoDataFrame if needed
grouped_std_rooms_gdf = gpd.GeoDataFrame(grouped_std_rooms_gdf, geometry='geometry')

There are **12257 student rooms** accounted for in the kotlabel dataset, spread across **1095 locations** in the municipality of Leuven. We _could_ add these numbers to the working age population of each statistical sector to get a more realistic nighttime population distribution. However, this is not all of the students rooms in Leuven. Also, these rooms may be registered to international students, who may be officially registered in the municipality already. We can rescale this value to match the number of unregistered students in Leuven, and reweight the number of student rooms in each statistical sector (proxy variable for the number of unregistered students in each statistical sector) accordingly.

Many student residences are missing. They account for thousands more student rooms. It should be easy to obtain these locations, and add these to our current student room information. This will slightly decrease the numebr of students who are unaccounted for, and thus we will not reweight the numebr of student rooms by as much.

### Residence halls

In [None]:
## data on number of residence halls from the KU Leuven official webpage, georeferenced with google maps manually if missing
kul_residences = gpd.read_file("Data/kul_residence_shapefiles/KU Leuven residenties.shp")
stuvo_residences = gpd.read_file("Data/kul_residence_shapefiles/KU Leuven Stuvo Residence Halls.shp")
other_residences = gpd.read_file("Data/kul_residence_shapefiles/SWO residenties KU Leuven.shp")

In [None]:
kul_residences.head()

In [None]:
# take columns of interest
kul_residences = kul_residences[["Name", "geometry"]]
## change CRS to correct one
kul_residences = kul_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
kul_residences["pop"]= [181, 208, 87, 150, 44, 192, 320, 106, 167, 145, 471, 18, 200, 234, 67, 40]
## 50% domestic intake
kul_residences["unreg_residence_pop"] = round(kul_residences["pop"]*(0.5))
# take columns of interest
kul_residences = kul_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
kul_residences

In [None]:
## take columns of interest
stuvo_residences = stuvo_residences[["Name", "geometry"]]
stuvo_residences["index"] = stuvo_residences.index
## change CRS to correct one
stuvo_residences = stuvo_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
stuvo_residences["pop"]= [35,493,50,72,281,45,820,64,135,95,113,54,102,27,178,26,20,107,60,28,89,94,58,191]
## 80% domestic intake
stuvo_residences["unreg_residence_pop"] = round(stuvo_residences["pop"]*0.8)
# take columns of interest
stuvo_residences = stuvo_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
stuvo_residences

In [None]:
## take columns of interest
other_residences = other_residences[["Name", "geometry"]]
other_residences["index"] = other_residences.index
## change CRS to correct one
other_residences = other_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
other_residences["pop"] = [205, 92, 254, 139, 74]
## 80% domestic intake
other_residences["unreg_residence_pop"] = round(other_residences["pop"]*0.8)
# take columns of interest
other_residences = other_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
other_residences

In [None]:
all_residences = pd.concat([kul_residences, stuvo_residences])
all_residences = pd.concat([all_residences, other_residences])
all_residences = all_residences.reset_index()

# Convert the geometry column to WKT (Well-Known Text) format for displaying in the tooltip
grouped_std_rooms_gdf['index'] = grouped_std_rooms_gdf.index

## I used this manually cross-check if a residence was already included via the flemish kotlabel
m = all_residences.explore(color = "red", name = "KUL Residences", tooltip=['Name'])
m = grouped_std_rooms_gdf.explore(m = m, color = "blue", name = "Known student rooms",tooltip=['pop','index'])
folium.LayerControl().add_to(m)
# m

In [None]:
print(sum(all_residences["pop"]),sum(all_residences["unreg_residence_pop"]))

The population of students in residences associated with KU Leuven is **6631**, with approximately **4516 domestic** students, who will not be registered as residing in the municipality.

Some residences are already included in the flemish kotlabel dataset. These residences have much more rooms than the average for a particular address, sometimes containing hundreds of kots. They are essentially outliers in the kotlabel dataset. When scaling the population, if they are left in, they will have an outsized effect on number of student kots added to a statistical sector, unreasonably inflating the number. Therefore, we will delete these rooms from the kotlabel dataset.

#### Deleting KUL residence kots from the kotlabel dataset

In [None]:
## list of indices of residences duplicated in the flemish kotabel dataset 
## (carefully manually verified, some residences have multiple instances in the data, slightly differing point location)
duplicate_residence_indices = [161,162,163,166,100,101,1091,71,80,457,950,1008]
grouped_std_rooms_gdf.iloc[duplicate_residence_indices]["pop"]

In [None]:
## drop the duplicates
grouped_std_rooms_gdf = grouped_std_rooms_gdf.drop(duplicate_residence_indices)

In [None]:
grouped_std_rooms_gdf

In [None]:
## we want to exclude the unregistered population which is accounted for in the kul associated residences
unreg_scalar = unreg_leuven_students - sum(all_residences["unreg_residence_pop"])
unreg_scalar

Subtracting the approximate number of unregistered students who reside a KU Leuven associated residences, we obtain an estimate of the number of students for whom we need to "house" somewhere in the municipality: **45981 unaccounted for students**.

To approximate where these students reside, we rescale the number of kots at the locations which we are aware of from the flemish kotlabel dataset. This is under the assumption that their is no spatial correlation of missing data within this dataset.

In [None]:
## rescale the population of each room so that we get a realistic distribution of student housing locations
grouped_std_rooms_gdf["scaled_pop"] = grouped_std_rooms_gdf["pop"]*(unreg_scalar/std_rooms_gdf.shape[0])
grouped_std_rooms_gdf=grouped_std_rooms_gdf.drop("index",axis=1)
grouped_std_rooms_gdf.head()

Now we are finally ready to obtain as estimate of the "nighttime" residents of the municipality of Leuven. To do so, we will merge the information of our census data with our dataframes with estimates of the locations of all students.

In [None]:
# Perform spatial join to assign each student room to a statistical sector
rooms_with_sectors = gpd.sjoin(grouped_std_rooms_gdf, leuven_gdf, how="inner", op="within")

# Sum the scaled student room populations by statistical sector
student_room_population_by_sector = rooms_with_sectors.groupby('CS01012024')['scaled_pop'].sum()

# Now, merge this summed student population with the sectors dataframe
sectors_with_total_population = leuven_gdf.merge(student_room_population_by_sector, on='CS01012024', how='left')

# Replace NaN values with 0 for sectors with no student rooms
sectors_with_total_population['scaled_pop'].fillna(0, inplace=True)

In [None]:
## this plot shows which statistical sectors population has been boosted by adding the scaled kot population
# sectors_with_total_population.explore("scaled_pop")

#### Still have to add the residences

In [None]:
# Perform spatial join to assign each residence room to a statistical sector
residences_with_sectors = gpd.sjoin(all_residences, sectors_with_total_population, how="inner", op="within")

# Sum the residences rooms by statistical sector
residence_unreg_population_by_sector = residences_with_sectors.groupby('CS01012024')['unreg_residence_pop'].sum()

# Now, merge this summed student population with the sectors dataframe
res_sectors_with_total_population = sectors_with_total_population.merge(residence_unreg_population_by_sector, on='CS01012024', how='left')

# Replace NaN values with 0 for sectors with no residences
res_sectors_with_total_population['unreg_residence_pop'].fillna(0, inplace=True)

# Add the population from residence rooms to the population of each sector
res_sectors_with_total_population['total_possible_CFR'] = round(
    res_sectors_with_total_population['group15_64'] + 
    res_sectors_with_total_population['scaled_pop'] +
    res_sectors_with_total_population['unreg_residence_pop']
)



### Finally
Now the population of each statistical sector has been updated by adding the scaled kot population _and_ unregistered students in residences. All working age people living in the city have been accounted for: non-resident students have been added to the population of each statistical sector.

In [None]:
## plotting the raw totals of possible CFRs per statistical sector
# res_sectors_with_total_population.explore("total_possible_CFR", tooltip = ["group15_64","scaled_pop","unreg_residence_pop","total_possible_CFR"])

In [None]:
sum(res_sectors_with_total_population["total_possible_CFR"])

In [None]:
## plots to compare the old and new nighttime CFR sampling probability densities
## new sampling density 
res_sectors_with_total_population["new_sampling_density"] = res_sectors_with_total_population["total_possible_CFR"]/res_sectors_with_total_population["geometry"].area

## old sampling density
res_sectors_with_total_population["old_sampling_density"] = (res_sectors_with_total_population["group0_14"] + res_sectors_with_total_population["group15_64"] +
res_sectors_with_total_population["group65ETP"])/res_sectors_with_total_population["geometry"].area

In [None]:
## old sampling density in each statistical sector
# res_sectors_with_total_population.explore("old_sampling_density")

In [None]:
## new sampling density in each statistical sector
# res_sectors_with_total_population.explore("new_sampling_density")