# Leuven - Simulating Community First Responders

To create a realistic distribution of potential first responders, we need to get an approximate distribution of where people of working-age are located in Leuven. We must have an accurate perception of how many working-age people are located in the city, and where they are located. After doing this, when running the algorithm, we can randomly sample CFR locations in a way which reflects where they are likely to be: higher sampling probability in areas of the city with more working-age people.

In [1]:
import numpy as np
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point 
import matplotlib.pyplot as plt
import folium

### Nighttime population
We want to calculate the number of working-age people actually residing in the city.

#### Data sources:
- **Population age distribution by statistical sector**: https://publish.geo.be/geonetwork/IGiLJUGB/api/records/0202b8dd-1c7e-4331-8ba7-35e1fef4037a?language=eng
| Number of Leuven residents in the 0-14, 15-64 and 65+ age groups.
- **Flemish kotlabel**: https://www.vlaanderen.be/datavindplaats/catalogus/vlaams-kotlabel-via-poi-service
| Location of student rooms which have requested, received or been refused the Flemish kotlabel.
- **Student statistics**: https://onderwijs.vlaanderen.be/nl/onderwijsstatistieken/dataloep-aan-de-slag-met-cijfers-over-onderwijs
| Contains information on the number of student at institutions registered in each municipality in Flanders, and their place of residence.
- **KUL associated residences**: https://www.kuleuven.be/english/life-at-ku-leuven/housing/find-housing/students/residences
| Data on number of students in each residence, and their locations, attained manually from their respective webpages.

The first step in this process involves using publicly available data on residents of the city. This is easily obtained from Statistics Flanders and geo.be. We will use data on individuals in the 15-64 age group, who may be eligible for becoming a community first responder, and will supplement these numbers by accounting for students who live in Leuven.



#### Population age breakdown by statistical sector

In [2]:
## load in the data
pop_dist_stat_sector_gdf = gpd.read_file("Data/BE_SB_TF_PD_STATDIS_2024.gpkg")

In [3]:
## take the NIS code corresponding to leuven municipality (24062)
leuven_gdf = pop_dist_stat_sector_gdf[pop_dist_stat_sector_gdf["CNIS5_2024"]=="24062"]
# Replace NaN values with 0 for sectors with no residents
leuven_gdf.fillna(0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  leuven_gdf.fillna(0, inplace=True)


In [4]:
leuven_gdf.head()

Unnamed: 0,CS01012024,T_SEC_NL,T_SEC_FR,T_SEC_DE,CNIS5_2024,C_COUNTRY,Shape_Length,Shape_Area,TOTAL,MALE,FEMALE,group0_14,group15_64,group65ETP,Areaofdis,Datum,geometry
4039,24062A00-,LEUVEN-CENTRUM,LEUVEN-CENTRUM,LEUVEN-CENTRUM,24062,BE,1838.075773,107925.820602,765.0,442.0,323.0,41.0,665.0,59.0,BE,2024-01-01,"MULTIPOLYGON (((3948573.723 3098899.263, 39485..."
4040,24062A01-,LEI - VISMARKT,LEI - VISMARKT,LEI - VISMARKT,24062,BE,1656.133549,126520.541505,885.0,472.0,413.0,28.0,699.0,158.0,BE,2024-01-01,"MULTIPOLYGON (((3948367.087 3099246.624, 39483..."
4041,24062A02-,LEUVEN STADSPARK,LEUVEN STADSPARK,LEUVEN STADSPARK,24062,BE,1428.289911,99833.415787,365.0,167.0,198.0,4.0,262.0,99.0,BE,2024-01-01,"MULTIPOLYGON (((3948586.231 3098646.382, 39485..."
4042,24062A03-,DAMIAANPLEIN,DAMIAANPLEIN,DAMIAANPLEIN,24062,BE,1995.315951,136756.465252,674.0,355.0,319.0,26.0,489.0,159.0,BE,2024-01-01,"MULTIPOLYGON (((3948132.029 3098720.665, 39481..."
4043,24062A04-,LEUVEN KLINIEK -O.L.VROUW-KERK,LEUVEN KLINIEK -O.L.VROUW-KERK,LEUVEN KLINIEK -O.L.VROUW-KERK,24062,BE,1570.926185,118838.440059,425.0,215.0,210.0,32.0,326.0,67.0,BE,2024-01-01,"MULTIPOLYGON (((3947993.582 3098985.979, 39480..."


In [5]:
## calculate the propotion of people of working age (possible CFRs)
leuven_gdf["working_age_prop"] = leuven_gdf["group15_64"]/leuven_gdf["TOTAL"];

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


In [6]:
## You can get an nice interactive plot using "geodataframe_name".explore("attribute_of_interest")
# leuven_gdf.explore("group15_64")



### Student Population

The municipality has a very high student population. Most of these students are not registered as city residents. This is because Belgian students, and students from neighbouring counntries (annex 33), remain officially resident in their hometown while studying. Only other international students must register at the town hall (annex 19). It is assumed that international PhD students must register in the city. Anecdotally, it is known that some other international students do not officially register during their time as a student in the city. 

## ----------
## OBSOLETE

#### Census data

We calculate the number of university students registered as residents in the city using a combination of the two datasets:
- Employment data between 15-64
- Population age data

We assume that approximately all students under 18 remain in full time education. We take the number of students aged over 15, and subtract the number of people aged between 15 and 17 => we get the number of students aged 18 or older.
## ----------

In [7]:
## Obsolete

# ## calculating number of students over 18
# # reading in population employment data 15-64
# employment_15_64 = pd.read_excel("Data/T01_CAS_AGE_BE_NL.XLSX", "CENSUS_T01_2021_BE_CAS1564_2021", skiprows=3)
# # Create a new list of headers, keeping the existing non-unnamed headers
# new_columns = []
# for current_col, new_col in zip(employment_15_64.columns, employment_15_64.iloc[0]):
#     if "Unnamed" in current_col:  # Replace only Unnamed headers
#         new_columns.append(new_col)
#     else:  # Keep the current non-unnamed header
#         new_columns.append(current_col)
# # Update the DataFrame headers
# new_columns[2] = "Gender"
# employment_15_64.columns = new_columns
# # drop first row (used to update headers)
# employment_15_64 = employment_15_64.drop(0).reset_index(drop=True)
# ## just take the totals, ignore gender
# ## fill in the NaNs
# employment_15_64[['CODE-NIS', 'Verblijfplaats']] = employment_15_64[['CODE-NIS', 'Verblijfplaats']].ffill()
# ## just take the totals, ignore gender
# totals_employment_15_64 = employment_15_64.loc[(employment_15_64["Gender"] == "Totaal")]
# ## filter Leuven
# leuven_students_over_15 = totals_employment_15_64.loc[(totals_employment_15_64["CODE-NIS"] == "24062")]

# ## count the number of people in leuven between 15 and 17
# # reading in statistical sector data which contains populations counts
# pop_df = pd.read_csv("Data/TF_SOC_POP_STRUCT_2024.txt", sep = '|')
# pop_df_leuven = pop_df[pop_df["TX_DESCR_NL"] == "Leuven"]
# pop_df_leuven_age = pop_df_leuven.groupby("CD_AGE")["MS_POPULATION"].sum().reset_index()
# minors_over_15 = sum(pop_df_leuven_age.loc[
#                     (pop_df_leuven_age["CD_AGE"] == 15)| 
#                     (pop_df_leuven_age["CD_AGE"] == 16)|
#                     (pop_df_leuven_age["CD_AGE"] == 17), 
#                     "MS_POPULATION"])

# leuven_students_over_18 = leuven_students_over_15["2.3 Studenten"] - minors_over_15
# leuven_students_over_18

#### Calculating number of students who live in Leuven, but who are not officially resident there
To try to account for distribution of unregistered students, we will estimate their number and rescale the number of student rooms throughout the city so that each student is accommodated.

In [8]:
## data taken from https://onderwijs.vlaanderen.be/nl/onderwijsstatistieken/dataloep-aan-de-slag-met-cijfers-over-onderwijs
## total leuven students
total_leuven_students = 59399
## total number of students in Leuven registered as residents of Leuven
leuven_resident_students = 8902
## total unregistered leuven students residents
unreg_leuven_students = total_leuven_students - leuven_resident_students
print(unreg_leuven_students)

unreg_prop = leuven_resident_students/total_leuven_students

##
#### Obsolete below, found better data, no need to approximate anymore
##

# ## taken from https://www.kuleuven.be/prodstudinfo/v2/50000050/aant_det_en_v2.html
# ## total number of ku leuven students
# total_ku_leuven_students = 65535
# ## total number of students at the leuven campus
# leuven_campus_students = 50133
# ## number of UCLL students in Leuven (rough estimate, half of UCLL campuses in leuven => half of students in Leuven)
# ucll_students = 17000/2
# ## number of students at LUCA school of arts in Leuven
# luca_students = 572
# ## total number of KUL (degree-seeing and exchange) international students
# kul_internationals = 7259 + 2055
# ## number of LUCA internationals in Leuven
# luca_internationals = 103
# ## most international students are (supposed to be) registered at the city hall, except those from neighbouring countries
# ## (subtract number of dutch, german, french students (no data on luxembourgers))
# reg_kul_internationals = kul_internationals-(1168+616+439)()
# ## calculating total proportion of students who are international and registered as living in the city out of ALL INTERNATIONALS
# ## (assume similar proportion for UCLL and LUCA school of arts)
# reg_international_prop = reg_kul_internationals/kul_internationals
# ## calculating total proportion of students who are international and registered as living in the city OUT OF TOTAL
# ## (assume similar proportion for UCLL and LUCA school of arts)
# reg_student_prop = reg_kul_internationals/total_ku_leuven_students
# ## calculating proportion of unregistered students
# unreg_prop = 1-reg_student_prop 
# unreg_leuven_students = round((leuven_campus_students + ucll_students + luca_students)*(unreg_prop))
# unreg_leuven_students

50497


This is the estimated number of students in Leuven who are not registered as living in the city. They are unaccounted for in the statistical sector data. After accounting for those who are unregistered and living in KUL associated residences, we will rescale the number of kots in each statistical sector to get a realistic nighttime population distribution

#### Flemish Kotlabel

Using data from geopunt.be on locations of student rooms, we can see where many students in the city live.

In [9]:
## load in the data
std_rooms_df = pd.read_csv("Data/student_rooms.csv", sep=';')

In [10]:
# Display the first few rows of the dataframe to verify
std_rooms_df.iloc[1]
# Create a 'geometry' column with shapely Points using the longitude and latitude
std_rooms_df['geometry'] = std_rooms_df.apply(lambda row: Point(row['WGS84_LONGITUDE'], row['WGS84_LATITUDE']), axis=1)

# Convert the DataFrame to a GeoDataFrame
std_rooms_gdf = gpd.GeoDataFrame(std_rooms_df, geometry='geometry')

# Set the coordinate reference system (CRS) to WGS84 (EPSG:4326)
std_rooms_gdf.set_crs(epsg=4326, inplace=True)

## change the coordinate reference system of our student room geodataframe to match census one
std_rooms_gdf = std_rooms_gdf.to_crs(leuven_gdf.crs)

print(std_rooms_gdf.head())

      POIID               CREATED               UPDATED  BEGINDATUM  \
0  18502741  1/6/2025 10:40:00 AM  1/6/2025 10:40:00 AM         NaN   
1  18507650  1/6/2025 10:40:00 AM  1/6/2025 10:40:00 AM         NaN   
2  18505397  1/6/2025 10:40:00 AM  1/6/2025 10:40:00 AM         NaN   
3  18503073  1/6/2025 10:40:00 AM  1/6/2025 10:40:00 AM         NaN   
4  18501081  1/6/2025 10:40:00 AM  1/6/2025 10:40:00 AM         NaN   

   EINDDATUM                                 NAAM  ALTNAAM      NOTITIE  \
0        NaN       Kamer bus 0001 op gelijkvloers      NaN  EN18-040052   
1        NaN           Kamer bus 0102 op onbekend      NaN  EN21-017612   
2        NaN  Kamer bus 0201 op tweede verdieping      NaN  EN20-001827   
3        NaN   Kamer bus 0303 op derde verdieping      NaN  EN18-091586   
4        NaN  Kamer bus 0204 op tweede verdieping      NaN  EN18-033991   

                                              OMSCHR  TELEFOON  ...  \
0  De procedure om een kotlabel te bekomen is opg..

In [11]:
### rooms vary according to if they have requested, been approved for, or been rejected for getting the flemish kotlabel
std_rooms_gdf.OMSCHR.unique()

array(['De procedure om een kotlabel te bekomen is opgestart.',
       'De kamer voldoet aan de criteria van het Vlaams kotlabel en aan alle gemeentelijke criteria studentenhuisvesting. De resultaten van de beoordeelde criteria kan je vinden op de kotlabel website via onderstaande link.',
       'De woning voldoet aan de criteria van het Vlaams kotlabel en aan alle gemeentelijke criteria studentenhuisvesting. De resultaten van de beoordeelde criteria kan je vinden op de kotlabel website via onderstaande link.',
       'De kamer voldoet aan de criteria van het Vlaams kotlabel. De gemeentelijke criteria studentenhuisvesting worden nog beoordeeld. De resultaten van de beoordeelde criteria kan je vinden op de kotlabel website via onderstaande link.',
       'De kamer voldoet NIET aan alle criteria van het Vlaams kotlabel. De resultaten van de beoordeelde criteria kan je vinden op de kotlabel website via onderstaande link.',
       'De woning voldoet aan de criteria van het Vlaams kotlabel.

In [12]:
# std_rooms_gdf.explore()

In [13]:
known_kots = std_rooms_gdf.shape[0]
known_kots

12257

The geodataframe has _point_ geometry objects, indicating the location of the student room. Many of the student room points are overlapping. We would like to sum the number of kots at each address to get the number of students living at each address.

In [14]:
## each room can house one student, add a population column for this
std_rooms_gdf["pop"] = np.ones(std_rooms_gdf.shape[0])
std_rooms_gdf = std_rooms_gdf[["pop","geometry"]]
std_rooms_gdf.head()

Unnamed: 0,pop,geometry
0,1.0,POINT (3947508.674 3098363.474)
1,1.0,POINT (3948678.440 3098386.503)
2,1.0,POINT (3948307.329 3098802.471)
3,1.0,POINT (3948220.418 3098785.773)
4,1.0,POINT (3948149.151 3098795.021)


In [15]:
# Group by 'geometry' and sum the 'pop' column to get the total number of rooms at each location
grouped_std_rooms_gdf = std_rooms_gdf.groupby('geometry').agg({'pop': 'sum'}).reset_index()
grouped_std_rooms_gdf
# Convert back to a GeoDataFrame if needed
grouped_std_rooms_gdf = gpd.GeoDataFrame(grouped_std_rooms_gdf, geometry='geometry')

There are **12257 student rooms** accounted for in the kotlabel dataset, spread across **1095 locations** in the municipality of Leuven. We _could_ add these numbers to the working age population of each statistical sector to get a more realistic nighttime population distribution. However, this is not all of the students rooms in Leuven. Also, these rooms may be registered to international students, who may be officially registered in the municipality already. We can rescale this value to match the number of unregistered students in Leuven, and reweight the number of student rooms in each statistical sector (proxy variable for the number of unregistered students in each statistical sector) accordingly.

Many student residences are missing. They account for thousands more student rooms. It should be easy to obtain these locations, and add these to our current student room information. This will slightly decrease the number of students who are unaccounted for, and thus we will not reweight the number of student rooms by as much.

### Residence halls

In [16]:
## data on number of residence halls from the KU Leuven official webpage, georeferenced with google maps manually if missing
kul_residences = gpd.read_file("Data/kul_residence_shapefiles/KU Leuven residenties.shp")
stuvo_residences = gpd.read_file("Data/kul_residence_shapefiles/KU Leuven Stuvo Residence Halls.shp")
other_residences = gpd.read_file("Data/kul_residence_shapefiles/SWO residenties KU Leuven.shp")

In [17]:
kul_residences.head()

Unnamed: 0,Name,descriptio,timestamp,begin,end,altitudeMo,tessellate,extrude,visibility,drawOrder,icon,geometry
0,Amerikaans College,Naamsestraat 100<br>3000 Leuven <br>www.kuleuv...,,,,,-1,0,-1,,,POINT Z (4.69976 50.87352 0.00000)
1,COPAL\n,Tervuursestraat 56 - bus 5557 <br>3000 Leuven ...,,,,,-1,0,-1,,,POINT Z (4.68741 50.87857 0.00000)
2,Don Bosco Peda\n,Paul Van Ostaijenlaan 21 <br>3001 Leuven <br>w...,,,,,-1,0,-1,,,POINT Z (4.70811 50.86721 0.00000)
3,Heilige Geestcollege,Naamsestraat 40 - bus 5551 <br>3000 Leuven <br...,,,,,-1,0,-1,,,POINT Z (4.70044 50.87680 0.00000)
4,J.L. Vives International Residence\n,Pater Damiaanplein 10 <br>3000 Leuven <br>www....,,,,,-1,0,-1,,,POINT Z (4.69695 50.87630 0.00000)


In [18]:
# take columns of interest
kul_residences = kul_residences[["Name", "geometry"]]
## change CRS to correct one
kul_residences = kul_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
kul_residences["pop"]= [181, 208, 87, 150, 44, 192, 320, 106, 167, 145, 471, 18, 200, 234, 67, 40]
## 50% domestic intake
kul_residences["unreg_residence_pop"] = round(kul_residences["pop"]*(0.5))
# take columns of interest
kul_residences = kul_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
kul_residences

Unnamed: 0,Name,pop,unreg_residence_pop,geometry
0,Amerikaans College,181,90.0,POINT Z (3948290.837 3098179.030 0.000)
1,COPAL\n,208,104.0,POINT Z (3947464.977 3098803.151 0.000)
2,Don Bosco Peda\n,87,44.0,POINT Z (3948826.415 3097436.631 0.000)
3,Heilige Geestcollege,150,75.0,POINT Z (3948364.777 3098540.594 0.000)
4,J.L. Vives International Residence\n,44,22.0,POINT Z (3948116.103 3098502.952 0.000)
5,Paus Adrianus VI-college (Pauscollege)\n,192,96.0,POINT Z (3948486.848 3098585.749 0.000)
6,Residentie Groenveld\n,320,160.0,POINT Z (3946798.858 3097614.618 0.000)
7,Residentie Leo XIII,106,53.0,POINT Z (3948790.610 3098453.048 0.000)
8,Residentie Mgr. Karel Cruysberghs\n,167,84.0,POINT Z (3948093.283 3098663.981 0.000)
9,Residentie Rega,145,72.0,POINT Z (3948629.937 3099274.404 0.000)


In [19]:
## take columns of interest
stuvo_residences = stuvo_residences[["Name", "geometry"]]
stuvo_residences["index"] = stuvo_residences.index
## change CRS to correct one
stuvo_residences = stuvo_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
stuvo_residences["pop"]= [35,493,50,72,281,45,820,64,135,95,113,54,102,27,178,26,20,107,60,28,89,94,58,191]
## 80% domestic intake
stuvo_residences["unreg_residence_pop"] = round(stuvo_residences["pop"]*0.8)
# take columns of interest
stuvo_residences = stuvo_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
stuvo_residences

Unnamed: 0,Name,pop,unreg_residence_pop,geometry
0,Bakeleyn,35,28.0,POINT Z (3948643.247 3099061.336 0.000)
1,Camilo Torres,493,394.0,POINT Z (3947477.237 3099200.518 0.000)
2,De La Salle,50,40.0,POINT Z (3947356.193 3099350.593 0.000)
3,De Rijschool,72,58.0,POINT Z (3948608.779 3099015.634 0.000)
4,De Vesten,281,225.0,POINT Z (3947666.335 3097823.566 0.000)
5,Edith Stein,45,36.0,POINT Z (3948317.432 3097771.571 0.000)
6,Arenberg,820,656.0,POINT Z (3947388.526 3097698.987 0.000)
7,De Viking,64,51.0,POINT Z (3947611.699 3099457.403 0.000)
8,Frascati,135,108.0,POINT Z (3948656.143 3099031.702 0.000)
9,Herman Servotte,95,76.0,POINT Z (3948477.213 3098114.966 0.000)


In [20]:
## take columns of interest
other_residences = other_residences[["Name", "geometry"]]
other_residences["index"] = other_residences.index
## change CRS to correct one
other_residences = other_residences.to_crs(leuven_gdf.crs)
## input pops taken from kuleuven website
other_residences["pop"] = [205, 92, 254, 139, 74]
## 80% domestic intake
other_residences["unreg_residence_pop"] = round(other_residences["pop"]*0.8)
# take columns of interest
other_residences = other_residences[["Name", "pop", "unreg_residence_pop", "geometry"]]
other_residences

Unnamed: 0,Name,pop,unreg_residence_pop,geometry
0,De Flint,205,164.0,POINT Z (3948713.405 3098132.700 0.000)
1,Marbrerie,92,74.0,POINT Z (3948215.829 3097718.583 0.000)
2,Residentie #94,254,203.0,POINT Z (3948488.521 3099538.078 0.000)
3,The Village,139,111.0,POINT Z (3949126.343 3098394.102 0.000)
4,Vineam,74,59.0,POINT Z (3947590.178 3099524.814 0.000)


In [21]:
all_residences = pd.concat([kul_residences, stuvo_residences])
all_residences = pd.concat([all_residences, other_residences])
all_residences = all_residences.reset_index()

# Convert the geometry column to WKT (Well-Known Text) format for displaying in the tooltip
grouped_std_rooms_gdf['index'] = grouped_std_rooms_gdf.index

## I used this manually cross-check if a residence was already included via the flemish kotlabel
m = all_residences.explore(color = "red", name = "KUL Residences", tooltip=['Name'])
m = grouped_std_rooms_gdf.explore(m = m, color = "blue", name = "Known student rooms",tooltip=['pop','index'])
folium.LayerControl().add_to(m)
# m

<folium.map.LayerControl at 0x1c6cd033950>

In [22]:
print(sum(all_residences["pop"]),sum(all_residences["unreg_residence_pop"]))

6631 4516.0


The population of students in residences associated with KU Leuven is **6631**, with approximately **4516 domestic** students, who will not be registered as residing in the municipality.

Some residences are already included in the flemish kotlabel dataset. These residences have much more rooms than the average for a particular address, sometimes containing hundreds of kots. They are essentially outliers in the kotlabel dataset. When scaling the population, if they are left in, they will have an outsized effect on number of student kots added to a statistical sector, unreasonably inflating the number. Therefore, we will delete these rooms from the kotlabel dataset.

#### Deleting KUL residence kots from the kotlabel dataset

In [23]:
## list of indices of residences duplicated in the flemish kotabel dataset 
## (carefully manually verified, some residences have multiple instances in the data, slightly differing point location)
duplicate_residence_indices = [161,162,163,166,100,101,1091,71,80,457,950,1008]
grouped_std_rooms_gdf.iloc[duplicate_residence_indices]["pop"]

161      17.0
162     211.0
163     237.0
166      51.0
100     240.0
101      80.0
1091     87.0
71       92.0
80       46.0
457      27.0
950      58.0
1008    292.0
Name: pop, dtype: float64

In [24]:
## drop the duplicates
grouped_std_rooms_gdf = grouped_std_rooms_gdf.drop(duplicate_residence_indices)

In [25]:
grouped_std_rooms_gdf

Unnamed: 0,geometry,pop,index
0,POINT (3946567.329 3095878.873),1.0,0
1,POINT (3947095.924 3096779.923),14.0,1
2,POINT (3947239.940 3096639.895),159.0,2
3,POINT (3947296.211 3096433.114),13.0,3
4,POINT (3947914.961 3096584.169),4.0,4
...,...,...,...
1089,POINT (3948835.107 3097619.573),5.0,1089
1090,POINT (3948854.206 3097579.249),6.0,1090
1092,POINT (3949042.609 3097271.938),15.0,1092
1093,POINT (3949344.607 3096760.402),1.0,1093


In [26]:
## we want to exclude the unregistered population which is accounted for in the kul associated residences
unreg_scalar = unreg_leuven_students - sum(all_residences["unreg_residence_pop"])
unreg_scalar

45981.0

Subtracting the approximate number of unregistered students who reside a KU Leuven associated residences, we obtain an estimate of the number of students for whom we need to "house" somewhere in the municipality: **45981 unaccounted for students**.

To approximate where these students reside, we rescale the number of kots at the locations which we are aware of from the flemish kotlabel dataset. This is under the assumption that their is no spatial correlation of missing data within this dataset.

In [27]:
## rescale the population of each room so that we get a realistic distribution of student housing locations
grouped_std_rooms_gdf["scaled_pop"] = grouped_std_rooms_gdf["pop"]*(unreg_scalar/std_rooms_gdf.shape[0])
grouped_std_rooms_gdf=grouped_std_rooms_gdf.drop("index",axis=1)
grouped_std_rooms_gdf.head()

Unnamed: 0,geometry,pop,scaled_pop
0,POINT (3946567.329 3095878.873),1.0,3.751407
1,POINT (3947095.924 3096779.923),14.0,52.519703
2,POINT (3947239.940 3096639.895),159.0,596.47377
3,POINT (3947296.211 3096433.114),13.0,48.768296
4,POINT (3947914.961 3096584.169),4.0,15.005629


Now we are finally ready to obtain as estimate of the "nighttime" residents of the municipality of Leuven. To do so, we will merge the information of our census data with our dataframes with estimates of the locations of all students.

In [28]:
# Perform spatial join to assign each student room to a statistical sector
rooms_with_sectors = gpd.sjoin(grouped_std_rooms_gdf, leuven_gdf, how="inner", op="within")

# Sum the scaled student room populations by statistical sector
student_room_population_by_sector = rooms_with_sectors.groupby('CS01012024')['scaled_pop'].sum()

# Now, merge this summed student population with the sectors dataframe
sectors_with_total_population = leuven_gdf.merge(student_room_population_by_sector, on='CS01012024', how='left')

# Replace NaN values with 0 for sectors with no student rooms
sectors_with_total_population['scaled_pop'].fillna(0, inplace=True)

  if await self.run_code(code, result, async_=asy):
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  sectors_with_total_population['scaled_pop'].fillna(0, inplace=True)


In [29]:
## this plot shows which statistical sectors population has been boosted by adding the scaled kot population
# sectors_with_total_population.explore("scaled_pop")

#### Still have to add the residences

In [30]:
# Perform spatial join to assign each residence room to a statistical sector
residences_with_sectors = gpd.sjoin(all_residences, sectors_with_total_population, how="inner", op="within")

# Sum the residences rooms by statistical sector
residence_unreg_population_by_sector = residences_with_sectors.groupby('CS01012024')['unreg_residence_pop'].sum()

# Now, merge this summed student population with the sectors dataframe
res_sectors_with_total_population = sectors_with_total_population.merge(residence_unreg_population_by_sector, on='CS01012024', how='left')

# Replace NaN values with 0 for sectors with no residences
res_sectors_with_total_population['unreg_residence_pop'].fillna(0, inplace=True)

# Add the population from residence rooms to the population of each sector
res_sectors_with_total_population['total_possible_CFR'] = round(
    res_sectors_with_total_population['group15_64'] + 
    res_sectors_with_total_population['scaled_pop'] +
    res_sectors_with_total_population['unreg_residence_pop']
)



  if await self.run_code(code, result, async_=asy):
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  res_sectors_with_total_population['unreg_residence_pop'].fillna(0, inplace=True)


### Finally
Now the population of each statistical sector has been updated by adding the scaled kot population _and_ unregistered students in residences. All working age people living in the city have been accounted for: non-resident students have been added to the population of each statistical sector.

In [31]:
## plotting the raw totals of possible CFRs per statistical sector
# res_sectors_with_total_population.explore("total_possible_CFR", tooltip = ["group15_64","scaled_pop","unreg_residence_pop","total_possible_CFR"])

In [32]:
sum(res_sectors_with_total_population["total_possible_CFR"])

116817.0

In [33]:
## plots to compare the old and new nighttime CFR sampling probability densities
## new sampling density 
res_sectors_with_total_population["new_sampling_density"] = res_sectors_with_total_population["total_possible_CFR"]/res_sectors_with_total_population["geometry"].area

## old sampling density
res_sectors_with_total_population["old_sampling_density"] = (res_sectors_with_total_population["group0_14"] + res_sectors_with_total_population["group15_64"] +
res_sectors_with_total_population["group65ETP"])/res_sectors_with_total_population["geometry"].area

In [34]:
## old sampling density in each statistical sector
res_sectors_with_total_population.explore("old_sampling_density")

In [35]:
## new sampling density in each statistical sector
res_sectors_with_total_population.explore("new_sampling_density")

## Daytime Population

The population of any municipality varies substantially throughout the day. Commuters both to and from the area move to their place of work or study. We will account for the change in the number of working age people during the day, and the location of these people. Working-age people can be broken up into the categories of workers, non-workers and students. The economic sector of each worker in Leuven can be determined, and these workers will be distributed throughout the municipality to "establishment units" of companies which operate within that sector. Students can be distributed throughout the campuses and and their statistical sector of residence in the city. Non-workers can be distributed throughout the city according to the statistical sector working-age population density.

#### Data sources: PLACEHOLDER
- **VKBO companies and establishment units**: https://www.vlaanderen.be/datavindplaats/catalogus/vkbo-ondernemingen-en-vestigingseenheden
| Locations of companies and establishment units (branches of companies) in Leuven.
- **Municipality workers data, municipality of employment**: https://statbel.fgov.be/nl/themas/census/arbeidsmarkt/jobkenmerken#panel-13
| (file: T01_BE_LPW_REFNIS_07JAN25_NL.xlsx)
| Where Leuven workers are commuting from, and where Leuven workers are commuting to.
- **Municipality workers data, employment status**: https://statbel.fgov.be/nl/themas/census/arbeidsmarkt/jobkenmerken#panel-13
| (file: T01_CAS_AGE_BE_NL.xlsx)
| Gives us the number of Leuven residents who are working, unemployed, or registered as students (15-64, or 20-64).

In [37]:
## load in the data
companies_gdf = gpd.read_file("Data/VKBO_24062_Shapefile/Shapefile/Vkbo24062.shp")

In [40]:
## display the data on companies and establishment units.
companies_gdf.head()

Unnamed: 0,OIDN,UIDN,OND_VESTNR,D_INSCHR,D_START,D_STOP,REDEN_STOP,D_AFSL,OND_VEST,TYPE_OND,...,BTW_NACE_V,BTW_NACE_O,BTW_AANTAL,RSZ_NACE_C,RSZ_NACE_V,RSZ_NACE_O,RSZ_AANTAL,PERSKLASSE,URL_NBB,geometry
0,1778194,8742142,2148341944,2005-08-09,2004-01-01,1900-01-01,,9999-12-31,Vestiging,,...,,,0,,,,0,,https://consult.cbso.nbb.be/consult-enterprise...,POINT (174265.900 174801.160)
1,1993659,9146200,2010938870,2003-01-25,1974-12-01,1900-01-01,,9999-12-31,Vestiging,,...,,,0,,,,0,,https://consult.cbso.nbb.be/consult-enterprise...,POINT (173184.720 174835.980)
2,2131818,7373300,880007952,2006-03-16,2005-12-14,1900-01-01,,9999-12-31,Onderneming,Rechtspersoon,...,2008.0,Teelt van pit- en steenvruchten,1,,,,0,0 tot 0,https://consult.cbso.nbb.be/consult-enterprise...,POINT (174913.200 173565.910)
3,5425671,8408589,2357026358,2024-03-19,2021-07-01,1900-01-01,,9999-12-31,Vestiging,,...,,,0,,,,0,,https://consult.cbso.nbb.be/consult-enterprise...,POINT (174859.520 171031.780)
4,4154548,8612278,2286825478,2019-03-26,2007-04-17,1900-01-01,,9999-12-31,Vestiging,,...,,,0,,,,0,,https://consult.cbso.nbb.be/consult-enterprise...,POINT (172906.630 174269.560)


In [47]:
companies_gdf["BTW_NACE_C"].unique()

array([None, '01240', '69101', '56101', '47810', '45204', '70210',
       '94999', '68203', '68100', '45113', '73200', '68201', '20590',
       '90023', '41102', '70220', '82990', '86220', '80100', '73120',
       '56301', '32400', '47910', '42919', '62020', '58290', '47112',
       '62090', '47299', '43222', '90031', '46220', '69102', '69203',
       '88911', '86210', '96012', '59111', '77295', '86901', '86909',
       '72190', '68311', '47410', '94110', '72200', '74909', '56102',
       '88999', '73110', '56210', '46150', '49320', '82300', '71111',
       '41201', '77293', '86905', '41101', '78100', '94995', '85592',
       '46520', '56290', '71121', '33200', '77292', '18120', '31099',
       '62010', '23190', '47300', '74201', '87302', '81220', '47620',
       '69201', '36000', '46751', '46494', '55100', '93130', '64200',
       '85599', '46460', '93123', '86230', '47711', '47730', '63990',
       '47540', '43299', '60100', '72110', '94991', '46120', '47591',
       '85520', '82110'

In [51]:
companies_gdf.loc[companies_gdf["BTW_NACE_C"] == "46492"]

Unnamed: 0,OIDN,UIDN,OND_VESTNR,D_INSCHR,D_START,D_STOP,REDEN_STOP,D_AFSL,OND_VEST,TYPE_OND,...,BTW_NACE_V,BTW_NACE_O,BTW_AANTAL,RSZ_NACE_C,RSZ_NACE_V,RSZ_NACE_O,RSZ_AANTAL,PERSKLASSE,URL_NBB,geometry
25511,4376503,8589378,450858572,2003-01-18,1993-09-09,1900-01-01,,9999-12-31,Onderneming,Rechtspersoon,...,2008,Groothandel in kantoor- en schoolbenodigdheden,1,,,,0,1 tot 4,https://consult.cbso.nbb.be/consult-enterprise...,POINT (175888.290 171058.810)


In [52]:
companies_gdf.columns

Index(['OIDN', 'UIDN', 'OND_VESTNR', 'D_INSCHR', 'D_START', 'D_STOP',
       'REDEN_STOP', 'D_AFSL', 'OND_VEST', 'TYPE_OND', 'RECHTSVORM',
       'RECHTSTOE', 'AMBTD_DLWZ', 'AMBTD_RED', 'AMBTD_DBEG', 'AMBTD_DEIN',
       'OND_MZ', 'MAATSCH_NM', 'COMM_NM', 'AFKORT_NM', 'ZOEK_NM', 'VKBO_STR',
       'VKBO_HNR', 'VKBO_BNR', 'VKBO_NISC', 'VKBO_PC', 'VKBO_GEM',
       'ADRDH_DLWZ', 'ADRDH_RED', 'ADRDH_DAT', 'TEL', 'EMAIL', 'CRAB_STR',
       'CRAB_HNR', 'CRAB_BNR', 'CRAB_PC', 'BTW_NACE_C', 'BTW_NACE_V',
       'BTW_NACE_O', 'BTW_AANTAL', 'RSZ_NACE_C', 'RSZ_NACE_V', 'RSZ_NACE_O',
       'RSZ_AANTAL', 'PERSKLASSE', 'URL_NBB', 'geometry'],
      dtype='object')

In [65]:
companies_gdf["PERSKLASSE"]

array([None, '0 tot 0', '1 tot 4', '200 tot 499', '5 tot 9', '20 tot 49',
       '100 tot 199', '10 tot 19', '50 tot 99', '500 tot 999',
       '1000 en meer'], dtype=object)

In [77]:
m = companies_gdf.loc[companies_gdf["PERSKLASSE"] == '1000 en meer'].explore(color = "red", tooltip = ["MAATSCH_NM", "ZOEK_NM", "BTW_NACE_C"])
m = companies_gdf.loc[companies_gdf["PERSKLASSE"] == '500 tot 999'].explore(m = m, color = "blue", tooltip = ["MAATSCH_NM", "ZOEK_NM", "BTW_NACE_V"])
m = companies_gdf.loc[companies_gdf["PERSKLASSE"] == '200 tot 499'].explore(m = m, color = "green", tooltip = ["MAATSCH_NM", 'ZOEK_NM', "BTW_NACE_V"])
m = companies_gdf.loc[companies_gdf["PERSKLASSE"] == '100 tot 199'].explore(m = m, color = "brown", tooltip = ["MAATSCH_NM", 'ZOEK_NM', "BTW_NACE_V"])
folium.LayerControl().add_to(m)
m