# Project Group - 20 

Members: 

1. Sheikh Arfahmi Bin Sheikh Arzimi

2. Ewan Brett

3. Cedric Nissen

4. Nills Hollnagel

5. Luka Rehviašvili

Student numbers: 

1. 6452868

2. 6525318

3. 6560733

4. 6540848

5. 6299318

# Research Objective

To develop an interactive dashboard that aids effective crowd flow management for the SAIL2025 event in Amsterdam.

# Introduction 

Effective crowd management is essential to ensure public safety and improve visitors’ experience at large-scale events such as SAIL 2025 in Amsterdam. Without sufficient crowd monitoring, crowd managers may lack oversight of crowd densities, potentially increasing the risk of overcrowding and related safety incidents. Past tragedies, such as the Seoul Halloween Parade in 2022, Houston’s Astroworld Festival in 2021 and Germany’s Love Parade in 2010, underscore the importance of proactive crowd monitoring and prediction systems. This project, therefore, aims to develop an interactive dashboard that serves to aid the SAIL Crowd Monitoring Team (CMT) in making informed decisions in real-time to manage crowd levels effectively and efficiently. Beyond SAIL, this dashboard could also be a modular tool which can be implemented by other large-scale event organisers worldwide. This project is also part of the broader crowd management strategy and will be integrated with physical control measures to form a holistic solution to crowd management challenges. 

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1: Sheikh **: Future Crowd Flow Prediction (ML Module)

**Author 2: Nils **: User Authentication & Security 

**Author 3: Cedric **: Reports, Settings & Multi-User Management

**Author 4: Ewan **: Visualization / Dashboard Interface

**Author 5: Luka **: Current Crowd Flow Data Pipeline

# Data Used

## Confirmed Datasets

1.	Crowd flow (based on sensor counts)

    •	Number of sensors: 46 locations, bidirectional

    •	Refresh rate: Every 3 minutes

2.	SAIL Event timetable (https://www.sail.nl/programma-en-plattegrond)

3.	Geospatial Information of Sail 2025 Area (OpenStreetMaps, ArcGIS)

## Potential Datasets (Pending Requests)

1.	Vessel position

2.	Positions of traffic marshals

3.	NS Train (live) Timetables (https://ndovloket.nl/index.html) 

4.	GVB (live) Timetables - Metros, Trams, Buses (https://ndovloket.nl/index.html) 

5.	Meteorological data (based on KNMI data, alternative https://www.wunderground.com/history/weekly/nl/schiphol/EHAM/date/2025-8-20 copy to Excel and export as CSV)



# Data Pipeline

1.	Data Ingestion 

    •	Usage of confirmed datasets listed above. 

    •	In case access is granted, the above listed “potential datasets” will also be considered.

2.	Data Processing/Transformation 

    •	Compile all the data into a time-series pandas DataFrames, for easy reading, updating and plotting.

    •	Clean and standardise data to ensure consistency across sources.

3.	Data Storage 

    •	Store all data on shared drives that are backed up on university servers – OneDrive.

    •	Have a redundant version of the data, in case a file becomes corrupted and unusable. 

4.	Data Analysis & Prediction 

    •	Analysis of the retrieved data 

            i.	Cleaning

            ii.	Removing the duplicates 

            iii. Making sure there are no inconsistencies

            iv.	Bias checks

                1.	Inspect the representation across key groups 

                2.	Check missingness, errors and label quality by group

    •	The usage of the appropriate data analysis tool (Python and Excel (in case of))

    •	Prediction will be based on the need and the initial results

5.	Data Visualisation & Delivery

    •	Deliver dashboard as a Web Application using streamlit

    •	Display interactive maps, charts and tables based on the preference of the CMT, adapting the view based on size and category of the dataset respectively.


In [24]:
import pandas as pd
def load_sensor_locations():
    sensor_loc = pd.read_csv("data/sensor_location.csv")
    # Split Lat/Long into floats
    sensor_loc[['Lat', 'Lon']] = sensor_loc['Lat/Long'].str.split(',', expand=True).astype(float)
    return sensor_loc



In [25]:
sensor_loc = load_sensor_locations()

In [26]:
sensor_loc

Unnamed: 0,Objectummer,Locatienaam,Lat/Long,Breedte,Effectieve breedte,Lat,Lon
0,CMSA-GAKH-01,Kalverstraat t.h.v. 1,"52.372634, 4.892071",8.0,67,52.372634,4.892071
1,CMSA-GAWW-11,Korte Niezel,"52.374616, 4.899830",38.0,34,52.374616,4.89983
2,CMSA-GAWW-12,Oudekennissteeg,"52.373860, 4.898690",3.0,26,52.37386,4.89869
3,CMSA-GAWW-13,Stoofsteeg,"52.372439, 4.897689",26.0,22,52.372439,4.897689
4,CMSA-GAWW-14,Oudezijds Voorburgwal t.h.v. 91,"52.373538, 4.898166",4.0,36,52.373538,4.898166
5,CMSA-GAWW-15,Oudezijds Achterburgwal t.h.v. 86,"52.372916, 4.898207",32.0,28,52.372916,4.898207
6,CMSA-GAWW-16,Oudezijds Achterburgwal t.h.v. 91,"52.372628, 4.898233",31.0,27,52.372628,4.898233
7,CMSA-GAWW-17,Oudezijds Voorburgwal t.h.v. 206,"52.372782, 4.896649",51.0,47,52.372782,4.896649
8,CMSA-GAWW-19,Molensteeg,"52.373587, 4.899815",29.0,25,52.373587,4.899815
9,CMSA-GAWW-20,Oudebrugsteeg,"52.375350, 4.897480",57.0,53,52.37535,4.89748


In [28]:
def load_sensor_data():
    sensor_data = pd.read_csv("data/sensor_data.csv")
    return sensor_data

In [29]:
sensor_data = load_sensor_data()

In [30]:
sensor_data.head()

Unnamed: 0,timestamp,CMSA-GAKH-01_0,CMSA-GAKH-01_180,CMSA-GAWW-11_120,CMSA-GAWW-11_300,CMSA-GAWW-12_115,CMSA-GAWW-12_295,CMSA-GAWW-13_120,CMSA-GAWW-13_300,CMSA-GAWW-14_40,...,GVCV-13_10,GVCV-13_190,GVCV-14_90,GVCV-14_270,hour,minute,day,month,weekday,is_weekend
0,2025-08-20 00:00:00+02:00,15,4,29,33,44,28,42,37,11,...,41,40,0,0,0,0,20,8,2,0
1,2025-08-20 00:03:00+02:00,1,3,21,29,34,39,9,14,6,...,0,0,0,0,0,3,20,8,2,0
2,2025-08-20 00:06:00+02:00,5,4,35,22,29,34,33,42,14,...,0,0,0,0,0,6,20,8,2,0
3,2025-08-20 00:09:00+02:00,4,4,40,47,42,40,19,34,15,...,0,0,0,0,0,9,20,8,2,0
4,2025-08-20 00:12:00+02:00,4,11,54,59,58,33,17,33,26,...,127,57,0,0,0,12,20,8,2,0


Notice how the names of the sensors in sensor_data are different from the names of the sensors in sensor_location?

For example, for one of the names
sensor_data: "CMSA-GAKH-01_0" 
sensor_location: "CMSA-GAKH-01"


for sensor_data there is a "_0" at the end and "_180". This seems to indicate bidirectional data. 

what is the best solution in terms of data cleaning? 

create a 'cleaned' sensor_data.csv. For sensors with bidirectional data, update name and duplicate row. include directions. 

In [40]:
# convert column index to list
sensor_data_index_list = sensor_data.columns.tolist()
sensor_data_index_list

['timestamp',
 'CMSA-GAKH-01_0',
 'CMSA-GAKH-01_180',
 'CMSA-GAWW-11_120',
 'CMSA-GAWW-11_300',
 'CMSA-GAWW-12_115',
 'CMSA-GAWW-12_295',
 'CMSA-GAWW-13_120',
 'CMSA-GAWW-13_300',
 'CMSA-GAWW-14_40',
 'CMSA-GAWW-14_220',
 'CMSA-GAWW-15_30',
 'CMSA-GAWW-15_210',
 'CMSA-GAWW-16_30',
 'CMSA-GAWW-16_210',
 'CMSA-GAWW-17_40',
 'CMSA-GAWW-17_220',
 'CMSA-GAWW-19_115',
 'CMSA-GAWW-19_295',
 'CMSA-GAWW-20_120',
 'CMSA-GAWW-20_300',
 'CMSA-GAWW-21_120',
 'CMSA-GAWW-21_300',
 'CMSA-GAWW-23_109',
 'CMSA-GAWW-23_289',
 'GACM-04_50',
 'GACM-04_230',
 'GASA-01-A1_135',
 'GASA-01-A1_315',
 'GASA-01-A2_135',
 'GASA-01-A2_315',
 'GASA-01-B_135',
 'GASA-01-B_315',
 'GASA-01-C_135',
 'GASA-01-C_315',
 'GASA-02-01_135',
 'GASA-02-01_315',
 'GASA-02-02_135',
 'GASA-02-02_315',
 'GASA-03_105',
 'GASA-03_285',
 'GASA-04_135',
 'GASA-04_315',
 'GASA-05-O_135',
 'GASA-05-O_315',
 'GASA-05-W_135',
 'GASA-05-W_315',
 'GASA-06_95',
 'GASA-06_275',
 'GASA-06-B_95',
 'GASA-06-B_275',
 'GVCV-01_40',
 'GVCV-01_220',


In [None]:
# remove first and last 6 items because they are not sensor ids
sensor_data_index_list = sensor_data_index_list[1:-6]
sensor_data_index_list

['CMSA-GAKH-01_0',
 'CMSA-GAKH-01_180',
 'CMSA-GAWW-11_120',
 'CMSA-GAWW-11_300',
 'CMSA-GAWW-12_115',
 'CMSA-GAWW-12_295',
 'CMSA-GAWW-13_120',
 'CMSA-GAWW-13_300',
 'CMSA-GAWW-14_40',
 'CMSA-GAWW-14_220',
 'CMSA-GAWW-15_30',
 'CMSA-GAWW-15_210',
 'CMSA-GAWW-16_30',
 'CMSA-GAWW-16_210',
 'CMSA-GAWW-17_40',
 'CMSA-GAWW-17_220',
 'CMSA-GAWW-19_115',
 'CMSA-GAWW-19_295',
 'CMSA-GAWW-20_120',
 'CMSA-GAWW-20_300',
 'CMSA-GAWW-21_120',
 'CMSA-GAWW-21_300',
 'CMSA-GAWW-23_109',
 'CMSA-GAWW-23_289',
 'GACM-04_50',
 'GACM-04_230',
 'GASA-01-A1_135',
 'GASA-01-A1_315',
 'GASA-01-A2_135',
 'GASA-01-A2_315',
 'GASA-01-B_135',
 'GASA-01-B_315',
 'GASA-01-C_135',
 'GASA-01-C_315',
 'GASA-02-01_135',
 'GASA-02-01_315',
 'GASA-02-02_135',
 'GASA-02-02_315',
 'GASA-03_105',
 'GASA-03_285',
 'GASA-04_135',
 'GASA-04_315',
 'GASA-05-O_135',
 'GASA-05-O_315',
 'GASA-05-W_135',
 'GASA-05-W_315',
 'GASA-06_95',
 'GASA-06_275',
 'GASA-06-B_95',
 'GASA-06-B_275',
 'GVCV-01_40',
 'GVCV-01_220',
 'GVCV-03_42',

In [48]:
# Now time to split the strings into sensor_id and direction

sensor_info = []
for item in sensor_data_index_list:
    parts = item.split('_')
    # print(parts)
    sensor_info.append({
        'sensor_id_full':item,
        'sensor_id': parts[0],
        'sensor_direction': parts[1]
    })

sensor_info



[{'sensor_id_full': 'CMSA-GAKH-01_0',
  'sensor_id': 'CMSA-GAKH-01',
  'sensor_direction': '0'},
 {'sensor_id_full': 'CMSA-GAKH-01_180',
  'sensor_id': 'CMSA-GAKH-01',
  'sensor_direction': '180'},
 {'sensor_id_full': 'CMSA-GAWW-11_120',
  'sensor_id': 'CMSA-GAWW-11',
  'sensor_direction': '120'},
 {'sensor_id_full': 'CMSA-GAWW-11_300',
  'sensor_id': 'CMSA-GAWW-11',
  'sensor_direction': '300'},
 {'sensor_id_full': 'CMSA-GAWW-12_115',
  'sensor_id': 'CMSA-GAWW-12',
  'sensor_direction': '115'},
 {'sensor_id_full': 'CMSA-GAWW-12_295',
  'sensor_id': 'CMSA-GAWW-12',
  'sensor_direction': '295'},
 {'sensor_id_full': 'CMSA-GAWW-13_120',
  'sensor_id': 'CMSA-GAWW-13',
  'sensor_direction': '120'},
 {'sensor_id_full': 'CMSA-GAWW-13_300',
  'sensor_id': 'CMSA-GAWW-13',
  'sensor_direction': '300'},
 {'sensor_id_full': 'CMSA-GAWW-14_40',
  'sensor_id': 'CMSA-GAWW-14',
  'sensor_direction': '40'},
 {'sensor_id_full': 'CMSA-GAWW-14_220',
  'sensor_id': 'CMSA-GAWW-14',
  'sensor_direction': '220

In [49]:
sensor_info_df = pd.DataFrame(sensor_info)
sensor_info_df

Unnamed: 0,sensor_id_full,sensor_id,sensor_direction
0,CMSA-GAKH-01_0,CMSA-GAKH-01,0
1,CMSA-GAKH-01_180,CMSA-GAKH-01,180
2,CMSA-GAWW-11_120,CMSA-GAWW-11,120
3,CMSA-GAWW-11_300,CMSA-GAWW-11,300
4,CMSA-GAWW-12_115,CMSA-GAWW-12,115
...,...,...,...
69,GVCV-11_230,GVCV-11,230
70,GVCV-13_10,GVCV-13,10
71,GVCV-13_190,GVCV-13,190
72,GVCV-14_90,GVCV-14,90


In [50]:
# --- Merge with sensor location data ---
sensor_location_cleaned = pd.merge(
    sensor_info_df,
    sensor_loc,
    left_on='sensor_id',
    right_on='Objectummer',
    how='left'
)

sensor_location_cleaned

Unnamed: 0,sensor_id_full,sensor_id,sensor_direction,Objectummer,Locatienaam,Lat/Long,Breedte,Effectieve breedte,Lat,Lon
0,CMSA-GAKH-01_0,CMSA-GAKH-01,0,CMSA-GAKH-01,Kalverstraat t.h.v. 1,"52.372634, 4.892071",8,67,52.372634,4.892071
1,CMSA-GAKH-01_180,CMSA-GAKH-01,180,CMSA-GAKH-01,Kalverstraat t.h.v. 1,"52.372634, 4.892071",8,67,52.372634,4.892071
2,CMSA-GAWW-11_120,CMSA-GAWW-11,120,CMSA-GAWW-11,Korte Niezel,"52.374616, 4.899830",38,34,52.374616,4.899830
3,CMSA-GAWW-11_300,CMSA-GAWW-11,300,CMSA-GAWW-11,Korte Niezel,"52.374616, 4.899830",38,34,52.374616,4.899830
4,CMSA-GAWW-12_115,CMSA-GAWW-12,115,CMSA-GAWW-12,Oudekennissteeg,"52.373860, 4.898690",3,26,52.373860,4.898690
...,...,...,...,...,...,...,...,...,...,...
69,GVCV-11_230,GVCV-11,230,GVCV-11,NDSMkade,"52.401060, 4.891368",35,31,52.401060,4.891368
70,GVCV-13_10,GVCV-13,10,GVCV-13,Buiksloterweg,"52.382352, 4.903016",35,31,52.382352,4.903016
71,GVCV-13_190,GVCV-13,190,GVCV-13,Buiksloterweg,"52.382352, 4.903016",35,31,52.382352,4.903016
72,GVCV-14_90,GVCV-14,90,GVCV-14,Buiksloterweg,"52.382169, 4.903385",35,31,52.382169,4.903385


In [51]:
# --- Save to CSV ---
sensor_location_cleaned.to_csv("data/sensor_location_cleaned.csv", index=False)

In [54]:
def load_sensor_locations():
    sensor_loc = pd.read_csv("data/sensor_location_cleaned.csv")
    # Split Lat/Long into floats
    sensor_loc[['Lat', 'Lon']] = sensor_loc['Lat/Long'].str.split(',', expand=True).astype(float)
    return sensor_loc
sensor_loc = load_sensor_locations()
sensor_loc

Unnamed: 0,sensor_id_full,sensor_id,sensor_direction,Objectummer,Locatienaam,Lat/Long,Breedte,Effectieve breedte,Lat,Lon
0,CMSA-GAKH-01_0,CMSA-GAKH-01,0,CMSA-GAKH-01,Kalverstraat t.h.v. 1,"52.372634, 4.892071",8,67,52.372634,4.892071
1,CMSA-GAKH-01_180,CMSA-GAKH-01,180,CMSA-GAKH-01,Kalverstraat t.h.v. 1,"52.372634, 4.892071",8,67,52.372634,4.892071
2,CMSA-GAWW-11_120,CMSA-GAWW-11,120,CMSA-GAWW-11,Korte Niezel,"52.374616, 4.899830",38,34,52.374616,4.899830
3,CMSA-GAWW-11_300,CMSA-GAWW-11,300,CMSA-GAWW-11,Korte Niezel,"52.374616, 4.899830",38,34,52.374616,4.899830
4,CMSA-GAWW-12_115,CMSA-GAWW-12,115,CMSA-GAWW-12,Oudekennissteeg,"52.373860, 4.898690",3,26,52.373860,4.898690
...,...,...,...,...,...,...,...,...,...,...
69,GVCV-11_230,GVCV-11,230,GVCV-11,NDSMkade,"52.401060, 4.891368",35,31,52.401060,4.891368
70,GVCV-13_10,GVCV-13,10,GVCV-13,Buiksloterweg,"52.382352, 4.903016",35,31,52.382352,4.903016
71,GVCV-13_190,GVCV-13,190,GVCV-13,Buiksloterweg,"52.382352, 4.903016",35,31,52.382352,4.903016
72,GVCV-14_90,GVCV-14,90,GVCV-14,Buiksloterweg,"52.382169, 4.903385",35,31,52.382169,4.903385


In [57]:
for i,row in sensor_loc.iterrows():
    sensor_id = row['sensor_id_full']
    crowd_count = sensor_data[sensor_id][0]
    print(crowd_count)
    # Map crowd count to colour and radius

    

15
4
29
33
44
28
42
37
11
3
21
26
73
41
19
28
27
31
27
50
33
45
12
13
20
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
14
2
0
0
0
55
59
41
40
0
0
