# Forecasting Dutch Rail Network Disruptions

This project is part of the course: TIL6022: *Python Programming*

| **Student Name** | **Student ID** |
|------------------|----------------|
| Ioannis Nikas    | 6550266        |
| Yannis Kiesslich | 6572049        |
| Frank Ansink     | 5389984        |
| Bogdan Buzatu    | 5239265        |
| Raoul de Rooij   | 5562481        |

## Introduction

The Dutch rail network experiences frequent service disruptions with detailed archival data available since 2011 [1], enabling a rigorous, time-of-day analysis across lines, causes and durations. Peak periods are operationally well defined by the national operator’s off peak windows and are known to concentrate a large share of daily passenger flows – thus making them a critical point for reliability analysis and passenger impact assessment.

## Research Objective

Quantify whether disruption frequency and duration are higher during peak hours compared to off-peak, controlling for calendar events and planned works. Identify routes and corridors where peak-hour vulnerability is most acute, with emphasis on high-throughput hubs such as Amsterdam Centraal, Utrecht Centraal, Rotterdam Centraal and Den Haag Centraal [2]. Characterize whether disruption causes differ by time-of-day and assess their peak-hour severity profiles. Forecast disruption counts per hour for increased passenger flow in the future based on realistic data on expected NS users and forecasting models widely used in academia.

## Research Questions

The research questions as formulated based on the research objectives and the introduction of this project are:

- **RQ1:** What are the main disruption causes?
- **RQ2:** Are disruption counts per hour significantly higher during peak hours than off peak hours?
- **RQ3:** Do disruption causes differ by time-of-day?
- **RQ4:** Can disruption counts be forecasted accurately to improve predictions over calendar-only baselines?

## Data Sets

For the purposes of this project we have used three main data sets:

1. Data on check-ins using OVChipkaarts [3].
2. Open Data on train services. Data set contains all passenger train services in the Netherlands since 2019 (departure times, arrival times and service updates) [4, 5].
3. Data on train disruptions (all train disruptions since 2011) [6].

## Methodology

The first step of the methodology is to define the cohort, as in building a multi-year panel of disruptions with event timestamps mapped to hourly bins and labeled as peak or off-peak using NS definitions and weekday/holiday rules. Furthermore, it is deemed necessary to join disruptions to planned work flags and project windows to isolate outside impacts versus background reliability. Indicators will be created for cause categories, affected routes, multi-line impact and duration quantiles – accounting for duplication notes when a single event affects multiple lines. Peak hour effects are estimated using statistical modelling for hourly counts with controls for days of the week, months, planned works and hub indicators. Corridors frequently present in disruption rankings and high throughput hubs are prioritized (spatial focus). Combining the above, the visualization part of the report includes heatmaps and scatter plots to showcase the relationship between peak hours and NS train disruptions.

## What are the main disruption causes?

In a first step we want to have a look at the main disruption causes in the NS network. For this we specifically looked at the disruption causes column from the disruption datasets for 2023 and 2024 and created a pie chart out of it. Figure 1 shows causes and associated percentages.

![Disruption causes breakdown](causes.png)
*Figure 1: Disruption causes breakdown*

Analysing the pie chart, we can see that more than a third of all disruptions are caused by rolling stock, which is the main disruption cause. Other main causes that account for more than 10% of disruptions respectively are infrastructure breakdowns, external causes and accidents. For a more detailed analysis of disruption causes on a time-of-day basis, please see research question 3.

## Are disruption counts per hour significantly higher during peak hours than off-peak hours?

In this section, we are trying to answer whether disruption counts during peak hours are higher than the ones that classify in the off-peak hours category. To answer that, we have analyzed two main datasets: Data on train disruptions [6 rijdendetreinen_disruptions] and data on check-ins using OVchipkaarts [3 translink_opendata].

By analyzing the first dataset, we have loaded and combined 14 files with a total of 55,864 disruptions. Delays over 334.5 minutes were filtered and not used. The results have been plotted and can be seen in Figure 2.

![Total disruption counts per hour since 2011](dirsuptions.png)
*Figure 2: Total disruption counts per hour since 2011*

As implied in the above figure, disruptions do tend to be higher during peak hours than in off-peak hours. By performing a statistical significance test, it has been proven that:

1. Peak hour disruption rate is 0.68 disruptions per hour.
2. Off-peak hour disruption rate is 0.37 disruptions per hour.
3. Chi-squared is equal to χ² = 4164.17.
4. P-value is equal to p = 0.0.

Therefore, the difference in disruption rates during peak and off-peak hours is statistically significant with a p-value of p = 0.0.

The graph shows that there is a higher amount of disruptions during NS peak hours (6:30 am–9:00 am & 4:00 pm–6:30 pm). Disruption counts peak once around 8:00 am and again at 5:00 pm, which falls directly into the peak-hour periods.

The question arises why this could be the case. Looking at the graph we can see that disruption counts are significantly lower at night, where usually fewer trains are scheduled. Therefore, we could assume that the amount of trains running has an impact on the amount of disruptions. With the following analysis we aimed to support this assumption. We took all train services from 2023 and 2024 into account. Each train service is assigned to a specific hour of the day column using the median time (middle value of departure time and arrival time at destination) to then calculate the average number of train journeys per hour per day.

![Train journey distribution per hour per day](train_services.png)
*Figure 3: Train journey distribution per hour per day*

While all throughout the daytime a high number of trains are scheduled, we can see a slightly higher amount of services during peak times. Looking at this graph, we can say that this definitely underlines our assumption, as both graphs show similar patterns. As we found out, failing rolling stock is a major cause for disruptions (see Figure 1). Therefore, the increased number of trains running during peak hours means there is a higher chance of total breakdowns, which explains the higher number of disruptions during that period.

In a next step we want to look at the relation with the amount of people travelling during peak times. To do this we looked at the OV chipcard dataset, which includes data on check-in times at all NS stations. To make matters comparable we are also looking at the years 2023 and 2024. The results have been plotted and can be seen in Figure 4.

![Total OV check-ins per hour for 2023 and 2024](OVcheckins.png)
*Figure 4: Total OV check-ins per hour for 2023 and 2024*

![Heatmap of check-ins in days of week and hours](ovheatmap.png)
*Figure 5: Heatmap of check-ins in days of week and hours*

Figure 5 shows a heatmap displaying the average number of check-ins per time of day, including the distribution across weekdays.

By running a statistical significance test on OV check-ins, the results show that average check-in rate during peak hours is 60% higher than the check in rate during non peak hours. Below we present the elaborated results:

1. Peak hour check-in rate is 243.99 check-ins per hour.
2. Off-peak hour check-in rate is 102.26 check-ins per hour.
3. Chi-squared is equal to χ² = 2847.52.
4. P-value is equal to p = 0.0.

Figure 4 shows that, as the name suggests, we can see the passenger numbers peaking during NS peak hours, especially from 7:00 until 8:00 and from 16:00 until 17:00. Looking at Figure 5, we can see clearly that the majority of check-ins occur during weekdays (Monday–Friday) and during the previously identified hour window. Comparing Figure 4 to Figure 2, it can be derived that disruptions do tend to happen simultaneously with the significant number of passenger flows entering the Dutch train system during these "rush" hours. This means that disruptions in those periods affect more people, making the higher number of disruptions during peak hours more problematic. This makes analysing disruptions on a time-of-day basis and predicting them for the future even more important (see chapter 4).

But furthermore, we wonder if the bigger amount of passengers could additionally be adding to disruption counts during peak hours. Even though it isn’t mentioned as a main disruption cause (see Figure 1), we could assume that prolonged boarding times and overcrowding can lead to disruptions. A possible explanation for these not showing up as a disruption cause is that the disruption dataset description states the following: “the rule of thumb that NS uses is that a disruption is communicated when multiple trains are delayed or cancelled (i.e. a major impact of the train service)” [6 rijdendetreinen_disruptions]. As overcrowding and prolonged waiting times would mostly result in small delays, instead of big scale disruptions and cancellations, those cases could not be included in the dataset. But since small disruptions and delays can also negatively affect passengers, we also want to try to take those cases into account.

In order to validate the assumption that higher passenger numbers cause more delays, we tried to find out average delays per hour per day. For this analysis we took the maximum delay of all train services from 2023 and 2024 (including 0 minute values if train service was on time), and calculated the average amount of minutes for services running during each hour in a day. For each train service the median time (middle value of departure time and arrival time at destination) was taken, to be able to place each service within a specific hour column.

![Average train delays per hour](train_delays.png)
*Figure 6: Average train delays per hour*

The first thing we can notice is that delays are the highest at night (specifically at 1:00 am). However, this time of day offers the smallest sample size and we want to rather focus on the daytime, where most trains are scheduled and most people are travelling (see Figures 3 and 4). Looking at the graph between 6:00 am and 10:00 pm we can see a familiar pattern. Peak hours see the highest average amount of delay per train service, which we assume is mostly due to the higher number of passengers at that time causing prolonged boarding times.

## Do disruption causes differ by time-of-day?

To answer this question the “Rijden de Treinen” dataset was first cleaned and pre-processed for the year 2023. The disruption start times were converted to a date-time format, after which each disruption was assigned to a specific time-of-day category:

- Night (00:00–06:00)
- Morning peak (06:00–09:00)
- Morning (09:00–12:00)
- Afternoon (12:00–16:00)
- Evening peak (16:00–19:00)
- Evening (19:00–00:00)

The data were grouped by both *time-of-day* and *cause group* to calculate the frequency and relative share of each disruption type within every time window. By presenting the results in a bar chart, the different causes can be easily compared with one another.

![Disruption causes by time of day analysis](disr_causes_time.png)
*Figure 7: Disruption causes by time of day analysis*

What stands out from the results is that almost throughout the entire day, the majority of disruptions are caused by rolling stock. This is a collective term for all railway vehicles, referring to issues such as a defect in the train itself (e.g., doors, engines, brakes, or electronics), a stranded train, or a failure related to the operation of the rolling equipment. Only during the night (00:00–06:00) is this not the dominant cause. This can likely be explained by the fact that there is very little train traffic at night, which reduces the likelihood of mechanical failures and operational disruptions. Moreover, more maintenance and inspections are carried out during nighttime hours, allowing potential defects to be detected and resolved before they can cause a disruption. However, other causes dominate during the night. Infrastructure and staff are the two main categories. This can be explained by the fact that scheduled maintenance of the railway network often takes place at night. Such activities are recorded in the open data as infrastructure causes, since they affect the regular timetable. The large share of staff-related causes can also be explained by the limited personnel availability during these hours. With fewer train drivers, conductors, traffic controllers, and maintenance engineers on duty, any absence or delay quickly results in a registered staff cause.

During the day (06:00–19:00), the distribution of causes remains relatively stable, with rolling stock being the primary contributor in approximately 40% of all cases. The second major cause is infrastructure-related disruptions, accounting for about 25% of the total. In the dataset, infrastructure refers to failures or defects in the railway infrastructure itself, everything that is physically part of the rail system, rather than the trains. A potential difference between peak and off-peak hours was expected, but no significant variation can be observed.

Another striking result is that the relative number of accident-related causes in the evening (19:00–00:00) is much higher than during the rest of the day. As much as 20% of the disruptions are caused by accidents in this period, compared to no more than 10% at other times, roughly twice as high. This evening peak is likely the result of several combined factors. Dusk and darkness increase the risk of mistakes at level crossings and near the tracks. In addition, evening hours involve more leisure travel and movements with less routine and predictability, which can lead to riskier behavior around the railway network.

## Can disruption counts be forecasted accurately to improve predictions over calendar-only baselines?

To answer research question 3, a forecasting model had to be chosen to evaluate the model. After reviewing several common approaches to solve this problem, a few options were found, which were evaluated to determine which of the options would be best. Firstly, a Random Forest Classifier was used, from the scikit-learn module. This classifier was used to forecast: “Will a disruption occur in a certain hour?”. The answer to this was a binary (yes/no) result. As shown in the figure below, the accuracy of this model was only 63%. More important is the recall feature in this table. While the recall on “No disruption” looks very good at 87% success rate, the recall rate on disruptions that do occur was only 23%, which is a very bad score. Only 23% of hours where a disruption occurred were forecasted by the model, proving to be a bad forecasting model for this application.

![Metrics of the first prediction model](first_model.png)
*Figure 8: Metrics of the first prediction model*

Secondly, a different approach was considered. This was done using the statsmodels module, with specifically negative binomial regression function. This model is useful for data that contains a lot of zeroes, which is the case for our data (over 50% of hours in the test data does not contain a disruption). After evaluating the model, the recall of the “No disruption” looks good at 75%, and a significant improvement was noted on the recall rate of the disruptions, at 50%. This means that the model identified 50% of hours in which a disruption actually occurred. When looking further into the results, a concerning metric was discovered, which is the RMSE. As this regression model does not forecast a binary result, but a numbered result of how many disruptions occurred that hour, the mean squared error could be evaluated. With a mean of ~0.62 for the test data, an RMSE of 3.418 is very high, proving that our model can estimate “will a disruption occur”, but cannot accurately determine how many disruptions would occur in a specific hour.

![Metrics of the second prediction model](second_model.png)
*Figure 9: Metrics of the second prediction model*

Finally, a Random Forest Regressor was evaluated, using the scikit-learn model. After evaluating the code, and plotting the same classification report, the following results emerged:

![Metrics of the third prediction model](third_model.png)
*Figure 10: Metrics of the third prediction model*

As can be seen, the recall on “No disruption” is not ideal, as it only identifies 54% of hours in which no disruptions occur. On the other hand, the recall on the disruptions looks promising. A 76% identification rate for the hours in which disruptions occur shows that the model can indeed accurately identify (binary) if a disruption occurs within that hour. Additionally, in contrast with the second model evaluated, the root mean squared error was significantly lower at just over 1. This can be attributed to inherent outliers in the data, as some hours show a lot of disruptions that occur. Plotting the average predicted disruptions of the Random Forest Regressor against the test data that the model does not know, it shows promising results. The model can accurately determine patterns in the disruption data, and the curve follows the curve of the actual disruptions. The averages were taken to make the data more insightful, as the plot would look very messy otherwise.

![Disruption prediction based on the third model](disruption_plot_final_predicted.png)
*Figure 11: Disruption prediction based on the third model*

## Conclusion
*(Section content to be added.)*


## Contribution Statement
| **Student Name** | **Contribution**                                                                                                                       |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| Ioannis Nikas    | RQ1 OVchipkaart dataset analysis and interpretation, formatting, references                                                            |
| Yannis Kiesslich | RQ2 train journey distribution + average delay: coding, analysis, interpretation; RQ1: coding, analysis, interpretation                |
| Frank Ansink     | RQ1 initial analysis of disruptions per hour and RQ4 forecasting analysis on the disruption datasets                                   |
| Bogdan Buzatu    | Analysis of train services dataset for general purposes and for an attempted forecast based on this data; GitHub repository management |
| Raoul de Rooij   | RQ3 Analysis on the disruptions happening at specific time of the day: coding, analysis, interpretation                                |


## References

[1] Rijden de Treinen. “Statistics,” Accessed: Nov. 2, 2025. [Online].
Available: <https://www.rijdendetreinen.nl/en/statistics>

[2] Rail Pass. “List of Busiest Railway Stations in the Netherlands,” Accessed: Nov. 2, 2025. [Online].
Available: <https://www.rail-pass.com/list-of-busiest-railway-stations-in-the-netherlands>

[3] Translink. “Open Data,” 2025. [Online].
Available: <https://translink.nl/open-data/>

[4] Rijden de Treinen. “Train Archive — Open Data,” 2025. [Online].
Available: <https://www.rijdendetreinen.nl/en/open-data/train-archive#:~:text=Related%20data-,Description%20of%20the%20data,website%20of%20Rijden%20de%20Treinen>

[5] DuckDB Foundation. “Dutch Railway Datasets,” 2025. [Online].
Available: <https://duckdb.org/docs/stable/guides/snippets/dutch_railway_datasets>

[6] Rijden de Treinen. “Open Data — Disruptions,” 2025. [Online].
Available: <https://www.rijdendetreinen.nl/en/open-data/disruptions>