# Prospective High Speed Rail Connections in Western Europe

 #### TIL6022 Python Programming - Group 3  
 Nicholas Alvord - 6119131  
 Amelija Ancupane -   
 David Eijbergen -   
 Yannick Elsten - 6306292  
 Koen Groeneveld -  6283594


## Introduction  
  
  
After the end of WW2, many countries’ governments around the world believed cars and airplanes were the future of transport and invested heavily in airports and highways. While cars and planes are indeed useful forms of transportation and certainly have their places in the world economy, there exists another form of transportation which can transform and revitalize the cities and regions in which it’s built: high-speed rail. As a concept, high-speed rail (HSR) is not a new technology, yet the environment benefits and positive socio-economic externalities it can bring to communities are nothing short of extraordinary. Large reductions of greenhouse gas emissions, less congestion on roads and airplanes alike, and the spurring of development along corridors are just a few of them.  

Within the last few decades, many countries in Europe have chosen to create and expand their HSR networks, connecting populations with a faster, greener, and more convenient mode of travel. However, there are still major gaps in those networks, especially across international borders. And given the EU goal to become a climate-neutral economy by 2050, it is crucial for investments in high-speed rail to be carried out in a timely manner. The only question left to answer is where such systems should be built. As such, this report was created to try and answer this question:

**Which city-pairs in Europe could stand to benefit the most from a high-speed rail connection?**

The answer to this question would be based upon three main factors: 

&nbsp;&nbsp;&nbsp;&nbsp;a.	The number of current air passengers.  
&nbsp;&nbsp;&nbsp;&nbsp;b.	The distance between city centers.  
&nbsp;&nbsp;&nbsp;&nbsp;c.	The metropolitan population of both cities.

In addition, two sub-questions were devised to place and give better context to the main question:  

&nbsp;&nbsp;&nbsp;&nbsp;1.	Which city pairs already have the highest number of air passengers?  
&nbsp;&nbsp;&nbsp;&nbsp;2.	Which city pairs fall within the set distance range?  

This report only focuses on cities which do not already have an HSR line between them. What constitutes a high-speed rail line was defined according to Category 3 of the European Union Directive 96/48/EC, Annex 1, which requires the tracks and infrastructure to be designed for high speeds, permitting a maximum speed of at least 200 km/h. For the purposes of this report, it was decided that for a rail line to be considered high speed, it would not need to reach those speeds along its whole length. Even if a route between two cities has a single segment where speeds over 200 km/h are reached, it would be considered a normal-speed line, and thus eligible for inclusion in the study.

It was decided that the geographical area of the report was to be constrained to five of the largest economies in the EU: France, Germany, Spain, Belgium, Luxembourg, and the Netherlands. These countries were chosen due to their economic and political strength within the EU, their well-developed rail networks, and their likeliness to invest further to improve said networks.

High-speed rail networks are expensive to build, and there are a few reasons for not building an HSR connection between cities, the first and foremost of which being distance. After consulting multiple sources, the ideal range for a HSR connection between two cities was determined to be between 150 – 1200 km. This range was considered appropriate because it is where the door-to-door travel time for HSR will be most competitive with other options. For shorter distances, the time differences between HSR and normal speed rail don’t justify the higher costs of a HSR line, and longer distances will be in heavy competition with airplanes, especially so if those routes go over large bodies of water and/or mountain ranges.

Data from Eurostat was utilized due to its breadth, good formatting, and ease of access. A four-year timespan, stretching from 2016 – 2019, was decided upon so passenger number data could be compared from year to year and trends could be extrapolated. Data from after 2020 was not used due to the large impacts of the COVID-19 pandemic and the subsequent lockdowns on the global economy and passenger traffic between countries.

## Data Pipeline

### Datasets used

**Eurostat air passenger data**  
6 datasets from Eurostat were used, each for one of the countries – Germany, Netherlands, Belgium, Luxembourg, Spain, France [1]. The datasets were processed to extract information on:  
*	Year [2016, 2017, 2018, 2019]  
*	Origin airport code  
*	Destination airport code  
*	Number of air passengers carried between city pairs  

**City name dataset**  
Since no relevant dataset was found, the airport codes from the Eurostat dataset were used in ChatGPT to create a table with 2 columns. The first is the 4-letter airport code and the second is the name of the city that the airport is located. The table was reviewed to ensure consistency and accuracy. This table was then saved to excel and imported into the jupyter notebook.  

**NUTS-3 data**  
To define the service area of the airport, we decided to use NUTS-3 (Nomenclature of Territorial Units for Statistics) regions. NUTS-3 regions represent the most detailed classification of territorial statistical units within the European Union and usually cover a large municipality or a group of municipalities. Information on the distances between pairs of NUTS-3 regions are found at Mendeley Data [2]. Furthermore, information on the number of inhabitants for each NUTS-3 region is available on Eurostat [3].  

**High-speed rail information**   
Information about the already existing high-speed rail infrastructure was taken from multiple sources and manually combined into a dataset containing a total of 33 unique rail routes. High-speed rail (HSR) refers to rail systems designed to operate at speeds exceeding 250 km/h (155 mph) on specially built tracks. While trains may need to reduce speed in certain areas, such as urban environments, most of the tracks should be suitable for high-speed travel.  

**Population information**  
Relevant population data was taken from a dataset found on Eurostat [4], providing the number of inhabitants for every NUTS-3 region in years 1990 to 2023.  

**City Location Data**  
Longitude and Latitude location data for cities identified in the previously mentioned datasets were obtained from OpenStreetMap (OSM) using Nominatim. Nominatim is a tool that can be used to access OSM data and its geocoding. Nominatim was installed as a python library.

### Data Pipeline




## Results

### Analysis of city population  
To analyze the potential demand for new high-speed rail connections more comprehensively, we examine both real passenger flows and the population of the cities within our research scope. While there may be limited air passenger traffic between two highly populated areas, these cities could still benefit from high-speed rail services. This is because the demand for mobility between them may currently be met by alternative transportation modes, such as cars or buses, while the overall demand for travel remains substantial. Every city or airport gets assigned the number of inhabitants in the years 2016 to 2019 of the NUTS-3 region it is located in. It is found that the largest cities by population in the scope of this research are Madrid, Barcelona and Berlin among others (Figure PX1). The reason why Paris, as one of the largest urban agglomerations in Europe is not mentioned in that list is because the multiple airports of the city (Paris Orly and Paris Charles-de-Gaulle) are located in different NUTS-3 regions, which serve as reference for the number of inhabitants. Therefore, the population of Paris in its entirety is not represented in our dataset, though it is suspected to be relevant in reality when trying to gauge potential demand between it and other European cities.  
![Plotly Figure](Plot_Graphics/Largest_cities.png)  
Figure PX1: Largest cities by population  

Figure PX2 shows a geographical overview over all chosen cities and airports in the scope of this research together with the number of inhabitants in their respective NUTS-3 region. The size of the bubble associated with each city represents the size of its 2019 population.  
![Plotly Figure](Plot_Graphics/City_Population.png)  
Figure PX2: Map of cities with associated population  

In the following, the city pairs are analyzed based on the combined population of their NUTS-3 regions to determine candidates for a high-speed rail connection based on potential demand. Each city pair gets assigned the combined number of inhabitants of both urban areas and is plotted in Figure PX3, showing the combined population as well as the distance between the cities in kilometers. In this scatter plot, three major aggregations of data points can be identified. The largest one covering lower combined population numbers and reaching from distances of less than 100 km to about 2000 km. In it, intra-national as well as international connections of various kinds within western Europe can be found. The second aggregation features higher combined populations and a range of distances between ca. 250 km and 1500 km. Due to the significant number of combined population in these city pairs, they mostly consist of city pairs featuring Madrid and Barcelona, the two cities in the dataset with the largest population. The third aggregation lies between distances of about 3000 km and 3500 km with combined populations on the lower end, mostly featuring trans-european connections between urban areas in western Europe and Spanish islands in the Atlantic ocean, like Tenerife or Fuerteventura.  
![Plotly Figure](Plot_Graphics/Scatterplot_Distance-Population.png)  
Figure PX3: Scatter Plots distance in relation to combined population per year  

The range of distances in which a connection between two cities using high speed rail is deemed viable lies between about 150 km and 1200 km. After application of these boundaries to the data, the top ten city pairs are determined for each year, featuring the highest combined population. The results of this analysis can be found in Figure PX4. As the NUTS-3 regions of Madrid and Barcelona both have a large population, these two cities are heavily featured in these lists, with the consistent number one being the connection between these two cities themselves. The other spots on the list are occupied by connections of these cities with other larger cities in Spain as well as France. 
![Plotly Figure](Plot_Graphics/Population_Top10_2016.png) 
![Plotly Figure](Plot_Graphics/Population_Top10_2017.png) 
![Plotly Figure](Plot_Graphics/Population_Top10_2018.png) 
![Plotly Figure](Plot_Graphics/Population_Top10_2019.png) 
Figure PX4: Top Ten city pairs by combined population and year

## Conclusion

## Discussion

## Contribution Statement

Author 1 - Nicholas Alvord: Introduction, creation of distance/air passenger volume scatterplot and analysis, discussion, limitations 

Author 2 - Amelija Ancupane:  

Author 3 - David Eijbergen:

Author 4 - Yannick Elsten:

Author 5 - Koen Groeneveld:

## References
