# Project Group 34 - Final Notebook

Members: Vasileios Sideris, Piotr Pietrzak, Davide Centorrino, Zacharias Zervos, Nicola Bovo

Student numbers: 5842727, 5717086, 5851629, 5846307, 5619157

# Research Objective

In this comprehensive project, our objective is to analyze how the COVID-19 pandemic has specifically influenced the transportation of goods and passengers by sea within the European Economic Exclusive Area over a five-year period. We will examine changes in maritime transportation behavior and their direct impact on CO2 emissions, comparing maritime emissions to overall CO2 trends in the European region during the pandemic. This research aims to provide valuable insights into the adaptability and resilience of the maritime sector.

Main research question:What are the multifaceted impacts of the COVID-19 pandemic on maritime transportation and environmental sustainability within the European Economic Exclusive Area 
Sub-research questions:
 1. How did the COVID-19 pandemic impact the volume and patterns of goods transported by sea within the European Economic Exclusive Area from 2017 to 2022?

 2. What were the changes in passenger maritime transportation behaviors during the COVID-19 pandemic in the European Economic Exclusive Area, and how do they differ across various modes of sea travel?

 3. How did the alterations in maritime transportation practices during the COVID-19 pandemic correlate with fluctuations in CO2 emissions in the European Economic Exclusive Area?

 4. How do the changes in maritime CO2 emissions within the European Economic Exclusive Area during the COVID-19 pandemic compare to the broader CO2 emission trends across Europe, and what insights can be gained from this comparison for sustainable transportation strategies?

# Introduction
WRITE HERE WHY THE RESEARCH OBJECTIVE IS RELEVANT

In [2]:
import pandas as pd
import numpy as np
import pycountry as pc
import plotly_express as px
import seaborn as sb
import os
import matplotlib.pyplot as plt
import geopandas as gpd
from plotly.subplots import make_subplots
import plotly.graph_objects as go

datapath = os.path.abspath("") + "\\DATA_PROCESSED\\"

# 1. Analysis of the pattern of goods transported by sea in EEA
This section focuses on the analysis of the pattern of goods transported by sea in European Economic Area (EEA) during the period 2018-2022. 

WRITE A PARAGRAPH ON WHAT'S THE FOCUS OF THE SECTION

WRITE HOW THE DATA HAS BEEN PROCESSED

WRITE HERE WHAT CAN BE DEDUCTED FROM THE GRAPH (COULD ALSO GO AFTER THE FIGURE)

In [2]:
# Choropleth map displaying the impact on the imports/exports of goods

WRITE HERE WHAT CAN BE DEDUCTED FROM THE GRAPH (COULD ALSO GO AFTER THE FIGURE)

In [3]:
# Chronological representation of the changes in the transportation of goods from the top 10 most affected countries within the EEA. 

WRITE HERE THE GENERAL CONCLUSIONS OF THE FIRST SECTION

# 2. Analysis of the pattern of passenger transportation behaviors in EEA
This section focuses on the analysis of the pattern of passenger travels in European Economic Area during the period 2018-2022 (extending into Q1 and Q2 of 2023 for some reporting countries). To conduct this analysis, three datasets from Eurostat were used, one for aviation, maritime and rail passenger transport data in European countries. They are available under the following links and for the purpose of this project, they were accessed on 20-10-2023:
- Aviation passenger transport data: https://ec.europa.eu/eurostat/databrowser/view/avia_paoc__custom_8077441/default/table?lang=en
- Maritime passenger transport data: https://ec.europa.eu/eurostat/databrowser/view/MAR_PA_QM__custom_8077490/default/table?lang=en
- Rail passenger transport data: https://ec.europa.eu/eurostat/databrowser/view/rail_pa_quartal__custom_8373712/default/table?lang=en

The focus of the analysis is to establish the general trends of the volume of passenger transport in European countries, visualize the impact of the COVID-19 pandemic on passenger transport and take a closer look at some countries most affected by the pandemic in terms of maritime passenger transport.

The datasets were downloaded in the form of .csv files from the website and preprocessed in the dataprocessing.py file by cutting out the columns with irrelevant data and appending a column with the passenger numbers normalized through division by the number of passengers in the top quarter of 2019 in terms of transport volume, per country. 

In case of the maritime transport dataframe, the data was structured in a slightly different way from other transport modes - the passenger counts were given per reporting entity, as opposed to per country. Some countries had multiple reporting entities, such as Sweden or Denmark. Luckily, the reporting entity codes started with the code of their respective countries, so they only needed to be summed to obtain results of the same type as other transport modes. Additionally, several countries' data was missing in certain time periods: 
- Data for Bulgaria was missing entirely;
- Data for Belgium, Montenegro and United Kingdom was missing from Q2 of 2020;
- Data for France and EU27 (2020) was missing from Q1 of 2022;
- Data for Sweden, Norway, Denmark, the Netherlands and Italy was missing from Q1 of 2023;
- Data for Finland was missing from Q2 of 2023.

Because in the specified periods the passenger count was equal to zero, their data was useless for this study, because the trends would be incomplete. Additionally, if the data was missing from at least Q2 of 2020, no valid passenger transport volume drop could be calculated. The decision was therefore made to cut out the data by simply deleting the passenger values for certain periods of time, so that the plots would not be cluttered with invalid information. Then, the unit of passenger counts was converted from thousands of passengers to passengers. Afterwards, since the data in this dataset started from the year 2017, only the data from years 2018-2023 was kept for easy comparison with other transport modes.

In case of rail transport dataframe, first, the data in the unit of thousands of passengers was first separated from the data in the unit of millions of km travelled. Then, a similar operation to maritime passenger dataset was performed, where missing data for certain countries was removed. All countries affected were missing data entirely, over the entire time range of the dataset. Those countries were Austria, Belgium, Serbia, Bosnia and Herzegovina as well as all combined data for EU: EU27 (2020), EU28 and EU27 (2007). Then, the unit of passenger counts was converted from thousands of passengers to passengers. Afterwards, the dataset was cut to the time range of 2018-2023. 

**Load relevant processed datasets**

In [3]:
df_passengers_avia = pd.read_csv(datapath+"df_passengers_avia.csv").iloc[:,1:]
df_passengers_mar = pd.read_csv(datapath+"df_passengers_mar.csv").iloc[:,1:]
df_passengers_rail = pd.read_csv(datapath+"df_passengers_rail.csv").iloc[:,1:]

**Visualisation of general trends in passenger transport across European countries per transport mode**
First, a simple collection of line plots were created to showcase the trends of passenger transport in Europe between 2018 and 2023. To do this, the normalized passenger counts for each country and each quarter of the time range were plotted together - for this, no additional data processing was needed. As mentioned in the first paragraph of this chapter, some countries had missing data in various parts of the time range, which can be seen in the graph.

In [7]:
Q1_fig1a = px.line(df_passengers_avia, 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Geo',
                   title='Aviation passenger transport volume in Europe, normalized for highest quarter of 2019 per country',
                   labels={'Time_period':'Time period','Norm_2019':'Normalized passenger transport volume [%]','Geo':'Country code'})
Q1_fig1b = px.line(df_passengers_mar, 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Geo',
                   title='Maritime passenger transport volume in Europe, normalized for highest quarter of 2019 per country',
                   labels={'Time_period':'Time period','Norm_2019':'Normalized passenger transport volume [%]','Geo':'Country code'})
Q1_fig1b.update_xaxes(tickangle=30)
Q1_fig1c = px.line(df_passengers_rail, 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Geo',
                   title='Rail passenger transport volume in Europe, normalized for highest quarter of 2019 per country',
                   labels={'Time_period':'Time period','Norm_2019':'Normalized passenger transport volume [%]','Geo':'Country code'})
Q1_fig1c.update_xaxes(tickangle=30)
Q1_fig1a.show()
Q1_fig1b.show()
Q1_fig1c.show()

From the normalized passenger transport graphs it can be deduced that the three transport modes differed significantly from each other in terms of general trends in European countries. While aviation and maritime transport saw significant peaks in every third quarter of a year (likely attributable to people preferring to go on vacations and travel by air or by water during summer), rail transport was more or less constant for most depicted countries. 

All three transport modes saw a significant dip in Q2 of 2020, when the first COVID-19 regulations were introduced, but it is clear that the most significantly regulated mode of transport was air travel - all European countries saw aviation passenger transport drop to below 10% of their respective 2019 norms. On the other hand, for other transport modes, these regulations varied from country to country in terms of severity, because of a much broader observed spread of normalized passenger transport volume readings for these countries. This is especially visible in rail travel, where some countries maintained over 50% of their 2019 norm in Q2 of 2020. Maritime passenger transport in most countries was subject to much less significant reductions in volume when comparing the graphs before and after the introduction of COVID-19 measures in Q2 of 2020, especially when compared to other means of transport. This is probably due to its much smaller absolute volume of passengers and a tendency of transport areas on ferries to be less cramped. Lastly, it can be observed that after the first introduction of COVID-19 regulations in europe, the rail transport patterns were climbing back up to the numbers before those regulations were introduced. However, the paces at which they were returning to their norms differed greatly from country to country, resulting in a much wider spread of each country's line depicting the normalized transport volume.

**Bar graph of drop in normalized maritime passenger transport volume**

The next part of this chapter focuses on the magnitude of drop in normalized maritime passenger transport volume. In order to perform this analysis, some additional data processing was required. For every country in the maritime passenger transport dataframe, the difference between normalized passenger transport volumes in Q2 of 2019 and in Q2 of 2020 was calculated. The intention of this action was to pick 3 top countries in terms of the drop in maritime passenger volume when the COVID-19 regulations were introduced in Europe. Then, for each of the three countries a separate line graph would be created, where the trends in all three listed means of travel would be compared for those countries. While the top country in terms of the drop in maritime passenger volume was Belgium, for each quarter after Q1 of 2020, the numbers of passengers transported through maritime means of transport in that country kept repeating 2000 or 4000, all the way until Q2 of 2023 when the database ends. This, together with the fact that there was no data provided on the passenger travel by rail in Belgium, led to the conclusion that this country needs to be disregarded in this analysis, with the top 3 countries being selected as those coming after Belgium.
This is why, for clarity on the bar graph, the dataframe with the maritime passenger volume drop was split into three parts, one being Belgium, second being the top three countries coming after Belgium, and the third being the rest. The top three countries after Belgium were given a distinguishing blue color in the bar graph, as can be seen below.

In [8]:
mar_countries = list(df_passengers_mar['Geo'].unique())
mar_per_diff = pd.DataFrame(columns=['Geo','Passenger_drop'])
diff_index=0
for country in mar_countries:
    per_mq1 = df_passengers_mar[(df_passengers_mar['Geo'] == country) &
                                 (df_passengers_mar['Time_period'] == '2019-Q2')]['Norm_2019'].to_numpy()[0]
    per_mq2 = df_passengers_mar[(df_passengers_mar['Geo'] == country) &
                                 (df_passengers_mar['Time_period'] == '2020-Q2')]['Norm_2019'].to_numpy()[0]
    mar_per_diff = pd.concat([mar_per_diff,pd.DataFrame({'Geo':country,'Passenger_drop':per_mq1-per_mq2},index=[diff_index])])
    diff_index += 1
# print(mar_per_diff)
mar_per_diff = mar_per_diff.sort_values(by='Passenger_drop',ascending=False)
mar_per_diff_rest1 = mar_per_diff.iloc[:1,:]
mar_per_diff_top = mar_per_diff.iloc[1:4,:]
mar_per_diff_rest2 = mar_per_diff.iloc[4:,:]

Q1_bardata = [go.Bar(x=mar_per_diff_rest1['Geo'],y=mar_per_diff_rest1['Passenger_drop'], name='Rest', marker_color=['lightslategray',]*len(mar_per_diff_rest1), showlegend=False),
              go.Bar(x=mar_per_diff_top['Geo'],y=mar_per_diff_top['Passenger_drop'], name='Selected 3', marker_color=['darkblue',]*3, showlegend=False),
              go.Bar(x=mar_per_diff_rest2['Geo'],y=mar_per_diff_rest2['Passenger_drop'], name='Rest', marker_color=['lightslategray',]*len(mar_per_diff_rest2), showlegend=False)]

Q1_fig2 = go.Figure(data = Q1_bardata)
Q1_fig2.update_layout(
    title='Bar chart of percentage point drop in normalized maritime passenger volume (Q2 of 2019 vs Q2 of 2020)',
    xaxis_title='Country code',
    yaxis_title='Drop in normalized passenger transport volume')
Q1_fig2.show()


The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



It is clear that the top three countries in terms of the magnitude of maritime passenger volume drop due to the COVID-19 pandemic after Belgium was removed from the analysis are Latvia (LV), Finland (FI) and the Netherlands (NL). The difference between the normalized passenger transport through maritime means in these countries equaled 71.7, 70.8 and 65.9 percentage points, respectively. Across Europe, these volume drops differed significantly from country to country and ranged from 77.3 percentage points in Belgium to as low as 8.8 percentage points in Lithuania. These differences can be attributed to the varying severity of COVID-19 related regulations these countries introduced, as well as the different views on social responsibility the inhabitants of those countries may have had at that time, both of which would cause the decline in the number of passengers transported.

**Visualisation of combined modes of transport for countries with highest maritime passenger transport volume drop**

The third part of this chapter focuses on the three countries with the highest drop in normalized maritime passenger transport volume between Q2 of 2019 and Q2 of 2020, as depicted on the previous bar graph. The chosen countries were Latvia, Finland and the Netherlands, due to lack of relevant data for Belgium. Since the intention of this chapter was to showcase the combined trends in all three transport modes for each of the chosen countries, some data processing was necessary. A nested for loop was created which, for every country from the list, for every quarter from the time range and for every mode of transport, appended a relevant row to a new dataframe, which combined all of the necessary data. 

In [9]:
df_passengers_top = pd.DataFrame(columns=['Geo','Mode','Time_period','Norm_2019'])

for country in ['LV','FI','NL']:
    index_top=0
    for time in df_passengers_mar['Time_period'].unique():
        for mode in ['Aviation','Maritime','Rail']:
            if mode == 'Aviation':
                df_mode = df_passengers_avia
            elif mode == 'Maritime':
                df_mode = df_passengers_mar
            else:
                df_mode = df_passengers_rail
            norm_list = df_mode[(df_mode['Geo']==country) & (df_mode['Time_period']==time)]['Norm_2019'].to_list()
            if not norm_list: 
                norm = np.NaN
            else:
                norm = norm_list[0]
            df_passengers_top = pd.concat([df_passengers_top,pd.DataFrame({'Geo':country,'Mode': mode,'Time_period': time,'Norm_2019':norm},index=[index_top])])
            index_top += 1
            
Q1_fig3a = px.line(df_passengers_top[df_passengers_top['Geo']=='LV'], 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Mode',
                   title='Normalized volume of passenger transport by air, sea and rail in Latvia from 2018 to 2023',
                   labels={'Time_period': 'Time period', 'Norm_2019':'Normalized passenger transport volume [%]','Mode':'Means of travel'})
Q1_fig3a.update_xaxes(tickangle=30)
Q1_fig3b = px.line(df_passengers_top[df_passengers_top['Geo']=='FI'], 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Mode',
                   title='Normalized volume of passenger transport by air, sea and rail in Finland from 2018 to 2023',
                   labels={'Time_period': 'Time period', 'Norm_2019':'Normalized passenger transport volume [%]','Mode':'Means of travel'})
Q1_fig3b.update_xaxes(tickangle=30)
Q1_fig3c = px.line(df_passengers_top[df_passengers_top['Geo']=='NL'], 
                   x='Time_period', 
                   y='Norm_2019', 
                   color='Mode',
                   title='Normalized volume of passenger transport by air, sea and rail in the Netherlands from 2018 to 2023',
                   labels={'Time_period': 'Time period', 'Norm_2019':'Normalized passenger transport volume [%]','Mode':'Means of travel'})
Q1_fig3c.update_xaxes(tickangle=30)
Q1_fig3a.show()
Q1_fig3b.show()
Q1_fig3c.show()


The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



In the plots above it can be observed that every listed country's normalized passenger transport volume was ordered in the same manner with respect to each transport mode's 2019 norm: the highest percentage of passenger transport volume was maintained by rail, while the lowest percentage was hit by air travel, with maritime transport between the two. In all listed countries, with the exception of a small spike in maritime travel in the Netherlands in Q3 of 2020 and another, more significant one in Q3 of 2022, rail travel dropped to the highest relative percentage of volume and stayed above the other transport modes to this day, even after European countries lifted the COVID_19 regulations.


**General conclusion of the second section**

It is clear that COVID-19 has a drastic impact on passenger travel in European countries. This can be seen in part 1 and part 3, where all countries experienced massive drops across all analyzed passenger transport modes. Some effects of the transport volume drops are even noticeable to this day, which is best demonstrated by the erratic return to norms of rail transport in part 1. The impact of COVID-19 on passenger travel varied significantly from country to country, generally visible in parts 1 and 3, but best visualized by part 2, where the drop in normalized maritime passenger transport volume across European countries varied from 77.3 percentage points for Belgium to as low as 8.8 percentage points for Latvia. 

# 3. Analysis of the impact on the maritime CO2 emissions
This section focuses on the examination of the impact of alterations in maritime transportation practices during the period 2018-2022 on the fluctuations in CO2 emissions within the EEA. 

WRITE A PARAGRAPH ON WHAT'S THE FOCUS OF THE SECTION


WRITE HERE HOW THE DATA HAS BEEN PROCESSED:
    Vessel type categorization: 
    - Passenger Vessels
    - Containerships
    - Oil tankers
    - Bulk Carriers
    - LNG Carriers


WRITE HERE WHAT CAN BE DEDUCTED FROM THE GRAPH (COULD ALSO GO AFTER THE FIGURE)

In [None]:
# Comparative graph depicting the fluctuation of CO2 emissions per ship type

WRITE HERE WHAT CAN BE DEDUCTED FROM THE GRAPH (COULD ALSO GO AFTER THE FIGURE)

In [None]:
# Five donut graphs, one for each year during 2018-2022, depicting the percentage of each type of emissions out of the total maritime emissions for that year.

WRITE HERE THE CONCLUSIONS OF THE PARAGRAPH AND RESULTS

# 4. Analysis of the difference between CO2 emissions in maritime transport and broader emission trends in EEA
This section investigates the impact on the overall CO2 emissions of maritime transport in comparison to the broader trends in CO2 emissions across EEA. 

WRITE A PARAGRAPH ON WHAT'S THE FOCUS OF THE SECTION


WRITE HERE HOW THE DATA HAS BEEN PROCESSED: Evaluating the emission in Europe since the dataset is given for countries across the globe
    Categorization: 
    - Agriculture
    - Buildings
    - Shipping
    - Fuel Exploitation
    - Power Industry
    - Transport
    - Processes
    - Waste

WRITE HERE WHAT CAN BE DEDUCTED FROM THE GRAPH (COULD ALSO GO AFTER THE FIGURE)

In [None]:
# Comparative graphs depicting the fluctuation of CO2 emissions for each individual emission producing sector.

WRITE HERE THE CONCLUSIONS OF THE PARAGRAPH AND RESULTS

# 5. Conclusions

BRIEF REMARKS OVER THE RESULTS PRODUCED FROM THE RESEARCH 

# 6. Contribution Statement

**Nicola Bovo**: Background research & visualisation

**Zacharias Zervos**: Background research & conceptualisation

**Vaseileios Sideris**: Data analysis & Data modelling

**Piotr Pietrzak**: Data analysis & Data modelling

**Davide Centorrino**: Data analysis & visualisation 

# 7. Data Used

**Paragraph 1**:
    - Gross weight of goods transported: https://ec.europa.eu/eurostat/databrowser/bookmark/662cc15a-0555-4b7f-87c8-e48013aea8c0?lang=en

**Paragraph 2**:
    - Maritime Transport: https://ec.europa.eu/eurostat/databrowser/bookmark/0d511ae5-24c1-4009-8c27-4992f3d7c652?lang=en
    - Air Transport: https://ec.europa.eu/eurostat/databrowser/bookmark/e274c95c-b4af-4704-8630-9c7560979af0?lang=en
    - Railway Transport :https://ec.europa.eu/eurostat/databrowser/bookmark/e8751a3b-5536-47f1-a543-672db0bc4528?lang=en 
    
**Paragraph 3**:
    - CO2 emissions from maritime transport: https://data.europa.eu/data/datasets/co2-emissions-data?locale=en

**Paragraph 4**:
    - CO2 emissions data from maritime transport: https://data.europa.eu/data/datasets/co2-emissions-data?locale=en
    - Overall emissions: https://edgar.jrc.ec.europa.eu/emissions_reports       

STILL TO BE SORTED    
EDGAR - Emissions Database for Global Atmospheric Research: https://edgar.jrc.ec.europa.eu/emissions_reports 
Sea transport of goods: https://ec.europa.eu/eurostat/databrowser/view/TTR00009/default/table?lang=en
Passenger embarked and disembarked in all ports: https://ec.europ.eu/eurostat/databrowser/view/mar_mp_aa_cph/default/table?lang=en

# Data Pipeline

1. Data collection:
    * Transportation data
        * Passenger data
        * Goods data
    * COVID 19 data
    * CO2 emission data
        * Overall emission data
        * Maritime sector emission data
2. Data cleanup
3. Data integration
4. Data analysis
5. Visualisation & Report