# Project Group - 15

Members: 
Sören Burghardt, 
Sarah Blanc, 
Allan Guzmann, 
Lars van den Berg, 
Simon Schreders

Student numbers: 
5861012, 
5854830, 
5718619,
5626668,
5878845

# Research Objective

What are the effects on aviation that COVID-19  caused at the Amsterdam Schiphol Airport (EHAM) and its most frequented destinations for freight and passenger transport?

When COVID-19 became a global pandemic in 2020, the containments plunged the global economy into an unprecedented recession. The magnitude of the economic shock was such that it resulted in unrecognisable data and a "data fog" that made it difficult to interpret and, more importantly, to predict. The last two years have been extraordinary, both for global economy and the humanity. While the COVID-19 pandemic now appears to be under control thanks to vaccination programmes, parts of the global economy are not yet fully recovered. As a result of the protective travel restrictions, the aviation sector has been hit harder than many others industries. In this project, our aim was to observe the impacts that covid has had on aviation and in particular, the impacts on Amsterdam Schiphol Airport and its most frequented destinations. This from the perspective of passenger and freight transport.

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

Sören: Background research, coding, 

Sarah: Background research, coding,  

Allan: Background research, coding, 

Lars: Background research, coding, 

Simon: Background research, coding

# Data Used

COVID-19 cases in the world (Our World in Data)

Flight data passengers (Eu Stat)

Flight data freight (EU Stat)

OD Data for Airport Amsterdam (EU Stat)


# Data Pipeline

All the data for flights from Amsterdam Airport Schiphol (EHAM) were grouped by country. This way two top 5 were created for both the volume of cargo and mail and the passangers. The countries from these two top 5's and the Netherlands were used to filter the COVID-19 cases, using the amount of confirmed new cases per million in a country. 

As a first step, we are going to import all the librairies that we will need.

In [2]:
import pandas as pd
import numpy as np
import plotly as plt

import math
import scipy

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

import geopandas as gpd

import plotly.io as pio
import plotly.graph_objects as go   
import country_converter as coco

import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

# Part I - Data import

First, we are going to import the dataframes. We are going to modifiy them in order to prepare them for our project and to combine them together. 

We are starting with the flight-data, we have modified them according to our needs. 

In [3]:
# Import datasets
freight = pd.read_csv("Data_Sets/avia_gor_nl__custom_3564729_monthly_linear.csv")
passengers = pd.read_csv("Data_Sets/avia_par_nl__custom_3564728_monthly_linear.csv")

# Setting up special cleaning list
country_code_fix = {'EL':'GR','AN':'BQ'}

def top_n_year(df,n,year):
    # This function has a dataframe, a number of entries and a year as inputs and 
    # as output the dataframe filtered for the top n countries on a specific year
    df[["airp_country_1","airp_code_1","airp_country_2","airp_code_2"]] = df.airp_pr.str.split("_",expand=True)
    df[["Year","Month"]] = df.TIME_PERIOD.str.split("-",expand=True)
    topn = df.groupby(['Year',"airp_country_2"]).sum()
    topn = topn.loc[year,:].sort_values("OBS_VALUE", ascending= False).head(n).reset_index()
    cleaned = df[df["airp_country_2"].isin(topn["airp_country_2"])]
    return cleaned

def codes_correction(df,ISO_2):
    # This function has a data frame and a ISO2 countries list and outputs the same 
    # dataframe adding a ISO3 code and the country name for each unique
    df[ISO_2] = df[ISO_2].replace(to_replace= country_code_fix)
    df['iso_3_country'] = coco.convert(names = df[ISO_2], to= 'ISO3')
    df['country_2_name'] = coco.convert(names = df[ISO_2], to= 'name_short')
    return df

# Call defined functions for the 2 datasets
top_passengers = top_n_year(passengers,5,'2019')
top_cargo = top_n_year(freight,5,'2019')


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



Explain code below

In [4]:
# importing the world geometry data from geopandas library
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()

Unnamed: 0,pop_est,continent,name,iso_a3,gdp_md_est,geometry
0,889953.0,Oceania,Fiji,FJI,5496,"MULTIPOLYGON (((180.00000 -16.06713, 180.00000..."
1,58005463.0,Africa,Tanzania,TZA,63177,"POLYGON ((33.90371 -0.95000, 34.07262 -1.05982..."
2,603253.0,Africa,W. Sahara,ESH,907,"POLYGON ((-8.66559 27.65643, -8.66512 27.58948..."
3,37589262.0,North America,Canada,CAN,1736425,"MULTIPOLYGON (((-122.84000 49.00000, -122.9742..."
4,328239523.0,North America,United States of America,USA,21433226,"MULTIPOLYGON (((-122.84000 49.00000, -120.0000..."


In [5]:
#COVID PART 
filepath = 'Data_Sets/owid-covid-data.csv'
covid_data = pd.read_csv(filepath, delimiter=',')

# Select the columns of interest from the original dataset and filter it, also expand the date to accomodate year and month
columns_of_interest = ['new_cases_smoothed_per_million','people_fully_vaccinated_per_hundred','date','location']
filtered_covid_data = covid_data[columns_of_interest]
filtered_covid_data[['Year','Month','Day']] = filtered_covid_data.date.str.split("-",expand=True)

# Selecting the countries of interest for cargo and passenger transport
cargo_countries_of_interest =['China','United States','United Arab Emirates','Brazil','Qatar','Netherlands']
passenger_countries_of_interest = ['United Kingdom','Spain','Germany','Italy','United States','Netherlands']

# Group by year, month and location to calculate the mean for cargo transport
cargo_filtered_covid_data = filtered_covid_data[filtered_covid_data['location'].isin(cargo_countries_of_interest)]
cargo_filtered_covid_data = cargo_filtered_covid_data.groupby(['location','Year','Month']).mean()

# Resetting the index and creating new collum with Year and Month 
cargo_filtered_covid_data = cargo_filtered_covid_data.reset_index()
cargo_filtered_covid_data['date'] = cargo_filtered_covid_data['Year'] + ['-'] + cargo_filtered_covid_data['Month'] 

# Group by year, month and location to calculate the mean for cargo transport
passenger_filtered_covid_data = filtered_covid_data[filtered_covid_data['location'].isin(passenger_countries_of_interest)]
passenger_filtered_covid_data = passenger_filtered_covid_data.groupby(['location','Year','Month']).mean()

#Resetting the index and creating new collum with Year and Month 
passenger_filtered_covid_data = passenger_filtered_covid_data.reset_index()
passenger_filtered_covid_data['date'] = passenger_filtered_covid_data['Year'] + ['-'] + passenger_filtered_covid_data['Month']



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be val

# Part II - Data Processing

In the code below we combined the information for passengers flying from Amsterdam Schiphol Airport. > Some more analysis

In [8]:
passengers_for_map = passengers.groupby(['Year','airp_country_2']).sum()
passengers_for_map = passengers_for_map.loc['2019',:].reset_index()
passengers_for_map = codes_correction(passengers_for_map,'airp_country_2')

freight_for_map = freight.groupby(['Year','airp_country_2']).sum()
freight_for_map = freight_for_map.loc['2019',:].reset_index()
freight_for_map = codes_correction(freight_for_map,'airp_country_2')



The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



In [31]:
passenger_labels = {
    'OBS_VALUE':'Total passengers (passengers)',
    'country_2_name': 'Country',
    'iso_3_country': 'Country code'
}
cargo_labels = {
    'country_2_name':'Country',
    'OBS_VALUE': 'Total cargo [tons]',
    'iso_3_country': 'Country code'
}

In [24]:
map_data_passengers = world.merge(passengers_for_map,left_on = 'iso_a3', right_on = 'iso_3_country', how = 'left')

fig = px.choropleth(map_data_passengers, hover_name = 'country_2_name',locations= 'iso_3_country', \
    locationmode= 'ISO-3', color='OBS_VALUE',labels = passenger_labels)

fig.update_layout(
    title_text = 'Total passengers flying from Schiphol Airport in 2019',
    coloraxis_colorbar_title_text = 'Total passengers',
    geo= dict(
        showframe = False,
        showcoastlines = False,
        projection_type = 'equirectangular'
    )
)
fig.show()

In [25]:
passenger_labels = {
    'OBS_VALUE':'Total passengers (passengers)',
    'country_2_name': 'Country'
}

fig = px.bar(map_data_passengers.sort_values('OBS_VALUE',ascending= False).head(20),\
    x= 'country_2_name', y = 'OBS_VALUE',labels=passenger_labels)

fig.update_layout(
    title_text = 'Total passengers transported by air from Schiphol Airport in 2019',
    
)
fig.show()

After this information we made a top 5 of countries for passengers. Here you can see the historical data of the top 5. COVID is clearly visible.

In [29]:
total_passengers = top_passengers.groupby(['airp_country_2','Year']).sum()
total_passengers = total_passengers.reset_index()
total_passengers = codes_correction(total_passengers,'airp_country_2')
fig = px.line(total_passengers, x='Year', y= 'OBS_VALUE' ,color='airp_country_2',labels= passenger_labels)
fig.show()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



Repeat the same for cargo transported on planes.

In [32]:
map_data_cargo = world.merge(freight_for_map,left_on = 'iso_a3', right_on = 'iso_3_country', how = 'left')

fig = px.choropleth(freight_for_map, hover_name = 'country_2_name',locations= 'iso_3_country', \
    locationmode= 'ISO-3', color='OBS_VALUE',labels = cargo_labels)

fig.update_layout(
    title_text = 'Total tons transported by air from Schiphol Airport in 2019',
    geo= dict(
        showframe = False,
        showcoastlines = False,
        projection_type = 'equirectangular'
    )
)
fig.show()

In [33]:
fig = px.bar(map_data_cargo.sort_values('OBS_VALUE',ascending= False).head(20),\
    x= 'country_2_name', y = 'OBS_VALUE',labels= cargo_labels)
fig.update_layout(
    title_text = 'Total tons transported by air from Schiphol Airport in 2019',
    geo= dict(
        showframe = False,
        showcoastlines = False,
        projection_type = 'equirectangular'
    )
)
fig.show()

In [35]:
total_cargo = top_cargo.groupby(['airp_country_2','Year']).sum()
total_cargo = total_cargo.reset_index()
total_cargo = codes_correction(total_cargo,'airp_country_2')
fig = px.line(total_cargo, x='Year', y= 'OBS_VALUE' ,color='airp_country_2',labels=cargo_labels)
fig.show()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



Then we start the COVID-19 data of all we decided to keep only the columns we needed to avoid having unused information in this data frame. We filtered this new data frame and expanded the date to the accomodate year and month. The COVID-19 data is only imported for the countries of interest (the two top 5's and the Netherlands).

In parallel, let's look at the evolution of covid cases since the beginning of the pandemic and the vaccination rate in the Netherlands.

In [54]:
fig = px.line(passenger_filtered_covid_data, x='date', y= 'new_cases_smoothed_per_million' ,color='location',
                title= 'New covid cases in the top 5 countries for passenger transport')

fig.show()

In [55]:
fig = px.line(cargo_filtered_covid_data, x='date', y= 'new_cases_smoothed_per_million' ,color='location',
              title= 'New covid cases in the top 5 countries for cargo transport', range_y=[0,2000] )

fig.show()

In [56]:
fig = px.line(passenger_filtered_covid_data, x='date', y= 'people_fully_vaccinated_per_hundred' ,color='location',
               title= 'Vaccination rate in percent in the top 5 countries for passenger transport' )

fig.show()

In [57]:
fig = px.line(cargo_filtered_covid_data, x='date', y= 'people_fully_vaccinated_per_hundred' ,color='location',
                title= 'Vaccination rate in percent in the top 5 countries for cargo transport')

fig.show()

peaks covid cases NL vs number of passenger leaving NL (see previous work to see hom to find common peaks)

valleys covid cases NL vs number of passengers leaving  NL (see previous work to see hom to find common peaks)

common peaks and valleys between NL and ... for number of passengers and number of freigh (see previous work to see hom to find common peaks)
goal here: to see if covid had the same impact (peaks) at the same time in 2 or more countries

# Part 3 - Data Visualisation

All 5 of us have experienced covid. We have all been impacted by it, to a greater or lesser extent. Although the first cases were reported in China, the official number of covid cases in the country has always been very low. It is therefore logical to ask questions. How is this possible? Could they have escaped it?
For this first part of the visualisation, we decided to compare the impact on the transport sector of the start of Covid in the Netherlands and in China. In this way, it will be possible to compare the two situations in order to understand whether the Chinese have really been able to escape Covid and its impacts or not. Indeed, as their number of cases is officially so low, it would be possible to think that their economy has not been impacted. But is this really the case? What were the influences of the covid on the country compared to the Netherlands? We decided to proceed as follows:
- First, we will compare the purely covid-related data between the two countries.
- Then, we will analyse the behaviour of the transport economy in China in order to compare it to that of the Netherlands.

Quantities cannot be compared as such. After all, China is one of the world's largest powers, which is in no way comparable with the Netherlands. However, it is possible to compare the trend of the different curves when the first wave of Covid appeared (increasing, decreasing, stagnating).

Import and preparation of the covid data

In [58]:
filepath = 'Data_Sets/owid-covid-data.csv'
covid_data = pd.read_csv(filepath, delimiter=',')

# Select the columns of interest from the original dataset and filter it, also expand the date to accomodate year and month
columns_of_interest = ['new_cases_smoothed_per_million','people_fully_vaccinated_per_hundred','date','location']
filtered_covid_data = covid_data[columns_of_interest]
filtered_covid_data[['Year','Month','Day']] = filtered_covid_data.date.str.split("-",expand=True)
filtered_covid_data.head()
#Select the countrys of interest 
Covid_China = filtered_covid_data[filtered_covid_data['location'].isin(['Netherlands','China'])]
Covid_China.head(-5)




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,new_cases_smoothed_per_million,people_fully_vaccinated_per_hundred,date,location,Year,Month,Day
40988,,,2020-01-22,China,2020,01,22
40989,,,2020-01-23,China,2020,01,23
40990,,,2020-01-24,China,2020,01,24
40991,,,2020-01-25,China,2020,01,25
40992,,,2020-01-26,China,2020,01,26
...,...,...,...,...,...,...,...
141454,125.751,,2022-10-02,Netherlands,2022,10,02
141455,125.735,,2022-10-03,Netherlands,2022,10,03
141456,155.209,,2022-10-04,Netherlands,2022,10,04
141457,155.209,,2022-10-05,Netherlands,2022,10,05


1. Air passengers carried include both domestic and international aircraft passengers of air carriers registered in the country.

In [59]:
filepath1 = 'Data_Sets/Passengers.csv'
Passengers = pd.read_csv(filepath1, delimiter=',')
Passengers_drop = Passengers.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
Passengers_drop.set_index('Country', inplace=True) 
Passengers_wanted = Passengers_drop.T
China_passengers_final = Passengers_wanted.China.reset_index()
China_passengers_final.columns = ['Year', 'Number of passengers carried, China']
NL_passengers_final = Passengers_wanted.Netherlands.reset_index()
NL_passengers_final.columns = ['Year', 'Number of passengers carried, NL']

2. Air freight is the volume of freight, express, and diplomatic bags carried on each flight stage (operation of an aircraft from takeoff to its next landing), measured in metric tons times kilometers traveled.

In [1]:
filepath2 = 'Data_Sets/Freight.csv'
Freight = pd.read_csv(filepath2, delimiter=',')
Freight_drop = Freight.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
Freight_drop.set_index('Country Name', inplace=True) 
Freight_wanted = Freight_drop.T
NL_freight_final = Freight_wanted.Netherlands.reset_index()
NL_freight_final.columns = ['Year', 'Volume of freight carried, NL']
China_freight_final = Freight_wanted.China.reset_index()
China_freight_final.columns = ['Year', 'Volume of freight carried, China']

NameError: name 'pd' is not defined

3. Port container traffic measures the flow of containers from land to sea transport modes, and vice versa, in twenty-foot equivalent units (TEUs), a standard-size container. Data refer to coastal shipping as well as international journeys. Transshipment traffic is counted as two lifts at the intermediate port (once to off-load and again as an outbound lift) and includes empty units.

In [61]:
filepath3 = 'Data_Sets/Container.csv'
Container = pd.read_csv(filepath3, delimiter=',')
Container_drop = Container.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
Container_drop.set_index('Country Name', inplace=True) 
Container_wanted = Container_drop.T
China_container_final = Container_wanted.China.reset_index()
China_container_final.columns = ['Year', 'Flow of containers carried, China']
NL_container_final = Container_wanted.Netherlands.reset_index()
NL_container_final.columns = ['Year', 'Flow of containers carried, NL']

4. Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring.

In [None]:
filepath4 = 'Data_Sets/CO2_emissions.csv'
CO2 = pd.read_csv(filepath4, delimiter=',')
CO2_drop = CO2.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
CO2_drop.set_index('Country Name', inplace=True) 
CO2_wanted = CO2_drop.T
NL_CO2_final = CO2_wanted.Netherlands.reset_index()
NL_CO2_final.columns = ['Year', 'Carbon dioxide emissions, NL']
China_CO2_final = CO2_wanted.China.reset_index()
China_CO2_final.columns = ['Year', 'Carbon dioxide emissions, China']

5. Goods transported by railway are the volume of goods transported by railway, measured in metric tons times kilometers traveled.

In [None]:
filepath5 = 'Data_Sets/Rail.csv'
Rail = pd.read_csv(filepath5, delimiter=',')
Rail_drop = Rail.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
Rail_drop.set_index('Country Name', inplace=True) 
Rail_wanted = Rail_drop.T
NL_Rail_final = Rail_wanted.Netherlands.reset_index()
NL_Rail_final.columns = ['Year', 'Goods transported by railway, NL']
China_Rail_final = Rail_wanted.China.reset_index()
China_Rail_final.columns = ['Year', 'Goods transported by railway, China']

6. Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life.

In [None]:
filepath6 = 'Data_Sets/Lifeexp.csv'
Lifeexp = pd.read_csv(filepath6, delimiter=',')
Lifeexp_drop = Lifeexp.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
Lifeexp_drop.set_index('Country Name', inplace=True) 
Lifeexp_wanted = Lifeexp_drop.T
NL_Lifeexp_final = Lifeexp_wanted.Netherlands.reset_index()
NL_Lifeexp_final.columns = ['Year', 'Life expectancy, NL']
China_Lifeexp_final = Lifeexp_wanted.China.reset_index()
China_Lifeexp_final.columns = ['Year', 'Life expectancy, China']

7. Annual percentage growth rate of GDP at market prices based on constant local currency. Aggregates are based on constant 2015 prices, expressed in U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.

In [None]:
filepath7 = 'Data_Sets/GDP.csv'
GPD = pd.read_csv(filepath7, delimiter=',')
GPD_drop = GPD.drop(['Country Code', 'Indicator Name', 'Indicator Code'],  axis=1)
GPD_drop.set_index('Country Name', inplace=True) 
GPD_wanted = GPD_drop.T
NL_GPD_final = GPD_wanted.Netherlands.reset_index()
NL_GPD_final.columns = ['Year', 'GDP growth, NL']
China_GPD_final = GPD_wanted.China.reset_index()
China_GPD_final.columns = ['Year', 'GDP growth, China']

Join of all data frames on the column 'date' to two dataframes

In [62]:
df1 = pd.merge(NL_passengers_final, China_passengers_final, how='outer', on='Year')
df1 = pd.merge(df1, NL_freight_final, how='outer', on='Year')
df1 = pd.merge(df1, China_freight_final, how='outer', on='Year')
df1 = pd.merge(df1, NL_container_final, how='outer', on='Year')
df1 = pd.merge(df1, China_container_final, how='outer', on='Year')
df1 = pd.merge(df1, NL_CO2_final, how='outer', on='Year')
df1 = pd.merge(df1, China_CO2_final, how='outer', on='Year')
df1 = pd.merge(df1, NL_Rail_final, how='outer', on='Year')
df1 = pd.merge(df1, China_Rail_final, how='outer', on='Year')
df2 = pd.merge(China_Lifeexp_final, NL_Lifeexp_final, how='outer', on='Year')
df2 = pd.merge(df2, NL_GPD_final, how='outer', on='Year')
df2 = pd.merge(df2, China_GPD_final, how='outer', on='Year')
#df2 = pd.merge(df2, NL_DR_final, how='outer', on='Year')
#df2 = pd.merge(df2, China_DR_final, how='outer', on='Year')
#df = pd.merge(df, Covid_China, how='outer', on='Year')
df1 = df1.iloc[ 0:61, : ]
df2 = df2.iloc[ 0:61, : ]
df1

Unnamed: 0,Year,"Number of passengers carried, NL","Number of passengers carried, China","Volume of freight carried, NL","Volume of freight carried, China","Flow of containers carried, NL","Flow of containers carried, China"
0,1960,,,,,,
1,1961,,,,,,
2,1962,,,,,,
3,1963,,,,,,
4,1964,,,,,,
...,...,...,...,...,...,...,...
56,2016,40078714.00,487960477.0,4745.958515,21304.5851,12556000.0,197849000.0
57,2017,42763443.00,551234509.0,5855.480476,23323.6147,13911000.0,222155820.0
58,2018,44417573.58,611439830.0,5886.514904,25256.2071,14696000.0,233201600.0
59,2019,46358457.95,659629070.0,5656.442000,25394.5878,14986800.0,242030000.0


In [63]:
fig = px.line(Covid_China, x='date', y= 'new_cases_smoothed_per_million' ,color='location',
                 )

fig.show()

As the graph above shows, China has indeed recorded very few cases. The Netherlands, on the other hand, has a curve that is dictated by the different epidemic waves.

In [64]:
fig = px.line(Covid_China, x='date', y= 'new_cases_smoothed_per_million' ,color='location',
                range_x= ['2020-01-22', '2021-01-22'], range_y=[0,100] )

fig.show()

In [65]:
fig = px.line(Covid_China, x='date', y= 'new_cases_smoothed_per_million' ,color='location',
                range_x= ['2022-03-22', '2022-09-22'], range_y=[0,400] )

fig.show()

Pearson corr

In [None]:
mask = np.triu(np.ones_like(df1.corr(), dtype=bool))
sns.heatmap(df1.corr(), mask=mask, annot=True)  #annot=True

In [None]:
mask = np.triu(np.ones_like(df2.corr(), dtype=bool))
sns.heatmap(df2.corr(), mask=mask, annot=True)  #annot=True

1

In [None]:
fig = px.line(df1, x='Year', y= ['Number of passengers carried, NL', 'Number of passengers carried, China'] )

#fig.add_annotation(x='2019', y=400000000, ax='2019', ay=300000000, text='First Covid cases',
                    #xref='x', yref='y', axref='x', ayref='y')

fig.show()

2

In [None]:
fig = px.line(df1, x='Year', y= ['Volume of freight carried, NL', 'Volume of freight carried, China'] )

fig.show()

3

In [None]:
fig = px.line(df1, x='Year', y= ['Flow of containers carried, NL', 'Flow of containers carried, China'])

fig.show()

4

In [None]:
fig = px.line(df1, x='Year', y= ['Carbon dioxide emissions, NL', 'Carbon dioxide emissions, China'] )

fig.show()

5

In [None]:
fig = px.line(df1, x='Year', y= ['Goods transported by railway, NL', 'Goods transported by railway, China'] )

fig.show()

6

In [None]:
fig = px.line(df2, x='Year', y= ['Life expectancy, NL', 'Life expectancy, China'] )

fig.show()

7

In [None]:
fig = px.line(df2, x='Year', y= ['GDP growth, NL', 'GDP growth, China'] )

fig.show()