# Project Group - 5

Members: Ahmad Nabil Maulana, Daan Michel, Gijs Aben, Philine Cremers

Student numbers: 5943442, 4684559 ,    4713656   ,     5036534

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

The research objective is to analyse the Impact of Gross Domestic Product pet capita (GDP) on the number of air traffic movements within a country. 
**(We would like to have some feedback on our research objective if is defined enough or specific enough. Furthermore we would like to know if you foresee any problems that we could have with the data packages that we're going to use, especially for the air traffic data. At last we would like to hear your suggestions regarding data visualization, because we think this project is really straightforward, so we like some suggestions on how we're going to handle this project)**

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1**: Ahmad Nabil Maulana : Creating the repository,

**Author 2**: Daan Michel

**Author 3**: Philine Cremers

**Author 4**: Gijs Aben

# Background and Context

For this project, we will analyze the relationship between a country's GDP per capita and its air traffic mobility. GDP measures the monetary value of final goods and services produced within a country's borders during a specific period, such as a quarter or a year. We aim to determine whether countries with higher GDPs exhibit greater air traffic mobility compared to those with lower GDPs, or if there is no discernible correlation

For the flight data we found two different data sets with each its own pros and cons. \
The different data sets form OpenSky provide information about all the aircraft movements. We can use all air traffic movements to and from all the airports in a country. The data will be filtered by only picking comercial aircraft and cargo aircraft, private aircraft will not be considered. Furthermore the origin country of the departed flights will be used, which gives us an overview of amount of takeoffs per country. This should be a good indicator for the relative amount of air traffic movements per country when compared to the GDP. However the amount of data that needs to be processed for this is huge and could potentially lead to problems.\
The second data set we found is from The World Bank: 'Air transport, registered carrier departures worldwide'. This data discribes the amount of registerd aircraft takeoffs on scheduled services (i.e. plannend commercial flights both passenger and cargo) by country the aircraft is registered in. The data set gives us takeoffs per country per year which should make processing the data not too complicated and thus make it possible to go for an overview of multiple years. However, this data might not completely represent the goals of our research objective since it does not discripe the amount of aircraft movements per country. An aircraft can be registered in a country where it does not depart from or arrive in.

As the first step, we need to import the necessary libraries.

In [1]:
# this can be changed
import pandas as pd
from pathlib import Path
import numpy as np
import math
import scipy
from scipy.signal import find_peaks
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
import geopandas as gpd
import os

# Part I - Data Import

First, we're going to import and combine dataframes from the two datas that we will be using:

* Countries' GDP data from The World Bank.
* Countries' air traffic mobility data from OpenSky or The World Bank.

# GDP Data Import

In [2]:
# import the GDP Data for all the states in the USA
file_path_GDP = 'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/GDP-US/GDP_per_state_all_states.csv'
df_GDP = pd.read_csv(file_path_GDP)

df_GDP.head()

Unnamed: 0,GeoFIPS,GeoName,Region,Unit,2005:Q1,2005:Q2,2005:Q3,2005:Q4,2006:Q1,2006:Q2,...,2020:Q4,2021:Q1,2021:Q2,2021:Q3,2021:Q4,2022:Q1,2022:Q2,2022:Q3,2022:Q4,2023:Q1
0,"""01000""",Alabama,5,Millions of chained 2012 dollars,182600.4,184625.7,184134.9,186116.9,186539.0,187489.6,...,204400.7,207001.8,209857.1,210029.1,213029.2,212789.5,212311.6,212946.3,215011.6,215084.4
1,"""02000""",Alaska,8,Millions of chained 2012 dollars,45176.1,45776.9,45501.7,46173.5,47346.6,48911.2,...,51069.4,50690.3,50707.9,50935.9,51143.5,49070.4,48963.4,49999.1,50501.7,50700.8
2,"""04000""",Arizona,6,Millions of chained 2012 dollars,256521.4,260884.6,265619.7,266278.2,271247.2,271897.9,...,338182.1,339659.7,344936.2,348706.0,357322.0,355309.3,353565.6,356966.0,359827.3,362191.9
3,"""05000""",Arkansas,5,Millions of chained 2012 dollars,104693.3,105271.5,105872.6,108082.4,108122.4,109421.7,...,119689.9,121860.2,123015.2,123676.9,124837.0,126803.5,125830.9,126248.3,127245.8,127312.2
4,"""06000""",California,8,Millions of chained 2012 dollars,1896451.4,1913819.0,1940272.3,1956825.6,1998965.4,1992185.0,...,2746311.9,2802569.5,2858504.7,2894880.5,2942968.5,2870410.5,2866766.2,2893948.3,2911384.3,2919913.3


# The USA Flight Data Import

In [None]:
# import the file for the flight data departing from the USA in 2005-2022
import pandas as pd

# merge all the data
# DONT RUN THIS
# List of CSV file URLs
csv_files = [
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2005.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2006.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2007.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2008.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2009.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2010.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2011.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2012.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2013.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2014.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2015.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2016.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2017.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2018.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2019.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2020.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2021.csv',
    'https://raw.githubusercontent.com/anmaulana1/big-project/main/Data/Flight%20data/International%20data/T_T100I_SEGMENT_ALL_CARRIER_INTERNATIONAL_2022.csv',
]

# Initialize an empty list to store dataframes
dfs = []

# Load and append each CSV file from URLs to the list of dataframes
for url in csv_files:
    df = pd.read_csv(url)
    dfs.append(df)

# Concatenate the dataframes vertically
merged_flight_df = pd.concat(dfs, ignore_index=True)

# Save the merged dataframe to a new CSV file
#merged_flight_df.to_csv('merged_flight_data.csv', index=False)

In [3]:
# the file is too big to be loaded from github (more than 100 MB)
# change the csv to your local directory
df_flight = pd.read_csv("C:/Users/ahmad/Downloads/merged_flight_data.csv")

# To check if the data is merged correctly
# Filter the DataFrame for the year 2020
df_2020 = df_flight[df_flight['YEAR'] == 2020]

# Display the last rows of the filtered DataFrame
df_2020.tail()

Unnamed: 0,DEPARTURES_PERFORMED,PAYLOAD,SEATS,PASSENGERS,UNIQUE_CARRIER,UNIQUE_CARRIER_NAME,REGION,CARRIER,CARRIER_NAME,ORIGIN_AIRPORT_ID,ORIGIN,ORIGIN_CITY_NAME,ORIGIN_COUNTRY_NAME,DEST_AIRPORT_ID,DEST,DEST_CITY_NAME,DEST_COUNTRY_NAME,AIRCRAFT_TYPE,YEAR,QUARTER
1254199,248.0,18805840.0,44392.0,27109.0,09Q,"Swift Air, LLC d/b/a Eastern Air Lines d/b/a E...",D,09Q,"Swift Air, LLC d/b/a Eastern Air Lines d/b/a E...",13303,MIA,"Miami, FL",United States,12073,HAV,"Havana, Cuba",Cuba,617,2020,4
1254200,252.0,19109160.0,45108.0,16744.0,09Q,"Swift Air, LLC d/b/a Eastern Air Lines d/b/a E...",I,09Q,"Swift Air, LLC d/b/a Eastern Air Lines d/b/a E...",12073,HAV,"Havana, Cuba",Cuba,13303,MIA,"Miami, FL",United States,617,2020,4
1254201,255.0,4767479.0,18870.0,10229.0,PD,"Porter Airlines, Inc.",I,PD,"Porter Airlines, Inc.",16215,YTZ,"Toronto, Canada",Canada,11618,EWR,"Newark, NJ",United States,482,2020,1
1254202,256.0,4786175.0,18944.0,11437.0,PD,"Porter Airlines, Inc.",I,PD,"Porter Airlines, Inc.",11618,EWR,"Newark, NJ",United States,16215,YTZ,"Toronto, Canada",Canada,482,2020,1
1254203,316.0,35603605.0,0.0,0.0,LA,Lan-Chile Airlines,I,LA,Lan-Chile Airlines,14717,SCL,"Santiago, Chile",Chile,13303,MIA,"Miami, FL",United States,889,2020,2


# Part II - Data Processing