# Bachelor Thesis: Predicting flight delays
#### Author: Tygo Francissen, s1049742, Radboud University
This thesis aims to predict flight delays in the United States and Brazil by using a broad scale of machine learning algorithms and transfer learning.

### 1.1 Gathering the data
The data sets are gathered from the [Bureau of Transportation Statistics](https://www.transtats.bts.gov) and the [VRA](https://sas.anac.gov.br/sas/bav/view/frmConsultaVRA). First, we extract the data and have a look at it:

In [168]:
import pandas as pd
import numpy as np
from deep_translator import GoogleTranslator

# Store the data sets for January 2022
data_brazil = pd.read_excel('../Data/VRA_20230228124648.xlsx')
data_usa = pd.read_csv('../Data/T_ONTIME_REPORTING.csv')

# Translate the column names of the Brazilian data set
translator = GoogleTranslator(source='auto', target='en')
for i, _ in enumerate(data_brazil.columns.values):
    data_brazil.columns.values[i] = translator.translate(data_brazil.columns.values[i])

In [169]:
# Create dictionary to look up full name of airline ICAO code
values = [translator.translate(name) for name in data_brazil["airline"].unique()]
airline_code_dict = dict(zip(data_brazil["Acronym ICAO Airline"].unique(), values))

# Create dictionary to look up full name of airport ICAO code
keys = pd.unique(np.concatenate((data_brazil["Acronym ICAO Airport Origin"].unique(),data_brazil["Acronym ICAO Destiny Airport"].unique())))
airports = pd.unique(np.concatenate((data_brazil["Description Origin Airport"].unique(),data_brazil["Destination Airport Description"].unique())))
values = [translator.translate(name) for name in airports]
airport_code_dict = dict(zip(keys, values))

In [170]:
# Translate remaining Brazilian words in data set
for word in data_brazil["Flight Status"].unique():
    data_brazil["Flight Status"] = data_brazil["Flight Status"].replace(word, translator.translate(word))

for word in data_brazil["Starting Status"].unique():
    if not type(word)==float:
        data_brazil["Starting Status"] = data_brazil["Starting Status"].replace(word, translator.translate(word))

for word in data_brazil["Arrival Status"].unique():
    if not type(word)==float:
        data_brazil["Arrival Status"] = data_brazil["Arrival Status"].replace(word, translator.translate(word))