# Exploratory Data Analysis: Toussaint Louverture International Airport 2017 - 2020

### Cette etude est 
Project Description:

Introduction
Air traffic, let's say, air transport has had a truly gigantic effect on the world. It is one of the main drivers of globalization. The commercialization of air transport has made the world a smaller, more connected place. People can easily visit, explore everywhere, every corner, even the remotest corners of the earth, allowing us to connect with cultures far removed from our own. Businesses can grow and connect with each other, and goods can be shipped around the world in a matter of hours, travel around the world has become faster with air travel.
That is why I have looked into this topic today to carry out a small study to 

In [23]:
# Import the required packages
import pandas as pd 
import numpy as np  # linear algebra
import pylab as plt
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### Adjusting `matplotlib` parameters

In [24]:
SMALL_SIZE = 12
MEDIUM_SIZE = 14
BIGGER_SIZE = 16

plt.rc('font', size=SMALL_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=SMALL_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=SMALL_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

# Data Preparation

### Obtaining the data

The original data can be seen on the site Toussaint Louverture International Airport (https://www.flightera.net/fr/airport/Port-au-Prince/MTPP/arrival "Toussaint Louverture International Airport"). The selected columns are the same as in this older.

Please refer to [this](https://github.com/Shito3/Airplanes_Analysis/tree/master/Results) notebook on how i scraped the raw data to CSV format.

### Read in the data

In [25]:
# Import Customer Churn Dataset 
url = 'Dataset/airplane_data.csv'
# Read in the data
df = pd.read_csv(url)
# Quick insights into the dataset
df.head()

Unnamed: 0.1,Unnamed: 0,Date / Statut,Vol,De,Arrivée Planifiée,Départ,Arrivé,Durée,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
0,0,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m,,,,,
1,1,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m,,,,,
2,2,"dim, 01. oct 2017 08:00 EDT A Atterri",S6200 Flag of Haiti Sunrise Airways (S6 / KSZ),Santo Domingo (SDQ / MDSD),08:00 EDT,07:15 AST à temps,08:00 EDT à temps,45m,,,,,
3,3,"dim, 01. oct 2017 08:02 EDT A Atterri",AA1158 AAL1158 American Airlines (AA / AAL),Fort Lauderdale (FLL / KFLL),08:02 EDT,06:09 EDT 9 min retard,07:46 EDT 16 min tôt,1h 36m,,,,,
4,4,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...


### Let's delete the columns that are not part of the table.

In [26]:
df = df.drop(['Unnamed: 7','Unnamed: 0','Unnamed: 8','Unnamed: 9','Unnamed: 10','Unnamed: 11'],axis=1)

In [27]:
df.head() #Let's look at the result now.

Unnamed: 0,Date / Statut,Vol,De,Arrivée Planifiée,Départ,Arrivé,Durée
0,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m
1,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m
2,"dim, 01. oct 2017 08:00 EDT A Atterri",S6200 Flag of Haiti Sunrise Airways (S6 / KSZ),Santo Domingo (SDQ / MDSD),08:00 EDT,07:15 AST à temps,08:00 EDT à temps,45m
3,"dim, 01. oct 2017 08:02 EDT A Atterri",AA1158 AAL1158 American Airlines (AA / AAL),Fort Lauderdale (FLL / KFLL),08:02 EDT,06:09 EDT 9 min retard,07:46 EDT 16 min tôt,1h 36m
4,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...


In [51]:
# Verifions s'il y ades valeurs nulles
df[df['Date / Statut'].isna()]

Unnamed: 0,Date / Statut,Vol,De,Arrivée Planifiée,Départ,Arrivé,Durée


In [52]:
# Faisons la somme de toutes les valeurs nulles
df.isnull().sum() #

Date / Statut        0
Vol                  0
De                   0
Arrivée Planifiée    0
Départ               0
Arrivé               0
Durée                0
dtype: int64

In [53]:
# Delete a missing values
df.dropna() 

Unnamed: 0,Date / Statut,Vol,De,Arrivée Planifiée,Départ,Arrivé,Durée
0,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m
1,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m
2,"dim, 01. oct 2017 08:00 EDT A Atterri",S6200 Flag of Haiti Sunrise Airways (S6 / KSZ),Santo Domingo (SDQ / MDSD),08:00 EDT,07:15 AST à temps,08:00 EDT à temps,45m
3,"dim, 01. oct 2017 08:02 EDT A Atterri",AA1158 AAL1158 American Airlines (AA / AAL),Fort Lauderdale (FLL / KFLL),08:02 EDT,06:09 EDT 9 min retard,07:46 EDT 16 min tôt,1h 36m
4,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...
...,...,...,...,...,...,...,...
31988,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...
31995,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...,(adsbygoogle = window.adsbygoogle || []).push(...
31996,"sam, 18. avr 09:00 EDT A Atterri",B66123 JBU6123 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),09:00 EDT,06:40 EDT planifié,12:37 EDT 3 h 37 min retard,5h 57m
32000,"sam, 18. avr 09:00 EDT A Atterri",B66123 JBU6123 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),09:00 EDT,06:40 EDT planifié,12:37 EDT 3 h 37 min retard,5h 57m


In [54]:
df.info() # About DF

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29375 entries, 0 to 32003
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Date / Statut      29375 non-null  object
 1   Vol                29375 non-null  object
 2   De                 29375 non-null  object
 3   Arrivée Planifiée  29375 non-null  object
 4   Départ             29375 non-null  object
 5   Arrivé             29375 non-null  object
 6   Durée              29375 non-null  object
dtypes: object(7)
memory usage: 1.0+ MB


In [55]:
df.dropna(inplace = True)

In [56]:
# Suppression des ADS de Google dans le Dataframe
df_final = df[~df['Date / Statut'].str.contains('window.')]

In [57]:
# fonction permettant de separer l'heure de départ et le retard
def split_date_statut(x,index):
    return x.split('EDT')[index]

In [58]:
def split_date(x):
    return x.split('EDT')[0].strip()

In [59]:
split_date_statut('dim, 01. oct 2017 07:48 EDT A Atterri',0)

'dim, 01. oct 2017 07:48 '

In [60]:
split_date_statut('dim, 01. oct 2017 07:48 EDT A Atterri',-1)

' A Atterri'

In [61]:
split_date('07:48 EDT')

'07:48'

In [62]:
df_final['Date'] = df_final['Date / Statut'].apply(lambda x: split_date_statut(x,0))
df_final['statut'] =df_final['Date / Statut'].apply(lambda x: split_date_statut(x,-1))
df_final['arrivee heure planifiee'] =df_final['Arrivée Planifiée'].apply(lambda x: split_date(x))
df_final['depart heure'] = df_final['Départ'].apply(lambda x: split_date_statut(x,0))
df_final['depart statut'] = df_final['Départ'].apply(lambda x: split_date_statut(x,-1))
df_final['Arrivé heure'] = df_final['Arrivé'].apply(lambda x: split_date_statut(x,0))
df_final['Arrivé statut'] = df_final['Arrivé'].apply(lambda x: split_date_statut(x,-1))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['Date'] = df_final['Date / Statut'].apply(lambda x: split_date_statut(x,0))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['statut'] =df_final['Date / Statut'].apply(lambda x: split_date_statut(x,-1))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_final['arrivee heure planifiee

In [63]:
df_final

Unnamed: 0,Date / Statut,Vol,De,Arrivée Planifiée,Départ,Arrivé,Durée,Date,statut,arrivee heure planifiee,depart heure,depart statut,Arrivé heure,Arrivé statut
0,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m,"dim, 01. oct 2017 07:48",A Atterri,07:48,06:16,16 min retard,07:54,6 min retard
1,"dim, 01. oct 2017 07:48 EDT A Atterri",B61509 JBU1509 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),07:48 EDT,06:16 EDT 16 min retard,07:54 EDT 6 min retard,1h 37m,"dim, 01. oct 2017 07:48",A Atterri,07:48,06:16,16 min retard,07:54,6 min retard
2,"dim, 01. oct 2017 08:00 EDT A Atterri",S6200 Flag of Haiti Sunrise Airways (S6 / KSZ),Santo Domingo (SDQ / MDSD),08:00 EDT,07:15 AST à temps,08:00 EDT à temps,45m,"dim, 01. oct 2017 08:00",A Atterri,08:00,07:15 AST à temps,07:15 AST à temps,08:00,à temps
3,"dim, 01. oct 2017 08:02 EDT A Atterri",AA1158 AAL1158 American Airlines (AA / AAL),Fort Lauderdale (FLL / KFLL),08:02 EDT,06:09 EDT 9 min retard,07:46 EDT 16 min tôt,1h 36m,"dim, 01. oct 2017 08:02",A Atterri,08:02,06:09,9 min retard,07:46,16 min tôt
5,"dim, 01. oct 2017 08:02 EDT A Atterri",AA1158 AAL1158 American Airlines (AA / AAL),Fort Lauderdale (FLL / KFLL),08:02 EDT,06:09 EDT 9 min retard,07:46 EDT 16 min tôt,1h 36m,"dim, 01. oct 2017 08:02",A Atterri,08:02,06:09,9 min retard,07:46,16 min tôt
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31950,"sam, 11. avr 11:23 EDT A Atterri",NK6450 NKS6450 Spirit Airlines (NK / NKS),Fort Lauderdale (FLL / KFLL),11:23 EDT,09:00 EDT planifié,11:25 EDT 2 min retard,2h 25m,"sam, 11. avr 11:23",A Atterri,11:23,09:00,planifié,11:25,2 min retard
31983,"jeu, 16. avr 21:45 EDT A Atterri",M6847 AJT847 Amerijet International (M6 / AJT),Miami (MIA / KMIA),21:45 EDT,19:15 EDT planifié,21:45 EDT planifié,2h 30m,"jeu, 16. avr 21:45",A Atterri,21:45,19:15,planifié,21:45,planifié
31986,"jeu, 16. avr 21:45 EDT A Atterri",M6847 AJT847 Amerijet International (M6 / AJT),Miami (MIA / KMIA),21:45 EDT,19:15 EDT planifié,21:45 EDT planifié,2h 30m,"jeu, 16. avr 21:45",A Atterri,21:45,19:15,planifié,21:45,planifié
31996,"sam, 18. avr 09:00 EDT A Atterri",B66123 JBU6123 JetBlue Airways (B6 / JBU),Fort Lauderdale (FLL / KFLL),09:00 EDT,06:40 EDT planifié,12:37 EDT 3 h 37 min retard,5h 57m,"sam, 18. avr 09:00",A Atterri,09:00,06:40,planifié,12:37,3 h 37 min retard


In [64]:
# Voyons le resultat
df_final[['statut','Date']]

Unnamed: 0,statut,Date
0,A Atterri,"dim, 01. oct 2017 07:48"
1,A Atterri,"dim, 01. oct 2017 07:48"
2,A Atterri,"dim, 01. oct 2017 08:00"
3,A Atterri,"dim, 01. oct 2017 08:02"
5,A Atterri,"dim, 01. oct 2017 08:02"
...,...,...
31950,A Atterri,"sam, 11. avr 11:23"
31983,A Atterri,"jeu, 16. avr 21:45"
31986,A Atterri,"jeu, 16. avr 21:45"
31996,A Atterri,"sam, 18. avr 09:00"


In [49]:
# fonction permettant de separer l'heure d'arrivée et le retard
def heure_Arrivée(x):
    return x.split('EDT')[0].strip()

In [50]:
heure_Arrivée('07:54 EDT')

'07:54'