Installing and loading all the required libraries for the task of data loading, visualization, maninpulation, machine learning, sentimental analysis and creating the dashboard.

In [None]:
!pip install dash

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from scipy import stats
from scipy.stats import f_oneway
from scipy.stats import chi2_contingency
from scipy.stats import mannwhitneyu
from scipy.stats import wilcoxon
from scipy.stats import kruskal

loading the first data which is in csv format and having a glimpse of the data.

In [2]:
df = pd.read_csv('TAM07.20231217131259.csv')
df.head()

Unnamed: 0,STATISTIC,Statistic Label,TLIST(M1),Month,C02935V03550,Airports in Ireland,C02191V04000,Country,C02354V02832,Direction,C02936V03551,Flight Type,UNIT,VALUE
0,TAM07C01,Passengers,202001,2020 January,EI0M,All main airports,-,All Countries,-,All directions,-,All flights,Thousand,2388.4
1,TAM07C01,Passengers,202001,2020 January,EI0M,All main airports,-,All Countries,-,All directions,1,Scheduled,Thousand,2365.5
2,TAM07C01,Passengers,202001,2020 January,EI0M,All main airports,-,All Countries,-,All directions,2,Unscheduled,Thousand,22.8
3,TAM07C01,Passengers,202001,2020 January,EI0M,All main airports,-,All Countries,3,Arrival,-,All flights,Thousand,1200.0
4,TAM07C01,Passengers,202001,2020 January,EI0M,All main airports,-,All Countries,3,Arrival,1,Scheduled,Thousand,1187.7


Checking the info of the data, like data types, count and it's shape.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 430110 entries, 0 to 430109
Data columns (total 14 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   STATISTIC            430110 non-null  object 
 1   Statistic Label      430110 non-null  object 
 2   TLIST(M1)            430110 non-null  int64  
 3   Month                430110 non-null  object 
 4   C02935V03550         430110 non-null  object 
 5   Airports in Ireland  430110 non-null  object 
 6   C02191V04000         430110 non-null  object 
 7   Country              430110 non-null  object 
 8   C02354V02832         430110 non-null  object 
 9   Direction            430110 non-null  object 
 10  C02936V03551         430110 non-null  object 
 11  Flight Type          430110 non-null  object 
 12  UNIT                 430110 non-null  object 
 13  VALUE                430110 non-null  float64
dtypes: float64(1), int64(1), object(12)
memory usage: 45.9+ MB


In [4]:
df.shape

(430110, 14)

checking the unique countries in the dataset

In [5]:
#finding the countries in our dataset
df.Country.unique()

array(['All Countries', 'Ireland (domestic)', 'Austria', 'Belgium',
       'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia',
       'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Italy',
       'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands',
       'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain',
       'Canary Islands', 'Sweden', 'EU27 excl Ireland 2020',
       'United Kingdom (1)', 'England', 'Northern Ireland', 'Scotland',
       'Wales', 'Other UK (1)', 'Iceland', 'Norway', 'Russia', 'Serbia',
       'Switzerland', 'Turkiye', 'Other Europe (3)',
       'Europe (48 Countries)', 'America', 'Canada', 'United States',
       'Other America (5)', 'Africa (9)', 'Eygpt', 'Morocco', 'Tunisia',
       'Other Africa (6)', 'Asia (8)', 'Bahrain', 'Israel',
       'United Arab Emirates', 'Other Asian countries (4)',
       'Oceania and Polar regions (1)'], dtype=object)

- Since the data has several countries,we need to selected only a few countries that we will compare with Ireland.
- Subsetting the data by selecting Ireland and countries that have a population proportion to it which are Denmark, Finland and Slovakia.

In [6]:
# filtering to get countries with proportinal population to Ireland
selected_countries = ['Ireland (domestic)', 'Denmark', 'Finland', 'Slovakia']

# Filter rows where the 'Country' column is in the list of selected countries
df_selected = df[df['Country'].isin(selected_countries)]

# renaming Ireland
df_selected.loc[df_selected['Country'] == 'Ireland (domestic)', 'Country'] = 'Ireland'

It can be noted we have the selected countries and our data and shape of the data has changed.

In [7]:
df_selected.Country.unique()

array(['Ireland', 'Denmark', 'Finland', 'Slovakia'], dtype=object)

In [8]:
df_selected.shape

(29160, 14)

- Filtering the data to remain with columns that will be needed for analysis.
- 8 columns are kept these will be needed for analysis.

In [9]:
# keeping required columns
selected_columns = ['Statistic Label', 'Month', 'Airports in Ireland', 'Country', 'Direction', 'Flight Type', 'UNIT', 'VALUE']

# Create a new DataFrame with the selected columns
df_selected = df_selected[selected_columns]

# Replace spaces with underscores in column names
df_selected.columns = df_selected.columns.str.replace(' ', '_')

In [10]:
df_selected.info()

<class 'pandas.core.frame.DataFrame'>
Index: 29160 entries, 9 to 429803
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Statistic_Label      29160 non-null  object 
 1   Month                29160 non-null  object 
 2   Airports_in_Ireland  29160 non-null  object 
 3   Country              29160 non-null  object 
 4   Direction            29160 non-null  object 
 5   Flight_Type          29160 non-null  object 
 6   UNIT                 29160 non-null  object 
 7   VALUE                29160 non-null  float64
dtypes: float64(1), object(7)
memory usage: 2.0+ MB


Creating a new column from the month column by splitting the month column. The new year column will be needed for easy vizualization during the explnatory data analysis.

In [11]:
# creating year column by splitting the month column
df_selected[['Year', 'Month']] = df_selected['Month'].str.split(expand=True)

In [12]:
df_selected.head()

Unnamed: 0,Statistic_Label,Month,Airports_in_Ireland,Country,Direction,Flight_Type,UNIT,VALUE,Year
9,Passengers,January,All main airports,Ireland,All directions,All flights,Thousand,9.7,2020
10,Passengers,January,All main airports,Ireland,All directions,Scheduled,Thousand,9.7,2020
11,Passengers,January,All main airports,Ireland,All directions,Unscheduled,Thousand,0.0,2020
12,Passengers,January,All main airports,Ireland,Arrival,All flights,Thousand,5.1,2020
13,Passengers,January,All main airports,Ireland,Arrival,Scheduled,Thousand,5.0,2020
