In this section, the purpose is to find how the centrality of airports changes/not changes before and after the pandemic.

In [1]:
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import plotly
import datetime as dt
import networkx as nx
import numpy as np

In [2]:
df = pd.read_csv('flightdata.csv')

First, the top 200 airports with the most departing flights are taken into account, which contains 41.8% of the total data. The assumption here is the airport centrality are approximately positively correlated, so the dataset of the top 200 airports with the most flights is sufficient to find...

In [3]:
#top 200 airports
df_count = df['origin'].value_counts().rename_axis('origin').reset_index(name='counts')
df_200 = df[df['origin'].isin(df_count.head(200)['origin']) & df['destination'].isin(df_count.head(200)['origin'])]
len(df_200)
len(df_200)/len(df)

0.4179483271862255

In [4]:
cities = df_200['origin'].value_counts()
df_cities = pd.DataFrame(cities)
df_cities['origin'] = df_cities.index
df_cities.head(10)

Unnamed: 0,origin
KORD,KORD
KATL,KATL
KLAX,KLAX
KDEN,KDEN
KDFW,KDFW
KLAS,KLAS
KPHX,KPHX
KSEA,KSEA
KSFO,KSFO
KCLT,KCLT


In [5]:
df_0120 = df_200[df_200['day'].str.startswith('2020-01')]
df_0120.head()
len(df_0120)

717756

In [6]:
df_0120_OD = df_0120.iloc[:, 1:3]
df_0120_OD

Unnamed: 0,origin,destination
4350197,YSSY,EHAM
4350199,RKSI,LIRF
4350200,SBGR,LOWW
4350201,RJBB,EGKK
4350202,KLAX,YSSY
...,...,...
5764282,KRDU,KCLT
5764308,KSMF,KSFO
5764309,KCOS,KAPA
5764317,KJFK,KEWR


In [7]:
airport = np.arange(0, 200).tolist()
df_airport = pd.DataFrame(airport)

In [8]:
df_cities['No.'] = airport
df_cities.head()

Unnamed: 0,origin,No.
KORD,KORD,0
KATL,KATL,1
KLAX,KLAX,2
KDEN,KDEN,3
KDFW,KDFW,4


In [23]:
#dictionary
dic_airport = dict(zip(df_cities['origin'],df_cities['No.']))
dic_airport_re = dict(zip(df_cities['No.'],df_cities['origin']))

In [10]:
#replace
df_num = df_0120_OD.replace(dic_airport)

In [11]:
df_odpair = df_num.iloc[:, 0:2]
df_odpair

Unnamed: 0,origin,destination
4350197,53,17
4350199,70,63
4350200,135,52
4350201,110,92
4350202,2,53
...,...,...
5764282,45,9
5764308,56,8
5764309,176,155
5764317,12,11


In [12]:
arr_od = df_odpair.values
arr_od

array([[ 53,  17],
       [ 70,  63],
       [135,  52],
       ...,
       [176, 155],
       [ 12,  11],
       [128,  74]], dtype=int64)

In [14]:
G = nx.Graph()
G.add_nodes_from(airport)
G.add_edges_from(arr_od)

In [15]:
arr_eigen = nx.eigenvector_centrality(G)
df_eigen = pd.DataFrame(arr_eigen.items(), columns=['Node', 'Centrality'])

In graph theory, eigenvector centrality (also called eigencentrality or prestige score is a measure of the influence of a node in a network. Relative scores are assigned to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score means that a node is connected to many nodes who themselves have high scores.

In [25]:
df_eigen100 = df_eigen.sort_values(by = ['Centrality'], ascending = False).head(100)
df_eigen100

Unnamed: 0,Node,Centrality
2,2,0.127207
0,0,0.125117
27,27,0.122492
11,11,0.122211
81,81,0.120004
...,...,...
194,194,0.063857
95,95,0.063821
33,33,0.063130
64,64,0.063067


In [26]:
df_eigenrank = df_eigen100.replace(dic_airport_re)
df_eigenrank

Unnamed: 0,Node,Centrality
2,KLAX,0.127207
0,KORD,0.125117
27,KIAD,0.122492
11,KEWR,0.122211
81,KTEB,0.120004
...,...,...
194,KFTW,0.063857
95,EBBR,0.063821
33,VHHH,0.063130
64,EIDW,0.063067


# Next steps


- Compute the eigen centrality of other chosen months
- Draw the animation of dynamic eigen centrality ranking
- Conclude how the centrality changes/not changes before and after the pandemic