# SETUP

*Do not change the content of the cells under __SETUP__ and __TESTS__*

*Work only in the __YOUR WORK__ area*


In [1]:
# DO NOT EDIT OR CHANGE THE CONTENT OF THIS CELL
scenario = 0

In [2]:
import pandas as pd
import networkx as nx

# Network analytics


In this question, we will analyze flight traffic data from [OpenFlights](https://openflights.org/data.html)

As of June 2014, the OpenFlights/Airline Route Mapper Route Database contains 67663 routes between 3321 airports on 548 airlines spanning the globe, as shown in the map above. Each entry contains the following information:

- Airline: 2-letter (IATA) or 3-letter (ICAO) code of the airline.
- Airline ID: Unique OpenFlights identifier for airline (see Airline).
- Source airport: 3-letter (IATA) or 4-letter (ICAO) code of the source airport.
- Source airport ID: Unique OpenFlights identifier for source airport (see Airport)
- Destination airport: 3-letter (IATA) or 4-letter (ICAO) code of the destination airport.
- Destination airport ID: Unique OpenFlights identifier for destination airport (see Airport)
- Codeshare: "Y" if this flight is a codeshare (that is, not operated by Airline, but another carrier), empty otherwise.
- Stops: Number of stops on this flight ("0" for direct)
- Equipment: 3-letter codes for plane type(s) generally used on this flight, separated by spaces

### Notes:
- Routes are directional: if an airline operates services from A to B and from B to A, both A-B and B-A are listed separately.
- Routes where one carrier operates both its own and codeshare flights are listed only once. 

### Questions
1. We will focus on the flights without any stops. Filter out this subset of data. How many flights are there?
2. Load the data into a directed network. An directed edge between two airports maps to a flight from the source airport to the destination airport. How many nodes and edges are there in the network.
3. What is the shortest path between Athens Ben Epps Airport `AHN` and Kazan International Airport (in Russia) `KZN`?
4. Rank the airport based on their degree centrality in the network. What are the ten most central airports? Report the results in a list
5. Rank the airport based on their betweenness centrality in the network. What are the ten most central airports? Report the results in a list.
6. Rank the airport based on their closeness centrality in the network. What are the ten most central airports? Report the results in a list.
7. What are the airports that show up in the three previous lists. Return the results in a sorted list.

In [3]:
data = pd.read_csv('flights.csv')
data.head()

Unnamed: 0,Airline,Airline ID,Source airport,Source airport ID,Destination airport,Destination airport ID,Codeshare,Stops,Equipment
0,2B,410,AER,2965,KZN,2990,,0,CR2
1,2B,410,ASF,2966,KZN,2990,,0,CR2
2,2B,410,ASF,2966,MRV,2962,,0,CR2
3,2B,410,CEK,2968,KZN,2990,,0,CR2
4,2B,410,CEK,2968,OVB,4078,,0,CR2


### Notes
- Write code the calculate the following numbers. Store each answer in a variable `test_#`. For example, the answer for the first question should be stored in `test_1`
- Before you submit your work, ensure you clean up your notebook. Your notebook has to run without an error in order to be tested. The easiest way to ensure is to `Kernel->Restart & Run All`
- Answers are provided below for your convenience
- __AGAIN__ Don't change anything in the __SETUP__ and __TEST__ sections

In [4]:
test_1=test_2=test_3=test_4=test_5=test_6=test_7=0.0

# YOUR WORK

In [5]:
df = data[data['Stops']==0][['Airline', 'Source airport', 'Destination airport']]
test_1 = len(df)

In [6]:
net = nx.DiGraph()

In [7]:
net.add_edges_from(df[['Source airport', 'Destination airport']].values)

In [8]:
test_2 = len(net.nodes), len(net.edges)

In [9]:
test_3 = nx.shortest_path(net, 'AHN', 'KZN')

In [10]:
dc = nx.degree_centrality(net)

In [11]:
test_4 = list(pd.Series(dc).sort_values(ascending=False).head(10).index)

In [12]:
bc = nx.betweenness_centrality(net)

In [13]:
test_5 = list(pd.Series(bc).sort_values(ascending=False).head(10).index)

In [14]:
cc = nx.closeness_centrality(net)

In [15]:
test_6 = list(pd.Series(cc).sort_values(ascending=False).head(10).index)

In [16]:
test_7 = sorted(list( set(test_4) & set(test_5) & set(test_6)))

# TESTS

In [17]:
### TEST 1
test_1

67652

In [18]:
## TEST 2
test_2

(3425, 37595)

In [19]:
## TEST 3
test_3

['AHN', 'BNA', 'IAH', 'DME', 'KZN']

In [20]:
## TEST 4
test_4

['FRA', 'CDG', 'AMS', 'IST', 'ATL', 'PEK', 'ORD', 'MUC', 'DME', 'DFW']

In [21]:
## TEST 5
test_5

['ANC', 'LAX', 'CDG', 'DXB', 'FRA', 'PEK', 'ORD', 'SEA', 'AMS', 'YYZ']

In [22]:
## TEST 6
test_6

['FRA', 'CDG', 'LHR', 'DXB', 'AMS', 'LAX', 'JFK', 'YYZ', 'IST', 'ORD']

In [23]:
## TEST 7
test_7

['AMS', 'CDG', 'FRA', 'ORD']