# US Air Traffic visualisation

First let's import all the libraries that we will use and read the datasets.

In [1]:
from itertools import combinations
import pickle
import numpy as np
import pandas as pd
from colour import Color
import folium

In [2]:
airports = pd.read_csv("data/airports.csv")
flights = pd.read_csv("data/flights.csv")
busiests = pd.read_csv("data/busiests.csv")

## Traffic modelisation

The `flights` dataset contains all 2018 flights within the US. Let's imagine a graph with the edges representing the airports and the vertices the journeys (both ways).

Let's make a list of all airports that we will work with, and filter our `aiports` dataset thanks to this list.
Let's also set the IATA code as index as it will be the aiport identifaication.

In [3]:
airport_list = list(set(flights["Origin"]) | set(flights["Dest"]))
airports = airports[airports["iata_code"].isin(airport_list)]
airports = airports.set_index("iata_code")
airports[:3]

Unnamed: 0_level_0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,local_code,home_link,wikipedia_link,keywords
iata_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
BKG,45961,BBG,small_airport,Branson Airport,36.532082,-93.200544,1302.0,,US,US-MO,Branson,yes,KBBG,BBG,http://flybranson.com/,https://en.wikipedia.org/wiki/Branson_Airport,
ABE,3356,KABE,medium_airport,Lehigh Valley International Airport,40.6521,-75.440804,393.0,,US,US-PA,Allentown,yes,KABE,ABE,,https://en.wikipedia.org/wiki/Lehigh_Valley_In...,
ABI,3357,KABI,medium_airport,Abilene Regional Airport,32.411301,-99.6819,1791.0,,US,US-TX,Abilene,yes,KABI,ABI,,https://en.wikipedia.org/wiki/Abilene_Regional...,


Now, out of the airport list, let's compute a list of all journeys.

A journey will be modelised as `frozenset` object (not ordered tuple) as we do not consider a way for the journeys. We will use the iata codes of the airport as identification. For instance between Los Angeles International Airport and John F. Kennedy International Airport in New York, the vertice will be : `frozenset({'JFK', 'LAX'})`

In [4]:
journeys = list(map(lambda j: frozenset(j), combinations(airport_list, 2)))
journeys[:3]

[frozenset({'CRP', 'LAX'}),
 frozenset({'DCA', 'LAX'}),
 frozenset({'CLE', 'LAX'})]

Now we want to know how much traffic there is on every journeys. So let's create a dictionnary `traffic` with the vertices as keys, in order to access easily the journey with the IATA codes of the departure and the arrival. We will set the values to zero at first and will iterate on all `flights` to count the nuber of time the journey has been used.

In [5]:
traffic = {j: 0 for j in journeys}

In [6]:
def add_flight(row):
    j = frozenset([row["Origin"], row["Dest"]])
    try:
        traffic[j] += 1
    except:
        pass

As the `flight` dataset is quite big (9.5 M lines), the data takes a bit of time to compute, so I saved the `traffic` dictionnary in a `pickle` file in oreder to not compute again the data if needed.

In [7]:
# _ = flights.apply(add_flight, axis=1)

# with open("traffic.pickle", "wb") as pick:
#     pickle.dump(traffic, pick)

In [8]:
with open("data/traffic.pickle", "rb" ) as pick:
    traffic = pickle.load(pick)

In [9]:
traffic[frozenset(["JFK", "LAX"])]

52447

## Data Visualisation

#### Preparation

Let's create a Folium map, centered on the United States.

In [10]:
location = [37.0902405,-95.7128906]
fmap = folium.Map(location=location, zoom_start=4)

#### Plotting journeys

Let's plot a line joining every cities of the journeys.

In order to keep a clean and readable map, we will plot journeys accordingly to the amount of traffic they represent, particularly by playing on line opacity, line weight and color gradient.

In [11]:
colors = list(Color("green").range_to(Color("red"),100))

def get_color(x):
    if x == 1:
        return colors[-1].hex
    else:
        return colors[int(np.power(x, 0.7)*100)].hex

In [12]:
def get_opacity(x):
    return x**2

In [13]:
def get_weight(x):
    return 1 + 3*coef

In [14]:
max_traffic = max(traffic.values())

for journey in journeys:
    if traffic[journey] > 0:
        coef = traffic[journey]/max_traffic
        color = get_color(coef)
        opacity = get_opacity(coef)
        weight = get_weight(coef)

        cities = list(journey)
        departure = [airports.loc[cities[0],"latitude_deg"], airports.loc[cities[0],"longitude_deg"]]
        arrival = [airports.loc[cities[1],"latitude_deg"], airports.loc[cities[1],"longitude_deg"]]
        
        folium.PolyLine([departure, arrival], color=color, opacity=opacity, weight=weight).add_to(fmap)

#### Plotting markers for airports

Lots of airports are considered in the previous datasets and ploting a marker for each of them will make the result really dirty. Let's consider only the busiests airports in the world for markers.

In [15]:
busiests.code = busiests.code.apply(lambda s: s.split("/")[0])
busiests_list = np.unique(busiests[busiests.code.isin(airport_list)].code)

In [16]:
for i in busiests_list:
    airport = airports.loc[i]
    coordinates = [airport.latitude_deg, airport.longitude_deg]
    folium.Marker(coordinates, popup='<i>{}</i>'.format(airport["name"])).add_to(fmap)

#### Final result

And now let's display the final result !

In [17]:
fmap