# US Air Traffic Project

In this project, I explored the dataset of air traffic for Jan. 2008 in the US. The data consists of rows where each represents a separate flight.

## Visualization 1.
US Main Airport Hubs, Jan. 2008
<br>
Goal: be able to quickly identify the location of main hubs (airports) in the US and routes available within the particular hub.
<br>
Implementation: Map with points representation hubs and lines which represent routes (displayed when mouse hovered on the particular hub).
Other Possible Solutions: Firstly I used heatmap, but hubs do not have connections with all other hubs, so the plot has a lot of uncolored space.
<br>
Pros and Cons:
<br>
+Interactivity and compact style 
<br>
-unclear what dots represents without tooltip


## Visuzalization 2.
US Air Traffic Routes Map, Jan. 2008
<br>
Goal: be able to quickly identify the coverage of US territory with air traffic routes.
<br>
Implementation: Map covered with a net of lines which represent air routes.
<br>
Other Possible Solutions: Split territory of US into small squares 100kmx100km and create US map from squares and color them according to the number of flights which routes go through this square.
<br>
Pros and Cons:
<br>
+User can easily see coverage of territory with routes
<br>
-Bad for detailed routes check, can be used only for "high-level" overview


## Visuzalization 3.
US Air Traffic Delay, Jan. 2008
Goal: be able to quickly check delays in Air Traffic within the US: volume, sources of delays.
Implementation: Stacked Area Chart which represents the size of delay per day (in minutes) and breaks down by delay type. 
<br>
Other Possible Solutions: show this data with stacked bar chart. 
<br>
Pros and Cons:
<br>
+Interactive visualization, user can "zoom in/out" selected date range.
<br>
-May be hard at first to understand chart (better stacked bar).


## Visuzalization 4.
US Daily Air Traffic Volume, Jan. 2008
Goal: understand "peak" and "low" times of Air Traffic Activity.
Implementation: Heatmap where each grid represent day of the month and hour of days, colored according to number of flights.
<br>
Other Possible Solutions: bar chart with each bar colored accoring to flights density.
<br>
Pros and Cons:
<br>
+User can easily see "peak/low" air traffic times.
<br>
-a lot of same color (orange on the chart), due to same data.

In [17]:
import pandas as pd
import altair as alt
import geopandas as gpd
from vega_datasets import data

In [18]:
#Read Data
airports = pd.read_csv('airports.csv')
df = pd.read_csv('flights.csv')
states = alt.topo_feature(data.us_10m.url, feature='states')


In [19]:
#Data Preparation 
grouped = df.groupby(['Origin','Dest']).size().reset_index(name='count') 
grouped.rename(columns={'Origin':'origin', 'Dest':'destination'}, inplace=True)
grouped

final = grouped.groupby(['origin']).sum().reset_index() 
final.rename(columns={'count':'# of flights'}, inplace=True)


airport=airports
airport.rename(columns={'iata':'origin'}, inplace=True )
left_merged = pd.merge(grouped, airport,
                        how="left", on=['origin'])
left_merged = left_merged[['origin', 'destination', 'latitude', 'longitude', 'state', 'count']]
left_merged.rename(columns={'latitude':'latitude1', 'longitude':'longitude1', 'state':'state1'}, inplace=True )


airport.rename(columns={'origin':'destination'}, inplace=True )
merged = pd.merge(left_merged, airport,
                        how="left", on=['destination'])
merged = merged[['origin', 'latitude1', 'longitude1', 'state1', 'destination', 'latitude', 'longitude', 'state', 'count']]
merged.rename(columns={'latitude':'latitude2', 'longitude':'longitude2', 'state':'state2'}, inplace=True )

joined = pd.merge(merged, final, left_on = 'origin', right_on = 'origin', how = 'left')


In [20]:
#Altair Map Plot 
select_city = alt.selection_single(
    on="mouseover", nearest=True, fields=["origin"], empty="none"
)


background = alt.Chart(states, title='US Main Airport Hubs, Jan. 2008').mark_geoshape(
    fill="lightgray",
    stroke="white"
).properties(
    width=750,
    height=500
).project("albersUsa")

connections = alt.Chart(joined).mark_rule(opacity=0.4).encode(
    latitude="latitude1:Q",
    longitude="longitude1:Q",
    latitude2="latitude2:Q",
    longitude2="longitude2:Q"
).transform_filter(
    select_city
)

points = alt.Chart(joined).mark_circle().encode(
    latitude="latitude1:Q",
    longitude="longitude1:Q",
    size=alt.Size("# of flights:Q", scale=alt.Scale(range=[0, 1000]), legend=None),
    order=alt.Order("# of flights:Q", sort="descending"),
    tooltip=[alt.Tooltip("origin:N", title='Airport Code'), alt.Tooltip("# of flights:Q", title='Flights in Jan. 2008')]
).add_selection(
    select_city
)

(background + connections + points).configure_view(stroke=None)

In [21]:
#AltAir Map Plot
background = alt.Chart(states, title='US Air Traffic Routes Map, Jan. 2008').mark_geoshape(
    fill="lightgray",
    stroke="white"
).properties(
    width=750,
    height=500
).project("albersUsa")

connections = alt.Chart(joined).mark_rule(opacity=0.1).encode(
    latitude="latitude1:Q",
    longitude="longitude1:Q",
    latitude2="latitude2:Q",
    longitude2="longitude2:Q",
    color=alt.Color('count', scale=alt.Scale(scheme='plasma'), title='Flights Per Route'),
    tooltip=[alt.Tooltip('origin:N', title='From'), alt.Tooltip('destination:N', title='To  '), alt.Tooltip("count:Q", title='Flights in Jan. 2008')]
)

points = alt.Chart(joined).mark_circle().encode(
    latitude="latitude1:Q",
    longitude="longitude1:Q",
    size=alt.Size("# of flights:Q", scale=alt.Scale(range=[0, 1]), legend=None),
    order=alt.Order("# of flights:Q", sort="descending")
)

(background + connections + points).configure_view(stroke=None)

In [22]:
#Delay Overview
comp11=df.groupby(['DayofMonth']).sum().reset_index() 
comp22=df.groupby(['DayofMonth']).size().reset_index(name='Flights') 

d1 = pd.merge(comp11, comp22, left_on = ['DayofMonth'], right_on = ['DayofMonth'], how = 'left')
d1 = d1[['DayofMonth', 'Cancelled', 'Diverted', 'LateAircraftDelay', 'SecurityDelay', 'NASDelay', 'WeatherDelay', 'CarrierDelay', 'Flights']]




l1, l2, l3 = [], [], []
a1, a2, a3 = [], [], []
for i in range(31):
    l1.append([d1['DayofMonth'][i], d1['DayofMonth'][i], d1['DayofMonth'][i], d1['DayofMonth'][i], d1['DayofMonth'][i]])
    l2.append([d1['LateAircraftDelay'][i], d1['SecurityDelay'][i], d1['NASDelay'][i], d1['WeatherDelay'][i], d1['CarrierDelay'][i]])
    l3.append(['LateAircraftDelay', 'SecurityDelay', 'NASDelay', 'WeatherDelay', 'CarrierDelay'])

for i, j, z in zip(l1, l2, l3):
    for q, w, e in zip(i, j, z):
        a1.append(q)
        a2.append(w)
        a3.append(e)

f = pd.DataFrame(list(zip(a1, a2, a3)), columns =['Day', 'Delay in Min', 'Delay Type'])



In [23]:
#Dealy Overview
year_brush = alt.selection_interval(encodings=['x'])

area = alt.Chart(f, title='US Air Traffic Delays in January 2008').mark_area().encode(
    x = alt.X('Day:N', title='Day of Month'),
    y = alt.Y('Delay in Min:Q', aggregate = 'sum', title='Minutes of Delay'),
    color = alt.Color('Delay Type:N', title='Type of Delay')
)

area.properties(width = 800, height = 400).transform_filter(year_brush) & area.properties(width = 800, height = 100).add_selection(year_brush)







In [24]:
#Flight Times Popularity 
df['Hour of the day']=[str(df['CRSDepTime'][i])[:-2] for i in range(100000)]
dt = df.groupby(['DayofMonth', 'Hour of the day']).size().reset_index(name='# of Flights') 

In [25]:
alt.Chart(
    dt,
    title="US Daily Air Traffic Volume, Jan. 2008"
).mark_rect().encode(
    x=alt.X('DayofMonth:N', title='Day of Month'),
    y='Hour of the day:Q',
    color=alt.Color('# of Flights:Q', scale=alt.Scale(scheme="inferno"), title='Number of Flights'),
    order=alt.Order("Hour of the day:N", sort="descending"),
    tooltip=[
        alt.Tooltip('# of Flights:Q', title='Number of Flights')
    ]
).properties(width=550)