In [1]:
import pandas as pd
import numpy as np

In [2]:
flight_delay_by_cause_file = './assets/!Flight Delays by Cause National June 2004 to November 2014.csv'

### Delay cause definitions:

* Air Carrier: The cause of the cancellation or delay was due to circumstances within the airline's control (e.g. maintenance or crew problems, aircraft cleaning, baggage loading, fueling, etc.).

* Extreme Weather: Significant meteorological conditions (actual or forecasted) that, in the judgment of the carrier, delays or prevents the operation of a flight such as tornado, blizzard or hurricane.

* National Aviation System (NAS): Delays and cancellations attributable to the national aviation system that refer to a broad set of conditions, such as non-extreme weather conditions, airport operations, heavy traffic volume, and air traffic control.

* Late-arriving aircraft: A previous flight with same aircraft arrived late, causing the present flight to depart late.

* Security: Delays or cancellations caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas.

A flight is considered delayed when it arrived 15 or more minutes than the schedule (see definitions in Frequently Asked Questions). Delayed minutes are calculated for delayed flights only. Data presented summarizes arriving flights only. 
When multiple causes are assigned to one delayed flight, each cause is prorated based on delayed minutes it is responsible for. The displayed numbers are rounded and may not add up to the total.

https://www.rita.dot.gov/bts/help/aviation/html/understanding.html
https://www.rita.dot.gov/bts/help_with_data/aviation/index.html#q7
https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp?pn=1

### Terminology:

arr_flights: count of operations (flights) - **count**

arr_del15: aggergate **count** of delayed flights from sum of:
* carrier_ct
* weather_ct
* nas_ct
* security_ct
* late_aircraft_ct

arr_delay: total **minutes** of delayed flights from sum of:
* arr_delay
* carrier_delay
* weather_delay
* nas_delay
* security_delay
* late_aircraft_delay

In [3]:
Flight_delay_by_cause_df = pd.read_csv(flight_delay_by_cause_file)
Flight_delay_by_cause_df.columns

Index([u'year', u' month', u'carrier', u'carrier_name', u'airport',
       u'airport_name', u'arr_flights', u'arr_del15', u'carrier_ct',
       u' weather_ct', u'nas_ct', u'security_ct', u'late_aircraft_ct',
       u'arr_cancelled', u'arr_diverted', u' arr_delay', u' carrier_delay',
       u'weather_delay', u'nas_delay', u'security_delay',
       u'late_aircraft_delay'],
      dtype='object')

In [5]:
Flight_delay_by_cause_df.columns = [u'year', u' month', u'carrier', u'carrier_name', u'airport',
       u'airport_name', u'arr_flights_ct', u'arr_del15_ct', u'carrier_ct',
       u' weather_ct', u'nas_ct', u'security_ct', u'late_aircraft_ct',
       u'arr_cancelled', u'arr_diverted', u' arr_delay_mins', u' carrier_delay_mins',
       u'weather_delay_mins', u'nas_delay_mins', u'security_delay_mins',
       u'late_aircraft_delay_mins']

In [None]:
Flight_delay_by_cause_df.columns

In [9]:
flight_delay_by_cause_df = Flight_delay_by_cause_df.groupby(['airport','year'])[u'arr_flights_ct', u'arr_del15_ct',u'carrier_ct',
       u' weather_ct', u'nas_ct', u'security_ct', u'late_aircraft_ct', u' arr_delay_mins', u' carrier_delay_mins', 
       u'weather_delay_mins', u'nas_delay_mins', u'security_delay_mins', u'late_aircraft_delay_mins'].sum()

In [11]:
flight_delay_by_cause_df.arr_flights_ct

airport  year
ABE      2004     6886.0
         2005     4640.0
         2006     5129.0
         2007     5700.0
         2008     4795.0
         2009     4080.0
         2010     4104.0
         2011     3853.0
         2012     2842.0
         2013     3098.0
         2014     1641.0
ABI      2004     3087.0
         2005     2745.0
         2006     2847.0
         2007     2838.0
         2008     2661.0
         2009     2497.0
         2010     2497.0
         2011     2479.0
         2012     2440.0
         2013     2753.0
         2014     2833.0
ABQ      2004    37037.0
         2005    36950.0
         2006    37263.0
         2007    41163.0
         2008    41144.0
         2009    35660.0
         2010    33298.0
         2011    33879.0
                  ...   
XNA      2009    13793.0
         2010    13836.0
         2011    12170.0
         2012    12254.0
         2013    12840.0
         2014    10484.0
YAK      2004      728.0
         2005      726.0
         20

In [7]:
flight_delay_by_cause_df[flight_delay_by_cause_df['airport'] == 'ATL']

KeyError: 'airport'