# American Airlines Delays (1987-2020) Analysis

The dataset used in this project is a sample of [The Reporting Carrier On-Time Performance Dataset](https://developer.ibm.com/exchanges/data/all/airline/) which contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay.

## Importing The Required Libraries

In [None]:
# Import required libraries
import pandas as pd
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output, State
from jupyter_dash import JupyterDash
import plotly.graph_objects as go
import plotly.express as px
from dash import no_update

## Importing the Data

In [26]:
#from dash import no_update
df =  pd.read_csv('../data/airline_data.csv', 
                            encoding = "ISO-8859-1",
                            dtype={'Div1Airport': str, 'Div1TailNum': str, 
                                   'Div2Airport': str, 'Div2TailNum': str})

In [27]:
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
df.drop('Unnamed: 0', axis=1,inplace=True)

## Importing Glossory Table

In [28]:
import requests
import lxml
from bs4 import BeautifulSoup

In [29]:
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html'

page = requests.get(url)

soup = BeautifulSoup(page.text,'lxml')

In [30]:
tables = soup.find_all('table',{'class':'bx--data-table'})

There are 3 tables in that webpage, and the target table for my purpose is the third

In [31]:
headers = ['Feature','Description']

In [32]:
glossary = pd.DataFrame(columns=headers)

In [33]:
for row in tables[2].find_all('tr')[1:]:
    table_data = row.find_all('td')
    row_data =[data_point.text.strip() for data_point in table_data]
    length = len(glossary)
    glossary.loc[length] = row_data

In [34]:
glossary = glossary.set_index('Feature')
glossary

Unnamed: 0_level_0,Description
Feature,Unnamed: 1_level_1
Year,Year
Quarter,Quarter
Month,Month
DayofMonth,Day of Month
DayOfWeek,Day of Week (numeric)
FlightDate,Date of Flight
Reporting_Airline,Airline Unique Carrier Code
DOT_ID_Reporting_Airline,Number assigned by US DOT to identify a unique...
IATA_CODE_Reporting_Airline,Airline Code assigned by IATA
Tail_Number,Aircraft tail number


In [36]:
required_cols = ['Year','Quarter','Month','DayofMonth','DayOfWeek','Reporting_Airline','Flight_Number_Reporting_Airline',
 'OriginCityName','OriginState','OriginStateName','DestCityName','DestState','DestStateName','DepDelay',
 'DepDelayMinutes','DepDel15','DepartureDelayGroups','ArrDelay','ArrDelayMinutes','ArrDel15',
 'ArrivalDelayGroups','Cancelled','CancellationCode','Diverted','ActualElapsedTime','AirTime','Flights','Distance',
 'DistanceGroup']

airline_data = df[required_cols]
airline_data

Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,Reporting_Airline,Flight_Number_Reporting_Airline,OriginCityName,OriginState,OriginStateName,DestCityName,DestState,DestStateName,DepDelay,DepDelayMinutes,DepDel15,DepartureDelayGroups,ArrDelay,ArrDelayMinutes,ArrDel15,ArrivalDelayGroups,Cancelled,CancellationCode,Diverted,ActualElapsedTime,AirTime,Flights,Distance,DistanceGroup
0,1998,2,4,2,4,AS,584,"Spokane, WA",WA,Washington,"Seattle, WA",WA,Washington,0.0,0.0,0.0,0.0,-6.0,0.0,0.0,-1.0,0.0,,0.0,50.0,37.0,1.0,224.0,1
1,2013,2,5,13,1,EV,4132,"Newark, NJ",NJ,New Jersey,"Richmond, VA",VA,Virginia,-6.0,0.0,0.0,-1.0,-12.0,0.0,0.0,-1.0,0.0,,0.0,76.0,54.0,1.0,277.0,2
2,1993,3,9,25,6,UA,2206,"Peoria, IL",IL,Illinois,"Chicago, IL",IL,Illinois,33.0,33.0,1.0,2.0,45.0,45.0,1.0,3.0,0.0,,0.0,52.0,,1.0,130.0,1
3,1994,4,11,12,6,HP,1207,"Los Angeles, CA",CA,California,"Phoenix, AZ",AZ,Arizona,24.0,24.0,1.0,1.0,41.0,41.0,1.0,2.0,0.0,,0.0,89.0,,1.0,370.0,2
4,2017,3,8,17,4,UA,576,"Cedar Rapids/Iowa City, IA",IA,Iowa,"Denver, CO",CO,Colorado,-9.0,0.0,0.0,-1.0,-18.0,0.0,0.0,-2.0,0.0,,0.0,118.0,102.0,1.0,692.0,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26995,2017,1,1,24,2,DL,921,"Milwaukee, WI",WI,Wisconsin,"Minneapolis, MN",MN,Minnesota,-7.0,0.0,0.0,-1.0,-14.0,0.0,0.0,-1.0,0.0,,0.0,71.0,53.0,1.0,297.0,2
26996,2013,2,6,27,4,B6,1115,"Boston, MA",MA,Massachusetts,"Dallas/Fort Worth, TX",TX,Texas,0.0,0.0,0.0,0.0,-11.0,0.0,0.0,-1.0,0.0,,0.0,221.0,202.0,1.0,1562.0,7
26997,2016,3,8,26,5,AA,2183,"Austin, TX",TX,Texas,"New York, NY",NY,New York,-6.0,0.0,0.0,-1.0,-2.0,0.0,0.0,-1.0,0.0,,0.0,229.0,209.0,1.0,1521.0,7
26998,2009,3,8,8,6,YV,7194,"Colorado Springs, CO",CO,Colorado,"Denver, CO",CO,Colorado,51.0,51.0,1.0,3.0,56.0,56.0,1.0,3.0,0.0,,0.0,43.0,19.0,1.0,73.0,1


In [38]:
airline_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27000 entries, 0 to 26999
Data columns (total 29 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Year                             27000 non-null  int64  
 1   Quarter                          27000 non-null  int64  
 2   Month                            27000 non-null  int64  
 3   DayofMonth                       27000 non-null  int64  
 4   DayOfWeek                        27000 non-null  int64  
 5   Reporting_Airline                27000 non-null  object 
 6   Flight_Number_Reporting_Airline  27000 non-null  int64  
 7   OriginCityName                   27000 non-null  object 
 8   OriginState                      26994 non-null  object 
 9   OriginStateName                  26994 non-null  object 
 10  DestCityName                     27000 non-null  object 
 11  DestState                        26993 non-null  object 
 12  DestStateName     

## Making the Dasboard

In [44]:

# Create a dash application
app = dash.Dash(__name__)


# REVIEW1: Clear the layout and do not display exception till callback gets executed
app.config.suppress_callback_exceptions = True

# Read the airline data into pandas dataframe
# airline_data =  pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv', 
#                             encoding = "ISO-8859-1",
#                             dtype={'Div1Airport': str, 'Div1TailNum': str, 
#                                    'Div2Airport': str, 'Div2TailNum': str})


# List of years 
year_list = [i for i in range(2005, 2021, 1)]

"""Compute graph data for creating yearly airline performance report 

Function that takes airline data as input and create 5 dataframes based on the grouping condition to be used for plottling charts and grphs.

Argument:
     
    df: Filtered dataframe
    
Returns:
   Dataframes to create graph. 
"""
def compute_data_choice_1(df):
    # Cancellation Category Count
    bar_data = df.groupby(['Month','CancellationCode'])['Flights'].sum().reset_index()
    # Average flight time by reporting airline
    line_data = df.groupby(['Month','Reporting_Airline'])['AirTime'].mean().reset_index()
    # Diverted Airport Landings
    div_data = df[df['DivAirportLandings'] != 0.0]
    
    # Source state count
    map_data = df.groupby(['OriginState'])['Flights'].sum().reset_index()
    # Destination state count
    tree_data = df.groupby(['DestState', 'Reporting_Airline'])['Flights'].sum().reset_index()
    
    return bar_data, line_data, div_data, map_data, tree_data


"""Compute graph data for creating yearly airline delay report

This function takes in airline data and selected year as an input and performs computation for creating charts and plots.

Arguments:
    df: Input airline data.
    
Returns:
    Computed average dataframes for carrier delay, weather delay, NAS delay, security delay, and late aircraft delay.
"""
def compute_data_choice_2(df):
    # Compute delay averages
    avg_car = df.groupby(['Month','Reporting_Airline'])['CarrierDelay'].mean().reset_index()
    avg_weather = df.groupby(['Month','Reporting_Airline'])['WeatherDelay'].mean().reset_index()
    avg_NAS = df.groupby(['Month','Reporting_Airline'])['NASDelay'].mean().reset_index()
    avg_sec = df.groupby(['Month','Reporting_Airline'])['SecurityDelay'].mean().reset_index()
    avg_late = df.groupby(['Month','Reporting_Airline'])['LateAircraftDelay'].mean().reset_index()
    
    return avg_car, avg_weather, avg_NAS, avg_sec, avg_late


# Application layout
app.layout = html.Div(children=[ 
                                # TODO1: Add title to the dashboard
                               html.H1('US Domestic Airline Flights Performance',
                                       style={'textAlign': 'left', 'color': '#503D36','font-size': 24}
                                      ),
    
                                # REVIEW2: Dropdown creation
                                # Create an outer division 
                                html.Div([
                                    # Add an division
                                    html.Div([
                                        # Create an division for adding dropdown helper text for report type
                                        html.Div(
                                            [
                                            html.H2('Report Type:', style={'margin-right': '2em'}),
                                            ]
                                        ),
                                        # TODO2: Add a dropdown
                                        dcc.Dropdown(id = 'input-type',
                                                     options = [{'label':'Yearly Airline Performance Report', 'value':'OPT1'},
                                                                 {'label':'Yearly Airline Delay Report','value':'OPT2'}
                                                                ],
                                                     placeholder = 'Select a report type',
                                                     style =  {'width':'80%', 'padding':'3px', 'font-size':'20px'}
                                                    )
                                    # Place them next to each other using the division style
                                    ], style={'display':'flex'}),
                                    
                                   # Add next division 
                                   html.Div([
                                       # Create an division for adding dropdown helper text for choosing year
                                        html.Div(
                                            [
                                            html.H2('Choose Year:', style={'margin-right': '2em'})
                                            ]
                                        ),
                                        dcc.Dropdown(id='input-year', 
                                                     # Update dropdown values using list comphrehension
                                                     options=[{'label': i, 'value': i} for i in year_list],
                                                     placeholder="Select a year",
                                                     style={'width':'80%', 'padding':'3px', 'font-size': '20px', 'text-align-last' : 'center'}),
                                            # Place them next to each other using the division style
                                            ], style={'display': 'flex'}),  
                                          ]),
                                
                                # Add Computed graphs
                                # REVIEW3: Observe how we add an empty division and providing an id that will be updated during callback
                                html.Div([ ], id='plot1'),
    
                                html.Div([
                                        html.Div([ ], id='plot2'),
                                        html.Div([ ], id='plot3')
                                ], style={'display': 'flex'}),
                                
                                # TODO3: Add a division with two empty divisions inside. See above disvision for example.
                                 html.Div([
                                        html.Div([ ], id='plot4'),
                                        html.Div([ ], id='plot5')
                                ], style={'display': 'flex'})
    
                                ])

# Callback function definition
# TODO4: Add 5 ouput components
@app.callback([Output(component_id = 'plot1', component_property = 'children'),
               Output(component_id = 'plot2', component_property = 'children'),
               Output(component_id = 'plot3', component_property = 'children'),
               Output(component_id = 'plot4', component_property = 'children'),
               Output(component_id = 'plot5', component_property = 'children')],
               # Input component list
               [Input(component_id='input-type', component_property='value'),
                Input(component_id='input-year', component_property='value')],
               # REVIEW4: Holding output state till user enters all the form information. In this case, it will be chart type and year
               [State("plot1", 'children'), State("plot2", "children"),
                State("plot3", "children"), State("plot4", "children"),
                State("plot5", "children")
               ])
# Add computation to callback function and return graph
def get_graph(chart, year, children1, children2, c3, c4, c5):
      
        # Select data
        df =  airline_data[airline_data['Year']==int(year)]
       
        if chart == 'OPT1':
            # Compute required information for creating graph from the data
            bar_data, line_data, div_data, map_data, tree_data = compute_data_choice_1(df)
            
            # Number of flights under different cancellation categories
            bar_fig = px.bar(bar_data, x='Month', y='Flights', color='CancellationCode', title='Monthly Flight Cancellation')
            
            # TODO5: Average flight time by reporting airline
            line_fig = px.line(line_data, x = 'Month', y = 'AirTime', color = 'Reporting_Airline',title = 'Average Flight Time by Reorting Airline')
            
            # Percentage of diverted airport landings per reporting airline
            pie_fig = px.pie(div_data, values='Flights', names='Reporting_Airline', title='% of flights by reporting airline')
            
            # REVIEW5: Number of flights flying from each state using choropleth
            map_fig = px.choropleth(map_data,  # Input data
                    locations='OriginState', 
                    color='Flights',  
                    hover_data=['OriginState', 'Flights'], 
                    locationmode = 'USA-states', # Set to plot as US States
                    color_continuous_scale='GnBu',
                    range_color=[0, map_data['Flights'].max()]) 
            map_fig.update_layout(
                    title_text = 'Number of flights from origin state', 
                    geo_scope='usa') # Plot only the USA instead of globe
            
            # TODO6: Number of flights flying to each state from each reporting airline
            tree_fig = px.treemap(tree_data, 
                                  path = ['DestState','Reporting_Airline'], 
                                  values = 'Flights',
                                  color = 'Flights',
                                  color_continuous_scale = 'RdBu',
                                  title = 'Flight count by airline to destination state')
            
            
            # REVIEW6: Return dcc.Graph component to the empty division
            return [dcc.Graph(figure=tree_fig), 
                    dcc.Graph(figure=pie_fig),
                    dcc.Graph(figure=map_fig),
                    dcc.Graph(figure=bar_fig),
                    dcc.Graph(figure=line_fig)
                   ]
        else:
            # REVIEW7: This covers chart type 2 and we have completed this exercise under Flight Delay Time Statistics Dashboard section
            # Compute required information for creating graph from the data
            avg_car, avg_weather, avg_NAS, avg_sec, avg_late = compute_data_choice_2(df)
            
            # Create graph
            carrier_fig = px.line(avg_car, x='Month', y='CarrierDelay', color='Reporting_Airline', title='Average carrrier delay time (minutes) by airline')
            weather_fig = px.line(avg_weather, x='Month', y='WeatherDelay', color='Reporting_Airline', title='Average weather delay time (minutes) by airline')
            nas_fig = px.line(avg_NAS, x='Month', y='NASDelay', color='Reporting_Airline', title='Average NAS delay time (minutes) by airline')
            sec_fig = px.line(avg_sec, x='Month', y='SecurityDelay', color='Reporting_Airline', title='Average security delay time (minutes) by airline')
            late_fig = px.line(avg_late, x='Month', y='LateAircraftDelay', color='Reporting_Airline', title='Average late aircraft delay time (minutes) by airline')
            
            return[dcc.Graph(figure=carrier_fig), 
                   dcc.Graph(figure=weather_fig), 
                   dcc.Graph(figure=nas_fig), 
                   dcc.Graph(figure=sec_fig), 
                   dcc.Graph(figure=late_fig)]


# Run the app
if __name__ == '__main__':
    # REVIEW8: Adding dev_tools_ui=False, dev_tools_props_check=False can prevent error appearing before calling callback function
    app.run_server()

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app '__main__' (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)


## Close the Server (Only if you want to close the server)

In [None]:
app.terminate_server_for_port("localhost",8050)