# <center>~ Pittsburgh Flight Data Explanatory Analysis ~</center>
## <center>An exploration into Delay Flights in Pittsburgh, PA in 2007</center>
<p><em>Created by Miles Murphy</em></p>
<em>November 11, 2020</em>

## Introduction: <a name="introduction"></a>

The following analysis will build upon the Exploratory analysis conducted on US Flight Data. The previous analysis focused on understand the basic nature of the flight data and then focusing in on flight data in Pittsburgh, PA in the years 1988 and 2007. This years are arbitrarily chosen from a data perspective, but are very important to the analyst's life as they are the year of his birth in Pittsburgh and the year he left Pittsburgh as an 18 year old "adult". 

Now, after being created and cleaned in the exploratory analysis, this delays subdataset features just flights in Pittsburgh in 2007 which experience delays from one (or more) of 5 delay types; Carrier, Weather, NAS (National Air System), Security, and Late Aircraft delays. The following exploratory analysis will look at some basic features of this dataframe with univariate analysis and then expand upon those features with bivariate or multivariate figures. 

# Table of Contents:
1. [Introduction](#introduction)
2. [Project Background](#project_background)
    1. [Data Source Information](#data_source_information)
    2. [Dataframe Basic Information](#dataframe_information)
3. [Explanatory Analysis](#explanatory_analysis)
    1. [Univariate Exploration](#univariate_exploration)
        a. [Most Popular Month to Fly](#popular_month)
        b. [Most Popular Day of the Week to Fly](#popular_day)
    2. [Bivariate Exploration](#bivariate_exploration)
    3. [Multivariate Exploration](#multivariate_exploration)
5. [Project Conclusions and Results](#project_conclusions)


## Project Background: <a name="project_background"></a>

This explanatory data analysis is part of the larger Udacity Data Analyst Nanodegree. The final project for the Data Visualization process involves the exploratory and explanatory analysis of a dataset. US Flight data was one of the provided options.

### Data Source Information: <a name="data_source_information"></a>

The US flight data utilized in both the exploratory analysis and this explanatory analysis of delayed flights in Pittsburgh all come from the original datasets colelcted by the United States Department of Transportation, Bureau of Transportation Statistics. The Bureau has been collecting data about the 'on-time' performance of all flights from 1987 to the present. The exploratory analysis originally explored flight data from both 1988 and 2007, but for the explanatory analysis that original dataset has been reduced to just flights from 2007 which departed from or originated in Pittsburgh, PA and experienced delays. 

In [None]:
#Import packages and set plots to be embedded
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

#Credit to https://stackoverflow.com/questions/40105796/turn-warning-off-in-a-cell-jupyter-notebook
#This line of code was utitlized as a known bug for matplotlib was printing with a figure on slides and ruining the presentation
#import warnings
#warnings.filterwarnings('ignore')
#warnings.simplefilter('ignore')

%matplotlib inline

In [None]:
#Create df
pit_delays = pd.read_csv("pittsburgh_07_delays.csv")

### Dataframe Basic Information <a name="dataframe_information"></a>

In [None]:
pit_delays.head()

In [None]:
pit_delays.info()

In [None]:
pit_delays.describe()

## Explanatory Analysis <a name="explanatory_analysis"></a>

### Univariate Analysis <a name="univariate_analysis"></a>

The first few figures will create some basic visualizations to show simple information about the dataset prior to the more detailed/insightful bivariate and multivariate analysis.

#### a. Most Popular Month to Fly <a name="popular_month"></a>

In [None]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
base_color = sns.color_palette()[0]

fig, ax = plt.subplots(figsize=(12, 6))
fig.suptitle('The Most Popular Month to Fly')

pit_month = pit_delays['Month'].value_counts().sort_index()

#Monthly flight data
sns.barplot(x=pit_month.index, y=pit_month, color=base_color)
ax.set_xticklabels(labels=months)
ax.set_ylabel('Flight Count', fontsize=12)
ax.set_xlabel('Months', fontsize=12)
#ax.set_ylim(7000, 9100);

#### b. Most Popular Day of the Week <a name="popular_day"></a>

In [None]:
days_of_week = ['Mon', 'Tues', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
base_color = sns.color_palette()[0]

fig, ax = plt.subplots(figsize=(12, 6))
fig.suptitle('The Most Popular Day of the Week to Fly')

pit_day = pit_delays['DayOfWeek'].value_counts().sort_index()

#Plot the 2007 day of week flight data
sns.barplot(x=pit_day.index, y=pit_day, color=base_color)
ax.set_xticklabels(labels=days_of_week)
ax.set_ylabel('Flight Count', fontsize=12)
ax.set_xlabel('Day of the Week', fontsize=12)
#ax.set_ylim(11500, 14900