# Data Visualization with Plotly

**Resource**: IBM Data Visualization Course - Coursera



In [1]:
import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go

# Plotly Exercise on a Airline DataFrame

The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

In [6]:
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv'


In [8]:
import pandas as pd

# URL to the CSV file
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv'

# Read the CSV file directly using pandas' read_csv method
airline_data = pd.read_csv(URL, encoding='ISO-8859-1', dtype={'Div1Airport': str, 'Div1TailNum': str, 'Div2Airport': str, 'Div2TailNum': str})

print('Data downloaded and read into a dataframe!')


Data downloaded and read into a dataframe!


In [11]:
airline_data.head()


Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,


In [12]:
print(airline_data.shape)

(27000, 110)


In [13]:
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.
data = airline_data.sample(n=500, random_state=42)

In [15]:
data.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
5312,985989,2006,1,3,29,3,2006-03-29,OO,20304,OO,...,,,,,,,,,,
18357,1782939,1993,3,8,3,2,1993-08-03,DL,19790,DL,...,,,,,,,,,,
6428,84140,1989,3,7,3,1,1989-07-03,HP,19991,HP,...,,,,,,,,,,
15414,1839736,2008,4,10,10,5,2008-10-10,UA,19977,UA,...,,,,,,,,,,
10610,1622640,2010,1,2,19,5,2010-02-19,FL,20437,FL,...,,,,,,,,,,


**Task**

It would be interesting if we visually capture details such as:

1. Departure time changes with respect to airport distance.

2. Average Flight Delay time over the months

3. Comparing number of flights in each destination state

4. Number of flights per reporting airline

5. Distrubution of arrival delay

6. Proportion of distance group by month (month indicated by numbers)

7. Hierarchical view in othe order of month and destination state holding value of number of flights

## plotly.graph_objects

### 1. Scatter Plot

Let us use a scatter plot to represent departure time changes with respect to airport distance

This plot should contain the following

* Title as **Distance vs Departure Time**.
* x-axis label should be **Distance**
* y-axis label should be **DeptTime**
* **Distance** column data from the flight delay dataset should be considered in x-axis
* **DepTime** column data from the flight delay dataset should be considered in y-axis
* Scatter plot markers should be of red color


In [20]:
# scatter plot

fig = go.Figure()

In [23]:
fig.add_trace(go.Scatter(x=data['Distance'], y=data['DepTime'],mode='markers', marker=dict(color='red')))

fig.update_layout(title="Distance vs Departure Time", xaxis_title="Departure",yaxis_title="Deperture Time")

fig.show()

### 2. Line Plot

Let us now use a line plot to extract average monthly arrival delay time and see how it changes over the year.

  This plot should contain the following

* Title as **Month vs Average Flight Delay Time**.
* x-axis label should be **Month**
* y-axis label should be **ArrDelay**
* A new dataframe **line_data** should be created which consists of 2 columns average **arrival delay time per month** and **month** from the dataset
* **Month** column data from the line_data dataframe should be considered in x-axis
* **ArrDelay** column data from the ine_data dataframeshould be considered in y-axis
* Plotted line in the line plot should be of green color


In [26]:
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()

In [27]:
line_data

Unnamed: 0,Month,ArrDelay
0,1,2.232558
1,2,2.6875
2,3,10.868421
3,4,6.229167
4,5,-0.27907
5,6,17.310345
6,7,5.088889
7,8,3.121951
8,9,9.081081
9,10,1.2


In [30]:
fig2 = go.Figure()

fig2.add_trace(go.Scatter(x=line_data['Month'],y=line_data['ArrDelay'],mode='lines',marker=dict(color='green')))

fig2.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month', yaxis_title='ArrDelay')

fig2.show()

## plotly.express

### 3. Bar Chart


Let us use a bar chart to extract number of flights from a specific airline that goes to a destination

This plot should contain the following

* Title as **Total number of flights to the destination state split by reporting air**.
* x-axis label should be **DestState**
* y-axis label should be **Flights**
* Create a new dataframe called **bar_data**  which contains 2 columns **DestState** and **Flights**.Here **flights** indicate total number of flights in each combination.


In [31]:
fig3 = go.Figure()

In [None]:
bar_data =data.groupby('DestState')['Flights'].sum().reset_index()

In [37]:
bar_data.head()

Unnamed: 0,DestState,Flights
0,AK,4.0
1,AL,3.0
2,AZ,8.0
3,CA,68.0
4,CO,20.0


In [38]:
fig33 = px.bar(bar_data,x='DestState',y='Flights',title='Total number of flights to the destination state split by reporting airline')
fig33.show()

### 4. Histogram

### 5. Bubble Chart

### 6. Pie Chart

### 7. SunBurst Charts