<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/labs/Module%204/logo.png" width="300" alt="cognitiveclass.ai logo">
</center>


# Basic Plotly Charts

Estimated time needed: 45 minutes


# Objectives


After completing this lab, you will be able to:

*   Use plotly graph objects and plotly express libraries to plot different types of charts
*   Create interesting visualizations on Airline Reporting Carrier On-Time Performance Dataset 


# Plotly graph objects and Plotly express libraries to plot different types of charts 

## Plotly Libraries

**plotly.graph_objects:** 
This is a low level interface to figures, traces and layout. The Plotly graph objects module provides an automatically generated hierarchy of classes ( figures, traces, and layout) called graph objects. These graph objects represent figures with a top-level class plotly.graph_objects.Figure.

**plotly.express:** 
Plotly express is a high-level wrapper for Plotly. It is a recommended starting point for creating the most common figures provided by Plotly using a simpler syntax. It uses graph objects internally.
Now let us use these libraries to plot some charts
We will start with plotly_graph_objects to plot line and scatter plots
> Note: You can hover the mouse over the charts whenever you want to view any statistics in the visualization charts 




## Exercise I: Get Started with Different Chart types in Plotly


In [100]:
# Import required libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go


## 1. Scatter Plot: 
A scatter plot shows the relationship between 2 variables on the x and y-axis. The data points here appear scattered when plotted on a two-dimensional plane. Using scatter plots, we can create exciting visualizations to express various relationships, such as:

* Height vs weight of persons
* Engine size vs automobile price
* Exercise time vs Body Fat


## II- Practice Exercises: Apply your Plotly Skills to an Airline Dataset

The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

Preview data, dataset metadata, and data glossary [here.](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html)


# Read Data


In [40]:
# Read the airline data into pandas dataframe
# from js import fetch
# import io

URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv'
# resp = await fetch(URL)
# text = io.BytesIO((await resp.arrayBuffer()).to_py())

airline_data =  pd.read_csv(URL)
#                            text ,encoding = "ISO-8859-1",
#                             dtype={'Div1Airport': str, 'Div1TailNum': str, 
#                                    'Div2Airport': str, 'Div2TailNum': str})

print('Data downloaded and read into a dataframe!')

Data downloaded and read into a dataframe!


In [41]:
# Preview the first 5 lines of the loaded data 
airline_data.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,


In [42]:
# Shape of the data
airline_data.shape

(27000, 110)

In [78]:
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.
data = airline_data.sample(n=500, random_state=42)
data

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
5312,985989,2006,1,3,29,3,2006-03-29,OO,20304,OO,...,,,,,,,,,,
18357,1782939,1993,3,8,3,2,1993-08-03,DL,19790,DL,...,,,,,,,,,,
6428,84140,1989,3,7,3,1,1989-07-03,HP,19991,HP,...,,,,,,,,,,
15414,1839736,2008,4,10,10,5,2008-10-10,UA,19977,UA,...,,,,,,,,,,
10610,1622640,2010,1,2,19,5,2010-02-19,FL,20437,FL,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18946,61420,2005,3,7,6,3,2005-07-06,WN,19393,WN,...,,,,,,,,,,
16291,458237,2019,2,6,1,6,2019-06-01,UA,19977,UA,...,,,,,,,,,,
21818,557936,1999,1,3,4,4,1999-03-04,HP,19991,HP,...,,,,,,,,,,
24116,1268298,2017,2,4,14,5,2017-04-14,DL,19790,DL,...,,,,,,,,,,


In [79]:
# Get the shape of the trimmed data
data.shape

(500, 110)

In [80]:
data.columns

Index(['Unnamed: 0', 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek',
       'FlightDate', 'Reporting_Airline', 'DOT_ID_Reporting_Airline',
       'IATA_CODE_Reporting_Airline',
       ...
       'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5AirportID',
       'Div5AirportSeqID', 'Div5WheelsOn', 'Div5TotalGTime',
       'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'],
      dtype='object', length=110)

In [81]:
data['Distance']

5312      109.0
18357     732.0
6428      117.0
15414    1846.0
10610     432.0
          ...  
18946     254.0
16291    1514.0
21818    1044.0
24116     366.0
16705    1182.0
Name: Distance, Length: 500, dtype: float64

In [48]:
data['DepTime']

5312      742.0
18357    1900.0
6428     2120.0
15414    1625.0
10610    1355.0
          ...  
18946    1225.0
16291    2001.0
21818    1815.0
24116    1728.0
16705    1208.0
Name: DepTime, Length: 500, dtype: float64

It would be interesting if we visually  capture details such as

* Departure time changes with respect to airport distance.

* Average Flight Delay time over the months

* Comparing number of flights in each destination state

* Number of  flights per reporting airline

* Distrubution of arrival delay

* Proportion of distance group by month (month indicated by numbers)

* Hierarchical view in othe order of month and destination state holding value of number of flights


# plotly.graph_objects¶


## 1. Scatter Plot


Let us use a scatter plot to represent departure time changes with respect to airport distance

This plot should contain the following

* Title as **Distance vs Departure Time**.
* x-axis label should be **Distance**
* y-axis label should be **DeptTime**
* **Distance** column data from the flight delay dataset should be considered in x-axis
* **DepTime** column data from the flight delay dataset should be considered in y-axis
* Scatter plot markers should be of red color


In [120]:
Distance = data['Distance']
DepTime = data['DepTime']

# Create a figure object
fig = go.Figure()

# Add a scatter trace with mode set to 'markers' to show only dots
fig.add_trace(go.Scatter(x=Distance,
                         y=DepTime,
                         mode='markers',
                        marker=dict(color='red')))

fig.update_layout(title='Distance vs Departure Time.',
                  xaxis_title='Distance', 
                  yaxis_title='DeptTime')

# Show the figure
fig.show()


In [111]:
## Correct answer is

    
##First we will create an empty figure ising go.Figure()
fig=go.Figure()

#Next we will create a scatter plot by using the add_trace function and use the go.scatter() function within it
# In go.Scatter we define the x-axis data,y-axis data and define the mode as markers with color of the marker as red
fig.add_trace(go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='red')))
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')
# Display the figure
fig.show()


Double-click **here** for hint.
<!-- 
***Use go.scatter() method*** and mode as markers
-->


Double-click **here** for the solution.

<!-- The answer is below:

    
##First we will create an empty figure ising go.Figure()
fig=go.Figure()
#Next we will create a scatter plot by using the add_trace function and use the go.scatter() function within it
# In go.Scatter we define the x-axis data,y-axis data and define the mode as markers with color of the marker as red
fig.add_trace(go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='red')))
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')
# Display the figure
fig.show()
-->


#### Inferences

It can be inferred that there are more flights round the clock for shorter distances. However, for longer distance there are limited flights through the day.


## 2. Line Plot


Let us now use a line plot to extract average monthly arrival delay time and see how it changes over the year.

  This plot should contain the following

* Title as **Month vs Average Flight Delay Time**.
* x-axis label should be **Month**
* y-axis label should be **ArrDelay**
* A new dataframe **line_data** should be created which consists of 2 columns average **arrival delay time per month** and **month** from the dataset
* **Month** column data from the line_data dataframe should be considered in x-axis
* **ArrDelay** column data from the ine_data dataframeshould be considered in y-axis
* Plotted line in the line plot should be of green color


In [130]:
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()

In [131]:
# Display the data
line_data.sample()

Unnamed: 0,Month,ArrDelay
6,7,5.088889


In [146]:
## Write your code here
fig=go.Figure()
fig.add_trace(go.Scatter(x=line_data['Month'],
                        y=line_data['ArrDelay'],
                        mode='lines',
                        marker=dict(color='green')))
fig.update_layout(title='Month vs Average Flight Delay Time.', xaxis_title='Month', yaxis_title='ArrDelay')
fig.show()

Double-click **here** for hint.
<!--
*   Hint: Scatter and line plot vary by updating mode parameter.
-->


Double-click **here** for the solution.

<!-- The answer is below:

    
##First we will create an empty figure ising go.Figure()
fig=go.Figure()
##Next we will create a line plot by using the add_trace function and use the go.scatter() function within it
# In go.Scatter we define the x-axis data,y-axis data and define the mode as lines with color of the marker as green
fig.add_trace(go.Scatter(x=line_data['Month'], y=line_data['ArrDelay'], mode='lines', marker=dict(color='green')))
# Create line plot here
## Here we update these values under function attributes such as title,xaxis_title and yaxis_title
fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month', yaxis_title='ArrDelay')
fig.show()
-->


#### Inferences

It is found that in the month of June the average monthly delay time is the maximum


# plotly.express¶


## 3. Bar Chart



Let us use a bar chart to extract number of flights from a specific airline that goes to a destination

This plot should contain the following

* Title as **Total number of flights to the destination state split by reporting air**.
* x-axis label should be **DestState**
* y-axis label should be **Flights**
* Create a new dataframe called **bar_data**  which contains 2 columns **DestState** and **Flights**.Here **flights** indicate total number of flights in each combination.


In [148]:
# Group the data by destination state and reporting airline. Compute total number of flights in each combination
bar_data = data.groupby('DestState')['Flights'].sum().reset_index()

In [150]:
# Display the data
bar_data.sample()

Unnamed: 0,DestState,Flights
18,MI,16.0


In [161]:
## Write your code here
fig=go.Figure()
fig = px.bar(x=bar_data['DestState'],
            y=bar_data['Flights'])
fig.update_layout(title='Total number of flights to the destination state split by reporting air.',
              xaxis_title='DestState',
              yaxis_title='Flights')
fig.show()

Double-click **here** for hint.
<!--
***Use the px.bar() function***
-->


Double-click **here** for the solution.

<!-- The answer is below:

    
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of the chart.
# This will give total number of flights to the destination state.
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination state split by reporting airline') 
fig.show()
-->


In [162]:
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of the chart.
# This will give total number of flights to the destination state.
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination state split by reporting airline') 
fig.show()

#### Inferences

It is found that maximum flights are to destination state **CA** which is around 68 and there is only 1 flight to destination state **VT**


## 4. Histogram



Let us represent the distribution of arrival delay using a histogram

This plot should contain the following

* Title as **Total number of flights to the destination state split by reporting air**.
* x-axis label should be **ArrayDelay**
* y-axis will show the count of arrival delay


In [170]:
# Set missing values to 0
data['ArrDelay'] = data['ArrDelay'].fillna(0)
data['ArrDelay']

5312     32.0
18357    -1.0
6428     -5.0
15414    -2.0
10610   -11.0
         ... 
18946     8.0
16291    -5.0
21818   -14.0
24116    88.0
16705     4.0
Name: ArrDelay, Length: 500, dtype: float64

In [171]:
## Write your code here
fig= go.Figure()
fig=px.histogram(x=data['ArrDelay'])
fig.update_layout(title='Total number of flights to the destination state split by reporting air.',
                  xaxis_title='ArrayDelay')
fig

Double-click **here** for hint.
<!--
***Use the px.histogram() function***
-->


Double-click **here** for the solution.

<!-- The answer is below:
## Use plotly express histogram chart function px.histogram.Provide input data x to the histogram
fig = px.histogram(data, x="ArrDelay",title="Total number of flights to the destination state split by reporting air.")
fig.show()
    

-->


In [173]:
data

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
5312,985989,2006,1,3,29,3,2006-03-29,OO,20304,OO,...,,,,,,,,,,
18357,1782939,1993,3,8,3,2,1993-08-03,DL,19790,DL,...,,,,,,,,,,
6428,84140,1989,3,7,3,1,1989-07-03,HP,19991,HP,...,,,,,,,,,,
15414,1839736,2008,4,10,10,5,2008-10-10,UA,19977,UA,...,,,,,,,,,,
10610,1622640,2010,1,2,19,5,2010-02-19,FL,20437,FL,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18946,61420,2005,3,7,6,3,2005-07-06,WN,19393,WN,...,,,,,,,,,,
16291,458237,2019,2,6,1,6,2019-06-01,UA,19977,UA,...,,,,,,,,,,
21818,557936,1999,1,3,4,4,1999-03-04,HP,19991,HP,...,,,,,,,,,,
24116,1268298,2017,2,4,14,5,2017-04-14,DL,19790,DL,...,,,,,,,,,,


#### Inferences

It is found that there is only max of 5 flights with an arrival delay of 50-54 minutes and around 17 flights with an arrival delay of 20-25 minutes


## 5. Bubble Chart


Let  use a bubble plot to represent number of flights as per reporting airline

This plot should contain the following

* Title as **Reporting Airline vs Number of Flights**.
* x-axis label should be **Reporting_Airline**
* y-axis label should be **Flights**
* size of the bubble should be **Flights** indicating number of flights
* Name of the hover tooltip to `reporting_airline` using `hover_name` parameter.


In [174]:
# Group the data by reporting airline and get number of flights
bub_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()

In [175]:
bub_data.head(2)

Unnamed: 0,Reporting_Airline,Flights
0,9E,5.0
1,AA,57.0


In [201]:
## Write your code here
fig = go.Figure()


fig = px.scatter(bub_data,
                x='Reporting_Airline',
                y='Flights',
                size='Flights',
                hover_name="Reporting_Airline",
                title='Reporting Airline vs Number of Flights',
                size_max=60
                )
fig.show()


Double-click **here** for hint.
<!--
***Use the px.scatter() function and define the size attribute values apart from x and y attribute values***
-->


Double-click **here** for the solution.

<!-- The answer is below:
## Bubble chart using px.scatter function with x ,y and size variables defined.Title defined as Reporting Airline vs Number of Flights
fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",
                 hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights', size_max=60)
fig.show()
    

-->


#### Inferences

It is found that the reporting airline **WN** has the highest number of flights which is around 86


In [190]:
import plotly.graph_objects as go

# Sample data
Distance = [3, 4, 5]
DepTime = [20, 10, 2]

# Create a scatter plot with marker size specified in the 'size' parameter of go.scatter
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=Distance,
    y=DepTime,
    mode='markers',
    marker={'size': 20}  # Adjust the size value as needed
))

fig.update_layout(title='Distance vs DepTime',
                  xaxis_title='Distance', yaxis_title='DepTime')

fig.show()


## 6. Pie Chart


Let us represent the proportion of distance group by month (month indicated by numbers)

This plot should contain the following

* Title as **Distance group proportion by month**.
* values should be **Month**
* names should be **DistanceGroup**


In [218]:
## Write your code here

fig = px.pie(data,
             values='Month', 
             names='DistanceGroup', 
             title='Distance group proportion by month.')


fig.show()

In [217]:
# print(data['Month'])
# print(data['DistanceGroup'])


Double-click **here** for hint.
<!--
***Use the px.pie() function***
-->


Double-click **here** for the solution.

<!-- The answer is below:
# Use px.pie function to create the chart. Input dataset. 
# Values parameter will set values associated to the sector. 'Month' feature is passed to it.
# labels for the sector are passed to the `names` parameter.
fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by month')
fig.show()
    

-->


#### Inferences

It is found that February month has the highest distance group proportion


## 7. SunBurst Charts


Let us represent the hierarchical view in othe order of month and destination state holding value of number of flights

This plot should contain the following

*  Define hierarchy of sectors from root to leaves in `path` parameter. Here, we go from `Month` to `DestStateName` feature.
*   Set sector values in `values` parameter. Here, we can pass in `Flights` feature.
*   Show the figure.
*   Title as **Flight Distribution Hierarchy**


In [226]:
fig = px.sunburst(data,
                 path=['Month', 'DestStateName'],
                 values='Flights',title='Flight Distribution Hierarchy')
fig.show()

In [219]:
data['DestStateName']

5312     Wisconsin
18357      Georgia
6428      Nebraska
15414     Illinois
10610      Indiana
           ...    
18946     Missouri
16291       Nevada
21818     Missouri
24116      Florida
16705      Florida
Name: DestStateName, Length: 500, dtype: object

In [223]:
data['Flights']

5312     1.0
18357    1.0
6428     1.0
15414    1.0
10610    1.0
        ... 
18946    1.0
16291    1.0
21818    1.0
24116    1.0
16705    1.0
Name: Flights, Length: 500, dtype: float64

Double-click **here** for hint.
<!--
***Use the px.sunburst() function***
-->


Double-click **here** for the solution.

<!-- The answer is below:
## Define path as Month and DestStateName and values as Flights.
fig = px.sunburst(data, path=['Month', 'DestStateName'], values='Flights',title='Flight Distribution Hierarchy')
fig.show()
    

-->


#### Inferences

Here the  **Month** numbers present in the innermost concentric circle is the root and for each month we will check the **number of flights** for the different **destination states** under it.


## Summary

Congratulations for completing your lab.

In this lab, you have learnt how to use `plotly.graph_objects` and `plotly.express` for creating plots and charts.


## Author(s)
[Saishruthi Swaminathan](https://www.linkedin.com/in/saishruthi-swaminathan/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork970-2022-01-01)

Lakshmi Holla


## Other Contributor(s)

Lavanya T S


## Changelog

| Date       | Version | Changed by | Change Description                   |
| ---------- | ------- | ---------- | ------------------------------------ |
| 12-18-2020 | 1.0     | Nayef      | Added dataset link and upload to Git |
| 07-02-2023 | 1.1     | Lakshmi Holla     | Updated lab |


## <h3 align="center"> © IBM Corporation 2023. All rights reserved. <h3/>
