# Dashboarding Plotly


可以用在
1. jupyter notebook
2. HTML files
3. Used in developing Python-bulit web applications

## import library

In [None]:
import pandas as pd 
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

## 1. Scatter Plot:
A scatter plot shows the relationship between 2 variables on the x and y-axis. The data points here appear scattered when plotted on a two-dimensional plane. Using scatter plots, we can create exciting visualizations to express various relationships, such as:

Height vs weight of persons
Engine size vs automobile price
Exercise time vs Body Fat

In [8]:
age_array = np.random.randint(25,55,60)

income_array = np.random.randint(300000,550000,60)

In [9]:
fig = go.Figure()
fig

In [12]:
fig.add_trace(go.Scatter(x=age_array, y=income_array, mode='markers', marker=dict(color='blue')))
fig.update_layout(title='Economic Survey', xaxis_title='Age', yaxis_title='Income')
fig.show()

## 2. Line Plot:
A line plot shows information that changes continuously with time. Here the data points are connected by straight lines. Line plots are also plotted on a two dimensional plane like scatter plots. Using line plots, we can create exciting visualizations to illustrate:

Annual revenue growth
Stock Market analysis over time
Product Sales over time

In [13]:
numberofbicyclessold_array=[50,100,40,150,160,70,60,45]
# Define an array containing months
months_array=["Jan","Feb","Mar","April","May","June","July","August"]

In [17]:
fig=go.Figure()
fig.add_trace(go.Scatter(x=months_array, y=numberofbicyclessold_array, mode='lines', marker=dict(color='green')))

fig.update_layout(title='Bicycle Sales', xaxis_title='Month', yaxis_title='Number of Bicycles Sold')
fig.show()

## 3.Bar Plot:
A bar plot represents categorical data in rectangular bars. Each category is defined on one axis, and the value counts for this category are represented on another axis. Bar charts are generally used to compare values.We can use bar plots in visualizing:

Pizza delivery time in peak and non peak hours
Population comparison by gender
Number of views by movie name

In [18]:
score_array = [80,90,56,88,95]
grade_array = ['Grade 6', 'Grade 7', 'Grade 8', 'Grade 9', 'Grade 10']

In [19]:
fig = px.bar(grade_array, y=score_array, title='Pass Percentage of Classes')
fig.show()

## 4.Histogram:
A histogram is used to represent continuous data in the form of bar. Each bar has discrete values in bar graphs, whereas in histograms, we have bars representing a range of values. Histograms show frequency distributions. We can use histograms to visualize:

Students marks distribution
Frequency of waiting time of customers in a Bank

In [20]:
height_array = np.random.normal(160,11,200) # 平均值 標準差 樣本數

fig = px.histogram(x=height_array, title='Distribution of Heights')
fig.show()

## 5. Bubble Plot:
A bubble plot is used to show the relationship between 3 or more variables. It is an extension of a scatter plot. Bubble plots are ideal for visualizing:

Global Economic position of Industries
Impact of viruses on Diseases

In [21]:
##Example 4: Let us illustrate crime statistics of US cities with a bubble chart

#Create a dictionary having city,numberofcrimes and year as 3 keys
crime_details = {
    'City' : ['Chicago', 'Chicago', 'Austin', 'Austin','Seattle','Seattle'],
    'Numberofcrimes' : [1000, 1200, 400, 700,350,1500],
    'Year' : ['2007', '2008', '2007', '2008','2007','2008'],
}
  
# create a Dataframe object with the dictionary
df = pd.DataFrame(crime_details)
  
df

Unnamed: 0,City,Numberofcrimes,Year
0,Chicago,1000,2007
1,Chicago,1200,2008
2,Austin,400,2007
3,Austin,700,2008
4,Seattle,350,2007
5,Seattle,1500,2008


In [22]:
bub_data = df.groupby('City')['Numberofcrimes'].sum().reset_index()
bub_data

Unnamed: 0,City,Numberofcrimes
0,Austin,1100
1,Chicago,2200
2,Seattle,1850


In [23]:
fig = px.scatter(bub_data, x="City", y="Numberofcrimes", size="Numberofcrimes",
                 hover_name="City", title='Crime Statistics', size_max=60)
fig.show()

## 6.Pie Plot:
A pie plot is a circle chart mainly used to represent proportion of part of given data with respect to the whole data. Each slice represents a proportion and on total of the proportion becomes a whole. We can use bar plots in visualizing:

Sales turnover percentatge with respect to different products
Monthly expenditure of a Family

In [24]:
exp_percent = [20, 50, 10, 8, 12]
house_hold_categories = ['A', 'B', 'C', 'D', 'E']
fig = px.pie(values=exp_percent, names=house_hold_categories, title='Household Expenditure')
fig.show()

## 7.Sunburst Charts:
Sunburst charts represent hierarchial data in the form of concentric circles. Here the innermost circle is the root node which defines the parent, and then the outer rings move down the hierarchy from the centre. They are also called radial charts.We can use them to plot

Worldwide mobile Sales where we can drill down as follows:

innermost circle represents total sales
first outer circle represents continentwise sales
second outer circle represents countrywise sales within each continent
Disease outbreak hierarchy

Real Estate Industrial chain

In [25]:
data = dict(
    character=["Eve", "Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
    parent=["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve" ],
    value=[10, 14, 12, 10, 2, 6, 6, 4, 4])

fig = px.sunburst(
    data,
    names='character',
    parents='parent',
    values='value',
    title="Family chart"
)
fig.show()


## read data 

In [28]:
import pandas as pd

url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv'

df = pd.read_csv(url)
df


Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,...,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,...,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26995,821542,2017,1,1,24,2,2017-01-24,DL,19790,DL,...,,,,,,,,,,
26996,1910565,2013,2,6,27,4,2013-06-27,B6,20409,B6,...,,,,,,,,,,
26997,9055,2016,3,8,26,5,2016-08-26,AA,19805,AA,...,,,,,,,,,,
26998,84136,2009,3,8,8,6,2009-08-08,YV,20378,YV,...,,,,,,,,,,


In [37]:
data = df.sample(n=500, random_state=42)
data

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
5312,985989,2006,1,3,29,3,2006-03-29,OO,20304,OO,...,,,,,,,,,,
18357,1782939,1993,3,8,3,2,1993-08-03,DL,19790,DL,...,,,,,,,,,,
6428,84140,1989,3,7,3,1,1989-07-03,HP,19991,HP,...,,,,,,,,,,
15414,1839736,2008,4,10,10,5,2008-10-10,UA,19977,UA,...,,,,,,,,,,
10610,1622640,2010,1,2,19,5,2010-02-19,FL,20437,FL,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18946,61420,2005,3,7,6,3,2005-07-06,WN,19393,WN,...,,,,,,,,,,
16291,458237,2019,2,6,1,6,2019-06-01,UA,19977,UA,...,,,,,,,,,,
21818,557936,1999,1,3,4,4,1999-03-04,HP,19991,HP,...,,,,,,,,,,
24116,1268298,2017,2,4,14,5,2017-04-14,DL,19790,DL,...,,,,,,,,,,


## 1. Scatter Plot¶

Title as Distance vs Departure Time.

x-axis label should be Distance

y-axis label should be DeptTime

Distance column data from the flight delay dataset should be considered in x-axis

DepTime column data from the flight delay dataset should be considered in y-axis

Scatter plot markers should be of red color

In [10]:
import pandas as pd

# 確保 CRSDepTime 是四位數字串 (hhmm)
df['CRSDepTime'] = df['CRSDepTime'].astype(int).apply(lambda x: f"{x:04d}")

# 拆出小時和分鐘
df['CRS_Hour'] = df['CRSDepTime'].str[:2].astype(int)
df['CRS_Min'] = df['CRSDepTime'].str[2:].astype(int)

# 建立 datetime (用 FlightDate + 時:分)
df['CRSDep_dt'] = pd.to_datetime(
    df['FlightDate'] + ' ' + df['CRS_Hour'].astype(str) + ':' + df['CRS_Min'].astype(str),
    format='%Y-%m-%d %H:%M'
)

# 加上延誤分鐘數，得到真正的 DepTime
df['DeptTime'] = df['CRSDep_dt'] + pd.to_timedelta(df['DepDelay'], unit='m')


AttributeError: Can only use .str accessor with string values!

In [60]:
df.isna()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum,DeptTime
0,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
1,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
2,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
3,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
4,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26995,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
26996,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
26997,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False
26998,False,False,False,False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,True,True,False


In [96]:
import plotly.graph_objects as go
import plotly.express as px

distance = df['Distance']
deptTime = df['DeptTime']
deptTime

0       1998-04-02 13:30:00
1       2013-05-13 12:55:00
2       1993-09-25 17:23:00
3       1994-11-12 13:09:00
4       2017-08-17 07:46:00
                ...        
26995   2017-01-24 09:03:00
26996   2013-06-27 13:20:00
26997   2016-08-26 09:14:00
26998   2009-08-08 12:30:00
26999   1993-07-17 06:18:00
Name: DeptTime, Length: 27000, dtype: datetime64[ns]

In [100]:
df_c = df.dropna(subset=['Distance', 'DeptTime'])

distance = df_c['Distance']

# 把 datetime 轉成一天中的時間 (0~24 小時的浮點數)
deptTime = df_c['DeptTime'].dt.hour + df_c['DeptTime'].dt.minute/60


In [101]:
print(len(distance))
print(len(deptTime))
print(deptTime.dtype)
print(distance.dtype)


26557
26557
float64
float64


In [102]:
fig = go.Figure()
fig.add_trace(go.Scatter( 
    x=distance, 
    y=deptTime, 
    mode='markers', 
    marker=dict(color='red')))

fig.update_layout(title='Distance vs Departure Time.', xaxis_title='Distance', yaxis_title='DeptTime')
fig.show()




## 2. Line Plot
Let us now use a line plot to extract average monthly arrival delay time and see how it changes over the year.

This plot should contain the following

Title as Month vs Average Flight Delay Time.

x-axis label should be Month

y-axis label should be ArrDelay

A new dataframe line_data should be created which consists of 2 columns average arrival delay time per month and month from the dataset

Month column data from the line_data dataframe should be considered in x-axis

ArrDelay column data from the ine_data dataframeshould be considered in y-axis

Plotted line in the line plot should be of green color

In [30]:
month = df['Month']
delay = df['ArrDelay']
delay

0        -6.0
1       -12.0
2        45.0
3        41.0
4       -18.0
         ... 
26995   -14.0
26996   -11.0
26997    -2.0
26998    56.0
26999     1.0
Name: ArrDelay, Length: 27000, dtype: float64

In [14]:
df_1 = pd.DataFrame({
    "Month" : month,
    "AirDelay" : delay
})

df_1.dropna(inplace=True)

In [35]:
line_data = df_1.groupby('Month')['AirDelay'].mean().reset_index()

In [36]:
line_data

Unnamed: 0,Month,AirDelay
0,1,7.097321
1,2,6.530025
2,3,5.429533
3,4,4.040055
4,5,5.874889
5,6,9.347515
6,7,8.851436
7,8,7.40263
8,9,2.198975
9,10,3.242506


In [37]:
x = line_data['Month']
y = line_data['AirDelay']

fig = go.Figure()
fig.add_trace(go.Scatter(x=x,y=y,mode='lines', marker=dict(color='red')))
fig.show()

## 3. Bar Chart
Let us use a bar chart to extract number of flights from a specific airline that goes to a destination

This plot should contain the following

Title as Total number of flights to the destination state split by reporting air.

x-axis label should be DestState

y-axis label should be Flights

Create a new dataframe called bar_data which contains 2 columns DestState and Flights.Here flights indicate total number of flights in each combination.