<a href="https://colab.research.google.com/github/aps0611/data_visualization/blob/main/book_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Why Plotly?

* Interactive Open Source Library
* hover tool capabilities that allow to detect any outliers or anomalies in a large number of datapoints
* Visually attractive that can be accepted by wide range of audience
* Lot of customization possible to make graphs more meaningful and understandable to others

In [1]:
import plotly.express as px
import plotly.graph_objects as go
import urllib, json

## Filled Area Plot

https://plotly.com/python/filled-area-plots/

`px.area` creates a stacked area plot. Each filled area corresponds to one value of the column given by `line_group` parameter

* Used to display cumulative contribution of different variables over the continous range
* Useful for showing the composition of distribution of multiple categories or variables over time or any other continuous dimension.
* In a filled area plot, each category or variable is represented by colored area that starts from baseline (usually zero or fixed reference point) and accumulates as the value increases. The areas of different categories are stacked on top of each other, creating visually appealing representation of overall composition.

### Highlight part to whole contribution

Stacked area charts allow the reader to easliy understand the change over time in total value of quantity, along with change in parts that contribute to this total. 

### Show radical changes to contribution over time. 

In case where our total are composed of a group of categories we may use stacked graphs to highlight the change in contribution over time

### issues with it:

Avoid when you want to accurately show the fluctuations in each category or data is not continuous time series type then stacked bar chart or a stacked column chart will be better choice. 

In [3]:
df = px.data.gapminder()
fig = px.area(df, x='year', y='pop', color = 'continent', line_group = 'country')
fig.show()

Advantages
--------------

* Time series
* Continuous distribution
* Multiple Variables


Disadvantages:
---------------
* Not Accurate 

## Gantt Chart:

* Gantt charts are used to show the `project schedule`. We can use the `timeline()` function of `plotly.express` to create a Gantt chart

* We have to create a data frame, and that data frame should contain three variables task, start data, and end date.

* Task List: List of tasks or activities required to complete the project. Each task is typically represented by seperate row or line

* Timeline: The top and bottom of the chart shows a horizontal timeline representing the project duration. It is usually divided into intervals

* Task Bars: Horizontal Bars or blocks are positioned along the timeline to represent the duration of each task. The length of the bar indicates the estimated or actual time required to complete the task


In [4]:
import plotly.express as px
import pandas as pd

df = [dict(Task="Job A", Start='2022-01-01', Finish='2022-02-28'),
dict(Task="Job B", Start='2022-03-05', Finish='2022-04-15'),
dict(Task="Job C", Start='2022-02-20', Finish='2022-05-30')]

fig = px.timeline(df, x_start='Start', x_end='Finish', y='Task', color='Task')
fig.show()

## Sunburst Chart

* Sunburst plots visualize hierarchical data spanning outwards radially from root to leaves. Similar to Icicle charts and Treemaps, the hierarchy is defined by labels (names for px.icicle) and parents attributes. The root starts from the center and children are added to the outer rings.

* With `px.sunburst`, each row of the DataFrame is represented as a sector of the sunburst.

In [5]:
import plotly.express as px
data = dict(
    character=["Eve", "Cain", "Seth", "Enos", "Noam", "Abel", "Awan", "Enoch", "Azura"],
    parent=["", "Eve", "Eve", "Seth", "Seth", "Eve", "Eve", "Awan", "Eve" ],
    value=[10, 14, 12, 10, 2, 6, 6, 4, 4])

fig = px.sunburst(
    data,
    names='character',
    parents='parent',
    values='value',
)
fig.show()


### Sunburst of a rectangular DataFrame with plotly.express

Hierarchical data are often stored as a rectangular dataframe, with different columns corresponding to different levels of the hierarchy. px.sunburst can take a path parameter corresponding to a list of columns. Note that id and parent should not be provided if path is given.

In [6]:
import plotly.express as px
df = px.data.tips()
fig = px.sunburst(df, path=['day', 'time', 'sex'], values='total_bill')
fig.show()

### continuous color argument in px.sunburst

If a color argument is passed, the color of a node is computed as the average of the color values of its children, weighted by their values.

In [7]:
import plotly.express as px
import numpy as np
df = px.data.gapminder().query("year == 2007")
fig = px.sunburst(df, path=['continent', 'country'], values='pop',
                  color='lifeExp', hover_data=['iso_alpha'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(df['lifeExp'], weights=df['pop']))
fig.show()

https://plotly.com/python/sunburst-charts/