<a href="https://colab.research.google.com/github/acelvaros/devtraining-createnotes-jakarta/blob/master/adatelemz_s_s_vizualiz_ci_pythonban_part_2_handout_bi_forum_25.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Intro to Plotly Express - Part 2 - BI Forum '25**
---



Let's start by importing the necessary libraries.

In [None]:
# importing plotly express and pandas
import plotly.express as px
import pandas as pd

# these imports will be needed for map visualizations
from urllib.request import urlopen
import json

# 1) Part-of-Whole charts

Part-of-whole charts are used to show the ratio of a measure compared to the whole dataset

## Pie chart

A pie chart is a circular statistical chart, which is divided into sectors to illustrate numerical proportion.

For our first pie chart let's use the election() data set.

Let's display the distribution of the votes for each candidate.

### Preparing the dataframe

In [None]:
df = px.data.election()

df.head()

In [None]:
# melting the DataFrame to convert it to long-form
df_long = pd.melt(df, id_vars=['district'], value_vars=['Coderre', 'Bergeron', 'Joly'],
                  var_name='Candidate', value_name= 'Votes')

df_long.head()

In [None]:
# Creating the pie chart
fig = px.pie(df_long, names='Candidate', values='Votes')

fig.show()

### Customizing the chart
*   Since for pie charts the default render setting is that we only get the margin percent values let's add the Candidate names and the exact Vote values as well.
*   We are adding all the Candidate names to the slices so there is really no point in having the legends on the right anymore, lets  remove those as well.

In [None]:
fig = px.pie(df_long, names='Candidate', values='Votes',
             title='Vote distribution')

fig.update_layout(title_x = 0.5, width = 900, showlegend = False,
                  # adding a subtitle as well
                  title_subtitle_text = '2013 Montreal mayoral election'
                 )

# pie() has no text attribute, this is how we add data labels
fig.update_traces(textinfo = 'label+percent')

## Pie chart practice


### Gold medals
Lets create a pie chart showing the gold medal distribution between the countries.

Dataset: px.data.medals_long

In [None]:
# taking a look at the dataset
df_medals = px.data.medals_long()
df_medals

#### Code

In [None]:
# create the pie chart here

## Treemap charts

Treemaps are visualizations for hierarchical data. They are made of a series of nested rectangles of sizes proportional to the corresponding data value.


### First try

Let's create a simple treemap with the gapminder data set to visualize the population distribution of all continents in 1997.

#### Code

In [None]:
# filtering the data set
df3 = px.data.gapminder().query('year == 1997')

# taking a look at the data frame
df3.head()

In [None]:
treemap = px.treemap(df3, path = ['continent', 'country'], values = 'pop')

treemap.show()

### Customizing the treemap

#### Creating a single root node

Using px.Constant() to add a dummy column which we can use for a single root node.

In [None]:
# creating a seperate array for the path parameter
path_ = [px.Constant('World'), 'continent', 'country']

In [None]:
labels_ = {
    'labels' : 'Value',
    'pop' : 'Population',
    'parent' : 'Parent',
    'id' : 'ID',
    'lifeExp' : 'Life Expectancy'
}

title_= 'Population distribution of Continents in 1997'

In [None]:
# creating the treemap
treemap = px.treemap(df3, path = path_, values = 'pop', labels = labels_,
                     title = title_)

treemap.update_layout(title_x = 0.5)

# rendering the chart
treemap.show()

#### Changing the coloring with a third parameter

Changing the coloring of a treemap can be done the same way we did before using the color parameter


In [None]:
# creating the treemap
treemap = px.treemap(df3, path = path_, values = 'pop',
                  color = 'lifeExp', title = title_
                  )

treemap.update_layout(title_x = 0.5,
                      # overriding the default margins
                      margin = dict(t = 50, l = 100, r = 100, b = 50))

treemap.update_traces(marker_line_width = 2)

# rendering the chart
treemap.show()

#### Adding continous color scale and defining it's midpoint

If we would want to change the color scale then we would need to give value to the color_continous_scale parameter.

https://plotly.com/python/builtin-colorscales/

In [None]:
# we can also set the midpoint of the color scale
treemap = px.treemap(df3, path = path_, values = 'pop',
                  color = 'lifeExp',title = title_,
                  color_continuous_scale = 'RdBu',
                  color_continuous_midpoint= 65
                  )

treemap.update_layout(title_x = 0.5,
                      margin = dict(t = 50, l = 100, r = 100, b = 50))

treemap.update_traces(marker_line_width = 2)

# rendering the chart
treemap.show()

# 2) Additional chart types

## Choropleth Map

A Choropleth Map is a map composed of colored polygons. It displays the value of a numeric variable as a color for each region of a map.


### Example chart

On this example chart we want to visualize the amount of total votes in all off the districts that the election sample data set uses to see how many people went to vote.

In [None]:
labels = {'total': 'Total votes', 'district' : 'District'}
title = 'Choropleth map for the election data set'

In [None]:
# Each row represents voting results for an electoral district in the 2013 Montreal mayoral election.
df_election = px.data.election()

df_election.head()

In [None]:
df_geojson = px.data.election_geojson()

df_geojson

In [None]:
choropleth = px.choropleth(df_election, geojson=df_geojson, color='total',
                    locations='district', featureidkey='properties.district',
                    projection='mercator', labels = labels, title = title,
                    # changing the default color scale
                    color_continuous_scale = 'viridis'
                   )

choropleth.update_geos(fitbounds="locations", visible=False)

choropleth.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

choropleth.show()

### Example chart 2

In this example we want to visualize the life expectancy values for all of the countries that the gapminder dataset includes in the year of 2007

In [None]:
df_gap_filtered = px.data.gapminder().query("year == 2007")

labels_ = {"iso_alpha" : "Country code",
           "lifeExp" : "Life expectancy"
          }

# since he gapminder dataset has iso_apha codes we dont need a geojson
world_map = px.choropleth(df_gap_filtered, locations="iso_alpha",
                          color="lifeExp",
                          labels = labels_,
                          color_continuous_scale='sunset',
                          title="Worldwide life expectancy values")

world_map.update_layout(title_x = 0.5)

world_map.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

world_map.show()

### Practice exercise

Let's create a map that visualizes the unemployment rates of the US.

In [None]:
# importing the data into a df
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})

df.head()

In [None]:
# importing the geojson
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

counties

In [None]:
# create the map here

## Boxplot

A box plot is a statistical representation of the distribution of a variable through its quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box.

### Example chart

In [None]:
df_tips = px.data.tips()

df_tips.head()

In [None]:
labels = {'total_bill':'Total bill', 'time': 'Time of day', 'smoker': 'Smoker', 'tip': 'Tip', 'day': 'Day'}

title = 'Tip values by time of day'

In [None]:
# let's use the tips data set again

boxplot = px.box(df_tips, x = 'time', y = 'tip', color = 'smoker', labels = labels, title = title)

boxplot.show()

The points seen above the boxes are the outlier values.

### Displaying the underlying detailed data points

With the points argument, display underlying data points with either all points (all), outliers only (outliers, default), or none of them (False).

In [None]:
boxplot = px.box(df_tips, x = 'time', y = 'tip', color = 'smoker', labels = labels, title = title,
                 points = 'all'
                )

boxplot.show()

## Timeline

A timeline chart or Gantt chart is a type of bar chart that illustrates a project schedule. The chart lists the tasks to be performed on the vertical axis, and the time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.

Let's create a gantt chart using the 'machines.csv' data set that represents the jobs on the y axis the start and finish times on the X axis




#### The solution

In [None]:
# reading the data
df_machines = pd.read_csv('https://raw.githubusercontent.com/CzibiBIC/biforum25/refs/heads/main/datasets/machines.csv')

In [None]:
# we need to cast the machine IDs as strings so that we can get individual colors
df_machines['Machine'] = df_machines['Machine'].astype(str)

df_machines['Job'] = df_machines['Job'].astype(str)

df_machines.head()

In [None]:
# we need to order the IDs
c_orders = {'Machine': ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15']}

# we need this array to order the y axis values
c_array = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15']

In [None]:
timeline = px.timeline(df_machines, x_start = 'Start', x_end = 'Finish', y = 'Job', color = 'Machine', color_discrete_sequence = px.colors.qualitative.Light24,
                       custom_data = ['Job','Machine','Start in minutes', 'Finish in minutes', 'Duration'], category_orders = c_orders
                      )

timeline.update_yaxes(categoryorder = 'array', categoryarray = c_array)

# easy way to display the information we want on the hover menu
timeline.update_traces(
    hovertemplate="<br>".join([
        "Job: %{customdata[0]}",
        "Machine: %{customdata[1]} ",
        "Start: %{customdata[2]}",
        "Finish: %{customdata[3]}",
        "Duration: %{customdata[4]}",
    ])
)

timeline.show()