# Hierarchical Graphs

## Objectives

- Explore hierarchical data visualization techniques using Plotly Express, including sunburst, treemap, and icicle charts.
- Analyze the Automobile Dataset to demonstrate different aspects of vehicle specifications such as body style, fuel type, and price.
- Utilize advanced Plotly features to enhance visualization interactivity and data interpretation through dynamic elements like hover data and color scales.

## Background

This notebook delves into hierarchical data visualization using Plotly Express to represent complex data structures in a digestible format. It demonstrates sunburst, treemap, and icicle charts to provide insights into the hierarchical relationships within the Automobile Dataset. By manipulating elements such as the path order, color, and values, the notebook illustrates various methods to enhance data visualization, improving both aesthetic appeal and functional depth.

## Datasets Used

The primary dataset is the Automobile Dataset from the UCI Machine Learning Repository.

## Automobile Dataset

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 10)

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Defining the headers
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration", "num_doors", 
            "body_style", "drive_wheels", "engine_location","wheel_base", "length", "width", 
            "height", "curb_weight", "engine_type", "num_cylinders", "engine_size", "fuel_system",
            "bore", "stroke", "compression_ratio", "horsepower", "peak_rpm","city_mpg", 
            "highway_mpg", "price"]

In [3]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
                  header=None, names=headers, na_values="?" )
df.head(3)

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,...,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,...,154.0,5000.0,19,26,16500.0


### Analyzing Missing Values

`Plotly` does not work with missing values. We are going to remove them.

In [4]:
# Analysing missing values
df.isnull().sum()

symboling             0
normalized_losses    41
make                  0
fuel_type             0
aspiration            0
num_doors             2
body_style            0
drive_wheels          0
engine_location       0
wheel_base            0
length                0
width                 0
height                0
curb_weight           0
engine_type           0
num_cylinders         0
engine_size           0
fuel_system           0
bore                  4
stroke                4
compression_ratio     0
horsepower            2
peak_rpm              2
city_mpg              0
highway_mpg           0
price                 4
dtype: int64

In [5]:
# Removing the missing values
df.dropna(inplace=True)
df.isnull().sum()

symboling            0
normalized_losses    0
make                 0
fuel_type            0
aspiration           0
num_doors            0
body_style           0
drive_wheels         0
engine_location      0
wheel_base           0
length               0
width                0
height               0
curb_weight          0
engine_type          0
num_cylinders        0
engine_size          0
fuel_system          0
bore                 0
stroke               0
compression_ratio    0
horsepower           0
peak_rpm             0
city_mpg             0
highway_mpg          0
price                0
dtype: int64

## Sunburst Plots

A sunburst diagram displays a hierarchical structure. The center of the circle represents the organization's origin, and an extra ring symbolizes each level of the organization.

In [6]:
# body_style is the inner ring, and fuel_type is the outer one
fig_s1 = px.sunburst(df, path=["body_style","fuel_type"], 
                    width=600, height=600, title='Body Style & Fuel Type')
fig_s1.show()

If you hover the mouse over the sedan/gas sector, you will see the following information:
- labels = gas
- count = 67
- parent = sedan
- id = sedan/gas

In [7]:
# Creating a DataFrame with all sedan/gas cars.
sedan_gas = df[(df.body_style == "sedan") & (df.fuel_type == "gas")]
print(sedan_gas.shape)

(67, 26)


There are 67 sedan/gas cars.

### Variables order inside the path argument

In [8]:
# fuel_type is the inner ring, and body_style is the outer one
fig_s2 = px.sunburst(df, path=["fuel_type", "body_style"], 
                    width=600, height=600, title='Fuel Type & Body Style')
fig_s2.show()

As you can see, the order in the path parameter is important.

### Adding `values`

In [9]:
# Adding the parameter values 
fig_s3 = px.sunburst(df, path=["body_style","fuel_type"], values="price", 
                    width=600, height=600, title='Body Style & Fuel Type with Price')
fig_s3.show()

The sunburst graphs `fig_s1` and `fig_s3` are a little bit different. `fig_s3` includes the parameter `values='price'`.

If you hover the mouse over the `sedan/gas` sector, you will see the following information:
- labels = gas
- price= 813,499
- parent = sedan
- id = sedan/gas

In [10]:
# Computing the sum of the prices for the sedan/gas cars
sedan_gas.price.sum()

813499.0

### Adding `color`

In [11]:
# Let's use a different example.
tip = px.data.tips()

In [12]:
print(tip.shape)
tip.head()

(244, 7)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [13]:
fig_s4 = px.sunburst(tip, path=['day', 'time'], values='total_bill',
                    width=600, height=600, title='Day & Time with Total Bill')
fig_s4.show()

In [14]:
# Adding the color parameter
fig_s4 = px.sunburst(tip, path=['day', 'time'], values='total_bill',
                    color='tip', color_continuous_scale='RdBu',
                    width=600, height=600, title='Day & Time with Total Bill')
fig_s4.show()

In [15]:
# Formatting the hover data
fig_s4 = px.sunburst(tip, path=['day', 'time'], values='total_bill',
                    color='tip', color_continuous_scale='RdBu',
                    hover_data={'tip':':.2f'},
                    width=600, height=600, title='Day & Time with Total Bill')
fig_s4.show()

If you hover your mouse over the `Sat/Dinner` section, you will see:
- labels=Dinner
- total_bill=1778.4
- parent=Sat
- id=Sat/Dinner
- tip=3.52

In [16]:
# Verifying the sum of total_bill for Sat/Dinner
sat_din = tip[(tip.day=='Sat') & (tip.time=='Dinner')]
print('The sum of total_bill for Sat/Dinner is: %.1f' %(sat_din.total_bill.sum()))

The sum of total_bill for Sat/Dinner is: 1778.4


In [17]:
# Computing the number of cases for Saturday/Dinner
print(sat_din.shape)
sat_din.head()

(87, 7)


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
19,20.65,3.35,Male,No,Sat,Dinner,3
20,17.92,4.08,Male,No,Sat,Dinner,2
21,20.29,2.75,Female,No,Sat,Dinner,2
22,15.77,2.23,Female,No,Sat,Dinner,2
23,39.42,7.58,Male,No,Sat,Dinner,4


The value `tip=3.51924` is computed as the weighted average of the color values (`color='tip'`) using as weight `values='total_bill'`.

In [18]:
# Computing tip (color values) times total_bill (weight values)
sat_din['total_bill_tip'] = sat_din.total_bill * sat_din.tip
sat_din.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,total_bill_tip
19,20.65,3.35,Male,No,Sat,Dinner,3,69.1775
20,17.92,4.08,Male,No,Sat,Dinner,2,73.1136
21,20.29,2.75,Female,No,Sat,Dinner,2,55.7975
22,15.77,2.23,Female,No,Sat,Dinner,2,35.1671
23,39.42,7.58,Male,No,Sat,Dinner,4,298.8036


In [19]:
# Dividing the sum of total_bill_tip by the sum of the weight values (sum of total_bill)
print('Average of tip with total_bill as a weight: %.2f' 
                        %(sat_din.total_bill_tip.sum()/sat_din.total_bill.sum()))

Average of tip with total_bill as a weight: 3.52


In [20]:
# Choosing color as non-numerical data
fig_s4 = px.sunburst(tip, path=['sex', 'time', 'day'], values='total_bill',
                    color='sex', 
                    width=600, height=600, title='Day & Time with Total Bill')
fig_s4.show()

In [21]:
# Using an explicit mapping for discrete colors
fig_s4 = px.sunburst(tip, path=['sex', 'time', 'day'], values='total_bill',
                    color='sex', color_discrete_map={'Male':'dimgrey', 'Female':'coral'},
                    width=600, height=600, title='Day & Time with Total Bill')
fig_s4.show()

## Treemap Chart

Treemap graphs show hierarchical data using nested rectangles. The input data is the same as for Sunburst Charts. 

We define the hierarchy by labels and parents' attributes. You click on one sector to zoom in/out. It also shows a path bar in the upper-left corner of your treemap. You can also use the path bar to zoom out.

In [22]:
fig_t1 = px.treemap(df, path=["body_style","fuel_type"], 
                    width=800, height=500, title='Body Style & Fuel Type')
fig_t1.show()

If you hover the mouse over the sedan/gas rectangle, you will see the following information:
- labels = gas
- count = 67
- parent = sedan
- id = sedan/gas

In [23]:
# Swapping the variables inside the path
fig_t2 = px.treemap(df, path=["fuel_type", "body_style"], 
                    width=800, height=500, title='Fuel Type & Body Style')
fig_t2.show()

The order in the path parameter defines the graph.

### Adding `values`

In [24]:
fig_t3 = px.treemap(df, path=["body_style","fuel_type"], values="price",
                    width=800, height=500, title='Body Style & Fuel Type with Price')
fig_t3.show()

The graphs of `fig_t1` and `fig_t3` are a little bit different. `fig_t3` includes the parameter `values='price'`.

If you hover the mouse over the `sedan/gas` rectangle, you will see the following information:
- labels = gas
- price= 813,499
- parent = sedan
- id = sedan/gas

In [25]:
# Including make in the path
fig_t3 = px.treemap(df, path=["body_style","fuel_type","make"], values="price",
                    width=800, height=500, title='Body Style & Fuel Type with Price')
fig_t3.show()

### Adding `color`

Let's use a the tips example.

In [26]:
# Adding the color parameter
fig_t4 = px.treemap(tip, path=['day', 'time'], values='total_bill',
                    color='tip', color_continuous_scale='RdBu',
                    width=800, height=500, title='Day & Time with Total Bill')
fig_t4.show()

In [27]:
# Formatting the hover data
fig_t4 = px.treemap(tip, path=['day', 'time'], values='total_bill',
                    color='tip', color_continuous_scale='RdBu',
                    hover_data={'tip':':.2f'},
                    width=800, height=500, title='Day & Time with Total Bill')
fig_t4.show()

If you hover your mouse over the `Sat/Dinner` section, you will see:
- labels=Dinner
- total_bill=1778.4
- parent=Sat
- id=Sat/Dinner
- tip=3.52

In [28]:
# Choosing color as non-numerical data
fig_t4 = px.treemap(tip, path=['sex', 'time', 'day'], values='total_bill',
                    color='time', 
                    width=800, height=500, title='Day & Time with Total Bill')
fig_t4.show()

## Icicle Charts

Icicle graphs visualize hierarchical data using rectangular sectors that cascade from root to leaves in one of four directions: up, down, left, or right. The input data is the same as for Sunburst and Treemap charts. 

We define the hierarchy by labels and parents' attributes. You click on one sector to zoom in/out. It also shows a path bar in the upper-left corner of your treemap. You can also use the path bar to zoom out.

In [29]:
fig_c1 = px.icicle(df, path=["body_style","fuel_type"], 
                    width=800, height=500, title='Body Style & Fuel Type')
fig_c1.show()

If you hover the mouse over the sedan/gas rectangle, you will see the following information:
- labels = gas
- count = 67
- parent = sedan
- id = sedan/gas

In [30]:
# Swapping the variables inside the path
fig_c2 = px.icicle(df, path=["fuel_type", "body_style"], 
                    width=800, height=500, title='Fuel Type & Body Style')
fig_c2.show()

The order in the path parameter defines the graph.

### Adding `values`

In [31]:
fig_c3 = px.icicle(df, path=["body_style","fuel_type"], values="price",
                    width=800, height=500, title='Body Style & Fuel Type with Price')
fig_c3.show()

The graphs of `fig_c1` and `fig_c3` are a little bit different. `fig_c3` includes the parameter `values='price'`.

If you hover the mouse over the `sedan/gas` rectangle, you will see the following information:
- labels = gas
- price= 813,499
- parent = sedan
- id = sedan/gas

In [32]:
# Including make in the path
fig_c3 = px.icicle(df, path=[px.Constant("Cars"),"body_style","fuel_type","make"], values="price",
                    width=600, height=700, title='Body Style & Fuel Type with Price')
fig_c3.show()

### Adding `color`

We will use the tips example.

In [33]:
# Adding the color parameter & formatting the hover data
fig_c4 = px.icicle(tip, path=['day', 'time'], values='total_bill',
                    color='tip', color_continuous_scale='RdBu',
                    hover_data={'tip':':.2f'},
                    width=800, height=600, title='Day & Time with Total Bill')
fig_c4.show()

If you hover your mouse over the `Sat/Dinner` section, you will see:
- labels=Dinner
- total_bill=1778.4
- parent=Sat
- id=Sat/Dinner
- tip=3.52

In [34]:
# Choosing color as non-numerical data
fig_c4 = px.icicle(tip, path=['sex', 'time', 'day'], values='total_bill',
                    color='time', 
                    width=800, height=600, title='Day & Time with Total Bill')
fig_c4.show()

## Conclusions

Key Takeaways:
- Sunburst, treemap, and icicle charts effectively represent hierarchical data, allowing for intuitive exploration and understanding of complex relationships.
- Plotly's dynamic visualization capabilities, such as hover details and interactive zoom, enhance the user's ability to interact with and dissect the presented data.
- The order of elements within the path parameter critically affects the visualization's structure and interpretation, emphasizing the importance of thoughtful data arrangement.
- Color coding and value assignments (e.g., using price as a value in sunburst charts) provide additional layers of information, making the visualizations informative and visually compelling.
- Incorporating tooltips and custom color scales can significantly improve the readability and interpretability of the data, aiding in quicker insights and more effective data presentation.

## References

- https://plotly.com/python/sunburst-charts/
- https://plotly.com/python/treemaps/
- https://plotly.com/python/icicle-charts/