# Storytelling with Data! in Altair
#### by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate the exercises from Cole Nussbaumer Knaflic's book, "Storytelling with Data - Let's Practice!", using `Python Altair`. Our primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of `Altair` compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`,and seemingly straightforward tasks in Excel that may require multiple iterations to translate effectively into the language.

In addition to the broader objective, this notebook also serves as a personal journey of learning `Altair`, a syntax that was previously unfamiliar to me. By delving into it, I aim to widen my repertoire in the data visualization field, discovering new ways to create compelling visual representations.

The data for all exercises can be found in the book's official website: https://www.storytellingwithdata.com/letspractice/downloads

### Imports

These are the libraries necessary to run the code for this project.

In [1]:
# For data manipulation and visualization
import pandas as pd
import numpy as np
import altair as alt

# For animation in Chapter 6 - Exercise 6
import ipywidgets as widgets
from ipywidgets import interact
from IPython.display import clear_output

# For converting .ipynb into .html
import keyboard
import nbconvert
import nbformat

And these are the versions used.

In [2]:
# Python version
! python --version

Python 3.11.6


In [3]:
# Library version

print("Pandas version: " + pd.__version__)
print("Numpy version: " + np.__version__)
print("Altair version: " + alt.__version__)
print("Ipywidgets version: " + widgets.__version__)
print("Nbconvert version: " + nbconvert.__version__)
print("Nbformat version: " + nbformat.__version__)

Pandas version: 2.1.2
Numpy version: 1.26.0
Altair version: 5.1.2
Ipywidgets version: 8.1.1
Nbconvert version: 7.11.0
Nbformat version: 5.9.2


## Table of Contents

+ [Chapter 2](#2)
    + [Exercise 1](#2.1)
    + [Exercise 4](#2.4)
    + [Exercise 5](#2.5)

+ [Chapter 3](#3)
    + [Exercise 2](#3.2)

+ [Chapter 4](#4)
    + [Exercise 2](#4.2)
    + [Exercise 3](#4.3)

+ [Chapter 5](#5)
    + [Exercise 4 (Inspired)](#5.4)

+ [Chapter 6](#6)
    + [Exercise 6](#6.6)

## Chapter 2 - Choose an effective visual<a name="2"></a>

*"When I have some data I need to show, how do I do that in an effective way?"* - Cole Nussbaumer Knaflic

### Exercise 2.1 - Improve this table<a name="2.1"></a>

For this exercise, we will start with a simple table and work our way into transforming it into different types of commonly used visualizations.

#### Loading the data

The first problem with the Excel-to-Altair translation arises from the data itself, as it is polluted with titles and texts for readability in Excel. This, however, is not friendly when dealing with Python, so we should be careful when loading it. Alterations like this will happen in all subsequent exercises.

In [4]:
# Example of polluted loading

table = pd.read_excel(r"Data\2.1 EXERCISE.xlsx")
table

Unnamed: 0,EXERCISE 2.1,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,,,,,,
1,,FIG 2.1a,,,,
2,,,,,,
3,,New client tier share,,,,
4,,,,,,
5,,Tier,# of Accounts,% Accounts,Revenue ($M),% Revenue
6,,A,77,0.070772,4.675,0.25
7,,A+,19,0.017463,3.927,0.21
8,,B,338,0.310662,5.984,0.32
9,,C,425,0.390625,2.805,0.15


In [5]:
del table

In [6]:
# Right loading
table = pd.read_excel(r"Data\2.1 EXERCISE.xlsx", usecols = [1, 2, 3, 4, 5], header = 6)
table

Unnamed: 0,Tier,# of Accounts,% Accounts,Revenue ($M),% Revenue
0,A,77,0.070772,4.675,0.25
1,A+,19,0.017463,3.927,0.21
2,B,338,0.310662,5.984,0.32
3,C,425,0.390625,2.805,0.15
4,D,24,0.022059,0.374,0.02


#### Table

The initial changes recommended in the book focus on improving the table's readability itself. These changes include reordering the tiers, adding a row to show the total value, incorporating a category called "All others" to account for unmentioned values when the total percentage doesn't add up to 100%, and rounding the numbers while adjusting the percentage format as required.

The following code implements these modifications.

In [7]:
# Ordering the tiers

table = table.loc[[1, 0, 2, 3, 4]]

In [8]:
# Fixing the percentages

table['% Accounts'] = table['% Accounts'].apply(lambda x: x*100)
table['% Revenue'] = table['% Revenue'].apply(lambda x: x*100)

In [9]:
# Calculating and adding "All other" values

other_account_per = 100 - table['% Accounts'].sum()
other_revenue_per = 100 - table['% Revenue'].sum()

other_account_num = (other_account_per*table['# of Accounts'][0])/table['% Accounts'][0]
other_revenue_num = (other_revenue_per*table['Revenue ($M)'][0])/table['% Revenue'][0]

table.loc[len(table)] = ["All other", other_account_num, other_account_per, other_revenue_num, other_revenue_per]

In [10]:
# Since we will not use rounded values or the total row for the graphs,
# we should create a new variable before making the following alterations

table_charts = table.copy()

In [11]:
# Adding total values row

table.loc[len(table)] = ["Total", table['# of Accounts'].sum(), table['% Accounts'].sum(),
                        table['Revenue ($M)'].sum(), table['% Revenue'].sum()]

In [12]:
# Rounding the numbers

table['% Accounts'] = table['% Accounts'].apply(lambda x: round(x))
table['Revenue ($M)'] = table['Revenue ($M)'].apply(lambda x: round(x, 1))

The new table is as follows:

In [13]:
table

Unnamed: 0,Tier,# of Accounts,% Accounts,Revenue ($M),% Revenue
1,A+,19.0,2,3.9,21.0
0,A,77.0,7,4.7,25.0
2,B,338.0,31,6.0,32.0
3,C,425.0,39,2.8,15.0
4,D,24.0,2,0.4,2.0
5,All other,205.0,19,0.9,5.0
6,Total,1088.0,100,18.7,100.0


or, for even better readability in `Python`:

In [14]:
table.set_index("Tier")

Unnamed: 0_level_0,# of Accounts,% Accounts,Revenue ($M),% Revenue
Tier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A+,19.0,2,3.9,21.0
A,77.0,7,4.7,25.0
B,338.0,31,6.0,32.0
C,425.0,39,2.8,15.0
D,24.0,2,0.4,2.0
All other,205.0,19,0.9,5.0
Total,1088.0,100,18.7,100.0


Some changes were not implemented, such as colors of rows, alignment of text, and embedding graphs into the table, for lack of compatibility with the Pandas DataFrame format. The percentage symbol (%) next to the number in the percentage columns wasn't added since doing this in Python will transform the data from `int` to `string`, and therefore is not a recommended approach.

#### Pie chart

Considering that percentages depict a fraction of a whole, the next proposal is to employ a pie chart. 
Here is the default Altair graph version:

In [15]:
# Default pie chart

alt.Chart(table_charts).mark_arc().encode(
    theta = "% Accounts",
    color = alt.Color('Tier')
)

Some of the adjustments needed to bring it closer to the original include reordering the tiers, changing the labels position, altering the color palette, and adding an title.

In [16]:
## % of Accounts Pie Chart

# Creating a base chart with a title, aligned to the left and with normal font weight
base = alt.Chart(
    table_charts, 
    title = alt.Title(r"% of Total Accounts", anchor = 'start', fontWeight = 'normal')
).encode(
    theta = alt.Theta("% Accounts:Q", stack = True), # Encoding the angle (theta) for the pie chart
    color = alt.Color('Tier', legend = None), # Encoding color based on the 'Tier' field
    order = alt.Order(field = 'Tier') # Ordering the sectors of the pie chart based on the 'Tier' field
)

# Creating the pie chart with an outer radius of 115
pie = base.mark_arc(outerRadius = 115)

# Creating text labels for each sector of the pie chart
text = base.mark_text(radius = 140, size = 15).encode(text = alt.Text("Tier"))

# Combining the pie chart and text labels
acc_pie = pie + text
acc_pie

Not informing the data type for the field `order` makes it so Altair rearranges the `Tiers` alphabetically instead of using the order provided by dataframe. We can fix this by identifying `Tier` as Ordered (O).

In [17]:
## % of Accounts Pie Chart

base = alt.Chart(
    table_charts, 
    title = alt.Title(r"% of Total Accounts", anchor = 'start', fontWeight = 'normal')
).encode(
    theta = alt.Theta("% Accounts:Q", stack = True),
    color = alt.Color('Tier',
                      scale = alt.Scale(
                          range = ['#4d71bc', '#5d9bd4', '#6fae45', '#febf0f', '#e77e2d', '#a6a6a6']
                        ), # Setting custom colors for each sector of the pie chart
                      sort = None, # So that the colors don't follow the alphabetic order
                      legend = None
                      ),
    order = alt.Order(field = 'Tier:O'))


pie = base.mark_arc(outerRadius = 115)
text = base.mark_text(radius = 140, size = 15).encode(text = alt.Text("Tier"))


acc_pie = pie + text
acc_pie

Initially, `offset` was used instead of `anchor`, manually specifying the title location in the x-axis by pixels.This produces a more replica-like result, as you define the texts to be exactly to the same place as the example. While this approach yields a result that closely mimics the example, we acknowledge that anchoring provides a faster and cleaner solution. The decision has been made to adopt anchoring for the remainder of this project, prioritizing efficiency and universality across all graphs, even if it means sacrificing pinpoint accuracy in text placement.

The HEX color code values of the palette from the book were acquired through the use of the online tool "Color Picker Online", which is freely accessible at https://imagecolorpicker.com/.

The pie chart above can now be easily modified to represent the percentage of total revenue.

In [18]:
# % of Revenue Pie Chart

base = alt.Chart(
    table_charts, 
    title = alt.Title(r"% of Total Revenue", anchor = 'start', fontWeight = 'normal')
).encode(
    theta = alt.Theta("% Revenue:Q", stack = True),
    color = alt.Color('Tier',
                      scale = alt.Scale(
                          range = ['#4d71bc', '#5d9bd4', '#6fae45', '#febf0f', '#e77e2d', '#a6a6a6']
                        ),
                      sort = None,
                      legend = None
                      ),
    order = alt.Order(field = 'Tier:O'))


pie = base.mark_arc(outerRadius = 115)
text = base.mark_text(radius = 140, size = 15).encode(text = alt.Text("Tier"))


rev_pie = pie + text
rev_pie

With both graphs available, we can add them next to each other and include a main title.

In [19]:
# Combining two pie charts using the vertical concatenation operator '|'
pies = acc_pie | rev_pie

# Setting properties for the combined pie charts
pies.properties(
    title = alt.Title('New Client Tier Share', offset=10, fontSize=20)  # Adding a title with specific offset and font size
)


Visualization as depicted in the book:

![Alt text](./Images/2_1e.png)

Pie charts can present readability challenges, as the human eye struggles to differentiate the relative volumes of slices effectively. While adding data percentages next to the slices can enhance comprehension, it may also introduce unnecessary clutter to the visualization.

#### Bar chart

The next graph proposed to tackle is a horizontal bar chart. Since now the comparison does not involve angles and are aligned at the start point, discerning the segment's scale is easier.

This is the default representation in Altair:

In [20]:
# Default altair bar chart

alt.Chart(table_charts).mark_bar().encode(
    y = alt.Y('Tier'),
    x = alt.X('% Accounts'))

The necessary adjustments involve placing the "Tier" label in the upper left corner, displaying values next to the bars instead of using an x-axis, and adding a title while rearranging the tiers.

In [21]:
# Creating a base chart with a title, aligned to the left and with normal font weight
base = alt.Chart(
    table_charts,
    title = alt.Title('TIER | % OF TOTAL ACCOUNTS', anchor = 'start', fontWeight = 'normal')
).mark_bar().encode(
    y = alt.Y('Tier', title = None),  # Encoding the 'Tier' field on the y-axis, without a specific title
    x = alt.X('% Accounts', axis = None),  # Encoding the '% Accounts' field on the x-axis, without axis labels
    order = alt.Order(field = 'Tier:O'),  # Ordering the bars based on the 'Tier' field
    text = alt.Text("% Accounts", format=".0f")  # Displaying the '% Accounts' values as text, formatted to have no decimal places
)

# Creating the final bar chart by combining the bars and text labels
final_acc = base.mark_bar() + base.mark_text(align='left', dx=2)

# Displaying the final bar chart
final_acc

Adding the `order` by `Tier:O` didn't had the same effect as it did on the pie chart. The compatible method for this case is adding a `sort` keyword in the axis to be sorted.

In [22]:
base = alt.Chart(
    table_charts,
    title = alt.Title('TIER   | % OF TOTAL ACCOUNTS     |', anchor = 'start', fontWeight = 'normal')
).encode(
    y = alt.Y('Tier', sort = ["A+"], title = None),  # Encoding the 'Tier' field on the y-axis, with a specific sorting order
    x = alt.X('% Accounts', axis = None), 
    text = alt.Text("% Accounts", format = ".0f")
)

final_acc = (base.mark_bar() + base.mark_text(align = 'left', dx = 2)).properties(width = 150) # Setting te width
final_acc


Now we do the same for the revenue column. In addition, the y-axis is removed so it isn't repeated when uniting the charts.

In [23]:
base = alt.Chart(
    table_charts, 
    title = alt.Title('% OF TOTAL REVENUE', anchor = 'start', fontWeight = 'normal')
).encode(
    y = alt.Y('Tier', sort = ["A+"]).axis(None),
    x = alt.X('% Revenue').axis(None),
    text = alt.Text("% Revenue", format = ".0f")
)

final_rev = (base.mark_bar() + base.mark_text(align = 'left', dx = 2)).properties(width = 150)
final_rev

Similar to the pie chart, we can arrange these graphs side by side and include a main title.

In [24]:
# Combining two charts horizontally using the concatenation operator '|'
hor_bar = final_acc | final_rev

# Configuring the view of the combined chart, removing strokes
hor_bar.configure_view(stroke = None).properties(
    title = alt.Title('New Client Tier Share', anchor = 'start', fontSize = 20)  # Adding title
)


Visualization as depicted in the book:

![Alt text](./Images/2_1f.png)

In both the pie and bar chart, the labeling beside the value is not in the same position as the examples provided. This discrepancy arises from the fact that adjusting these labels to match the book's examples, with variations in positions (some inside and some outside of the pie), different colors, and even omitting some labels, would be a labor-intensive manual task in Altair. These adjustments are primarily for aesthetic purposes and do not significantly impact readability, in some cases even obscuring the information being presented. 

Examples of how to manually define labels will be presented in future exercises.

#### Horizontal dual series bar chart

The two graphs in the last visualization can be merged into a single grouped bar chart.

In [25]:
# Altair with default settings
alt.Chart(table_charts).mark_bar().encode(
    x = alt.X('value:Q'),  # Encoding the quantitative variable 'value' on the x-axis
    y = alt.Y('variable:N'),  # Encoding the nominal variable 'variable' on the y-axis
    color = alt.Color(
        'variable:N', 
        legend = alt.Legend(title = 'Metric')
    ),  # Encoding color based on 'variable' with legend title 'Metric'
    row = alt.Row('Tier:O')  # Faceting by rows based on the ordinal variable 'Tier'
).transform_fold(
    fold = ['% Accounts', '% Revenue'],  # Transforming the data by folding the specified columns
    as_ = ['variable', 'value']  # Renaming the folded columns to 'variable' and 'value'
)


The necessary alterations involve removing the grid, adjusting label positions and reducing redundancy, adding a title and subtitle, and changing the color palette.

In [26]:
# Custom settings
merged_hor_bar = alt.Chart(
    table_charts,
    title = alt.Title('New client tier share', fontSize = 20)  # Adding a title with specific font size
).mark_bar().encode(
    x = alt.X(
        'value:Q',
        axis = alt.Axis(
            title = "TIER |  % OF TOTAL ACCOUNTS vs REVENUE",  # Setting a custom title for the x-axis
            grid = False, # Remove grid
            orient = 'top', # Put axis on top
            labelColor = "#888888",  # Setting the label color as gray
            titleColor = '#888888'  # Setting the title color as gray
        )
    ),
    y = alt.Y(
        'variable:N',
        axis = alt.Axis(title = None, labels = False, ticks = False)  # Removing y-axis title, labels, and ticks
    ),
    color = alt.Color(
        'variable:N',
        legend = alt.Legend(title = 'Metric'),  # Adding a legend with a custom title
        scale = alt.Scale(range = ['#b4c6e4', '#4871b7'])  # Setting a custom color range
    ),
    row = alt.Row(
        'Tier:O',
        header = alt.Header(labelAngle = 0, labelAlign = "left"),  # Rotating row labels and aligning to the left
        title = None,
        sort = ['A+'],  # Sorting rows based on 'Tier'
        spacing = 10  # Adding spacing between rows
    )
).transform_fold(
    fold = ['% Accounts', '% Revenue'],  # Transforming the data by folding the specified columns
    as_ = ['variable', 'value']  # Renaming the folded columns to 'variable' and 'value'
).properties(
    width = 200  # Setting the width of the chart
).configure_view(stroke = None)  # Removing the stroke from the view

merged_hor_bar

Visualization as depicted in the book:

![Alt text](./Images/2_1g.png)

#### Vertical bar chart

We should can modify the bar chart to be in a vertical orientation. This can be done by switching the y and x axis and the "Row" class to the "Column" class, as well as reorient the labels.

In [27]:
# Creating a vertical bar chart
vert_bar = alt.Chart(
    table_charts,
    title = alt.Title('New client tier share', fontSize = 20)  # Adding a title with specific font size
).mark_bar().encode(
    y = alt.Y(
        'value:Q',
        axis = alt.Axis(
            title = "% OF TOTAL ACCOUNTS vs REVENUE",  # Setting a custom title for the y-axis
            titleAlign = 'left',  # Aligning the title to the left
            titleAngle = 0,  # Setting the title angle to 0 degrees
            titleAnchor = 'end',  # Anchoring the title to the end
            titleY = -10,  # Adjusting the title position
            grid = False,  # Turning off grid lines
            labelColor = "#888888",  # Setting the label color to gray
            titleColor = '#888888'  # Setting the title color to gray
        )
    ),
    x = alt.X(
        'variable:N',
        axis = alt.Axis(title = None, labels = False, ticks = False)  # Removing x-axis title, labels, and ticks
    ),
    color = alt.Color(
        'variable:N',
        legend = alt.Legend(title = 'Metric'),  # Adding a legend with a custom title
        scale = alt.Scale(range = ['#b4c6e4', '#4871b7'])  # Setting a custom color range
    ),
    column = alt.Column(
        'Tier:O',
        header = alt.Header(labelOrient = 'bottom', titleOrient = "bottom", titleAnchor = "start"), # Adjusting column header settings
        sort = ['A+'],  # Sorting columns based on 'Tier'
        title = 'TIER'  # Adding a title for the column
    )
).transform_fold(
    fold = ['% Accounts', '% Revenue'],  # Transforming the data by folding the specified columns
    as_ = ['variable', 'value']  # Renaming the folded columns to 'variable' and 'value'
).properties(
    width = 50  # Setting the width of the chart
).configure_view(stroke = None)  # Removing the stroke from the view

vert_bar


Visualization as depicted in the book:

![Alt text](./Images/2_1h.png)

It's worth noting that titles in Altair do not readily support the option of changing the colors of individual words within them. As a simple solution for the time being, we will retain the legend that effectively indicates which column corresponds to each word.   Future exercises will delve into a more complicated way to tackle this challenge.

In the code above, we've utilized the `transform_fold` method to generate the grouped bar chart because our data is structured in the 'wide form', which is the standard Excel format. However, Altair (as well as other visualization languages) is inherently designed to work with 'long form' data. The `transform_fold` function automates this conversion within the chart, enabling us to create the graph. This approach can obscure the process, making it preferable to perform the data transformation before creating the visualizations.

In [28]:
# Transforms the data to the long-form format

melted_table = pd.melt(table_charts, id_vars = ['Tier'], var_name = 'Metric', value_name = 'Value')
melted_table

Unnamed: 0,Tier,Metric,Value
0,A+,# of Accounts,19.0
1,A,# of Accounts,77.0
2,B,# of Accounts,338.0
3,C,# of Accounts,425.0
4,D,# of Accounts,24.0
5,All other,# of Accounts,205.0
6,A+,% Accounts,1.746324
7,A,% Accounts,7.077206
8,B,% Accounts,31.066176
9,C,% Accounts,39.0625


We can now use this table to remake the bar chart without the ``transform_fold`` method.

In [29]:
# Selecting specific rows from the melted table based on the 'Metric' column
selected_rows = melted_table[melted_table['Metric'].isin(['% Accounts', '% Revenue'])]

vert_bar2 = alt.Chart(
    selected_rows,
    title = alt.Title('New client tier share', fontSize = 20)  # Adding a title with specific font size
).mark_bar().encode(
    y = alt.Y(
        'Value',
        axis = alt.Axis(
            title = "% OF TOTAL ACCOUNTS vs REVENUE",  # Setting a custom title for the y-axis
            titleAlign = 'left',  # Aligning the title to the left
            titleAngle = 0,  # Setting the title angle to 0 degrees
            titleAnchor = 'end',  # Anchoring the title to the end
            titleY = -10,  # Adjusting the title position
            grid = False,  # Turning off grid lines
            labelColor = "#888888",  # Setting the label color to gray
            titleColor = '#888888'  # Setting the title color to gray
        )
    ),
    x = alt.X(
        'Metric',
        axis = alt.Axis(title = None, labels = False, ticks = False)  # Removing x-axis labels and ticks
    ),
    color = alt.Color(
        'Metric',
        scale = alt.Scale(range = ['#b4c6e4', '#4871b7'])  # Setting a custom color range
    ),
    column = alt.Column(
        'Tier',
        header = alt.Header(labelOrient='bottom', titleOrient="bottom", titleAnchor="start"),  # Adjusting column header settings
        sort = ['A+'],  # Sorting columns based on 'Tier'
        title = 'TIER'  # Adding a title for the column
    )
).properties(
    height = 200, width = 50  # Setting the height and width of the chart
).configure_view(stroke = None)  # Removing the stroke from the view

vert_bar2

#### Bar chart with lines

The next proposed graph is an extension of the previous bar chart, featuring the addition of lines to accentuate the endpoints of the columns within the same tier.

However, due to the nature of faceted charts, we encounter an error (*ValueError: Faceted charts cannot be layered. Instead, layer the charts before faceting*) when attempting to layer it. This issue arises because, in faceted charts, the x-axis structure is altered. 

Now that we've transformed our data into long-format, we can work around this problem by creating our graph without using the 'column' method, and thereby, avoiding faceting. Instead of specifying 'x' as 'Metric,' 'y' as 'Value,' 'color' as 'Metric,' and 'column' as 'Tier,' we can redefine 'x' as 'Tier,' 'y' as 'Value,' 'color' as 'Metric,' and introduce 'XOffset' for controlling the horizontal positioning of data points within a group. In essence, 'column' primarily serves to define distinct x-axis categories, while 'XOffset' is employed to manage the horizontal placement of data points within a group.

The following chart incorporates the alterations we discussed and yields a graph that closely resembles the previous one.

In [30]:
bar = alt.Chart(
    selected_rows,
    title = alt.Title('New client tier share', fontSize = 20, anchor = 'start')
).mark_bar().encode(
    x = alt.X(
        'Tier',
        axis = alt.Axis(
            title = 'TIER',  # Setting a custom title for the x-axis
            labelAngle = 0,  # Setting the label angle to 0 degrees
            titleAnchor = "start",  # Anchoring the title to the start
            domain = False,  # Hiding the x-axis domain line
            ticks = False  # Hiding the x-axis ticks
        ),
        sort = ['A+']  # Sorting x-axis based on 'Tier'
    ),
    y = alt.Y(
        'Value',
        axis = alt.Axis(
            title = "% OF TOTAL ACCOUNTS vs REVENUE",  # Setting a custom title for the y-axis
            titleAlign = 'left',  # Aligning the title to the left
            titleAngle = 0,  # Setting the title angle to 0 degrees
            titleAnchor = 'end',  # Anchoring the title to the end
            titleY = -10,  # Adjusting the title position
            grid = False,  # Turning off grid lines
            labelColor = "#888888",  # Setting the label color to gray
            titleColor =' #888888'  # Setting the title color to gray
        )
    ),
    color = alt.Color(
        'Metric',
        scale = alt.Scale(range = ['#b4c6e4', '#4871b7'])  # Setting a custom color range
    ),
    xOffset = 'Metric'  # Adjusting the x-offset
).properties(
    height = 250, width = 375  # Setting the height and width of the chart
)

bar.configure_view(stroke = None)  # Removing the stroke from the view

bar

Now, we can layer the graph and introduce the lines. It's worth noting that creating the lines in Altair is not a straightforward task and a considerable amount of documentation searching was necessary to achieve it.

In [31]:
# x, y and y2 do not accept to be defined as "condition", so repetitive code is necessary

# Create a vertical rule chart for ascending lines
rule_asc = alt.Chart(selected_rows).mark_rule(x2Offset = 10, xOffset = -10).encode(
    x = alt.X('Tier', sort = ['A+']),  # X-axis encoding for 'Tier', sorted in a specific order
    x2 = alt.X2('Tier'),  # End point of the rule line
    y = alt.Y('min(Value)'),  # Start point of the rule line
    y2 = alt.Y2('max(Value)'),  # End point of the rule line
    strokeWidth = alt.value(2),  # Set the stroke width of the rule line
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') |  # Condition for specific tiers
        (alt.datum.Tier == 'A') |   # to determine opacity settings
        (alt.datum.Tier == 'B'),
        alt.value(1), alt.value(0)  # Opacity set to 1 if condition is met, else 0
    )
)

# Create a vertical rule chart for descending lines
rule_desc = alt.Chart(selected_rows).mark_rule(x2Offset = 10, xOffset = -10
).encode(
    x = alt.X('Tier', sort = ['A+']),
    x2 = alt.X2('Tier'),
    y = alt.Y('max(Value)'),
    y2 = alt.Y2('min(Value)'),
    strokeWidth = alt.value(2), 
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') | 
        (alt.datum.Tier == 'A')  |
        (alt.datum.Tier == 'B'), 
        alt.value(0), alt.value(1)
        )
    )

# Points of % Revenue where % Revenue > % Accounts
points1 = alt.Chart(selected_rows).mark_point(filled = True, xOffset = 10, color = "black").encode(
    x = alt.X('Tier', sort = ['A+']),
    y = alt.Y('max(Value)'),
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') | 
        (alt.datum.Tier == 'A')  |
        (alt.datum.Tier == 'B'), 
        alt.value(1), alt.value(0)
        )
    )

# Points of % Revenue where % Revenue < % Accounts
points2 = alt.Chart(selected_rows).mark_point(filled = True, xOffset = 10, color = "black").encode(
    x = alt.X('Tier', sort = ['A+']),
    y = alt.Y('min(Value)'),
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') | 
        (alt.datum.Tier == 'A')  |
        (alt.datum.Tier == 'B'), 
        alt.value(0), alt.value(1)
        )
    )

# Points of % Accounts where % Revenue < % Accounts
points3 = alt.Chart(selected_rows).mark_point(filled = True, xOffset = -10, color = "black").encode(
    x = alt.X('Tier', sort = ['A+']),
    y = alt.Y('max(Value)'),
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') | 
        (alt.datum.Tier == 'A')  |
        (alt.datum.Tier == 'B'), 
        alt.value(0), alt.value(1)
        )
    )

# Points of % Revenue where % Revenue > % Accounts
points4 = alt.Chart(selected_rows).mark_point(filled = True, xOffset = -10, color = "black").encode(
    x = alt.X('Tier', sort = ['A+']),
    y = alt.Y('min(Value)'),
    opacity = alt.condition(
        (alt.datum.Tier == 'A+') | 
        (alt.datum.Tier == 'A')  |
        (alt.datum.Tier == 'B'), 
        alt.value(1), alt.value(0)
        )
    )

bar_point = bar + rule_asc + rule_desc + points1 + points2 + points3 + points4
bar_point.configure_view(stroke = None)


Visualization as depicted in the book:

![Alt text](./Images/2_1i.png)

#### Lines only

With two types of visualizations displaying the same data, the book suggests to eliminate the bars altogether. This can be done without the need to program more graphs:

In [32]:
# Configure the legend to be disabled (hidden)
# The 'opacity' configuration is set to 0, making the bars transparent
point = bar_point.configure_mark(opacity = 0).configure_view(stroke = None).configure_legend(disable = True)
point

Visualization as depicted in the book:

![Alt text](./Images/2_1j.png)

#### Slope graph
At last, we can reassemble the lines to create a slope graph.

In [33]:
# Create base chart, setting the title
base = alt.Chart(
    selected_rows, 
    title = alt.Title("New client tier share", anchor = 'start', fontWeight = 'normal', fontSize = 20)
)

# Line chart configuration
line = base.mark_line(
    point = True # The lines have points at the end
).encode(
    x = alt.X(
        'Metric', 
        axis = alt.Axis(title = None, labelAngle = 0, domain = False, ticks = False)
    ),
    y = alt.Y('Value', axis = None),
    color = alt.Color(
        'Tier', 
        scale = alt.Scale(range = ['black']), # All Tier lines are black
        legend = None
        )
).properties(
    width = 300,
    height = 350
)

# Labels to the right of the slope
# These labels are for the Accounts
labels1 = base.mark_text(
    align = 'left',
    dx = 10
).encode(
    x = alt.X('Metric'),
    y = alt.Y('Value'),
    text = alt.Text('Value:Q', format = '.0f'),
    opacity = alt.condition(alt.datum.Metric == '% Accounts', alt.value(0), alt.value(0.7)) 
)

# Labels to the left of the slope
# These labels are for the Revenue
labels2 = base.mark_text(
    align ='left', 
    dx = -20
).encode(
    x = alt.X('Metric'),
    y = alt.Y('Value'),
    text = alt.Text('Value:Q', format='.0f'),
    opacity = alt.condition(alt.datum.Metric == '% Accounts', alt.value(0.7), alt.value(0)) 
)

# Labels for the Tiers
tier_labels = base.mark_text(
    align = 'left',
    dx = 30,
    fontWeight = 'bold'
).encode(
    x = alt.X('Metric'),
    y = alt.Y('Value'),
    text = 'Tier',
    opacity = alt.condition(alt.datum.Metric == '% Accounts', alt.value(0), alt.value(1)) # Show in only one side
)

# Tier title
tier_title = alt.Chart(
    {"values": [{"text":  ['TIER']}]}
).mark_text(
            align = "left", 
            dx = 105, 
            dy = -120,
            fontWeight = 'bold'
).encode(
    text = "text:N")

slope = line + labels1 + labels2 + tier_labels + tier_title
slope.configure_view(stroke = None)

Notice how there are two numbers overlapping, the percentage of accounts of the A+ and D tiers. Since they are the same number when rounded, we can just eliminate one of the values display.

In [34]:
# Eliminate Tier A+ from display
label_condition = (alt.datum.Metric == '% Accounts') & (alt.datum.Tier != 'A+')


labels2 = base.mark_text(
    align ='left', 
    dx = -20
).encode(
    x = alt.X('Metric'),
    y = alt.Y('Value'),
    text = alt.Text('Value:Q', format='.0f'),
    opacity = alt.condition(label_condition, alt.value(0.7), alt.value(0)) 
)

slope = line + labels1 + labels2 + tier_labels + tier_title
slope.configure_view(stroke = None)

Visualization as depicted in the book:

![Alt text](./Images/2_1k.png)

#### Interactivity

In this exercise, the selected graph for interactivity is the simple vertical bar chart, without the lines. 

The chosen interactive features include a simple tooltip, revealing the precise values of each column upon hovering. Additionally, it shows the legend to provide further clarity about the corresponding data categories. 

Finally, the columns are designed to highlight dynamically when the viewer hovers over them. Because of this feature, the color palette was changed, since the monochromatic version made the highlighted column and the not highlighted neighbor too similar.

In [35]:
# Selection for interactive points on hover
hover = alt.selection_point(on='mouseover', nearest=True, empty=False)

# Bar chart configuration with interactivity
bar_interactive = alt.Chart(
    selected_rows, 
    title = alt.Title('New client tier share', fontSize = 20, anchor = 'start')
).mark_bar().encode(
    x = alt.X(
        'Tier',
        axis = alt.Axis(title = 'TIER', labelAngle = 0, titleAnchor = "start", domain = False, ticks = False),
        sort = ['A+']
    ),
    y = alt.Y('Value', axis = alt.Axis(
        title = "% OF TOTAL ACCOUNTS vs REVENUE",
        titleAlign = 'left',
        titleAngle = 0,
        titleAnchor = 'end',
        titleY = -10,
        grid = False,
        labelColor = "#888888",
        titleColor = '#888888'
    )),
    color = alt.Color('Metric', scale = alt.Scale(range = ['#0a2f73', '#096b2b'])),
    xOffset = 'Metric',
    opacity = alt.condition(hover, alt.value(1), alt.value(0.5)),  # Set opacity based on hover
    tooltip = ['Value:Q', 'Metric']  # Show tooltip with specified fields
).properties(
    height = 250, width = 375
).add_params(hover)  # Add the hover selection to the chart

# Configure view settings for the interactive bar chart
bar_interactive.configure_view(stroke=None)


### Exercise 2.4 - Practice in your tool<a name="2.4"></a>

This exercise proposes to display the same data in six different formats, hand-drawn by the author in the theoretical exercise 2.3. The purpose of the activity is to practice in our own tool, and while C. Nussbaumer uses Excel, we will proceed with Altair.

![Alt text](./Images/2_3b.png)

#### Loading the data

In [36]:
# Loading considering the NaN caused by Excel formatting
table = pd.read_excel(r"Data\2.4 EXERCISE.xlsx", usecols = [1, 2, 3], header = 4)
table

Unnamed: 0,DATE,CAPACITY,DEMAND
0,2019-04,29263,46193
1,2019-05,28037,49131
2,2019-06,21596,50124
3,2019-07,25895,48850
4,2019-08,25813,47602
5,2019-09,22427,43697
6,2019-10,23605,41058
7,2019-11,24263,37364
8,2019-12,24243,34364
9,2020-01,25533,34149


In the graphs for this exercise, we require the inclusion of the "unmet demand" column, which is currently absent from the dataset. To obtain this value, we can calculate the difference between demand and capacity for each date.

In [37]:
# Calculate Unmet Demand
table['UNMET DEMAND'] = table['DEMAND'] - table['CAPACITY']

# Show only the first five lines
table.head()

Unnamed: 0,DATE,CAPACITY,DEMAND,UNMET DEMAND
0,2019-04,29263,46193,16930
1,2019-05,28037,49131,21094
2,2019-06,21596,50124,28528
3,2019-07,25895,48850,22955
4,2019-08,25813,47602,21789


Now we transform the data from the wide-format used in Excel to the long-format used in Altair.

In [38]:
# Transforming data into long-format

melted_table = pd.melt(table, id_vars = ['DATE'], var_name = 'Metric', value_name = 'Value')
melted_table

Unnamed: 0,DATE,Metric,Value
0,2019-04,CAPACITY,29263
1,2019-05,CAPACITY,28037
2,2019-06,CAPACITY,21596
3,2019-07,CAPACITY,25895
4,2019-08,CAPACITY,25813
5,2019-09,CAPACITY,22427
6,2019-10,CAPACITY,23605
7,2019-11,CAPACITY,24263
8,2019-12,CAPACITY,24243
9,2020-01,CAPACITY,25533


To simplify the data transformation process in the graphs, we will deviate from the "yyyy-mm" format for the date. Instead, we will create two separate columns, one for the year and another for the abbreviated name of the month. This adjustment will streamline our visualization efforts by reducing the need for extensive data transformations within the graphs themselves.

In [39]:
# Transform the column into datetime format
melted_table['DATE'] = pd.to_datetime(melted_table['DATE'])

# Extracting year and month
melted_table['year'] = melted_table['DATE'].dt.year
melted_table['month'] = melted_table['DATE'].apply(lambda x: x.strftime('%b'))

In [40]:
# The DATE column is no longer useful

melted_table.drop('DATE', axis = 1, inplace = True)
melted_table.head()

Unnamed: 0,Metric,Value,year,month
0,CAPACITY,29263,2019,Apr
1,CAPACITY,28037,2019,May
2,CAPACITY,21596,2019,Jun
3,CAPACITY,25895,2019,Jul
4,CAPACITY,25813,2019,Aug


To further avoid data manipulation within the chart code, we will create auxiliary tables.

In [41]:
# Making new sets of data

# Just 2019
table_2019  = melted_table[melted_table['year'].isin([2019])]
# Just Demand from 2019
demand_2019 = table_2019[table_2019['Metric'].isin(['DEMAND'])]
# Just Capacity from 2019
capacity_2019 = table_2019[table_2019['Metric'].isin(['CAPACITY'])]
# Just Unmet Demand from 2019
unmet_2019 = table_2019[table_2019['Metric'].isin(['UNMET DEMAND'])]
# Demand and Capacity from 2019
bar_table = table_2019[table_2019['Metric'].isin(['CAPACITY', 'DEMAND'])]
# Unmet Demand and Capacity from 2019
stacked_table = table_2019[table_2019["Metric"].isin(["CAPACITY", "UNMET DEMAND"])]



#### Bar chart

While the author deliberately filled the Capacity columns while leaving Demand only outlined in the attempt to visually distinguish between what can be fulfilled (Capacity) and the unmet portion of the requirement (Unmet Demand), `Altair` is not easily compatible with that choice. 

The variable which dictates if the mark will be filled does not accept a condition as its value. Since the author itself admits the shortcomings of this approach (*"I find the outline plus the white space between the bars visually jarring"*), we chose to differentiate the data by color, as it is traditional.

In [42]:
# Unfilled version
alt.Chart(
    bar_table,
    title = alt.Title(
        "Demand vs capacity over time", anchor = "start", offset = 20, fontSize = 16 # Set customized title
    ),
).mark_bar(filled = False).encode( # Filled = False makes the bars unfilled
    y = alt.Y(
        "Value",
        axis = alt.Axis(
            grid = False, # Removes grid
            titleAnchor = "end",
            labelColor = "#888888", # Changes the label color to gray
            titleColor = "#888888", # Changes the title color to gray
            titleFontWeight = "normal"
        ),
        scale = alt.Scale(domain = [0, 60000]), # y-axis goes from 0 to 60000
        title = "NUMBER OF PROJECT HOURS"
    ),
    x = alt.X(
        "month",
        sort = None,
        axis = alt.Axis(
            labelAngle = 0, # Makes label horizontal
            titleAnchor = "start",
            labelColor = "#888888", # Changes label and title color to gray
            titleColor = "#888888",
            titleFontWeight = "normal",
            ticks = False # Removes ticks from axis
        ),
        title = "2019"
    ),
    color = alt.Color( # Sets colors based on Metric (Demand and Capacity)
        "Metric", scale = alt.Scale(range = ["#b4c6e4", "#4871b7"]), sort = "descending"
    ), 
    xOffset = alt.XOffset("Metric", sort = "descending") # Sets offset on the x-axis 
).configure_view(
    stroke = None
)  # Remove the chart border


In [43]:
# Filled version
alt.Chart(
    bar_table,
    title = alt.Title(
        "Demand vs capacity over time", anchor = "start", offset = 20, fontSize = 16 # Set customized title
    ),
).mark_bar().encode( # Filled = True is default
    y = alt.Y(
        "Value",
        axis = alt.Axis(
            grid = False, # Removes grid
            titleAnchor = "end",
            labelColor = "#888888", # Changes the label color to gray
            titleColor = "#888888", # Changes the title color to gray
            titleFontWeight = "normal"
        ),
        scale = alt.Scale(domain = [0, 60000]), # y-axis goes from 0 to 60000
        title = "NUMBER OF PROJECT HOURS"
    ),
    x = alt.X(
        "month",
        sort = None,
        axis = alt.Axis(
            labelAngle = 0, # Makes label horizontal
            titleAnchor = "start",
            labelColor = "#888888", # Changes label and title color to gray
            titleColor = "#888888",
            titleFontWeight = "normal",
            ticks = False # Removes ticks from axis
        ),
        title = "2019"
    ),
    color = alt.Color( # Sets colors based on Metric (Demand and Capacity)
        "Metric", scale = alt.Scale(range = ["#b4c6e4", "#4871b7"]), sort = "descending"
    ), 
    xOffset = alt.XOffset("Metric", sort = "descending") # Sets offset on the x-axis 
).configure_view(
    stroke = None
)  # Remove the chart border


Visualization as depicted in the book:

![Alt text](./Images/2_4a.png)

#### Line graph

Cleaner than the bar chart, the next step was to convey the data using the line graph, with the labeling beside each line, along with the final value of the year. This helps the viewer to visualize the difference between the capacity and the demand.

In [44]:
line = (
    alt.Chart(
        bar_table,
        title = alt.Title(
            "Demand vs capacity over time",
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start",
            offset = 10 # Offsets the title in the y-axis
        ),
    )
    .mark_line()  # Using a line mark for the chart
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleAnchor = "end",
                labelColor = "#888888",
                titleColor = "#888888",
                titleFontWeight = "normal"
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "NUMBER OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,  # Disabling sorting for better time representation
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                labelColor = "#888888", # Set colors to gray
                titleColor = "#888888",
                titleFontWeight = "normal",
                ticks = False
            ),
            title = "2019"
        ),
        color = alt.Color(
            "Metric",
            scale = alt.Scale(range = ["#1f77b4", "#1f77b4"]), 
            legend = None
        ),
        strokeWidth = alt.condition(
            "datum.Metric == 'CAPACITY'", alt.value(3), alt.value(1)
        )  # Adjusting line thickness based on the metric
    )
    .properties(width = 350, height = 250) # Set size of the graph
)

# Adding labels
label = (
    alt.Chart(bar_table)
    .mark_text(align = "left", dx = 3)
    .encode(
        x = alt.X("month", sort = None, aggregate = "max"),
        y = alt.Y("Value", aggregate = {"argmax": "month"}),
        text = alt.Text("Metric"), # The text itself is the Metric
        color = alt.Color("Metric", scale = alt.Scale(range = ["#1f77b4", "#1f77b4"]))
    )
)

# Combining the line chart and labels
line + label


As it is possible to notice, defining the label position as the maximum argument of the y-axis did not yield the intended result. This is because Altair is considering the values in an alphabetical order (making Sept the last month), even when setting `sort = None` in the x-axis. 

Since documentation fixing this issue was not found, the next approach was adding the label manually. This also assist the process of adding the value next to the metric.

In [45]:
# Demand label
label1 = alt.Chart({"values": 
                    [{"text":  ['34K DEMAND']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = 160, dy = -15, 
                                color = '#1f77b4' # Color it blue
                                ).encode(text = "text:N")

# Capacity label
label2 = alt.Chart({"values": 
                    [{"text":  ['24K CAPACITY']}]
                    }
                    ).mark_text(size = 10, 
                                align = "left", 
                                dx = 160, dy = 25, 
                                color = '#1f77b4', # Color it blue
                                fontWeight = 'bold'
                                ).encode(text = "text:N")

line_final = line + label1 +  label2
line_final.configure_view(stroke = None)

Visualization as depicted in the book:

![Alt text](./Images/2_4b.png)

#### Overlapping bars

The author now explores overlapping bars, wherein two bar graphs are positioned on top of each other, sharing the same axis. The Capacity data is displayed with transparency to prevent any potential confusion that might arise with a stacked bar chart. 

In this particular graph, our choice was to emulate the column labeling using a title with different colors, despite Altair not providing a straightforward method for such customization. Unlike previous examples where the default legend effectively distinguished colors, the current data distinction — "opaque" or "transparent" — is better conveyed by utilizing normal or bold text in the title instead of relying on a legend with colors.

In [46]:
# Demand bar, with bigger spacing between them, unfilled
demand = (
    alt.Chart(
        demand_2019,
        width = alt.Step(40), # Defines the width of the bars (including distance between them)
        title = alt.Title(
            "Demand vs capacity over time",
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start"
        )
    )
    .mark_bar(filled = False) # Unfilled
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleAnchor = "end",
                labelColor = "#888888", # Gray label and title
                titleColor = "#888888"
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "NUMBER OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0, # Horizontal label
                titleAnchor = "start",
                labelColor = "#888888",
                titleColor = "#888888",
                ticks = False
            ),
            title = "2019"
        )
    )
)


# Capacity bar, bigger size and more transparency
capacity = (
    alt.Chart(capacity_2019)
    .mark_bar(size = 30) # Makes the bar thicker but keeps the distance the same
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleAnchor = "end",
                labelColor = "#888888",
                titleColor = "#888888",
                titleFontWeight = "normal"
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "NUMBER OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                labelColor = "#888888",
                titleColor = "#888888",
                titleFontWeight = "normal",
                ticks = False
            ),
            title = "2019"
        ),
        opacity = alt.value(0.5) # Makes the bar transparent
    )
)

# Labeling for subtitle
label1 = (
    alt.Chart({"values": [{"text": ["DEMAND  |"]}]})
    .mark_text(size = 10, align = "left", dx = -235, dy = -120, color = "#1f77b4")
    .encode(text = "text:N")
)

label2 = (
    alt.Chart({"values": [{"text": ["CAPACITY"]}]})
    .mark_text(size = 10, align = "left", dx = -177, dy = -120, color = "#1f77b4", fontWeight = 800)
    .encode(text = "text:N")
)

overlap = capacity + demand + label1 + label2

# Sets space (padding) between bands
overlap.configure_scale(bandPaddingInner = 0.5).configure_view(stroke=None).properties(
    height = 200
)

Visualization as depicted in the book:

![Alt text](./Images/2_4c.png)

#### Stacked bars

In the stacked bars configuration, the Demand bar chart has been replaced with Unmet Demand (i.e., Demand - Capacity). This modification allows the stacking to represent the cumulative Demand value. Additionally, a color adjustment has been made, with Unmet Demand now rendered in a darker shade to emphasize its significance as a more meaningful metric.

In [47]:
# Stacked bar
bars = (
    alt.Chart(
        stacked_table,
        title = alt.Title(
            "Demand vs capacity over time",
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start",
            offset = 10 # Offsets title in the y-axis
        )
    )
    .mark_bar(size = 25)
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleAnchor = "end",
                labelColor = "#888888", # Sets title and label to gray
                titleColor = "#888888",
                titleFontWeight = "normal"
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "NUMBER OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                labelColor = "#888888",
                titleColor = "#888888",
                titleFontWeight = "normal",
                ticks = False
            ),
            title = "2019"
        ),
        color = alt.Color("Metric", scale = alt.Scale(range = ["#d9dad9", "#4871b7"])),
        order = alt.Order("Metric", sort = "ascending") # Unmet demand on top
    )
)

# Border detail, makes the graph more visible
border = (
    alt.Chart(stacked_table)
    .mark_bar(size = 25, filled = False) # Makes an unfilled bar
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleAnchor = "start",
                labelColor = "#888888",
                titleColor = "#888888"
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "NUMBER OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "end",
                labelColor = "#888888",
                titleColor = "#888888",
                ticks = False
            ),
            title = "2019"
        ),
        order = alt.Order("Metric", sort = "ascending"),
    )
)

stacked = bars + border
stacked.configure_view(stroke = None).properties(width = 300, height = 200)


#### Dot plot

For the next graph, a dot plot, the author reveals the challenges she had in Excel. To create the circles, she employed data markers from two line graphs, concealing the lines themselves. The region connecting the dots was achieved by employing a stacked bar of Unmet Demand, sitting on top of an transparent Capacity series.

This serves as a noteworthy example of the limitations of Excel when dealing with charts not inherently programmed into the tool. While certain graphs may be more straightforward in Excel, unconventional visualizations might demand intricate and obscure workarounds, while in Altair, where the approach to data visualization is more flexible, documentation for this graph was readily available.

In [48]:
# Unfilled version
dots1 = (
    alt.Chart(
        bar_table,
        title = alt.Title(
            "Demand vs capacity over time", # Set title
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start",
            offset = 10
        )
    )
    .mark_circle(size = 600, opacity = 1) # Maximum opacity
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleFontWeight = "normal",
                titleColor = "#888888",
                titleAnchor = "start",
                ticks = False, 
                labels = False, # Removes labels from axis
                domain = False # Removes line from axis
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "# OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                titleFontWeight = "normal",
                labelColor = "#888888",
                labelPadding = 10, # Makes label more distant to the axis
                titleColor = "#888888",
                ticks = False
            ),
            title = "2019"
        ),
        color = alt.Color("Metric", scale = alt.Scale(range = ["#4871b7"]), legend = None) 
    )
    .properties(width = 400, height = 250)
    .transform_filter(alt.datum.Metric == "CAPACITY") # Alternative way to filter, similar to using auxiliary table
)

dots2 = ( # Same graph but to Demand
    alt.Chart(
        bar_table,
        title = alt.Title(
            "Demand vs capacity over time",
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start",
            offset = 10
        )
    )
    .mark_circle(size = 600, opacity = 1, filled = False) # Unfilled
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleFontWeight = "normal",
                titleColor = "#888888",
                titleAnchor = "start",
                ticks = False,
                labels = False,
                domain = False
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "# OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                titleFontWeight = "normal",
                labelColor = "#888888",
                labelPadding = 10, # Makes label more distant to axis
                titleColor = "#888888",
                ticks = False
            ),
            title = "2019"
        ),
        color = alt.Color("Metric", scale = alt.Scale(range = ["#b4c6e4"]), legend = None) # Darker blue
    )
    .properties(width = 400, height = 250)
    .transform_filter(alt.datum.Metric == "DEMAND")
)

# Lines between dots
line = (
    alt.Chart(bar_table)
    .mark_line(strokeWidth = 25, opacity = 0.25) # Sets width and transparency
    .encode(x = alt.X("month", sort = None), y = "Value", detail = "month") # detail = month makes a line per month
)                                                                           # instead of a single one

# Text inside the dots
text = (
    alt.Chart(bar_table)
    .mark_text()
    .encode(
        x = alt.X("month", sort = None),
        y = "Value",
        text = alt.Text("Value:Q", format = ".2s"), # Formats 10000 as 10k
        color = alt.condition(
            alt.datum.Metric == "DEMAND", alt.value("black"), alt.value("white") # Set color depending on metric
        )
    )
)

# Set legend for Metric
label1 = (
    alt.Chart({"values": [{"text": ["DEMAND"]}]})
    .mark_text(size = 11, align = "left", dx = 200, dy = -17, color = "#4871b7")
    .encode(text = "text:N")
)

label2 = (
    alt.Chart({"values": [{"text": ["CAPACITY"]}]})
    .mark_text(size = 11, align = "left", dx = 200, dy = 25, color = "#4871b7", fontWeight = "bold")
    .encode(text = "text:N")
)

dot_plot = line + dots1 + dots2 + text + label1 + label2
dot_plot.configure_view(stroke = None)


In [49]:
# Dot plot, filled-only version
dots = (
    alt.Chart(
        bar_table,
        title = alt.Title(
            "Demand vs capacity over time",
            fontSize = 18,
            fontWeight = "normal",
            anchor = "start",
            offset = 10
        ),
    )
    .mark_circle(size = 600, opacity = 1) # Max opacity, filled by default
    .encode(
        y = alt.Y(
            "Value",
            axis = alt.Axis(
                grid = False,
                titleFontWeight = "normal",
                titleColor = "#888888",
                titleAnchor = "start",
                ticks = False,
                labels = False,
                domain = False
            ),
            scale = alt.Scale(domain = [0, 60000]),
            title = "# OF PROJECT HOURS"
        ),
        x = alt.X(
            "month",
            sort = None,
            axis = alt.Axis(
                labelAngle = 0,
                titleAnchor = "start",
                titleFontWeight = "normal",
                labelColor = "#888888",
                labelPadding = 10,
                titleColor = "#888888",
                ticks = False
            ),
            title = "2019"
        ),
        color = alt.Color(
            "Metric", scale = alt.Scale(range = ["#4871b7", "#b4c6e4"]), legend = None
        )
    )
    .properties(width = 400, height = 250)
)

dot_plot = line + dots + text + label1 + label2
dot_plot.configure_view(stroke = None)

Visualization as depicted in the book:

![Alt text](./Images/2_4e.png)

#### Graph the difference

For the final visualization, it was chosen a simple line plot representing the unmet demand. Although minimalist and clean, this choice occults data from the actual value of demand and capacity.

In [50]:
alt.Chart(
    unmet_2019,
    title = alt.Title(
        "Unmet demand over time",
        fontSize = 18,
        fontWeight = "normal",
        anchor = "start",
        offset = 10
    ),
).mark_line().encode(
    y = alt.Y(
        "Value",
        axis = alt.Axis(
            grid = False,
            titleAnchor = "end",  
            labelColor = "#888888",
            titleColor = "#888888",
            titleFontWeight = "normal"
        ),
        title = "NUMBER OF PROJECT HOURS"
    ),
    x = alt.X(
        "month",
        sort = None,
        axis = alt.Axis(
            labelAngle = 0,
            titleAnchor = "start",
            labelColor = "#888888",
            titleColor = "#888888",
            titleFontWeight = "normal",
            ticks = False
        ),
        title="2019",
    ),
    strokeWidth = alt.value(3) # Set thickness of the line
).properties(
    width = 375, height = 250
).configure_view(
    stroke = None
)

Visualization as depicted in the book:

![Alt text](./Images/2_4f.png)

#### Cell to create the .html

In [51]:
# Saves the notebook before making the .html file
keyboard.press_and_release('ctrl+s')

This code was created by Søren Fuglede Jørgensen and can be found [here](https://github.com/jupyter/nbconvert/issues/699#issuecomment-372441219).

In [52]:
with open('index.ipynb') as nb_file:
    nb_contents = nb_file.read()

# Convert using the ordinary exporter
notebook = nbformat.reads(nb_contents, as_version=4)

# HTML Export
html_exporter = nbconvert.HTMLExporter()
body, res = html_exporter.from_notebook_node(notebook)

# Create a dict mapping all image attachments to their base64 representations
images = {}
for cell in notebook['cells']:
    if 'attachments' in cell:
        attachments = cell['attachments']
        for filename, attachment in attachments.items():
            for mime, base64 in attachment.items():
                images[f'attachment:{filename}'] = f'data:{mime};base64,{base64}'

# Fix up the HTML and write it to disk
for src, base64 in images.items():
    body = body.replace(f'src="{src}"', f'src="{base64}"')

# Write HTML to file
with open('index.html', 'w') as html_output_file:
    html_output_file.write(body)