# Storytelling with Data! in Altair

by Maisa de Oliveira Fraiz

## Introduction

This project aims to replicate the examples from Cole Nussbaumer's book, "Storytelling with Data - Let's Practice!", using `Python Altair`. Our primary objective is to document the reasoning behind the modifications proposed by the author, while also highlighting the challenges that arise when transitioning from the book's Excel-based approach to programming in a different software environment.

`Altair` was selected for this project due to its declarative syntax, interactivity, grammar of graphics, and compatibility with `Streamlit` and other web formatting tools, while within the user-friendly Python environment. Anticipated challenges include the comparatively smaller documentation and development community of Altair compared to more established libraries like `Matplotlib`, `Seaborn`, or `Plotly`. Furthermore, tasks that might appear straightforward in Excel may require multiple iterations to translate effectively into the language.


## Imports

In [36]:
import pandas as pd
import numpy as np
import altair as alt

## Chapter 3 - Identify and Eliminate Cluster

*"This lesson is simple but the impact is huge: get rid of the stuff that doesn’t need
to be there"* - Cole Nussbaumer

### Exercise 3.2 - how can we tie words to the graph?

The data for this exercise can be found here: https://www.storytellingwithdata.com/letspractice/downloads

In [37]:
# Loading considering the NaN caused by Excel formatting
table = pd.read_excel(r"..\..\Data\3.2 EXERCISE.xlsx", usecols = [1, 2, 3], header = 4, skipfooter = 6)
table

Unnamed: 0,2019,Rate,# exits
0,JAN,0.004,120
1,FEB,0.001,30
2,MAR,0.0015,45
3,APR,0.008,240
4,MAY,0.003,90
5,JUN,0.0014,42
6,JUL,0.0044,132
7,AUG,0.005,150
8,SEP,0.0022,66
9,OCT,0.0015,45


The column name for 2019 is currently an integer, which might pose issues in the future. To avoid potential complications, we will modify the column name to a string.

In [38]:
#alt.Chart(table).mark_bar().encode(
#    x = alt.X('2019'), 
#    y = alt.Y('Rate')
#    )
    

In [39]:
table.rename(columns = {2019:'Date'}, inplace = True)

### Gestalt Principles

The main focus of this exercise is to apply the Gestalt Principles of Visual Perception to declutter graphs. For principles will be demonstrated, and each of them will be clarified through the visualization employing it.

In [40]:
# Graph with cluttered text

bar = alt.Chart(table, title = alt.Title(
    "2019 monthly voluntary attrition rate",
    fontSize = 15,
    anchor = 'start',
    offset = 10,
    fontWeight = "normal"
    )).mark_bar(size = 20, color = "#b0b0b0").encode(
    x = alt.X('Date', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 12, 
                              labelColor = "#888888", titleColor = '#888888', 
                              ticks = False, titleAnchor = 'start', titleFontWeight = 'normal'), 
              title = "2019"
              ), 
    y = alt.Y('Rate', 
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%", tickCount = 10), 
              scale = alt.Scale(domain = [0, 0.01]), 
              title = "ATTRITION RATE"
              ),
    ).properties(
    width = 300,
    height = 200
)

text = alt.Chart(
    {"values": [{"text":  ['Highlights:', 
         ' ',
         'In April there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ' ',
         'Attrition rates tend to be', 'higher in the Summer',  'months when it is', 
         'common for associates',
         'to leave to go back to', 
         'school.',
         ' ',
         'Attrition is typically low in', 'November and December', 'due to the holidays.']}]}
).mark_text(size = 11, align = "left", dy = -20, dx = -10).encode(
    text = "text:N"
)


# Using the | symbols makes it so Altair unites the bar and the text next to each other horizontally

final = bar | text

final.configure_view(stroke = None)

![Alt text](\Images\3_2a.png)

#### Proximity

The "Proximity Principle" says that we tend to associate objects close to each other as being part of a single group. To apply this is our graph, we bring the texts near the data they represent.

In [41]:
# The text now needs to be broken into parts

text_april = alt.Chart(
    {"values": [{"text":  [
         'In April there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ]}]}
).mark_text(size = 11, align = "left", dx = -145, dy = -105).encode(
    text = "text:N"
)

text_summer = alt.Chart(
    {"values": [{"text":  [
        'Attrition rates tend to be', 'higher in the Summer',  'months when it is', 
         'common for associates to',
         'leave to go back to', 
         'school.'
         ]}]}
).mark_text(size = 11, align = "left", dx = -10, dy = -65).encode(
    text = "text:N"
)

text_nov_dec = alt.Chart(
    {"values": [{"text":  [
        'Attrition is', 'typically low in', 'November &', 'December due', 'to the holidays.'
         ]}]}
).mark_text(size = 11, align = "right", dx = 150, dy = 5).encode(
    text = "text:N"
)

# Now we sum the graphs, so that the texts lie on top of the bar, instead of next to it

final = bar + text_april + text_summer + text_nov_dec

final.configure_view(stroke = None)

![Alt text](\Images\3_2b.png)

#### Proximity with emphasis

We can enhance the visual impact by emphasizing the bars and keywords.

Given that Altair does not support bold text within regular content, a strategy is to introduce blank spaces in the text and create a distinct object for the bold keywords.

In [42]:
bar_highlight = alt.Chart(table, title = alt.Title(
    "2019 monthly voluntary attrition rate",
    fontSize = 15,
    anchor = 'start',
    offset = 10,
    fontWeight = "normal"
    )).mark_bar(size = 20).encode(
    x = alt.X('Date', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 12, 
                              labelColor = "#888888", titleColor = '#888888', 
                              ticks = False, titleAnchor = 'start', titleFontWeight = 'normal'), 
              title = "2019"
              ), 
    y = alt.Y('Rate', 
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%", tickCount = 10), 
              scale = alt.Scale(domain = [0, 0.01]), 
              title = "ATTRITION RATE"
              ),
    color = alt.Color('Date',
                      sort = None,
                      scale = alt.Scale(range = ["#b0b0b0", "#b0b0b0", "#b0b0b0", "#666666", 
                                                 "#b0b0b0", "#b0b0b0", "#666666", "#666666", 
                                                 "#b0b0b0", "#b0b0b0", "#666666", "#666666"]),
                      legend = None
                      )
    ).properties(
    width = 300,
    height = 200
)

text_april_blank = alt.Chart(
    {"values": [{"text":  [
         'In            there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ]}]}
).mark_text(size = 11, align = "left", dx = -145, dy = -105).encode(
    text = "text:N"
)

text_summer_blank = alt.Chart(
    {"values": [{"text":  [
        'Attrition rates tend to be', 'higher in the',  'months when it is', 
         'common for associates to',
         'leave to go back to', 
         'school.'
         ]}]}
).mark_text(size = 11, align = "left", dx = -10, dy = -65).encode(
    text = "text:N"
)

text_nov_dec_blank = alt.Chart(
    {"values": [{"text":  [
        'Attrition is', 'typically low in', '&', 'due', 'to the holidays.'
         ]}]}
).mark_text(size = 11, align = "right", dx = 150, dy = 5).encode(
    text = "text:N"
)

text_april_bold = alt.Chart(
    {"values": [{"text":  [
        'April'
         ]}]}
).mark_text(size = 11, align = "left", dx = -133, dy = -105, fontWeight = 800).encode(
    text = "text:N"
)

text_summer_bold = alt.Chart(
    {"values": [{"text":  [
        'Summer'
         ]}]}
).mark_text(size = 11, align = "left",  dx = 54, dy = -52, fontWeight = 800).encode(
    text = "text:N"
)

text_nov_bold = alt.Chart(
    {"values": [{"text":  [
        'November'
         ]}]}
).mark_text(size = 11, align = "left", dx = 80, dy = 31, fontWeight = 800).encode(
    text = "text:N"
)

text_dec_bold = alt.Chart(
    {"values": [{"text":  [
        'December'
         ]}]}
).mark_text(size = 11, align = "left", dx = 68, dy = 44, fontWeight = 800).encode(
    text = "text:N"
)

final = (bar_highlight + text_april_blank + text_april_bold + 
         text_summer_blank + text_summer_bold + 
         text_nov_dec_blank + text_nov_bold + text_dec_bold)

final.configure_view(stroke = None)

![Alt text](\Images\3_2c.png)

#### Similarity

The "Similarity Principle" pertains to our tendency to perceive objects as part of the same group when they share similar color, shape, or size. For this example, this means coloring the columns in the same shade as the chosen keywords.

In [43]:
bar_highlight_color = alt.Chart(table, title = alt.Title(
    "2019 monthly voluntary attrition rate",
    fontSize = 15,
    anchor = 'start',
    offset = 10,
    fontWeight = "normal"
    )).mark_bar(size = 20).encode(
    x = alt.X('Date', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 12, 
                              labelColor = "#888888", titleColor = '#888888', 
                              ticks = False, titleAnchor = 'start', titleFontWeight = 'normal'), 
              title = "2019"
              ), 
    y = alt.Y('Rate', 
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%", tickCount = 10), 
              scale = alt.Scale(domain = [0, 0.01]), 
              title = "ATTRITION RATE"
              ),
    color = alt.Color('Date',
                      sort = None,
                      scale = alt.Scale(range = ["#b0b0b0", "#b0b0b0", "#b0b0b0", "#ed1e24", 
                                                 "#b0b0b0", "#b0b0b0", "#ec7c30", "#ec7c30", 
                                                 "#b0b0b0", "#b0b0b0", "#5d9bd1", "#5d9bd1"]),
                      legend = None
                      )
    ).properties(
    width = 300,
    height = 200
)

text_april_blank2 = alt.Chart(
    {"values": [{"text":  [
         'Highlights:',
         ' ',
         'In            there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ]}]}
).mark_text(size = 11, align = "left", dy = -25, dx = -10).encode(
    text = "text:N"
)

text_summer_blank2 = alt.Chart(
    {"values": [{"text":  [
        'Attrition rates tend to be', 'higher in the',  'months when it is', 
         'common for associates to',
         'leave to go back to', 
         'school.'
         ]}]}
).mark_text(size = 11, align = "left", dx = -10, dy = 65).encode(
    text = "text:N"
)

text_nov_dec_blank2 = alt.Chart(
    {"values": [{"text":  [
        'Attrition is typically low in',' ', 'due to the holidays.'
         ]}]}
).mark_text(size = 11, align = "left", dx = -10, dy = 155).encode(
    text = "text:N"
)

text_april_color = alt.Chart(
    {"values": [{"text":  [
        'April'
         ]}]}
).mark_text(size = 11, align = "left", dx = 3, dy = 1, fontWeight = 800, color = '#ed1e24').encode(
    text = "text:N"
)

text_summer_color = alt.Chart(
    {"values": [{"text":  [
        'Summer'
         ]}]}
).mark_text(size = 11, align = "left",  dx = 55, dy = 78, fontWeight = 800, color = '#ec7c30').encode(
    text = "text:N"
)

text_nov_dec_color = alt.Chart(
    {"values": [{"text":  [
        'November & December'
         ]}]}
).mark_text(size = 11, align = "left", dx = -10, dy = 168, fontWeight = 800, color = '#5d9bd1').encode(
    text = "text:N"
)

# While we could have used '&' to arrange the texts vertically, 
# employing '+' provides greater flexibility in determining the layout of the text.

final = bar_highlight_color | (text_april_blank2 + text_april_color 
                                 + text_summer_blank2 + text_summer_color 
                                 + text_nov_dec_blank2 + text_nov_dec_color)

final.configure_view(stroke = None)



![Alt text](\Images\3_2d.png)

#### Enclosure

The "Enclosure Principle" says simply that, when objects are enclosed together, we perceive them as belonging to the same group.

Attempting to combine charts using the expression `(bar | text) + rect_nov_dec + rect_summer + rect_april` results in an error: "Concatenated charts cannot be layered. Instead, layer the charts before concatenating." The most straightforward way to solve this is to add the text to the bar using `bar + text`, but doing so means assigning another position (dx, dy) to the text.

In [44]:

text_enclosure = alt.Chart(
    {"values": [{"text":  ['Highlights:', 
         ' ',
         'In April there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ' ',
         'Attrition rates tend to be', 'higher in the Summer',  'months when it is', 
         'common for associates',
         'to leave to go back to', 
         'school.',
         ' ',
         'Attrition is typically low in', 'November and December', 'due to the holidays.']}]}
).mark_text(size = 11, align = "left", dx = 160, dy = -113).encode(
    text = "text:N"
)

rect_nov_dec = alt.Chart(pd.DataFrame({'y': [0], 'y2':[0.0019], 'x': [10], 'x2': [8.4] })).mark_rect(opacity = 0.2).encode(
   y = 'y', y2 = 'y2', x = alt.X('x', axis = None), x2 = 'x2'
)

rect_summer = alt.Chart(pd.DataFrame({'y': [0.0023], 'y2':[0.0063], 'x': [10], 'x2': [5.1] })).mark_rect(opacity = 0.2).encode(
   y = 'y', y2 = 'y2', x = alt.X('x', axis = None), x2 = 'x2'
)

rect_april = alt.Chart(pd.DataFrame({'y': [0.0068], 'y2':[0.0095], 'x': [10], 'x2': [2.6] })).mark_rect(opacity = 0.2).encode(
   y = 'y', y2 = 'y2', x = alt.X('x', axis = None), x2 = 'x2'
)

bar + text_enclosure + rect_nov_dec + rect_summer + rect_april

Utilizing a DataFrame to define the rectangles seems to prevent them from reaching the text section. As a next step, we will explicitly define the coordinates of the rectangles in pixels.

In [45]:

rect_nov_dec = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = "#b0b0b0").encode(
   y = alt.value(5), y2 = alt.value(60), x = alt.value(75), x2 = alt.value(440)
)

rect_summer = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = "#b0b0b0").encode(
   y = alt.value(70), y2 = alt.value(150), x = alt.value(150), x2 = alt.value(440)
)

rect_april = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = "#b0b0b0").encode(
   y = alt.value(160), y2 = alt.value(202), x = alt.value(250), x2 = alt.value(440)
)

# The text_enclosure and bar comes after the rectangles so that they sit on top
final = rect_nov_dec + rect_summer + rect_april + bar + text_enclosure

final.configure_view(stroke = None)

![Alt text](\Images\3_2e.png)

#### Enclosure with color differentiation

We can use color to emphasize the different enclosures.

In [46]:
rect_nov_dec_color = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = '#ed1e24').encode(
   y = alt.value(5), y2 = alt.value(60), x = alt.value(75), x2 = alt.value(450)
)

rect_summer_color = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = '#ec7c30').encode(
   y = alt.value(70), y2 = alt.value(150), x = alt.value(150), x2 = alt.value(450)
)

rect_april_color = alt.Chart(pd.DataFrame({'values':[{}]})).mark_rect(opacity = 0.2, color = '#5d9bd1').encode(
   y = alt.value(160), y2 = alt.value(202), x = alt.value(250), x2 = alt.value(450)
)

final = rect_nov_dec_color + rect_summer_color + rect_april_color + bar + text_enclosure

final.configure_view(stroke = None)

![Alt text](\Images\3_2f.png)

#### Enclosure + Similarity

We can combine different principles, such enclosure with similarity (using the rectangles and the colored keywords).

In [47]:
bar_highlight_color + rect_april_color | (text_april_blank2 + text_april_color 
                                 + text_summer_blank2 + text_summer_color 
                                 + text_nov_dec_blank2 + text_nov_dec_color)

In [48]:
bar_highlight_color | (text_april_blank2 + text_april_color 
                       + text_summer_blank2 + text_summer_color 
                       + text_nov_dec_blank2 + text_nov_dec_color) + rect_april_color

Since attempting to layer only already assigned variables seems to be ineffective, we will recreate the texts using alternative positions to enable their addition using the + symbol.

In [49]:
text_april_blank2_enclosure = alt.Chart(
    {"values": [{"text":  [
         'Highlights:',
         ' ',
         'In            there was a', 'reorganization. No jobs', 'were eliminated, but many', 'people chose to leave.', 
         ]}]}
).mark_text(size = 11, align = "left", dx = 160, dy = -113).encode(
    text = "text:N"
)

text_summer_blank2_enclosure = alt.Chart(
    {"values": [{"text":  [
        'Attrition rates tend to be', 'higher in the',  'months when it is', 
         'common for associates to',
         'leave to go back to', 
         'school.'
         ]}]}
).mark_text(size = 11, align = "left", dx = 160, dy = -21).encode(
    text = "text:N"
)

text_nov_dec_blank2_enclosure = alt.Chart(
    {"values": [{"text":  [
        'Attrition is typically low in',' ', 'due to the holidays.'
         ]}]}
).mark_text(size = 11, align = "left", dx = 160, dy = 68).encode(
    text = "text:N"
)

text_april_color_enclosure = alt.Chart(
    {"values": [{"text":  [
        'April'
         ]}]}
).mark_text(size = 11, align = "left", dx = 172, dy = -87, fontWeight = 800, color = '#ed1e24').encode(
    text = "text:N"
)

text_summer_color_enclosure = alt.Chart(
    {"values": [{"text":  [
        'Summer'
         ]}]}
).mark_text(size = 11, align = "left",  dx = 225, dy = -8, fontWeight = 800, color = '#ec7c30').encode(
    text = "text:N"
)

text_nov_dec_color_enclosure = alt.Chart(
    {"values": [{"text":  [
        'November & December'
         ]}]}
).mark_text(size = 11, align = "left", dx = 160, dy = 82, fontWeight = 800, color = '#5d9bd1').encode(
    text = "text:N"
)

final = (rect_nov_dec_color + rect_summer_color + rect_april_color + 
         bar_highlight_color + text_april_blank2_enclosure + 
         text_summer_blank2_enclosure + text_nov_dec_blank2_enclosure +
         text_april_color_enclosure + text_summer_color_enclosure + 
         text_nov_dec_color_enclosure
         )

final.configure_view(stroke = None)

![Alt text](\Images\3_2g.png)

#### Connection

The "Connection" relies on the fact that objects that are physically connected are often perceived as part of a single group. In this example, we will connect the texts and the data using a line.

In [50]:
rule_april = alt.Chart().mark_rule(point={
      "fill": "gray"
    }).encode(
    x = alt.value(102),
    y = alt.value(45),
    x2 = alt.value(300),
    strokeWidth = alt.value(0.5))

rule_summer = alt.Chart().mark_rule(point={
      "fill": "gray"
    }).encode(
    x = alt.value(202),
    y = alt.value(105),
    x2 = alt.value(300),
    strokeWidth = alt.value(0.5))

rule_nov_dec_1 = alt.Chart().mark_rule().encode(
    x = alt.value(260),
    y = alt.value(175),
    x2 = alt.value(300),
    strokeWidth = alt.value(0.5))

rule_nov_dec_2 = alt.Chart().mark_rule(point={
      "fill": "gray"
    }).encode(
    x = alt.value(260),
    y = alt.value(185),
    y2 = alt.value(175),
    strokeWidth = alt.value(0.5))

final = bar + text_enclosure + rule_april + rule_summer + rule_nov_dec_1 + rule_nov_dec_2

final.configure_view(stroke = None)


![Alt text](\Images\3_2h.png)

####  Connection + Similarity

Now we can use the connection and the similarity principles to connect highlighted texts with colored data.

In [51]:
final = (bar_highlight_color + text_april_blank2_enclosure + 
         text_summer_blank2_enclosure + text_nov_dec_blank2_enclosure +
         text_april_color_enclosure + text_summer_color_enclosure + 
         text_nov_dec_color_enclosure + rule_april + rule_summer + rule_nov_dec_1 + rule_nov_dec_2
         )

final.configure_view(stroke = None)

![Alt text](\Images\3_2i.png)

In [136]:
bar_base = alt.Chart(table, title = alt.Title(
    "2019 monthly voluntary attrition rate",
    fontSize = 15,
    anchor = 'start',
    offset = 10,
    fontWeight = "normal"
    )).mark_bar(size = 20).encode(
    x = alt.X('Date', 
              sort = None, 
              axis = alt.Axis(labelAngle = 0, titleX = 12, 
                              labelColor = "#888888", titleColor = '#888888', 
                              ticks = False, titleAnchor = 'start', titleFontWeight = 'normal'), 
              title = "2019"
              ), 
    y = alt.Y('Rate', 
              axis = alt.Axis(grid = False, titleAnchor = 'end', 
                              labelColor = "#888888", titleColor = '#888888', 
                              titleFontWeight = 'normal', format = "%", tickCount = 10), 
              scale = alt.Scale(domain = [0, 0.01]), 
              title = "ATTRITION RATE"
              ),
    color = alt.Color('Date',
                      sort = None,
                      scale = alt.Scale(range = ["#b0b0b0"]),
                      legend = None
                      )
    ).properties(
    width = 300,
    height = 200
)

bar_apr = alt.Chart(table).mark_bar().encode(
    x = alt.X('Date', 
              sort = None
              ), 
    y = alt.Y('Rate',
              scale = alt.Scale(domain = [0, 0.01])
              ),
    color = alt.value("#666666"),
    tooltip = alt.value('In April there was a reorganization. No jobs were eliminated, but many people chose to leave.')
    ).transform_filter(
    alt.FieldEqualPredicate(field='Date', equal='APR')
    )

bar_jul_aug = alt.Chart(table).mark_bar(size = 20).encode(
    x = alt.X('Date', 
              sort = None
              ), 
    y = alt.Y('Rate',
              scale = alt.Scale(domain = [0, 0.01])
              ),
    color = alt.value("#666666"),
    tooltip = alt.value('Attrition rates tend to be higher in the Summer months when it is common for associates to leave to go back to school.')
    ).transform_filter(
    alt.FieldOneOfPredicate(field='Date', oneOf=['JUL','AUG'])
    )

bar_dec_nov = alt.Chart(table).mark_bar(size = 20).encode(
    x = alt.X('Date', 
              sort = None
              ), 
    y = alt.Y('Rate',
              scale = alt.Scale(domain = [0, 0.01])
              ),
    color = alt.value("#666666"),
    tooltip = alt.value('Attrition is typically low in November and December due to the holidays.')
    ).transform_filter(
    alt.FieldOneOfPredicate(field='Date', oneOf=['NOV','DEC'])
    )


final = bar_base + bar_apr + bar_jul_aug + bar_dec_nov
final.configure_view(stroke = None)