# Pre-Class Tutorial 3: Temporal Viz and Styling Basics

## Overview
This pre-class material introduces temporal visualizations such as the area and line chart.
In addition, it focuses on how to describe the formatting structure of your visualizations.

**Time Required:** 90 - 120 minutes
**Pre-reqs:** Worked through Tutorials 1 and 2 and Class Starter 2.

---

## **Learning Goals**
Those who actively work through this tutorial will be able to:
- Use graphical marks to create temporal visualizations (`mark_line()`, `mark_area()`, `mark_rect()`)
- Create visualizations for sequential temporal tasks (line charts, area charts, stacked area charts)
- Master chart styling techniques (dimensions, axis titles, legends, tooltips)
- Create descriptive axis titles with proper units and formatting
- Apply professional styling techniques to bar charts and scatter plots

---

## Temporal Data

Temporal is derived from the latin word _tempus_ which means time. All data that we encounter has a temporal aspect to it. For instance, data is collected at a specific point in time. However when temporal is used, it typically refers to data that varies across time. Temporal data is otherwise known as **time-varying data**. Visualizing temporal data falls loosely into two main groups: sequential and cyclic.

Sequential tasks explore attributes that are unique, mostly consecutive, and unrepeating.
For example:
_How did the price of tea change over the last century?_
In this example we could encode the price of tea for each year [2021, 2020, 2019, 2018 ...]

Cyclic tasks explore attributes that repeat and have a limited number of options (i.e., fixed range)
For example:
_Which day of the week has the highest energy consumption?_
In this example we would encode energy consumption for each day [Sun, Mon, Tues, ...]  over a fixed time period (e.g., a month or year)

The preceding notebooks presented the `mark_point()`, `mark_circle()` , `mark_square()` , and `mark_bar()` marks used in Altair. With these marks you were able to create scatter plots, bubble plots, bar charts and its many variations. In this notebook you will be exposed to graphical marks that support the creating of visualizations used to make sense of time-varying data.

- `mark_line()` - Connected line segments.
- `mark_area()` - Filled areas defined by a top-line and a baseline.
- `mark_rect()` - Filled rectangles, useful for heatmaps.

For a complete list, and links to examples, see the [Altair marks documentation](https://altair-viz.github.io/user_guide/marks.html).

## Environment and Data Setup

In [None]:
import pandas as pd
import altair as alt

### Global Development Data
Once again, we will be using the global health and population data for a number of countries. <br>The data was collected by the Gapminder Foundation.


| Column                | Description                                                                                  |
|-----------------------|----------------------------------------------------------------------------------------------|
| country               | Country name                                                                                 |
| year                  | Year of observation                                                                          |
| population            | Population in the country at each year                                                       |
| region                | Continent the country belongs to                                                             |
| sub_region            | Sub-region the country belongs to                                                            |
| income_group          | Income group                                                                                 |
| life_expectancy       | The mean number of years a newborn would <br>live if mortality patterns remained constant    |
| income                | GDP per capita (in USD) <em>adjusted <br>for differences in purchasing power</em>            |
| children_per_woman    | Average number of children born per woman                                                    |
| child_mortality       | Deaths of children under 5 years <break>of age per 1000 live births                          |
| pop_density           | Average number of people per km<sup>2</sup>                                                  |
| co2_per_capita        | CO2 emissions from fossil fuels (tonnes per capita)                                          |
| years_in_school_men   | Mean number of years in primary, secondary,<br>and tertiary school for 25-36 years old men   |
| years_in_school_women | Mean number of years in primary, secondary,<br>and tertiary school for 25-36 years old women |

In [None]:
#filepath = "data/world-data-gapminder.csv"
filepath = 'https://raw.githubusercontent.com/kemiolamudzengi/dsci-320-datasets/main/world-data-gapminder.csv'

# Read in the data using pandas, remember to set parse_dates!
data = pd.read_csv(filepath, parse_dates=["year"])

## Data Task
Let's filter the dataset to keep data from the 1900 to 2018

In [None]:
# Let's filter the dataset so that we have data from the beginning of the 20th century
data = data[data["year"] >= pd.to_datetime("1900-01-01")]
print(f"Dataset shape: {data.shape}")
print(f"Years covered: {data.year.dt.year.min()} to {data.year.dt.year.max()}")


## Line Mark

The `line` mark type connects plotted points with line segments, for example so that a line's slope conveys information about the rate of change.

### Line Chart

The line chart (also called line graph) was created by William Playfair. It encodes data as a series of points that are connected by a straight line.
<br>
The `x` channel is used to encode the temporal field, while the `y` channel is used to encode a quantitative value.

<div class="alert alert-info" style="color:black; padding: 15px; border-radius: 8px; background-color:#eaf4ff;">
  <h2>Exploratory Task: Canadaâ€™s Population Over Time</h2>

  <p><strong>Guiding Question:</strong>
    <em>How has the population of Canada changed over time?</em>
  </p>

  <p>We will begin by preparing our data:</p>
  <ul>
    <li>Create a dataframe filtered to include <strong>only Canada</strong>.</li>
    <li>Use <code>mark_line</code> to represent data items.</li>
    <li>Encode <code>year</code> on the <strong>x channel</strong>.</li>
    <li>Encode <code>population</code> on the <strong>y channel</strong>.</li>
  </ul>
</div>


In [None]:
canada_data = data.query('country == "Canada"').copy()

canada_line = alt.Chart(canada_data).mark_line().encode(
    alt.X('year:T'),
    alt.Y('population:Q'),
)
# Show plot
canada_line

Itâ€™s a bit difficult to tell what was happening at each year in the current visualization.
We can see a steady increase in the population, but sometime between 1940 and 1955 the rate of increase seems to change.

To make this easier to explore, letâ€™s add a **tooltip** that shows the exact values when hovering over the line.
In addition, Altair allows us to display the individual data points by using the `point` property inside `mark_line`.

In [None]:
canada_line_styled = alt.Chart(canada_data).mark_line(point=True).encode(
    alt.X('year:T'),
    alt.Y('population:Q'),
    alt.Tooltip(['year', 'population'])
)

# Show plot
canada_line_styled

That doesnâ€™t really seem to have helped.
It just looks like we have a thicker line, even though there are actually over 100 points making up that line.

In addition to setting the `point` property, Altair also lets you specify properties for different marks.
At this point, I suggest you **experiment** with at least two different [line_mark properties](https://altair-viz.github.io/user_guide/marks/line.html) to see how they affect the visualization.

For example, try out *strokeDash*

Now, just to demonstrate the differences, letâ€™s make a few changes:
* Change the data attribute type for the x channel from **temporal** to **ordinal**.
* Create a new chart so that in addition to showing the points, we also change the **color of the points**.
* Modify how the line segments are connected by using a different **interpolation method**. (By default, interpolation is set to **linear**.)

Instead of rewriting the entire chart from scratch, weâ€™ll make a copy of the existing chart, then update the `x` encoding and add some styling options.

In [None]:
canada_line_exp = canada_line_styled.mark_line(
    interpolate='step-after',
    point=alt.OverlayMarkDef(color='red')).encode(
    alt.X('year:O')
)

#show plot
canada_line_exp

The [Altair API](https://altair-viz.github.io/user_guide/generated/core/altair.Interpolate.html) has a full listing of interpolate options.

Okayâ€”this chart might feel like a bit of a crime ðŸ˜…, but hopefully you can now see how different properties impact the visualization.

Before moving on, take some time to **experiment on your own**:

* Try out at least **two different interpolation options** (e.g., `step`, `monotone`, `basis`).
* Explore at least **two other ways to customize the mark** (for example, adjusting `size`, `opacity`, or `strokeDash`).

This hands-on exploration will help you understand how small styling choices can dramatically change the way your data is presented.

### Multi-Line Chart
With the `mark_line` you can encode temporal data for multiple countries at the same time. Let's get a sense of how population changea given region.

#### Data Task
Because we don't want sensory overload, let's first find out how many regions there are, and how many countries are in each region



In [None]:
print(data.groupby("sub_region")["country"].nunique())

That was helpful! We definitely donâ€™t want to focus only on **Sub-Saharan Africa**, since our data is missing the more specific groupingsâ€”**Western, Eastern, Southern, and Central Africa**â€”that are important for analyzing that part of the world.

**Rabbit Hole Task:**
Do some research online to figure out how to create a dataframe where the regions for Sub-Saharan Africa are broken down into these four accurate groupings. This may involve:

* Looking up a reliable source that lists which countries belong to each subregion.
* Creating a mapping (e.g., a dictionary) that assigns each country to its subregion.
* Updating your dataframe by applying this mapping.

### Back to Multi-Line Chart
For now, letâ€™s focus on Europe to keep things manageable.

* Weâ€™ll start with **Western Europe** (about 7 countries seems like a good size to work with).

Letâ€™s now use **color** to encode the `country` field.
To focus on the Americas, we have two options:

* Use Altairâ€™s **filter transform**, or
* Use **pandas** to create a new dataframe that only contains countries in the Americas.

Weâ€™ll take the second approach and filter the data with pandas.



In [None]:
# DATA TASK
# we will first take the dataset, filter to only include the sub-region of north america.
western_europe_data = data.query('sub_region == "Western Europe"').copy()
print(western_europe_data.shape)

Now that we have our dataset, letâ€™s create a **multi-line chart**.
From this point forward, I wonâ€™t provide the full code.
Instead, I want you to practice reading the specifications carefully and writing the code yourself.

<div class="alert alert-info" style="color:black; padding: 15px; border-radius: 8px; background-color:#eaf4ff;">
  <p><strong>Chart Specification:</strong></p>
  <ol>
    <li>Use the <code>line</code> mark.</li>
    <li>Encode <code>year</code> on the <strong>x channel</strong>.</li>
    <li>Encode <code>pop_density</code> on the <strong>y channel</strong>.</li>
    <li>Encode <code>country</code> on the <strong>color channel</strong> to differentiate lines.</li>
  </ol>
</div>



In [None]:
multi_line_w_europe = ...

# Show the plot
multi_line_w_europe

The first thing you may notice is that we donâ€™t have any `pop_density` data before a certain year.

**Questions for you:**

1. What year is the earliest `pop_density` data available?
2. Why do you think there is no population density data before that year?

ðŸ‘‰ Write down your answers and bring them to classâ€”weâ€™ll discuss possible explanations together.



## Area Marks

The `area` mark type combines aspects of `line` and `bar` marks: it visualizes connections (slopes) among data points, but also shows a filled region, with one edge defaulting to a zero-valued baseline.
Hereâ€™s a polished, student-friendly version of your area chart instructions with better grammar, flow, and a bit of explanation:

### Area Chart

An **area chart** is very similar to a line chart, but it emphasizes the magnitude of values by shading the area beneath the line. This makes it easier to see how the values accumulate or change over time.

<div class="alert alert-info" style="color:black; padding: 15px; border-radius: 8px; background-color:#eaf4ff;">

  <p><strong>Guiding Question:</strong>
    <em>How has Canadaâ€™s population changed over time?</em></p>

  <p>We will create an area chart to visualize this growth:</p>

  <p><strong>Chart Specification:</strong></p>
  <ul>
    <li>Encode <code>year</code> on the <strong>x channel</strong>.</li>
    <li>Encode <code>population</code> on the <strong>y channel</strong>.</li>
    <li>Use <code>mark_area</code> to fill in the space under the line.</li>
  </ul>

  <p>This will allow us to clearly see how Canadaâ€™s population has grown across the years.</p>
</div>


In [None]:
canada_area_chart = ...
# Show plot
canada_area_chart

Many of the properties we customized for `mark_line` exist for `mark_area` as well.
Let's customize the interpolation and the color of the area chart.
Here is the documentation from (Vega-Lite on Gradients)[https://vega.github.io/vega-lite/docs/gradient.html]

In [None]:
canada_area_styled = canada_area_chart.mark_area(
    interpolate='basis-open',
    line={"color": "darkred"},
    color=alt.Gradient(
        gradient="linear",
        stops=[
            alt.GradientStop(color="white", offset=0),
            alt.GradientStop(color="darkred", offset=1),
        ],
        x1=3,
        x2=0,
        y1=2,
        y2=1,
    )
)
# Show plot
canada_area_styled

Using a gradient color is **DEFINITELY** beyond the scope of what I expect, but i had to flex.

### Stacked Area Chart

Similar to `mark_bar`, `mark_area` also supports stacking.
Let's focus in on the 6 countries in Northern Africa.

In [None]:

# DATA TASK
# we will first take the dataset, filter to only include the sub-region of north america.
north_africa_data = data.query('sub_region == "Northern Africa"').copy()
print(north_africa_data.shape)


We can explore the change in population using a stacked area chart.
**Stacking happens when we use color to encode a non-quantitative value.**

<div class="alert alert-info" style="color:black; padding: 15px; border-radius: 8px; background-color:#eaf4ff;">
  <p><strong>Guiding Question:</strong>
    <em>How does population change over time across multiple countries?</em></p>

  <p><strong>Chart Specification:</strong></p>
  <ul>
    <li>Use <code>mark_area</code> to fill in the space under the line.</li>
    <li>Encode <code>year</code> on the <strong>x channel</strong>.</li>
    <li>Encode <code>population</code> on the <strong>y channel</strong>.</li>
    <li>Encode <code>country</code> on the <strong> </li>
  </ul>
</div>

In [None]:
northern_africa_stacked_area = ...
# Show plot
northern_africa_stacked_area

By default, stacking is performed relative to a zero baseline. However, other `stack` options are available:

* `center` - to stack relative to a baseline in the center of the chart, creating a *streamgraph* visualization, and
* `normalize` - to normalize the summed data at each stacking point to 100%, enabling percentage comparisons.

Below we adapt the chart by setting the `y` encoding `stack` attribute to `normalize`. What happens if you instead set it `center`?

In [None]:
northern_africa_stacked_area.encode(
    alt.Y('population:Q', stack='normalize')
)

What happens if you set stack to `center`?
Bring your answers to class.

## Customizing Vizzes
Visual encoding â€“ mapping data to visual variables such as position, size, shape, or color â€“ is the beating heart of data visualization.

The workhorse that actually performs this mapping is the scale: a function that takes a data value as input (the scale domain) and returns a visual value, such as a pixel position or RGB color, as output (the scale range).

Of course, a visualization is useless if no one can figure out what it conveys!

In addition to graphical marks, a chart needs reference elements, or guides, that allow readers to decode the graphic.

Guides such as axes (which visualize scales with spatial ranges) and legends (which visualize scales with color, size, or shape ranges), are the unsung heroes of effective data visualization!

In this section we will explore additional ways axes and scales can be customized in Altair

### **Why Styling Matters in Data Visualization**

Professional data visualization requires more than just encoding data correctly. Charts need to be:
- **Self-explanatory:** Clear titles and labels that tell the story
- **Appropriately sized:** Readable but not overwhelming for the context
- **Interactive:** Allow exploration through tooltips and selection
- **Accessible:** Consider color choices and layout for diverse audiences

As noted in the [Altair Customization Guide](https://altair-viz.github.io/user_guide/customization.html), proper styling transforms raw data encodings into effective communication tools.

Let's go back to our first chart and expand the width.


In [None]:
# Too narrow - cramped appearance
narrow_chart = canada_line.properties(width=200, height=200, title="Too Narrow")

# Standard width - good balance
standard_chart = canada_line.properties(width=400, height=300, title="Standard Dimensions")

# Wide for presentations - shows detail
wide_chart = canada_line.properties(width=700, height=300, title="Wide Format")

# Compare using concatenation
dimension_comparison = narrow_chart | standard_chart | wide_chart
dimension_comparison

Typically, we let Altair choose the default values for a chartâ€™s width and height.
However, when working with dashboards (or when you want consistent layouts), itâ€™s often important to specify the chart size explicitly.

### **Chart Dimensions & Layout**
**Dimension Guidelines:**
- **Narrow (200-300px):** Mobile-friendly but may truncate labels, great for faceting
- **Standard (400-500px):** Versatile for most applications
- **Wide (600+ px):** Good for presentations when screen space allows

**Height considerations:** Ensure adequate space for all category labels


## Titles and Axis Titles
In addition to chart size, we can also specify a title that describes what the chart shows.
Notice how each of our charts so far has been given an appropriate and descriptive titleâ€”this helps make the visualization easier to understand at a glance. In addition to the chart title, we can change the axis titles as well.

### Axis Titles
Okay we are going to step away from our temporal data and build a bubble plot.
Here are the specifications

Default axis labels are often technical variable names. Professional visualization requires descriptive, user-friendly labels

<div class="alert alert-info" style="color:black; padding: 15px; border-radius: 8px; background-color:#eaf4ff;">

  <p>We will create a bubble chart to explore these relationships:</p>

  <p><strong>Chart Specification:</strong></p>
  <ul>
    <li>Use the <code>circle</code> mark.</li>
    <li>Encode <code>co2_per_capita</code> on the <strong>x channel</strong>.</li>
    <li>Encode <code>life_expectancy</code> on the <strong>y channel</strong>.</li>
    <li>Encode <code>region</code> on the <strong>color channel</strong> to distinguish continents.</li>
    <li>Encode <code>population</code> on the <strong>size channel</strong> to reflect country population.</li>
  </ul>
</div>

---

Add tooltip so that we can access the attributes for each data item
```python
alt.Tooltip=['country:N', 'population:Q', 'pop_density:Q', 'co2_per_capita:Q', 'region:N']
```

You can control how large or small the bubbles appear by setting a range for the `size` channel:

```python
alt.Size('population:Q', scale=alt.Scale(range=[100, 1000]))
```
   Here, smaller populations map to \~100 pixels and larger ones to \~1000 pixels.

To give each circle a black border, add `stroke="black"` inside the `mark_circle`:
```python
   .mark_circle(stroke="black")
```

Add the stroke width encoding** inside `.encode()`:

   ```python
   alt.StrokeWidth('pop_density:Q', scale=alt.Scale(range=[0.5, 5])),
   ```

   * This draws an outline around each bubble.
   * Low density â†’ thin outline.
   * High density â†’ thick outline.



In [None]:
# DATA TASK
recent_data = data[data["year"] == pd.to_datetime("2014-01-01")]


# VIZ TASK

bubble_plot = ...

bubble_plot


We have encoded attributes on five channels x, y, color, size and strokeWidth. Lucky you, I didn't throw in opacity'.
Now let's consider our chart.
The x axis title is `children_per_woman` this doesn't exactly scream professionalism.
Having axis titles that include `_` isn't a good look, so what we can do is that we can change them and make them more descriptive.

<div class="alert alert-success" style="color:black; padding: 15px; border-radius: 8px; background-color:#e8f7e4;">
  <h3>Next Steps: Styling Your Chart</h3>

  <ol>
    <li>Update all titles for guides, including the <strong>x-axis</strong>, <strong>y-axis</strong>, and <strong>legends</strong>.</li>
    <li>Set the chart <strong>width</strong> and <strong>height</strong> to 500px.</li>
    <li>Refactor your code to use the <strong>method chaining</strong> approach rather than keyword-argument style. (Both approaches work.)</li>
    <li>Create and add an appropriate <strong>title</strong> for the chart.</li>
  </ol>
</div>



In [None]:

bubble_plot_styled = alt.Chart(recent_data).mark_circle(stroke='black').encode(
    alt.X('children_per_woman:Q').title('Children per Woman'),
    alt.Y('co2_per_capita:Q').title('COâ‚‚ Emissions per Capita'),
    alt.Color('region:N').title('Continent'),
    alt.Size('population:Q').scale(range=[100, 1000]).title('Population'),
    alt.Tooltip(['country:N', 'population:Q', 'pop_density:Q', 'co2_per_capita:Q', 'region:N']),
    alt.StrokeWidth('pop_density:Q').scale(range=[0.5, 5]).title('Population Density')
).properties(
    width=500,
    height=500,
    title='Comparing Life, Population, and Emissions Across Countries (2014)'
)

bubble_plot_styled



**Labeling Best Practices:**
- **Include units:** TWh, %, millions, etc. - essential for quantitative data
- **Use sentence case:** "Renewable Electricity" not "RENEWABLE ELECTRICITY"
- **Be descriptive:** "GDP per Capita ($)" not just "gdp_per_capita"
- **Keep concise:** Informative but not overly verbose


### Adjusting Axes
Beyond titles, we can control number formatting, tick counts, and axis appearance:

**Advanced Formatting Options:**
- **Number formats:** `.0f` (no decimals), `,.0f` (thousands separators), `$,.0f` (currency)
- **Tick control:** `tickCount=5` limits crowded axes
- **Grid lines:** `grid=True/False` aids reading values
- **Custom scales:** Control min/max ranges when needed


<div class="alert alert-success" style="color:black; padding: 15px; border-radius: 8px; background-color:#e8f7e4;">

  <p>To improve readability and clarity of your visualization, do the following:</p>
  <ul>
    <li>Set the <strong>number of ticks</strong> for the x-axis to make it easier to read.</li>
    <li>Remove the <strong>gridlines</strong> from the y-axis for a cleaner look.</li>
    <li>Format numbers for <strong>Population</strong> to make them more readable (e.g., with commas or in millions).</li>
  </ul>
</div>



In [None]:

bubble_plot_styled_guides = alt.Chart(recent_data).mark_circle(stroke='black').encode(
    alt.X('children_per_woman:Q').title('Children per Woman').axis(tickCount=10), # set number of ticks
    alt.Y('co2_per_capita:Q').title('COâ‚‚ Emissions per Capita').axis(grid=False), # remove gridlines
    alt.Color('region:N').title('Continent'),
    alt.Size('population:Q').scale(range=[100, 1000]).title('Population').legend(format=",.1s"), # format numbers in legend
    alt.Tooltip(['country:N', 'population:Q', 'pop_density:Q', 'co2_per_capita:Q', 'region:N']),
    alt.StrokeWidth('pop_density:Q').scale(range=[0.5, 5]).title('Population Density')
).properties(
    width=500,
    height=500,
    title='Comparing Life, Population, and Emissions Across Countries (2014)'
)

bubble_plot_styled_guides



We have barely scratched the surface on formatting.
To learn more, peruse the Vega-Altair documentation.
For [Axis formatting](https://altair-viz.github.io/user_guide/generated/core/altair.Axis.html)
For more on the number formatting, you have to go 2 layers deeper and see the [D3.js documentation](https://d3js.org/d3-format)
Okay we are almost done, I'm getting tired and I'm just typing.

Our tooltips still have the attribute names for the dataset. Let's update them as well so they are descriptive.


In [None]:
bubble_plot_styles_guides_tooltips = bubble_plot_styled_guides.encode(
    tooltip = [
        alt.Tooltip('country:N', title='Country'),
        alt.Tooltip('population:Q', title='Population', format=",.1s"),
        alt.Tooltip('pop_density:Q', title='Population Density'),
        alt.Tooltip('region:N', title='Continent')
    ]
)
bubble_plot_styles_guides_tooltips

## Summary and Next Steps

### Summary
Today, you have been exposed to **line and area charts** and explored how **marks can be customized**.
Perhaps the most interesting part was learning how to **style a visualization** so it looks polished â€” not something you threw together right before a nap.

We focused on:
- Adjusting **size, color, and stroke** to highlight important data
- Setting **titles and axis specifications** for clarity
    - Adding **tooltips** to make charts interactive and informative

These skills directly apply to **temporal visualizations** and beyond:
- **Line charts**: clear axis labels and informative tooltips
- **Area charts**: strategic color choices and legend management
- **Heatmaps**: careful formatting of dimensions
- **Faceted charts**: consistent styling across panels

---

### Next Steps
1. Go back to **Tutorial 2** and **Lecture 2B**. Take every chart you created and start **styling them** using what you learned today.
2. Practice is key â€” **you learn by doing**.

---

### Additional Resources
- [Altair Customization Documentation](https://altair-viz.github.io/user_guide/customization.html)
- [D3 Number Format Patterns](https://github.com/d3/d3-format#locale_format)
- [Color Accessibility Guidelines](https://webaim.org/articles/contrast/)

---

By integrating these customization techniques with everything you've learned about **encodings and visual channels**, youâ€™re now ready to create a wide variety of **statistical graphics** that communicate data clearly and effectively.
