# Temporal Data

Temporal is derived from the latin word _tempus_ which means time. All data that we encounter has a temporal aspect to it. For instance, data is collected at a specific point in time. However when temporal is used, it typically refers to data that varies across time. Temporal data is otherwise known as **time-varying data**. Visualizing temporal data falls loosely into two main groups: sequential and cyclic.

Sequential tasks explore attributes that are unique, mostly consecutive, and unrepeating.
For example:
_How did the price of tea change over the last century?_
In this example we could encode the price of tea for each year [2021, 2020, 2019, 2018 ...]

Cyclic tasks explore attributes that repeat and have a limited number of options (i.e., fixed range)
For example:
_Which day of the week has the highest energy consumption?_
In this example we would encode energy consumption for each day [Sun, Mon, Tues, ...]  over a fixed time period (e.g., a month or year)

The preceding notebooks presented the `mark_point()`, `mark_circle()` , `mark_square()` ,`mark_tick()` and `mark_bar()` marks used in Altair. With these marks you were able to create scatter plots, bubble plots, bar charts and its many variations. In this notebook you will be exposed to graphical marks that support the creationg of visualizations used to make sense of time-varying data.

- `mark_line()` - Connected line segments.
- `mark_area()` - Filled areas defined by a top-line and a baseline.
- `mark_rect()` - Filled rectangles, useful for heatmaps.

For a complete list, and links to examples, see the [Altair marks documentation](https://altair-viz.github.io/user_guide/marks.html).

## Learning Goals
Those who actively work through this notebook will be able to:
 - Use graphical marks to create common vizzes
 - Create visualizations for sequential temporal tasks (e.g., line, area, and stacked area charts)
 - Create visualizations for cyclic temporal tasks (e.g., heatmaps)

## Global Development Data
Once again, we will be using the global health and population data for a number of countries. The data was collected by the Gapminder Foundation.
In previous notebooks we have focused on the year 2000, here we will explore temporal trends from 1955 to 2005.
Let’s first load the dataset from the vega-datasets collection into a Pandas data frame.
NOTE: This data is slightly different from the dataset you previously used.

In [182]:
import pandas as pd
import altair as alt

In [183]:
from vega_datasets import data as vega_data
data = vega_data.gapminder()
data.head()

Unnamed: 0,year,country,cluster,pop,life_expect,fertility
0,1955,Afghanistan,0,8891209,30.332,7.7
1,1960,Afghanistan,0,9829450,31.997,7.7
2,1965,Afghanistan,0,10997885,34.02,7.7
3,1970,Afghanistan,0,12430623,36.088,7.7
4,1975,Afghanistan,0,14132019,38.438,7.7


## Line Marks

The `line` mark type connects plotted points with line segments, for example so that a line's slope conveys information about the rate of change.

### Line Chart

The line chart (also called line graph) was created by William Playfair. It encode data as a series of data points that are connected by a straight line. The `x` channel is used to encode the temporal field, while the `y` channel is used to encode a quantitative value.

Let's create a line chart that depicts the fertility rate for Canada for the given time period.

In [184]:
alt.Chart(data).mark_line().encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
).transform_filter(
    (alt.datum.country == 'Canada')
)

One observation is that the fertility has decreased over the last 50 years. There was a sharp decline between 1960 and 1980 and then the fertility rate seemed to stablize.
Altair allows you to include points for each value by adding the point property for `mark_line`.

In [185]:
alt.Chart(data).mark_line(point=True).encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
).transform_filter(
    (alt.datum.country == 'Canada')
)

You can customize the appearance of the point with `OverlayMarkDef`.
Altair allows you to change the way marks are connected.
The default interpolation for `mark_line` and `mark_area` is **linear**

In [186]:
alt.Chart(data).mark_line(
    interpolate='step-before',
    point=alt.OverlayMarkDef(color="black")
).encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
).transform_filter(
    (alt.datum.country == 'Canada')
)

The [API](https://altair-viz.github.io/user_guide/generated/core/altair.Interpolate.html) has a full listing of interpolate options. Explore the options before continuing to the next section.


### Multi-Line Chart
With the `mark_line` you can encode temporal data for multiple countries at the same time. Let's get a sense of how fertility change across the Americas.
Let's use color to encode the country field. We can either use the filter transform or use pandas to create a new dataframe just for countries in the Americas.
Let's do the later.


In [187]:
dataAmer = data.loc[(data['cluster'] == 3)]

In [188]:
alt.Chart(dataAmer).mark_line().encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
    alt.Color('country:N')
)

Canada is buried in here somewhere, but it is hard to figure out which line belongs to Canada.
What we can say is that for all American countries depicted, the fertility rate has decreased over the last 50 years.
The legend is not particularly helpful, so let's hide the legend and use tooltips instead.

In [189]:
alt.Chart(dataAmer).mark_line().encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
    alt.Color('country:N', legend=None),
    alt.Tooltip('country:N')
).properties( width=400)

Note that we set a custom width of 400 pixels. _Try changing (or removing) the widths and see what happens!_

Let's change some of the default mark parameters to customize the plot. We can set the `strokeWidth` to determine the thickness of the lines and the `opacity` to add some transparency. By default, the `line` mark uses straight line segments to connect data points. In some cases we might want to smooth the lines. We can adjust the interpolation used to connect data points by setting the `interpolate` mark parameter. Let's use `'monotone'` interpolation to provide smooth lines that are also guaranteed not to inadvertently generate "false" minimum or maximum values as a result of the interpolation.

In [190]:
alt.Chart(dataAmer).mark_line(
    strokeWidth=3,
    opacity=0.5,
    interpolate='monotone'
).encode(
    alt.X('year:O'),
    alt.Y('fertility:Q'),
    alt.Color('country:N', legend=None),
    alt.Tooltip('country:N')
).properties(
    width=400
)

### Slopegraph

The `line` mark can also be used to create *slope graphs*, charts that highlight the change in value between two comparison points using line slopes.
Let's create a slope graph comparing the life expectancies at two points of time: 1955 and 2005.
By default, Altair places the years close together. To better space out the years along the x-axis, we can indicate the size (in pixels) of discrete steps along the width of our chart as indicated by the comment below. Try adjusting the width `step` value below and see how the chart changes in response.

In [191]:
alt.Chart(dataAmer).mark_line(opacity=0.5, point=True).encode(
    alt.X('year:O'),
    alt.Y('life_expect'),
    alt.Color('country:N', legend=None),
    alt.Tooltip('country:N')
).properties(
    width={"step": 100} # adjust the step parameter
).transform_filter(
    (alt.datum.year == 1955) | (alt.datum.year == 2005)
)

Being that the life expectancy falls between 40 and 85, let's remove the requirement that the y-axes must begin at 0, so we can better distinguish between each country.
In addition, let's
- enlarge the point size,
- change the strokeWidth,
- add a title to the plot,
- change the height of the chart
- change the y-axes title,
- and include the life-expectancy value in the tooltip.


In [192]:
alt.Chart(dataAmer).mark_line(
    opacity=0.7,
    strokeWidth=2,
    point=alt.OverlayMarkDef(size=60),
).encode(
    alt.X('year:O'),
    alt.Y('life_expect',
          scale= alt.Scale(zero=False),
          axis= alt.Axis(title='Life Expectancy (yrs)')),
    alt.Color('country:N'),
    alt.Tooltip(['country', 'life_expect'])
).properties(
    width={"step": 100},
    height= 400,
    title= ['Change in Life Expectancy For', 'Select American Countries'],   #multi-line title
).transform_filter(
    (alt.datum.year == 1955) | (alt.datum.year == 2005)
).configure_title(fontSize=18, color='black')

What can you observe from this slope graph?
As of 2005, Canada has the highest life expectancy, while Haiti has the lowest.
The life expectancy for Grenada has improved at a slower rate than the other nations.


## Area Marks

The `area` mark type combines aspects of `line` and `bar` marks: it visualizes connections (slopes) among data points, but also shows a filled region, with one edge defaulting to a zero-valued baseline.

### Area Chart
The area chart is similar in function to the line chart.
Let's create an area chart that depicts the population change for Canada over time.
We will encode the year on the `x` channel, and the population on the `y`.


In [193]:
alt.Chart(dataAmer).mark_area().encode(
    alt.X('year:O'),
    alt.Y('pop:Q')
).transform_filter(
    (alt.datum.country == 'Canada')
)

Many of the properties we customized for `mark_line` exist for `mark_area` as well.
Let's add points for each value, and use the monotone interpolation.

In [194]:
alt.Chart(dataAmer).mark_area(
    interpolate='monotone',
    point=True,
).encode(
    alt.X('year:O'),
    alt.Y('pop:Q')
).transform_filter(
    (alt.datum.country == 'Canada')
)


### Stacked Area Chart

Similar to `mark_bar`, `mark_area` also support stacking.
Let's focus in on the 3 countries in North America.
We can explore the change in population using a stacked area chart.
Stacking happens when we use color to encode a non-quantitative value.

In [195]:
dataNA = data.loc[
    (data['country'] == 'United States') |
    (data['country'] == 'Canada') |
    (data['country'] == 'Mexico')
]

domain = ['Canada','Mexico', 'United States']
range = ['#66c2a5', '#fc8d62', '#8da0cb']

alt.Chart(dataNA).mark_area().encode(
    alt.X('year:O'),
    alt.Y('pop:Q'),
    alt.Color('country:N', scale=alt.Scale(domain=domain, range=range)),
)

The stacked area chart is useful for comparing multiple variables changing over time.
It also provides a overview of the population of the entire continent. From this graph we can observe that in 1955 the population in North America was around 210 million. By 2005 the population had doubled to around 430 million. This is an insight you could not observe just from looking at the multi-line chart. However, it is hard to get a sense of what the starting and ending population is for Canada from this graphl.

By default, stacking is performed relative to a zero baseline. However, other `stack` options are available:

* `center` - to stack relative to a baseline in the center of the chart, creating a *streamgraph* visualization, and
* `normalize` - to normalize the summed data at each stacking point to 100%, enabling percentage comparisons.

Below we adapt the chart by setting the `y` encoding `stack` attribute to `normalize`. What happens if you instead set it `center`?

In [196]:
alt.Chart(dataNA).mark_area().encode(
    alt.X('year:O'),
    alt.Y('pop:Q', stack='normalize'),
    alt.Color('country:N', scale=alt.Scale(domain=domain, range=range)),
)

By normalizing, we can observe the impact each country had on the continent's population.
 Canada's contribution has relatively remained the same. The United States has seen a decrease,, in 1955 it accounted for 75% of the continent's peoples, but by 2005 it had decreased to around 67%.
 Mexico's contribution has increased over the 50 year period.

To disable stacking altogether, set the  `stack` attribute to `None`. We can also add `opacity` as a default mark parameter to ensure we see the overlapping areas!

In [197]:
alt.Chart(dataNA).mark_area(opacity=0.5).encode(
    alt.X('year:O'),
    alt.Y('pop:Q', stack=None),
    alt.Color('country:N', scale=alt.Scale(domain=domain, range=range)),
)

The `area` mark type also supports data-driven baselines, with both the upper and lower series determined by data fields. As with `bar` marks, we can use the `x` and `x2` (or `y` and `y2`) channels to provide end points for the area mark.

The chart below visualizes the range of minimum and maximum fertility, per year, for North American countries:

In [198]:
alt.Chart(dataNA).mark_area().encode(
    alt.X('year:O'),
    alt.Y('min(fertility):Q'),
    alt.Y2('max(fertility):Q')
).properties(
    width={"step": 40}
)

We can see a larger range of values in 1995, from just under 4 to just under 7. By 2005, both the overall fertility values and the variability have declined, centered around 2 children per familty.

All the `area` mark examples above use a vertically oriented area. However, Altair and Vega-Lite support horizontal areas as well. Let's transpose the chart above, simply by swapping the `x` and `y` channels.

In [199]:
alt.Chart(dataNA).mark_area().encode(
    alt.Y('year:O'),
    alt.X('min(fertility):Q'),
    alt.X2('max(fertility):Q')
).properties(
    width={"step": 40}
)


So far we have allowed Altair to style the chart. In Visualization and particularly Data Science, the legends, axes play a vital role in communicating infroamtion and results.


## Customizing Vizzes
Visual encoding – mapping data to visual variables such as position, size, shape, or color – is the beating heart of data visualization. The workhorse that actually performs this mapping is the scale: a function that takes a data value as input (the scale domain) and returns a visual value, such as a pixel position or RGB color, as output (the scale range). Of course, a visualization is useless if no one can figure out what it conveys! In addition to graphical marks, a chart needs reference elements, or guides, that allow readers to decode the graphic. Guides such as axes (which visualize scales with spatial ranges) and legends (which visualize scales with color, size, or shape ranges), are the unsung heroes of effective data visualization! In this section we will explore additional ways axes and scales can be customized in Altair

### Adjusting Axes 
Let's start by creating a slope graph to visualization the change in population for Asian nations.

In [200]:
dataTimeAsia = data.loc[data['cluster'] == 4]
dataAsia= dataTimeAsia.loc[(dataTimeAsia['year'] == 1955) | (dataTimeAsia['year'] == 2005)]

alt.Chart(dataAsia).mark_line(
    strokeWidth=2,
    point=alt.OverlayMarkDef(size=60),
).encode(
    alt.X('year:O'),
    alt.Y('pop:Q',
          scale= alt.Scale(zero=False)),
    alt.Color('country:N'),
    alt.Tooltip(['country', 'pop'])
).properties(
    width={"step": 100},
    height= 400,
)

Say something about the data and how this is what we saw before but it is hard to see what is happening for the other countries in the region.
Because of China's size, tt is hard to make sense of the rate of change for other countries.
By default Altair uses a `linear` mapping between the domain values () and the range values (). To get a better sense of how the data encoded on the `x` channel changes we can apply a different scale transformation.
To change the scale type, we will set the `scale` attribute, using the alt.Scale method and type parameter.
Let's try using a [logarithmic scale](https://en.wikipedia.org/wiki/Logarithmic_scale) (`log`) instead:

In [201]:
alt.Chart(dataAsia).mark_line(
    strokeWidth=2,
    point=alt.OverlayMarkDef(size=60),
).encode(
    alt.X('year:O'),
    alt.Y('pop:Q',
          scale= alt.Scale(type='log')),
    alt.Color('country:N'),
    alt.Tooltip(['country', 'pop'])
).properties(
    width={"step": 100},
    height= 400,
)

Even though the data is not evenly distributed, we do see 3 groups emerge from the dataset.
In a standard linear scale, a visual (pixel) distance of 10 units might correspond to an *addition* of 10 units in the data domain. A logarithmic transform maps between multiplication and addition, such that `log(u) + log(v) = log(u*v)`. As a result, in a logarithmic scale, a visual distance of 10 units instead corresponds to *multiplication* by 10 units in the data domain, assuming a base 10 logarithm. The `log` scale above defaults to using the logarithm base 10, but we can adjust this by providing a `base` parameter to the scale.

### Styling Axes

To the untrained eye, log scale may be a bit confusing and so it is important that our axis title mirrors the scale.
Let’s add a more informative axis title: we’ll use the title property of the encoding to provide the desired title text. Note that we could also have used the alt.Axis to specify the title.

In [202]:
alt.Chart(dataAsia).mark_line(
    strokeWidth=2,
    point=alt.OverlayMarkDef(size=60),
).encode(
    alt.X('year:O'),
    alt.Y('pop:Q',
          scale=alt.Scale(type='log'),
          title='Population (log scale)'),
    alt.Color('country:N'),
    alt.Tooltip(['country', 'pop'])
).properties(
    width={"step": 100},
    height=400,
)

However, the grid lines are now rather dense. If we want to remove grid lines altogether, we can add `grid=False` to the `axis` attribute. But what if we instead want to reduce the number of tick marks, for example only including grid lines for each order of magnitude?

To change the number of ticks, we can specify a target `tickCount` property for an `Axis` object. The `tickCount` is treated as a *suggestion* to Altair, to be considered alongside other aspects such as using nice, human-friendly intervals. We may not get *exactly* the number of tick marks we request, but we should get something close.

Altair also allows us to format axes values, using [D3's number format patter](https://github.com/d3/d3-format#locale_format).
 `s` - decimal notation, with an [SI prefix](https://github.com/d3/d3-format#locale_formatPrefix), rounded to significant digits

In [203]:
alt.Chart(dataAsia).mark_line(
    strokeWidth=2,
    point=alt.OverlayMarkDef(size=60),
).encode(
    alt.X('year:O'),
    alt.Y('pop:Q',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(tickCount=4, format="s"),
          title='Population (log scale)'),
    alt.Color('country:N'),
    alt.Tooltip(['country', 'pop'])
).properties(
    width={"step": 100},
    height=400,
)

## Rect Marks
The last mark that we will use is the `rect_mark`.
It is typically used to create heatmaps.
The term heatmap was assigned to this visual representation in the early 1990s and was widely used in the financial industry to depict cyclic time-varying data.
A heatmap is bascially a matrix or table in which each cell uses color to encodea a numerical value.

### Energy Data
We will be visualizing a subset of Mike Bostock's energy consumption data for 2019.
To get a sense of the data, please skim the [visualization](https://observablehq.com/@mbostock/electric-usage-2019) he created.

In [204]:
path = 'data/energy_usage.csv'
data = pd.read_csv(path)
data.head(10)

Unnamed: 0,date,usage
0,2019-01-01T08:00Z,1.88
1,2019-01-01T09:00Z,2.69
2,2019-01-01T10:00Z,1.73
3,2019-01-01T11:00Z,1.6
4,2019-01-01T12:00Z,3.24
5,2019-01-01T13:00Z,2.0
6,2019-01-01T14:00Z,3.33
7,2019-01-01T15:00Z,3.79
8,2019-01-01T16:00Z,1.55
9,2019-01-01T17:00Z,-0.85


### TimeUnit Tranforms

Here are excerpts from the API about [Times and Dates](https://altair-viz.github.io/user_guide/times_and_dates.html?highlight=time)

> Altair is designed to work best with Pandas timeseries. A standard timezone-agnostic date/time column in a Pandas dataframe will be both interpreted and displayed as local user time.
For date-time inputs like these, it can sometimes be useful to extract particular time units (e.g. hours of the day, dates of the month, etc.).
In Altair, this can be done with a time unit transform, discussed in detail in [TimeUnit Transform](https://altair-viz.github.io/user_guide/transform/timeunit.html#user-guide-timeunit-transform).

We will provide some examples, but strongly recommend that you consult the API.

For example, we might decide we want a heatmap with hour of the day on the x-axis, and day of the month on the y-axis:
Let's start off my encoding the month for each data item with the `x` channel.

In [205]:
alt.Chart(data).mark_rect().encode(
    alt.X('month(date)')
)

It is very hard to see each individual rectangle. All we can surmise from this visual representation is that the dataset includes energy data for the first 7 months of the year.
Let's use color to encode the energy usage for each month.

In [206]:
alt.Chart(data).mark_rect().encode(
    alt.X('month(date)'),
    alt.Color('sum(usage):Q')
)

Now we can see individual rectangles. Because Mike has solar panels the energy consumption reduces as we proceed through the year.
In this visualize we are addressing a sequential task. Let's transition to time-varying tasks.

### Heatmap
Let us answer the question _What does the energy consumption look like for each day of the week?_
To create a heatmap, let's use the `y` channel to encode the day of the week.

In [207]:
alt.Chart(data).mark_rect().encode(
    alt.X('month(date):O'),
    alt.Y('day(date):O'),
    alt.Color('sum(usage):Q')
)

Notice how for both the `x` and `y` channels we are using the same attribute/field in our dataset **date**.
The TimeUnit transform extracts the relevant aspects from the datum.
What if we wanted to ask the question, _what time of the day has the highest or lowest energy usage?_
Let's use `y` to encode the month, and `x` to encode the time of the day.


In [208]:
alt.Chart(data).mark_rect().encode(
    alt.Y('month(date):O'),
    alt.X('hoursminutes(date):O'),
    alt.Color('sum(usage):Q')
)

It is worth mentioning that the color channel is encoding an aggregation.
It is not encoding the individual energy for a specific day.
Change the aggregation from `sum` to `average` what differences do you observe.
Remove the aggregation, what is being depicted?
We can ask the question again _what time of day has the highest or lowest energy usage?_ but this time let us aggregated the day by day of the week as opposed to month.

In [209]:
alt.Chart(data).mark_rect().encode(
    alt.Y('day(date):O'),
    alt.X('hoursminutes(date):O'),
    alt.Color('sum(usage):Q')
)

Let's visualize the entire dataset,
Use the `y` channel to encode the date, the `x` channel to encode the time of day.


In [210]:
alt.Chart(data).mark_rect().encode(
    alt.Y('monthdate(date):O'),
    alt.X('hoursminutes(date):O'),
    alt.Color('usage:Q')
)

Do you notice the date that has no data? Go to Mike's post](https://observablehq.com/@mbostock/electric-usage-2019) to find out why.
This is a big plot, let's switch the data being encoded on the `x` and `y` channels and make the chart smaller.
Let's rename the axes titles as well and add a title for the chart.

In [223]:
alt.Chart(data).mark_rect().encode(
    alt.X('monthdate(date):O',
    title='Day'),
    alt.Y('hoursminutes(date):O',
          title='Time of Day'),
    alt.Color('usage:Q')
).properties(width=700, height = 150, title = "Mike Bostocks' Household Energy Usage 2019")

What do you observe?
Note that you can customize each rectangle and set its size. If you go that route, you have to play around with resizing the chart to make sure that there is no blank space.


## Summary

In this notebook, we explored how the line, area, and rectangle graphical marks can be used to create charts used to visualize time-varying data.
You are more than halfway through the course. You have learned **ALOT**. Integrating what we've learned across the notebooks so far about encodings, data transforms, and customization, you should now be prepared to make a wide variety of statistical graphics. Now you can put Altair into everyday use for exploring and communicating data!

Interested in learning more about this topic?

- Start with the [Altair Customizing Visualizations documentation](https://altair-viz.github.io/user_guide/customization.html).
- For a complementary discussion of scale mappings, see ["Introducing d3-scale"](https://medium.com/@mbostock/introducing-d3-scale-61980c51545f).
- For a more in-depth exploration of all the ways axes and legends can be styled by the underlying Vega library (which powers Altair and Vega-Lite), see ["A Guide to Guides: Axes & Legends in Vega"](https://beta.observablehq.com/@jheer/a-guide-to-guides-axes-legends-in-vega).

