Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
199 lines (165 sloc) 10.3 KB
---
title: "Calendar-based graphics"
author: "Earo Wang, Di Cook, Rob J Hyndman"
bibliography: references.bib
biblio-style: authoryear-comp
link-citations: yes
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Calendar-based graphics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Introduction
```{r initial, echo = FALSE, cache = FALSE, results = 'hide'}
library(knitr)
opts_chunk$set(
warning = FALSE, message = FALSE, echo = TRUE,
fig.width = 7, fig.height = 6, fig.align = 'centre',
comment = "#>"
)
options(tibble.print_min = 5)
```
Calendar-based graphics turn out to be a useful tool for visually unfolding people's daily schedules in detail, such as hourly foot traffic in the CBD, daily residential electricity demand and etc. It arranges the values according to the corresponding dates into a calendar layout, which is comprised of weekdays in columns and weeks of a month in rows for a common monthly calendar. The idea originates from @van_wijk_cluster_1999 and is implemented in a couple of R packages ([ggTimeSeries](https://github.com/AtherEnergy/ggTimeSeries) and [ggcal](https://github.com/jayjacobs/ggcal)), yet they all are a variant of heatmap in temporal context. We extend the calendar-based graphics to a broader range of applications using linear algebra tools. For example, (1) it not only handles the data of daily intervals but also of higher frequencies like hourly data; (2) it is no longer constrained to a heatmap but can be used with other types of *Geoms*; (3) the built-in calendars include *monthly*, *weekly*, and *daily* types for the purpose of comparison between different temporal components. The `frame_calendar()` function returns the computed calendar grids as a data frame or a tibble according to its data input, and *ggplot2* takes care of the plotting as you usually do with a data frame.
We are going to use Melbourne pedestrian data (shipped with the package) as an example throughout the vignette, which is sourced from [Melbourne Open Data Portal](http://www.pedestrian.melbourne.vic.gov.au). The subset of the data contains 7 sensors counting foot traffic at hourly intervals across the city of Melbourne from January to April in 2017.
```{r load}
library(tidyr)
library(dplyr)
library(viridis)
library(sugrrants)
pedestrian17 <- filter(pedestrian, Year == "2017")
pedestrian17
```
We'll start with one sensor only--Melbourne Convention Exhibition Centre--to explain the basic use of the `frame_calendar()`. As it attempts to fit into the *tidyverse* framework, the interface should be straightforward to those who use *tidyverse* on a daily basis. The first argument is the `data` so that the data frame can directly be piped into the function using `%>%`. A variable indicating time of day could be mapped to `x`, a value variable of interest mapped to `y`. `date` requires a `Date` variable to organise the data into a correct chronological order . See `?frame_calendar()` for more options. In this case, *Time* as hour of day is used for `x` and *Hourly_Counts* as value for `y`. It returns a data frame including newly added columns *.Time* and *.Hourly_Counts* with a "." prefixed to the variable names. These new columns contain the rearranged coordinates for the calendar plots later.
```{r centre}
centre <- pedestrian17 %>%
filter(Sensor_Name == "Melbourne Convention Exhibition Centre")
centre_calendar <- centre %>%
frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly")
centre_calendar
```
Consequently, *.Time* and *.Hourly_Counts* are mapped to the x and y axes respectively, grouped by *Date* when using `geom_line()`. The transformed *.Time* and *.Hourly_Counts* variables no longer carry their initial meanings, and thereby their values are meaningless.
```{r centre-plot, fig.height = 7}
p1 <- centre_calendar %>%
ggplot(aes(x = .Time, y = .Hourly_Counts, group = Date)) +
geom_line()
p1
```
To make the plot more accessible and informative, we provide another function `prettify()` to go hand in hand with `frame_calendar()`. It takes a `ggplot` object and gives sensible breaks and labels. It can be noted that the calendar-based graphic depicts time of day, day of week, and other calendar effects like public holiday in a clear manner.
```{r centre-more, fig.height = 7}
prettify(p1)
```
## Scales
Scaling is controlled by the `scale` argument: `fixed` is the default suggesting to be scaled globally. The figure above shows the global scale that enables overall comparison. Another option `free` means to be scaled for each daily block individually. It puts more emphasis on a single day shape instead of magnitude comparison.
```{r centre-free, fig.height = 6, fig.width = 9}
centre_calendar_free <- centre %>%
frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly",
scale = "free", ncol = 4)
p2 <- ggplot(centre_calendar_free,
aes(x = .Time, y = .Hourly_Counts, group = Date)) +
geom_line()
prettify(p2)
```
The other two choices are `free_wday` and `free_mday`, scaled conditionally on each weekday and each day of month respectively. The code snippet below gives the scaling by weekdays so that it enables to compare the magnitudes across Mondays, Tuesdays, and so on.
```{r centre-wday, fig.height = 6, fig.width = 9}
centre_calendar_wday <- centre %>%
frame_calendar(x = Time, y = Hourly_Counts, date = Date, calendar = "monthly",
scale = "free_wday", ncol = 4)
p3 <- ggplot(centre_calendar_wday,
aes(x = .Time, y = .Hourly_Counts, group = Date)) +
geom_line()
prettify(p3)
```
## Use in conjunction with `group_by`
We can also superimpose one sensor on top of the other. Without using `group_by()`, they will share the common scale on the overlaying graph.
```{r overlay, fig.height = 6, fig.width = 9}
two_sensors <- c("Lonsdale St (South)", "Melbourne Convention Exhibition Centre")
two_sensors_df <- pedestrian17 %>%
filter(Sensor_Name %in% two_sensors)
two_sensors_calendar <- two_sensors_df %>%
frame_calendar(x = Time, y = Hourly_Counts, date = Date, ncol = 4)
p4 <- ggplot(two_sensors_calendar) +
geom_line(
data = filter(two_sensors_calendar, Sensor_Name == two_sensors[1]),
aes(.Time, .Hourly_Counts, group = Date), colour = "#1b9e77"
) +
geom_line(
data = filter(two_sensors_calendar, Sensor_Name == two_sensors[2]),
aes(.Time, .Hourly_Counts, group = Date), colour = "#d95f02"
)
prettify(p4)
```
The `frame_calendar()` function can be naturally combined with `group_by()`. Each grouping variable will have its own scale, making their magnitudes incomparable across different sensors.
```{r ped-facet, fig.height = 11, fig.width = 9}
grped_calendar <- two_sensors_df %>%
group_by(Sensor_Name) %>%
frame_calendar(x = Time, y = Hourly_Counts, date = Date, ncol = 4)
p5 <- grped_calendar %>%
ggplot(aes(x = .Time, y = .Hourly_Counts, group = Date)) +
geom_line(aes(colour = Sensor_Name)) +
facet_grid(Sensor_Name ~ .) +
scale_color_brewer(palette = "Dark2") +
theme(legend.position = "bottom")
prettify(p5)
```
## More geoms
It's not necessarily working with lines but other geoms too. One example is a lag scatterplot on a "daily" calendar. Lagged hourly counts are plotted against hourly counts in each daily cell using a point geom.
```{r ped-lag, fig.height = 2, fig.width = 8, warning = TRUE}
centre_lagged <- centre %>%
mutate(Lagged_Counts = dplyr::lag(Hourly_Counts))
centre_lagged_calendar <- centre_lagged %>%
frame_calendar(x = Hourly_Counts, y = Lagged_Counts, date = Date,
calendar = "daily")
p6 <- centre_lagged_calendar %>%
ggplot(aes(x = .Hourly_Counts, y = .Lagged_Counts, group = Date)) +
geom_point(size = 0.5)
prettify(p6, size = 3)
```
Furthermore, the argument `y` can take multiple variable names in combination with `vars()`. The rectangular glyphs arranged on a "weekly" calendar are plotted to illustrate the usage of the multiple `y`s and the differences between sensors. The long data format is firstly converted to the wide format using `tidyr::spread()` [@wickham2014tidy]. These two sensors are variables rather than values now, and hence can be passed to `y`.
```{r ped-daily}
two_sensors_wide <- two_sensors_df %>%
select(-Sensor_ID) %>%
spread(key = Sensor_Name, value = Hourly_Counts) %>%
rename(
Lonsdale = `Lonsdale St (South)`,
Centre = `Melbourne Convention Exhibition Centre`
) %>%
mutate(
Diff = Centre - Lonsdale,
More = if_else(Diff > 0, "Centre", "Lonsdale")
)
sensors_wide_calendar <- two_sensors_wide %>%
frame_calendar(x = Time, y = vars(Centre, Lonsdale), date = Date,
calendar = "weekly")
sensors_wide_calendar
```
Having multiple `y`s makes it a little easier when mapping to some geoms that contain the `ymin` and `ymax` arguments, such as `geom_rect` and `geom_ribbon`. Interestingly, Lonsdale Street is busier than the Convention Centre in the evening, vice versa in the day time.
```{r ped-daily-plot}
p7 <- sensors_wide_calendar %>%
ggplot() +
geom_rect(aes(
xmin = .Time, xmax = .Time + 0.005,
ymin = .Lonsdale, ymax = .Centre, fill = More
)) +
scale_fill_brewer(palette = "Dark2") +
theme(legend.position = "bottom")
prettify(p7)
```
Can we plot a heatmap using `frame_calendar()` too? Yep, either `x` or `y` can be given as 1 implying identity across days. If both `x` and `y` are given as 1, only calendar grids are returned. A heatmap therefore can be drawn.
```{r ped-max}
centre_daily <- centre %>%
group_by(Date) %>%
summarise(Max_Counts = max(Hourly_Counts)) %>%
ungroup()
centre_max_calendar <- centre_daily %>%
frame_calendar(x = 1, y = 1, date = Date, calendar = "monthly")
head(centre_max_calendar)
p8 <- centre_max_calendar %>%
ggplot(aes(x = .x, y = .y)) +
geom_tile(aes(fill = Max_Counts), colour = "grey50") +
scale_fill_viridis()
prettify(p8, label = "label", label.padding = unit(0.2, "lines"))
```
## Summary
As its name suggests, `frame_calendar()` just gives a rearranged data frame and leaves the plotting to *ggplot2*. This lends itself to more flexibility in calendar-based visualisation. In addition, some of these plots shown above may be produced using facets on temporal units by *ggplot2*; however, `frame_calendar()` coupled with *ggplot2* is much faster than `facet_*` as it's lighter weight. Lastly, it can handle long historical temporal data easily on a limited screen space as a result of calendar-based glyphs.
## Reference