Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

d3.stack for tidy data? #158

Open
mbostock opened this issue Feb 6, 2020 · 3 comments
Open

d3.stack for tidy data? #158

mbostock opened this issue Feb 6, 2020 · 3 comments
Assignees

Comments

@mbostock
Copy link
Member

mbostock commented Feb 6, 2020

d3.stack is designed to work with non-tidy data where each row corresponds to a “group” (the set of observations for all layers, e.g., year) with properties for each “layer” a.k.a. series (e.g., format) recording the observed value (e.g., revenue).

Year 8 - Track Cassette Cassette Single
1973 2699600000 419600000 0
1974 2730600000 433600000 0

In the tidy format, in contrast, rows correspond to observations and columns correspond to variables. (This is less efficient as the layer names are repeated, but oh well.)

Year Format Revenue
1973 8 - Track 2699600000
1973 Cassette 419600000
1973 Cassette Single 0
1974 8 - Track 2730600000
1974 Cassette 433600000
1974 Cassette Single 0

It’s possible to use tidy data with d3.stack, but it’s a little convoluted.

series = d3.stack()
    .keys(d3.group(data, d => d.name).keys())
    .value((group, key) => group.get(key).value)
    .order(d3.stackOrderReverse)
  (d3.rollup(data, ([d]) => d, d => d.year, d => d.name).values())
    .map(s => (s.forEach(d => d.data = d.data.get(s.key)), s))

It’d be nice if were more convenient to give d3.stack tidy data, say like so:

series = d3.stack()
    .key(d => [d.name, d.year])
    .value(d => d.value)
    .order(d3.stackOrderReverse)
  (data)

Here the key accessor would return a two-part key: the layer key and the group key. And the value accessor wouldn’t need to know the current keys. (Because the data is tidy, the value accessor is the same for all observations.)

An implication of the proposed design is that the data can be sparse: some layers may be missing observations for some groups (and equivalently vice versa). That’s not possible with the current design because the layer keys (stack.keys) and group keys (data) are specified as separate arrays, but it should be easy enough for d3.stack to compute the union of layer keys and the union of group keys to fill in the missing data. d3.stack probably will also need some facility for ordering the group keys, as the order may not be consistent across layers.

I imagine it’ll be difficult to make this backwards-compatible, but maybe it’s possible, or maybe it could be under a new name such as d3.stackTidy.

@Fil
Copy link
Member

Fil commented Feb 6, 2020

Absolutely! We can use https://observablehq.com/@fil/ncov2019-data#databyday for a current example :-/ (I don't think my data wrangling in that notebook is the most straightforward.)

@mbostock
Copy link
Member Author

mbostock commented Feb 9, 2020

Here’s an earlier example that breaks the data transformation into separate cells:

https://observablehq.com/@d3/stacked-area-chart-via-d3-group

@Fil Fil self-assigned this Mar 28, 2021
@Fil
Copy link
Member

Fil commented Aug 2, 2021

I feel that Plot's stack transform is the correct answer now, but we need to design the API if we want to port it back to D3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants