Skip to content
This repository was archived by the owner on Jul 13, 2021. It is now read-only.

Conversation

@greimel
Copy link
Contributor

@greimel greimel commented Dec 28, 2020

Example

Generate data
using CairoMakie
using DataFrames
using Random: shuffle

bar_df = let
	n_dodge = 2
	n_x = 3
	n_stack = 5
	n = n_dodge * n_x * n_stack
	
	grp_dodge = ["dodge $i" for i in 1:n_dodge]
	grp_x     = ["x $i"     for i in 1: n_x]
	grp_stack = ["stack $i" for i in 1:n_stack]
	
	df = Iterators.product(grp_dodge, grp_x, grp_stack) |> DataFrame
	cols = [:grp_dodge, :grp_x, :grp_stack]
	rename!(df, cols)
	transform!(df, cols .=> categorical .=> cols)
	
	cols_i = cols .|> string .|> x -> x[5:end]  .|> x -> x * "_i"
	transform!(df, cols .=> (x -> Int.(x.refs)) .=> cols_i)
	
	df[:,:y] = rand(n)
	#shuffle
	df = DataFrame(shuffle(eachrow(df)))
	df
end
barplot(bar_df.grp_x, bar_df.y, dodge = bar_df.grp_dodge, stack = bar_df.grp_stack, color = bar_df.stack_i)

image

RFC

In order to be really useful for AlgebraOfGraphics, this should also support incremental build-up of plots. That is, one needs to have a struct like

mutable struct BarPlotContents 
     x_groups
     dodge_groups
     stack_groups
     heights # length(dodge_groups) x length(stack_groups) x length(x_groups)
end

and whenever you add to the plot you

  • check if groups need to be expanded
  • calculate y and fillto given the current plot
  • plot
  • update heights

I would somehow change the behaviour of how barplot! works now, wouldn't it?

Do you have any other ideas how to support incremental plotting of grouped bar plots?

Do you want to support this in AbstractPlotting?

@greimel
Copy link
Contributor Author

greimel commented Dec 28, 2020

Can I somehow access attributes of previously plotted plots of the same Axis2D in the plot!(BarPlot) function? Then one could add new attributes (like heights) and use previous plots to calculate them.

@greimel
Copy link
Contributor Author

greimel commented Jan 12, 2021

Could you give me some feedback if (this is how) you want this?

If so, I'll try to remove the dependency on DataFrames.jl again and add some tests.

@greimel
Copy link
Contributor Author

greimel commented Jan 12, 2021

FWIW, after the MakieLayout merge, the categorical axis labels are gone,

image

@SimonDanisch
Copy link
Member

Could you give me some feedback if (this is how) you want this?

Sorry, I was super busy in my freetime to actually release JSServe & WGLMakie.

FWIW, after the MakieLayout merge, the categorical axis labels are gone,

Hm, I guess we need to figure out how to do that 😅

@SimonDanisch
Copy link
Member

Would be nice to make this independent of Dataframes.
@jkrumbiegel and I are trying to figure out how to do categorical axes with MakieLayout.
Sorry, this PR had a bad timing, but I'm very excited to have this merged!

@greimel
Copy link
Contributor Author

greimel commented Jan 17, 2021

Alright, I probably won't have time for this in the next two weeks, but who knows.

Note to self: I need to make this work for negative values and for a mixture of positive and negative values.

@grahamas grahamas mentioned this pull request Feb 19, 2021
7 tasks
@greimel
Copy link
Contributor Author

greimel commented Feb 20, 2021

This can now handle mixing positive and negative values

image

@greimel
Copy link
Contributor Author

greimel commented Feb 21, 2021

Examples

Plain bar

df = filter(:grp_stack => ==(1), DataFrame(bar_df))
filter!(:grp_dodge => ==(1), df)
	
barplot(df.x, df.y)

image

Dodged and stacked

df = bar_df
barplot(df.x, df.y, dodge = df.grp_dodge, stack = df.grp_stack, color = df.grp_stack)

image

Dodged

df = filter(:grp_stack => ==(1), DataFrame(bar_df))
barplot(df.x, df.y, dodge = df.grp_dodge, color = df.grp_dodge, dodge_gap = 0)

image

Stacked

df = filter(:grp_dodge => ==(1), DataFrame(bar_df))
barplot(df.x, df.y, stack = df.grp_stack, color = df.grp_stack)

image

Data

(copied from test)

begin
	x1         = [1, 1,  1,  1]
	y1         = [2, 3, -3, -2]
	grp_dodge1 = [2, 2,  1,  1]
	grp_stack1 = [1, 2,  1,  2]

	x2         = [2, 2,  2,  2]
	y2         = [2, 3, -3, -2]	
	grp_dodge2 = [1, 2,  1,  2]
	grp_stack2 = [1, 1,  2,  2]

	from, to = stack_from_to(grp_stack1, y1, (; x1, grp_dodge1))
	from1 = [0.0, 2.0,  0.0, -3.0]
	to1   = [2.0, 5.0, -3.0, -5.0]
	@test from == from1
	@test to   == to1

	from, to = stack_from_to(grp_stack2, y2, (; x2, grp_dodge2))
	from2 = [0.0,  0.0,  0.0,  0.0]
	to2   = [2.0,  3.0, -3.0, -2.0]
	@test from == from2
	@test to   == to2

	perm = [1, 4, 2, 7, 5, 3, 8, 6]
	x = [x1; x2][perm]
	y = [y1; y2][perm]
	grp_dodge = [grp_dodge1; grp_dodge2][perm]
	grp_stack = [grp_stack1; grp_stack2][perm]
	
	from_test = [from1; from2][perm]
	to_test = [to1; to2][perm]
	
	from, to = stack_from_to(grp_stack, y, (; x, grp_dodge))
	@test from == from_test
	@test to == to_test
	
	bar_df  = (; x,  grp_dodge,  grp_stack,  y)
end

If desired I can add these to the documentation.

Also, where should the tests go? (Which file/folder?)

@jkrumbiegel
Copy link
Member

For tests, you can make your own file under unit_tests I think, to check the grouping logic. For the plot part, I would add examples in the tests folder, wherever it looks the most appropriate to you

@greimel
Copy link
Contributor Author

greimel commented Mar 22, 2021

done, is a test dependency on CategoricalArrays ok?

if you want #667, then I could test the functionality from there in this test here as well.

@SimonDanisch
Copy link
Member

Great!

done, is a test dependency on CategoricalArrays ok?

Sounds good :)

@greimel
Copy link
Contributor Author

greimel commented Mar 22, 2021

Done, this PR now relies on #667

@greimel
Copy link
Contributor Author

greimel commented Mar 22, 2021

here's the plot I added to the tests:

	using CategoricalArrays: categorical
	
	x1         = ["a_right", "a_right", "a_right", "a_right"]
	y1         = [2, 3, -3, -2]
	grp_dodge1 = ["b", "b",  "a", "a"]
	grp_stack1 = [1, 2,  1,  2]

	x2         = ["z_left", "z_left", "z_left", "z_left"]
	y2         = [2, 3, -3, -2]	
	grp_dodge2 = ["a", "b", "a", "b"]
	grp_stack2 = [1, 1,  2,  2]

	perm = [1, 4, 2, 7, 5, 3, 8, 6]
	x = [x1; x2][perm]
	x = categorical(x, levels = ["z_left", "a_right"])
	y = [y1; y2][perm]
	grp_dodge = [grp_dodge1; grp_dodge2][perm]
	grp_stack = [grp_stack1; grp_stack2][perm]
		
	tbl = (; x,  grp_dodge,  grp_stack,  y)
	
	fig = Figure()
	ax = Axis(fig[1,1])
	
	barplot!(ax, tbl.x, tbl.y, dodge = tbl.grp_dodge, stack = tbl.grp_stack, color = tbl.grp_stack)
	
       categorical_ticks(cat) = AbstractPlotting.categorical_range(cat), 
		AbstractPlotting.categorical_labels(cat)
	ax.xticks = categorical_ticks(tbl.x)
	
	fig

image

@greimel
Copy link
Contributor Author

greimel commented Mar 26, 2021

Tests now pass locally

greimel added 4 commits April 5, 2021 17:22
qualify names from DataFrames


remove @show


edge cases


1


5
use `categoric_positions`


fix


fix
test


add CategoricalArrays as test dependency


update test


ReferenceTest: using at top level


import function to unit tests


add categorical arrays to referencetests


fix tests


rename tests file: bar.jl => stack.jl
@greimel
Copy link
Contributor Author

greimel commented Apr 6, 2021

given the skepticism about #666 and #667 I disentangled this PR again.

I would really appreciate to have this merged, because I need it often.

Please let me know if I can be of any further help here.

@SimonDanisch
Copy link
Member

I think the deps need to clean up to pass tests (remove Dataframes, add StructArrays or so?).
Otherwise, this seems good to merge (@jkrumbiegel ? )!

@greimel
Copy link
Contributor Author

greimel commented Apr 6, 2021

right, I've overlooked this. I had done this, but I've rebased this PR way too often now :-\

@greimel
Copy link
Contributor Author

greimel commented Apr 6, 2021

tests failed only on 1.4 due to NamedTuple syntax - this is now fixed.

CI still fails because docs previews fail for non-JuliaPlots people

@greimel
Copy link
Contributor Author

greimel commented Apr 14, 2021

friendly bump 🙂

@greimel
Copy link
Contributor Author

greimel commented Apr 15, 2021

@SimonDanisch I added examples to the docs

image
image
image
image

@greimel
Copy link
Contributor Author

greimel commented Apr 15, 2021

text looks scrambled when I build the docs locally.

E.g.

image

But it's similar for unrelated sections of the docs.

@jkrumbiegel
Copy link
Member

The svgs are interacting with each other, thats why we use the master branch of documenter

@greimel
Copy link
Contributor Author

greimel commented Apr 15, 2021

I see. I rebuilt with Documenter#master and now it looks good.

This is ready from my side.

@piever
Copy link
Member

piever commented Apr 15, 2021

Sorry for only looking carefully at this now!

Concerning "incremental build-ups", the key idea is that, in AlgebraOfGraphics, once the user add series, those are accumulated in an abstract object, which is then plotted, so there may not be a need for a "build-up mode" (I may need to think harder about this).

What it would need though is a way to pass dodging and stacking information where the set of "dodgeable values" is not taken from unique(dodge) (this will be necessary to synchronize "dodging" across different plots in a facet, without having to mess around with categorical arrays). This can definitely be added later though, but it may be worth to add a note that the dodging / stacking interface is experimental, so that it can be changed without bumping the minor version number.

I confess I would prefer the default approach to be something like dodge = [1, 2, 3] (meaning, put the bar in the first, second, or third dodge slot) . It seems more consistent with the rest of AbstractPlotting than the automated approach based on unique (forcing categorical conversion for all input types).

The issue with supporting both approaches is what it would mean to do dodge = [1, 2, 4]. IMO, here the user is asking to actively skip a dodge slot, whereas with the categorical conversion it would become the same as dodge = [1, 2, 3]. OTOH, if we take the default to be dodge::Vector{Int}, the user can always pass dodge = levels(x) to get the categorical conversion behavior. (Or we can select a more restricted set of input types on which to perform this conversion.)

Comment on lines 55 to 59
if dodge === automatic
n_dodge = 1
else
n_dodge = length(unique(dodge))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i_dodge could be computed before this code block. Then this could simply be n_dodge = maximum(i_dodge)? That way if we change the logic to compute i_dodge, we only have to change it once.

if dodge === automatic
i_dodge = 1
else
i_dodge = categoric_position.(dodge, Ref(categoric_labels(dodge)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "incriminated line" for me. I think somehow it would be nice for the user to be able to pass i_dodge directly.

@greimel
Copy link
Contributor Author

greimel commented Apr 15, 2021

Concerning "incremental build-ups", the key idea is that, in AlgebraOfGraphics, once the user add series, those are accumulated in an abstract object, which is then plotted, so there may not be a need for a "build-up mode" (I may need to think harder about this).

Right, I've worked around this in TabularMakie.jl.

necessary to synchronize "dodging" across different plots in a facet

That's a good point. For dodging, we can support [1, 2, 4] leaving a gap at three. For stacking this cannot work because it would require redrawing the whole plot. For this use case we need n_dodge as an additional attribute, that defaults to maximum(i_dodge). (What if one panel has i_dodge = [1, 2, 4] and the other i_dodge = [1, 2, 3]?)

I'd be happy to only allowing integers for dodging and stacking for now. Categorical variables can be supported once it has become clear how Makie deals with them in general.

Before I make these changes I'd like to hear what Julius and Simon think, though.

@piever
Copy link
Member

piever commented Apr 15, 2021

That's a good point. For dodging, we can support [1, 2, 4] leaving a gap at three. For stacking this cannot work because it would require redrawing the whole plot.

There may be smart ways to go about this using Observables, but let's worry about those in a future iteration.

I'd be happy to only allowing integers for dodging and stacking for now. Categorical variables can be supported once it has become clear how Makie deals with them in general.

Yes, I also think that's best, this should be done in a unified way (but yes, let's also see what Simon and Julius think).

@SimonDanisch
Copy link
Member

I'm ready to merge this once the docs build: #697, if there are no other concerns by @greimel, @jkrumbiegel or @piever :)

@SimonDanisch
Copy link
Member

Docs build, any final concerns?

@greimel
Copy link
Contributor Author

greimel commented Apr 20, 2021

@piever suggested to disallow non-integer inputs for stacking and dodging and add support for categorical inputs when Makie-wide support for categorical variables has been improved.

@greimel
Copy link
Contributor Author

greimel commented Apr 20, 2021

I pushed one more commit that restricts inputs of stack and dodge to integers. Integers are used directly to determine the dodge-position.

Please take a look, @piever.

image

tbl = (x = [1, 1, 1, 2, 2, 2, 3, 3, 3],
       height = 0.1:0.1:0.9,
       grp = [1, 2, 3, 1, 2, 3, 1, 2, 3],
       grp1 = [3, 2, 2, 1, 1, 2, 1, 1, 3],
       grp2 = [1, 1, 3, 1, 2, 1, 1, 3, 1]
       )

barplot(tbl.x, tbl.height,
        dodge = tbl.grp1,
        stack = tbl.grp2,
        color = tbl.grp,
        axis = (xticks = (1:3, ["left", "middle", "right"]),
                title = "Dodged and stacked bars"),
        figure = (resolution = (800, 600), )
        )

@piever
Copy link
Member

piever commented Apr 20, 2021

Thanks a lot for the change! Looks great to me, this should already be usable from my WIP AoG rewrite!

@jkrumbiegel
Copy link
Member

jkrumbiegel commented Apr 20, 2021

Haven't had time to check this out, but looks good so far! One thing I'd like to see is a stacked / dodged barplot where fillto is set to something other than 0. This will be needed when using it with log axes, as bars are not allowed to start at 0 in that case.

fillto can be an array, so two versions would be good that show this still works

@greimel
Copy link
Contributor Author

greimel commented Apr 20, 2021

I don't touch the fillto argument except for stacked barplots. (In my opinion scaled axes don't make sense for stacked bar plots because then the bar lengths depend on the stack index.)

Can you give me a MWE for a standard barplot with log-y-axis, then I can adapt it for dodging and show that it works.

@SimonDanisch
Copy link
Member

Thanks, this is really great :)

@SimonDanisch SimonDanisch merged commit 79d9837 into JuliaPlots:master Apr 21, 2021
x_unique = unique(filter(isfinite, x))

if length(x_unique) == 1
width = 1.0
Copy link
Member

@piever piever Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late for this comment, but I've just realized that this approach (setting default width to take the full space, and reducing it later) could be wrong. Now the width passed by the user is no longer respected, so for example in barplot(1:3, rand(3), width=1), bars no longer touch (this is problematic for histograms, where there shouldn't be space between the bars).

The "proper way" IMO is to stick with the previous approach of defaulting to width = 0.8 * default_gap, and do not add extra "outside spacing" later.

Copy link
Member

@piever piever Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greimel Is fixing this just a matter of changing defaults, where default width goes 0.8 * default gap and default x_gap goes to 0?

EDIT: even setting x_gap = 0 does not seem to completely remove the gap between bars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to set both x_gap and dodge_gap to 0.

begin
	f = Figure(resolution = (800, 600))
	Axis(f[1, 1], xticks = 0.5:0.1:2.5)

	xs = 1:2
	ys = 0.5 .* sin.(xs)

	barplot!(xs, ys, color = :red, strokecolor = :black, width = 0.8, strokewidth = 0.1, x_gap = 0, dodge_gap = 0)

	f
end

image

We could set these two to automatic as well. dodge_gap would be zero if dodge === automatic. And x_gap would be zero if width !== automatic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I think the cleanest solution is to only use x_gap to determine the width if it is automatic. This allows us to simplify those annoying dodge formulas as well, because there is no longer x_gap to worry about, see #702

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants