Skip to content
This repository has been archived by the owner on Jun 29, 2021. It is now read-only.

grouping prototype #7

Merged
merged 39 commits into from
Nov 13, 2018
Merged

grouping prototype #7

merged 39 commits into from
Nov 13, 2018

Conversation

piever
Copy link
Member

@piever piever commented Oct 3, 2018

This is a WIP to play around with ideas from JuliaPlots/Plots.jl#1530 in StatMakie.

Instead of a group keyword I'm using the first argument as I don't know how to implement this in Makie otherwise...

The idea is that one can use the first argument to do some grouping and define some keywords on those grouping. Here for example I'm plotting a scatter plot of rand(10) versus rand(10) using two vectors rand(Bool,10) and rand(Bool, 10) to specify color and markersize respectively. Then I'm passing the list of colors and markersizes (this part should be optional and default to something sensible, but it is not at the moment).

julia> scatter((:color => rand(Bool, 10) => [:blue, :red], :markersize => rand(Bool, 10) => [0.03, 0.2]), rand(10), rand(10))

screenshot from 2018-10-03 15-33-12

Would be happy to have feedback on syntax (I'm starting to like the :color => v => scale syntax but different people may have different taste) and on whether this should be closer to Plots grouping (meaning only work for categorical values) or to GoG aesthetics (also accept continuous values and try to detect automatically which it is).

In combination with @df this could be used with column names rather than vectors and should provide some automated legend entries for the group as well.

@piever
Copy link
Member Author

piever commented Oct 3, 2018

At the moment one can also pass a function or a dict rather than an array as a scale, for example:

julia> scatter((:color => rand(Bool, 10) => t -> t ? :blue : :red, :markersize => rand(Bool, 10) => [0.03, 0.2]), rand(10), rand(10))

@piever
Copy link
Member Author

piever commented Nov 3, 2018

This now requires JuliaPlots/AbstractPlotting.jl#35 and MakieOrg/Makie.jl#203 if people want to try it out.

I now use a special Group type that accepts keyword arguments:

N = 20

scatter(
    Group(
        color = rand(Bool, 10) => [:blue, :red],
        markersize = rand(Bool, 10) => [0.03, 0.2]),
    rand(10),
    rand(10)
)

and I've started adding default scales:

scatter(
    Group(color = rand(1:4, N), marker = rand(Bool, N)),
    rand(N),
    rand(N)
)

screenshot from 2018-11-03 16-59-17

In this case marker cycles across all possible markers (now it's a bit hacky as they are only accessible as a Dict from AbstractPlotting so the order is undefined, but this can be easily fixed from there) and color default to the beautiful to_colormap(:Dark2) of JuliaPlots/AbstractPlotting.jl#37

To reduce verbosity, if no keyword is specified, it defaults to grouping over color, i.e.

scatter( Group(rand(1:4, 10)), rand(N), rand(N))

is equivalent to

scatter( Group(color=rand(1:4, 10)), rand(N), rand(N))

which is helpful for those used to the behavior of Plots.

Important design remark: Makie is quite different from Plots in that there is no pipeline to process keywords sequentially, so for group to have such a drastic effect on the plot, it needs to be a positional argument so the API will probably stay as is.

@piever
Copy link
Member Author

piever commented Nov 4, 2018

I feel it's making good progress. Now I can have some GoG flavor already.

Default grouping is by color, like in Plots. Positional arguments and "styling that does not group" goes into Style. Outside Style are arguments that don't use the dataset.

using StatsMakie, RDatasets
julia> iris = RDatasets.dataset("datasets", "iris");

julia> scatter(iris, Group(:Species), Style(:SepalLength, :SepalWidth))

screenshot from 2018-11-04 13-11-43

Grouping by more than one thing

julia> mpg = RDatasets.dataset("ggplot2", "mpg");

julia> scatter(mpg,
           Group(marker = :Manufacturer, color = :Class),
           Style(:Displ, :Hwy), markersize = 1
       )

screenshot from 2018-11-04 13-11-19

Give a pair of column and scale to use a custom scale:

julia> scatter(mpg,
           Group(marker = :Manufacturer, color = :Class => [:blue, :cyan, :magenta]), #cycle through these colors
           Style(:Displ, :Hwy), markersize = 1
       )

screenshot from 2018-11-04 13-32-53

Adding a continuous color

julia> scatter(mpg,
           Group(marker = :Manufacturer),
           Style(:Displ, :Hwy, color = :Hwy), markersize = 1 
       )

screenshot from 2018-11-04 13-29-54

Building complex graphics from simple ones

The user can pass many group objects and styles that are merged together, for example:

julia> scatter(mpg,
           Group(marker = :Manufacturer), Group(:Class),
           Style(:Displ, :Hwy), markersize = 1 
       )

julia> scatter(mpg,
           Group(marker = :Manufacturer),
           Style(:Displ, :Hwy), Style(color = :Hwy),  markersize = 1 
       )

which is the equivalent of adding things with the + in ggplot2. I think it makes it easier to reuse "building blocks" across plots, without overloading + which would feel a bit strange.

TODO list:

  • Add legend / labelling (not sure how they work in Makie)

  • Simplify implementation when new recipe system gets finalized in Makie

  • Add "facets" (not sure how to do though as subplotting is not really implemented via keywords here, unlike the other "aesthetics")

  • Figure out interactivity: when the grouping variable changes I have to redraws as the number of series could change, but I couldn't find a way to rerender on screen

  • Add tests

  • Fix some markersize bugs in AbstractPlotting I think:

julia> scatter(Group(marker = rand(Bool, 10)), rand(10), markersize = 0.1:0.1:1)

Error showing value of type Scene:
ERROR: MethodError: no method matching glyph_scale!(::Observables.Observable{Char}, ::Array{Vec{2,Float32},1})
Closest candidates are:
  glyph_scale!(::Char, ::Any) at /home/pietro/.julia/dev/AbstractPlotting/src/utilities/texture_atlas.jl:128

@Evizero
Copy link
Member

Evizero commented Nov 4, 2018

Looking good! The one comment I have is that to me it seems more general to have group1 + group2 denote merging groups, instead of specially supporting multiple group arguments in some functions.

edit: or probably group1 * group2 would be better, since its not commutative (i.e. one of the two arguments has priority in case of duplicates). This would be in line with the reasoning behind using * for string concatination in base

@piever
Copy link
Member Author

piever commented Nov 4, 2018

That's a good point. The support for many groups should happen for all recipes if I understand the Makie pipeline correctly (I'll be honest, that's a big if :) ). OTOH I agree that it's generally useful to play with this group / style objects as a user outside of plot calls: I guess one may even want to save them to disk for future sessions.

Now you can manually combine groups and styles with each other using merge (say merge(grp1, grp2)) and indeed the pipeline simply does foldl(merge, groups). OTOH maybe it'd be nicer to have some operator as well rather than merge. + felt a bit strange - we don't even use it for string concatenation in Julia so it really is a bit out of nowhere -, but I am unsure what other options are there. I was also considering * (from stirng concatenation) and (from union), but I'm not thrilled about those either... After all, given that you explained "it'd be nice to have + to merge groups", maybe merge is indeed the right choice :)

@Evizero
Copy link
Member

Evizero commented Nov 4, 2018

Right, I actually edited my post as well about + vs *.

I think as a first step it might make sense to just go with merge. This way we don't promise convenience API details too early. As time goes on we can always overload * if we feel like writing merge is such a hassle.

@piever
Copy link
Member Author

piever commented Nov 4, 2018

I like the argument for * as merging is associative but not commutative (rightmost one wins): I think I'll go for it.

@mkborregaard
Copy link
Member

mkborregaard commented Nov 4, 2018

This is really progressing sweetly. Question: would it be clearer to do Group(color = :Class), color = [:blue, :cyan, :magenta]) which seems more native than Group(color = :Class => [:blue, :cyan, :magenta])? Or - does it actually support this already? EDIT: (sorry I should try it out instead of asking)

@mkborregaard
Copy link
Member

mkborregaard commented Nov 4, 2018

I like the argument for * as merging is associative but not commutative (rightmost one wins): I think I'll go for it.

FWIW I had the thought that * could be used for merging Makie Themes as well, for the exact same reason that it's not commutative (same argument as using it for Strings in Base essentially). Like I think it would be cool to have the concept of style themes and color themes as separate things that the user might merge with *.

@piever
Copy link
Member Author

piever commented Nov 4, 2018

Question: would it be clearer to do Group(color = :Class), color = [:blue, :cyan, :magenta]) which seems more native than Group(color = :Class => [:blue, :cyan, :magenta])?

We mentioned this option already at JuliaPlots/Plots.jl#1530 and I initially decided against it here, but looking more closely it's a very nice solution.

My "pair" strategy never seemed to impress instead the color keyword is unused if the user is setting colors via a group, and your proposed solution plays very nicely with themes.

The only thing I'd like to mention is that we may need to allow using custom types for fancy scale behavior. For example, let's say I have "Placebo", "Treatment 1", "Treatment 2" and I want them to always be "black", "red", "blue" respectively for my paper. Maybe some plots only compare "Placebo" and "Treatment 2": the palette would make "Treatment 2" red instead of blue, which I don't want. So instead of passing a normal Palette we would need some custom type that encodes the correct mapping.

@mkborregaard
Copy link
Member

Good point, but I think quite often that would mostly happen 1) in connection to facets or some other behaviour that groups over the entire dataset to create subplots/series, in which case the problem would not arise. In the relatively rare case where you'd want to create a single plot just on a subset of the data but still have the colors comparable to other unrelated plots (/scenes) in the paper, would that not be easily taken care off by passing a color vector explicitly? color = mypalette.colors[1:3;5]

@piever
Copy link
Member Author

piever commented Nov 5, 2018

So, I've reimplemented it the way we suggested. What I'm doing is, I'm creating a trait isscale that says if the value of an attribute corresponds to a scale (I think arrays, functions and dicts all make sense for example). The attributes are generally non-empty because AbstractPlotting decides to put things in them, so there needs to be a criterion to know if it's a proper value or a placehodler.

So now it's something like this:

using RDatasets
mpg = RDatasets.dataset("ggplot2", "mpg")
p1 = scatter(mpg,                                                    
    Group(marker = :Class),                              
    Style(:Displ, :Hwy), Style(color = :Hwy),  markersize = 1,  
)
new_theme = Theme(
    scatter = Theme(marker = [:circle, :diamond])
)
AbstractPlotting.set_theme!(new_theme)
p2 = scatter(mpg,                                                    
    Group(marker = :Class),                              
    Style(:Displ, :Hwy), Style(color = :Hwy),  markersize = 1,  
)
vbox(p1, p2)

screenshot from 2018-11-05 15-15-48

@piever
Copy link
Member Author

piever commented Nov 12, 2018

This is basically ready, I think it only needs some bikeshedding on names, argument order, etcetera.

Here we introduce 3 things: grouping support, "statistic" support and table support.

Grouping is done by passing a Group object as first argument:

scatter(Group(marker = rand(Bool, 100)), rand(100), rand(100))

"Statistics" (Grammar of Graphics terminology) is done by passing a function as first argument:

plot(kde, rand(100))

They can be combined:

plot(kde, Group(linestyle = rand(Bool, 100)), rand(100))

Table support is done by passing the table as first argument and the columns can be given as Style, regardless of whether they are a positional of keyword argument:

scatter(iris, Style(:SepalLength, :SepalWidth, color = :PetalLength))

It can also be mixed with a "statistic" and I use the same order as IndexedTables groupby(f, data, by; select):

plot(kde, iris, Group(linestyle = :Species), Style(:SepalLength))

Here the order of Style and Group can actually be changed.

I'm reasonably happy with the API (we can add extra methods to simplify some scenarios later), but one remark was that while Style makes a lot of sense for attributes, say Style(color = :SepalLength, markersize = :SepalWidth), it works less well for positional arguments, say scatter(iris, Style(:SepalLength, :SepalWidth)) (even though one can argue we are styling the x and y coordinates).

OTOH, it is quite nice to have a unique concept for positional arguments and attributes, so I was planning to merge as is unless there are strong counter proposals wrt API.

@piever
Copy link
Member Author

piever commented Nov 12, 2018

Another issue that it'd be nice to figure out is the following: kde and fit!(Histogram, ...) etc. tend to take arguments as a Tuple in the multidimensional case. Say:

kde((rand(100), rand(100)) # for a 2D distribution

This leads to:

plot(kde, Style((:x, :y))

Which looks a bit clumsy. Should we have our own density that is a bit simpler and allows density(x, y) = kde((x,y))? Same for histogram

@mkborregaard
Copy link
Member

I think it makes sense to implement local functions for density etc that are tuned to giving a nice plotting api 👍
To me, Style is a confusing term for positional arguments, but it's really the same as aesthetic in ggplot2, so I might be the only one who feels this. I do think I prefer it over plot(kde, iris, :SepalLength, Group(linestyle = :Species)) which has a bit too many positional args to remember the position of.

Finally I'll say it's really impressive how you've managed to implement a nice and understandable Julia-ish API on Makie that allows very powerful GOG. This is one of my favourite PRs I've seen in a long time.

@Evizero
Copy link
Member

Evizero commented Nov 12, 2018

Finally I'll say it's really impressive how you've managed to implement a nice and understandable Julia-ish API on Makie that allows very powerful GOG. This is one of my favourite PRs I've seen in a long time.

I'd like to second that. Lately I have actively been eyeing my notifications specifically to see if you added new things here

@Evizero
Copy link
Member

Evizero commented Nov 12, 2018

in this example:

scatter(
    Group(marker = rand(Bool, 100)), 
    rand(100), rand(100)
)

how do i specify the two markers to choose from? My first naive guess based on earlier examples would be

scatter(
    Group(marker = rand(Bool, 100)),
    rand(100), rand(100),
    marker = [:circle, :rect]
)

right? and if yes is this also true for tables (i.e. is the last marker parameter outside of Style)?

@piever
Copy link
Member Author

piever commented Nov 12, 2018

how do i specify the two markers to choose from?

Yes, you guessed correctly, it's:

scatter(
    Group(marker = rand(Bool, 100)),
    rand(100), rand(100),
    marker = [:circle, :rect]
)

and it also works for tables. The nice thing about wrapping things in Style is that the marker attribute won't be inside Style so Julia will not confuse its symbols with potential names of columns (as instead could happen in @df in StatPlots).

A cleaner way (proposed by Michael), is to set this thing in the theme:

julia> new_theme = Theme(
           scatter = Theme(
               marker = [:rect, :circle], # don't do this! it works in this case, but we should use a Palette type!
               markersize = 0.3
           )
       )

julia> set_theme!(new_theme)

julia> scatter(
           Group(marker = rand(Bool, 100)),
           rand(100), rand(100)
       )

Even though for colors we have the nice Palette that can serve both as a list and as a value for a specific plot, whereas here passing directly a vector in the theme can cause issues. So the next step is to implement a generic Palette type to be able to set all these default values in the theme.

@dpsanders
Copy link

This looks very nice!

To me it would be more natural to do something like

plot(KDE(bandwidth = 0.1), rand(100), color = :blue)

It at first seems kind of cool to be able to mix the different types of keyword arguments, but in the end I think it is more confusing to do so.

@mkborregaard
Copy link
Member

mkborregaard commented Nov 13, 2018

@dpsanders yes, but how would you do that in practice? Given that plot isn't a macro, how would you use KDE(;kwargs...) when that method doesn't exist? You could do plot(x->kde(x, bandwidth = 0.1), rand(100), color = :blue) already, I think.

@piever Could the Theme constructor not simply wrap an input Vector in a Palette without the user having to worry about this?

@dpsanders
Copy link

Define a KDE object and dispatch on it?

@mkborregaard
Copy link
Member

Yes but that's already possible, no? That's just defining recipes, like there should also be a recipe for StatsBase.Histogram independent of what other calls can be used to call a histogram.

@mkborregaard
Copy link
Member

Ah, I see what you mean - then you wouldn't get the extraction. So it would be to call

plot(KDE(bandwidth = 0.1), iris, Style(:SepalWidth))

?

@dpsanders
Copy link

Yes, exactly.

@piever
Copy link
Member Author

piever commented Nov 13, 2018

To me it would be more natural to do something like

plot(KDE(bandwidth = 0.1), rand(100), color = :blue)

That's a very good point. However would currying address this?

Meaning, it'd be enough to add a method:

kde(; kwargs...) = (args...) -> kde(args...; kwargs...)

and then

plot(kde(bandwidth = 0.1), iris, Style(:SepalWidth))

would work as expected. If the KernelDensity devs are happy to add this method, we could just do this, otherwise we would define our own:

density(; kwargs...) = (args...) -> kde(args...; kwargs...)
density(args...; kwargs...) = kde(args...; kwargs...)

The "callable type" can also be made to work but I feel it's quite clumsy in the case where there are no keywords as you'd have to do:

plot(KDE(), iris, Style(:SepalWidth))

That being said, I can quite easily add support to callable types, by adding a method:

convert_arguments(P::PlotFunc, t::AbstractCallable, args...; kwargs...) =
    convert_arguments(P, (v...) -> t(v...), args...; kwargs...)

Is there any strong preference between curried functions and callable structs?

@piever
Copy link
Member Author

piever commented Nov 13, 2018

Could the Theme constructor not simply wrap an input Vector in a Palette without the user having to worry about this?

I'm actually not sure, it may be better to ask @SimonDanisch, but I'm afraid this would cause issues if users actually want to pass a vector as a keyword argument (say scatter(rand(3), marker = [:circle, :rect, :cross]) where you want the vector to associate to each dot its respective marker: if the Theme constructor happens to be used on this pipeline you'd get a Palette instead).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants