Skip to content

Conversation

@pearlzli
Copy link
Contributor

@pearlzli pearlzli commented Mar 22, 2020

Grouped histogram produced by calling groupedbar. Usage:

using StatsPlots, RDatasets
iris = dataset("datasets", "iris")
@df iris groupedhist(:SepalLength, :Species; bar_position = :dodge)

dodge

@df iris groupedhist(:SepalLength, :Species; bar_position = :stack)

stack

I couldn't figure out how to make this work using the group keyword argument, which is why the group IDs are passed in as the second argument instead. Also, I recognize that there's some overlap between this recipe and stackedhist in #315. I'd love any and all feedback!

@mkborregaard
Copy link
Member

Can you say something about the difference between this and #315?

@pearlzli
Copy link
Contributor Author

pearlzli commented Mar 22, 2020

Yes, sorry - I should have said more about this in my original post!

I think results-wise, groupedhist with bar_position = :stack produces a very similar-looking plot to stackedhist, though bar_position = :dodge can't be replicated with stackedhist. I like the :dodge setting for comparing histograms, as I think having each bar start at the x-axis is helpful visually.

Implementation-wise, I've tried to make use of Plots._make_hist and groupedbar, so hopefully this is a little more modular. To use groupedhist, you don't have to separate out the values for the different groups into separate arrays, which I think is nice and also lets you use the @df macro. The groupedhist user doesn't have control over the group ordering like in stackedhist right now, though I think that could be added. groupedhist also doesn't support passing in Histograms.

All that being said, I'm definitely open to trying to combine the functionality of the two recipes into one. I opened this second pull request mainly because I had already written most of the new code for use in a personal project, so I hope I haven't stepped on anyone's toes.

@mkborregaard
Copy link
Member

You didn't step on anyone's toes, I think I like this better but will try to review in detail tomorrow. @piever want to have a look?

@pearlzli
Copy link
Contributor Author

Thanks!

@piever
Copy link
Member

piever commented Mar 22, 2020

Nice work! But I think we should figure out the group keyword issue, otherwise the API mismatch with groupedboxplot and groupedviolin is a bit weird.

@pearlzli
Copy link
Contributor Author

I definitely agree on wanting API consistency. In the commit 47f30e9 I just pushed, I got the group keyword argument to work when the groups are all of the same size, as in the iris dataset (3 groups, each with 50 observations):

using StatsPlots, RDatasets
iris = dataset("datasets", "iris")
@df iris groupedhist(:SepalLength; group = :Species)

This produces the :dodge plot from my first post. However, I get a BoundsError that I don't understand when the group sizes are unequal:

julia> @df iris[1:end-1, :] groupedhist(:SepalLength; group = :Species)
ERROR: BoundsError: attempt to access 50×3 Array{Float64,2} at index [Base.LogicalIndex(Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
Stacktrace:
 [1] throw_boundserror(::Array{Float64,2}, ::Tuple{Base.LogicalIndex{Int64,BitArray{1}}}) at ./abstractarray.jl:538
 [2] checkbounds at ./abstractarray.jl:503 [inlined]
 [3] _getindex at ./multidimensional.jl:669 [inlined]
 [4] getindex(::Array{Float64,2}, ::BitArray{1}) at ./abstractarray.jl:981
 [5] macro expansion at /Users/pearl/.julia/dev/StatsPlots/src/hist.jl:88 [inlined]
 [6] apply_recipe(::Dict{Symbol,Any}, ::StatsPlots.GroupedHist) at /Users/pearl/.julia/packages/RecipesBase/G4s6f/src/RecipesBase.jl:279
 [7] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsPlots.GroupedHist}) at /Users/pearl/.julia/packages/Plots/vTdnV/src/pipeline.jl:85
 [8] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{StatsPlots.GroupedHist}) at /Users/pearl/.julia/packages/Plots/vTdnV/src/plot.jl:178
 [9] #plot#138(::Base.Iterators.Pairs{Symbol,CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}},Tuple{Symbol},NamedTuple{(:group,),Tuple{CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}}}}, ::typeof(plot), ::StatsPlots.GroupedHist) at /Users/pearl/.julia/packages/Plots/vTdnV/src/plot.jl:57
 [10] #plot at ./none:0 [inlined]
 [11] #groupedhist#111 at /Users/pearl/.julia/packages/RecipesBase/G4s6f/src/RecipesBase.jl:354 [inlined]
 [12] (::StatsPlots.var"#kw##groupedhist")(::NamedTuple{(:group,),Tuple{CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}}}, ::typeof(groupedhist), ::Array{Float64,1}) at none:0
 [13] #add_label#17(::Base.Iterators.Pairs{Symbol,CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}},Tuple{Symbol},NamedTuple{(:group,),Tuple{CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}}}}, ::typeof(StatsPlots.add_label), ::Array{String,1}, ::Function, ::Array{Float64,1}) at /Users/pearl/.julia/dev/StatsPlots/src/df.jl:153
 [14] (::StatsPlots.var"#kw##add_label")(::NamedTuple{(:group,),Tuple{CategoricalArray{String,1,UInt8,String,CategoricalString{UInt8},Union{}}}}, ::typeof(StatsPlots.add_label), ::Array{String,1}, ::Function, ::Array{Float64,1}) at ./none:0
 [15] (::var"#75#76")(::DataFrame) at ./none:0
 [16] top-level scope at REPL[24]:1

@pearlzli
Copy link
Contributor Author

pearlzli commented Mar 22, 2020

Ah actually never mind - that BoundsError seems to be fixed by using Plots.extractGroupArgs (next commit). Now grouping on multiple variables also works:

iris[!, :Color] = rand(["red", "blue"], nrow(iris))
@df iris groupedhist(:SepalLength; group = (:Species, :Color))

two-groups

@pearlzli
Copy link
Contributor Author

@mkborregaard and @piever - is there anything else you think I should add or change? I understand also if you haven't had the chance to look at it, of course. Thanks again!

@piever
Copy link
Member

piever commented Mar 28, 2020

Thanks for addressing the comments, it looks good to me! I'll merge in a couple of days.

Copy link
Member

@daschw daschw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks a lot @pearlzli !

@pearlzli
Copy link
Contributor Author

Thanks, everyone!

@piever piever merged commit f954461 into JuliaPlots:master Mar 30, 2020
@pearlzli
Copy link
Contributor Author

Ahh I'm sorry, I just realized that I never updated the readme to reflect the group keyword argument. Should I push that change to my fork?

@mkborregaard
Copy link
Member

It's already merged, so if you can make a new PR?
And sorry for not coming back here in timeh

@pearlzli pearlzli deleted the pzl/groupedhist branch March 31, 2020 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants