Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxplot using only y values #210

Closed
diegozea opened this issue Apr 28, 2016 · 27 comments
Closed

Boxplot using only y values #210

diegozea opened this issue Apr 28, 2016 · 27 comments

Comments

@diegozea
Copy link
Contributor

At the moment, you could indicate only Y values for boxplot, but the default width looks strange.
Also would be great to support a list of columns names when a wide dataframe is used. I found difficult to plot series of data (Y) since they use the same X value (1):

image

image

Thanks!

@tbreloff
Copy link
Member

There are a couple issues here:

  • Should we set xlims explicitly? (I lean towards no...)
  • How to choose the x values for a boxplot?
  • Better handling of arrays of symbols (I agree this is broken right now)

In thinking about this just now, I had the thought that the current method of hoping that a Vector{Any} is good enough to allow dispatch on "processed" data is flawed and ripe for subtle bugs... I should replace the internal logic with a wrapper type:

immutable InputData{T}
  data::T
end

so that it's explicit that an input has been processed and wrapped, and dispatch will never get confused. I'll create a separate issue for this, and the arrays of symbols issue should be resolved as part of that change.

@tbreloff
Copy link
Member

Right now I implement the boxplot recipe by explicitly applying the grouping and forcing the xticks to 1:length(shapes)... this will need to be made more flexible to allow overlaying multiple boxplots.

As a stop-gap solution, you could build the arrays as expected by the current recipe:

tmp

@diegozea
Copy link
Contributor Author

Boxplot looks broken right now:
image

@tbreloff
Copy link
Member

The weird boxplot drawing issue is fixed.

I think the solution for the x-axis will be to have some sort of DiscreteAxis type that can map strings, etc to an x/y coordinate. I want to be able to overlay a scatter or violin plot over a boxplot but still allow new series to extend the axis. This can share implementation with the 'setStringVector..." stuff.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

In the current master, boxplots are working fine with a categorical variable in x, but it can be used with group.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

It can't be used with group.

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

Yeah I see the bug.. investigating

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

I think I got it. I'll push the fix soon.

tmp

@tbreloff tbreloff closed this as completed Jun 8, 2016
@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

Awesome :D I found a little bug with the whisker length. I will fix it soon.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

@tbreloff The group bug was solved for a call like that, where x and group are the same, but it still gives a strange output in the following example:

ToothGrowth = dataset("datasets","ToothGrowth")
boxplot(ToothGrowth, :Dose, :Len, group=:Supp, notch=true) 

image

tbreloff added a commit that referenced this issue Jun 8, 2016
Solve a bug with whisker lengths (#210)
@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

I disagree that this is strange. At least... it's what I expect/want. The
group arg creates 2 series. Each of those series are boxplots, and each of
those series are then re-grouped over the same x-domain. (Unless I'm
missing something?)

If you want them in different subplots because you don't like the overlap,
you can add 'layout=2' (you'd probably want to 'link=:all' as well), or
maybe make them easier to see by setting 'alpha=0.5'?

On Tuesday, June 7, 2016, Diego Javier Zea notifications@github.com wrote:

@tbreloff https://github.com/tbreloff The group bug was solved for a
call like that, where x and group are the same, but gives a strange output
in the following example

ToothGrowth = dataset("datasets","ToothGrowth")boxplot(ToothGrowth, :Dose, :Len, group=:Supp, notch=true)

[image: image]
https://cloud.githubusercontent.com/assets/2822757/15881318/6b11a2f2-2d0b-11e6-9e27-9ffaeff548ac.png


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#210 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA492nvKb_xARbcw02_dWpPWi8hggrFvks5qJi9mgaJpZM4ISAu6
.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

I didn't know that layout=2, link=:all makes the trick (maybe layout=:Supp could be more intuitive and/or similar to ggplot facet grid). The first time I was expecting something like the ggplot2 output:
image

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

maybe layout=:Supp could be more intuitive

I can't for the life of me figure out what :Supp is supposed to be. So I wouldn't vote for that being more intuitive! ;)

But these don't sound like very general ideas. What if the x data isn't nicely spaced? What if there are lots of groups? Just seems like its usefulness would be limited, but what do I know? I don't even know what "Supp" is!

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

Sorry... I was saying that

boxplot(ToothGrowth, :Dose, :Len, layout=:Supp)

would be more intuitive than

boxplot(ToothGrowth, :Dose, :Len, group=:Supp, layout=2, link=:all)

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

I imagine also layout taking a DataFrames's Formula like ggplot2's facet_grid.

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

Ha.. oh it's a field not a setting. I can't decide if that makes me look better or worse 😮

I'm not sure I fully understand what that would mean (in the general sense). This might only work well with dataframe column labels? Even then there's lots of weirdness?

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

Ok. I understand... In my opinion, no one wants superimposed boxplots (since you compare them side to side). So, having group given supperimposed boxplots instead of having a result similar to ggplot2 is no intuitive. But maybe that is because I used to make a lot of ggplot2 plots. I believe that the actual behavior of group if good for other series, but maybe not so good for boxplot.
As a general stuff, I used to found facet grid taking a R's formula to indicate variables/ data.frame columns of categorical data very useful. So to me, giving a categorical variable to layout means something like: I want a grid with so many plots as factor levels, and plot every data subset according to that levels. But, maybe I'm the only one who expect something like that XD

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

I think you're not the only one, but... would you agree that this discussion only really makes sense if your inputs are DataFrames and the Symbols for the columns?

Would it make more sense to have a "facet" recipe (similar to how I did marginal hists) which can handle all this stuff? Then it prepares everything for a "generic" boxplot (or whatever else) series recipe... offsetting x-values as needed, creating the layout, etc.

So you would call facet(iris, :Species, <blah blah>, layout = xxx ~ yyy) or something like that, and the facet recipe would replace layout with a real layout based on the formula.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

Your facet idea is a lot better than my degeneration of the layout keyword argument ;) But I don't see what should it be restricted to DataFrames...

x = rand(10)
y = rand(10)
z = [0,0,0,0,1,1,1,1,1,1]
w = [1,0,1,0,1,0,1,0,1,0]
facet(x, y, <bla bla>, layout = z ~ w) # Can something like this work?

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

That may be a lot trickier to implement, as you'd get the Symbols z/w inside the recipe, with no way to access the variables z/w. I'm sure there's a way, it's just not as straightforward as the DataFrame case.

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

I imagine that maybe we can use a Facet type, which store the variables z and w and make the needed checks in its construction. So, it can use dispatch:
plot(x, y, Facet(z,w))
Other idea can be use a Julia's Pair instead of a DataFrame's Formula. Formula syntax being supported only for DataFrames seems fine to me.

@tbreloff
Copy link
Member

tbreloff commented Jun 8, 2016

The way I envision it:

@userplot Facet

@recipe function f(facet::Facet; facet_groups = nothing)
    # inputs are the tuple: facet.args
    # TODO: process args with facet_groups to build a layout and assign series to subplots
end

#usage:
facet(args...; facet_groups = ???)

@diegozea
Copy link
Contributor Author

diegozea commented Jun 8, 2016

The Facet user plot looks fine. One thing that R solves using points in its formula (i.e. . ~ var) is to indicate if the categorical variable will generate vertical or horizontal subplots.
What do you think about diverging of the formula syntax and using something like:

facet(args...; x_group=varx, y_group=vary)

@diegozea
Copy link
Contributor Author

@tbreloff Is there a better/elegant way to do this?

image

using RDatasets
iris = dataset("datasets","iris")
using Plots
pyplot(size=(300,300))
iris[:dummy] = 1 # To plot the boxplot 
boxplot(iris, :dummy, [:SepalLength :SepalWidth :PetalLength :PetalWidth], layout=grid(1,4), link=:y)

I was expecting to do something like: boxplot(iris, [:SepalLength :SepalWidth :PetalLength :PetalWidth])

@tbreloff
Copy link
Member

Ugh... I need to recode DataFrames support. I hate how I'm doing it now.

On Wed, Jun 29, 2016 at 3:08 PM, Diego Javier Zea notifications@github.com
wrote:

@tbreloff https://github.com/tbreloff Is there a better/elegant way to
do this?

[image: image]
https://cloud.githubusercontent.com/assets/2822757/16465113/6ab634ba-3e3d-11e6-8db0-34a90ae84b85.png

using RDatasets
iris = dataset("datasets","iris")using Plotspyplot(size=(300,300))
iris[:dummy] = 1 # To plot the boxplot boxplot(iris, :dummy, [:SepalLength :SepalWidth :PetalLength :PetalWidth], layout=grid(1,4), link=:y)

I was expecting to do something like: boxplot(iris, [:SepalLength
:SepalWidth :PetalLength :PetalWidth])


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#210 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA492qGnlwAzeOJ3LG3eYGS3e3FpfN8Nks5qQsKugaJpZM4ISAu6
.

@diegozea
Copy link
Contributor Author

@tbreloff other thing about my last example... The boxplot linecolor is equal to the fillcolor, so the median line isn't visible.

@tbreloff
Copy link
Member

You finally motivated me to fix the horribly inflexible DataFrames code, now you can do cool stuff:

tmp

These changes aren't pushed up yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants