Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add population analysis with error bars across population for continuous plots #30

Merged
merged 25 commits into from
Jan 18, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
89d83f2
Added Bar Plots and Scatter Plots
Dec 1, 2016
9bcef97
Merge branch 'master' of https://github.com/JuliaPlots/StatPlots.jl
Dec 2, 2016
85b0f40
Added keyword version, also with cyclic keywords and automated label
Dec 9, 2016
203120c
Added set of standard analysis
Dec 9, 2016
da11d97
Changed NaN with NAs
Dec 10, 2016
dce15d2
Fixed bug on sorting categorical x axis
Dec 10, 2016
1e6921b
Fixed bug on sorting categorical x axis
Dec 10, 2016
4c40585
Added shortcuts, replaced kernelestimator with loess
Dec 15, 2016
5a8319b
Merge branch 'master' of https://github.com/piever/StatPlots.jl
Dec 15, 2016
4c18e6a
trying groupapply
Dec 17, 2016
c972da8
Added new groupederror type
Dec 18, 2016
3f51641
Fixed groupedbar on plotlyjs, added description to readme
Dec 19, 2016
29bd4f3
corrected couple of mistakes in readme
Dec 19, 2016
4764641
Fixed issue with label in Plotlyjs
Dec 19, 2016
fd10f37
Cleaned up axis selection, added docstrings
Dec 26, 2016
e7c4280
Updated readme
Dec 26, 2016
b06ae4b
Fixed error in case of subject with one datapoint!
Dec 31, 2016
2d10288
removed unnecessary array conversion
Jan 1, 2017
1a874c6
Unified files into groupederror, updated readme
Jan 10, 2017
899c87a
Implemented bootstrap error
Jan 10, 2017
41a2251
Updated README and docs
Jan 10, 2017
dc7e1d9
Corrected typos
Jan 10, 2017
4d65fb7
Updated groupapply docstrings
Jan 12, 2017
ff638a5
Temporary: minor refactoring of the data
Jan 16, 2017
3ae3845
Renamed get_summary to get_groupederror and corrected docstrings
Jan 17, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,48 @@ groupedbar(rand(10,3), bar_position = :dodge, bar_width=0.7)
```

![tmp](https://cloud.githubusercontent.com/assets/933338/18962092/673f6c78-863d-11e6-9ee9-8ca104e5d2a3.png)


## groupapply for population analysis
There is a groupapply function that splits the data across a keyword argument "group", then applies "summarize" to get average and variability of a given analysis (density, cumulative and local regression are supported so far, but one can also add their own function). To get average and variability there are 3 ways:

- `compute_error = (:across, col_name)`, where the data is split according to column `col_name` before being summarized. `compute_error = :across` splits across all observations. Default summary is `(mean, sem)` but it can be changed with keyword `summarize` to any pair of functions.

- `compute_error = (:bootstrap, n_samples)`, where `n_samples` fake datasets distributed like the real dataset are generated and then summarized (nonparametric
<a href="https://en.wikipedia.org/wiki/Bootstrapping_(statistics)">bootstrapping</a>). `compute_error = :bootstrap` defaults to `compute_error = (:bootstrap, 1000)`. Default summary is `(mean, std)`. This method will work with any analysis but is computationally very expensive.

- `compute_error = :none`, where no error is computed or displayed and the analysis is carried out normally.

The local regression uses [Loess.jl](https://github.com/JuliaStats/Loess.jl) and the density plot uses [KernelDensity.jl](https://github.com/JuliaStats/KernelDensity.jl). In case of categorical x variable, these function are computed by splitting the data across the x variable and then computing the density/average per bin. The choice of continuous or discrete axis can be forced via `axis_type = :continuous` or `axis_type = :discrete`

Example use:

```julia
using DataFrames
import RDatasets
using StatPlots
gr()
school = RDatasets.dataset("mlmRev","Hsb82");
grp_error = groupapply(:cumulative, school, :MAch; compute_error = (:across,:School), group = :Sx)
plot(grp_error, line = :path)
```
<img width="494" alt="screenshot 2016-12-19 12 28 27" src="https://cloud.githubusercontent.com/assets/6333339/21313005/316e0f0c-c5e7-11e6-9464-f0921dee3d29.png">

Keywords for loess or kerneldensity can be given to groupapply:

```julia
df = groupapply(:density, school, :CSES; bandwidth = 1., compute_error = (:bootstrap,500), group = :Minrty)
plot(df, line = :path)
```

<img width="487" alt="screenshot 2017-01-10 18 36 48" src="https://cloud.githubusercontent.com/assets/6333339/21819500/cb788fb8-d763-11e6-89b9-91018f2b9a2a.png">


The bar plot

```julia
pool!(school, :Sx)
grp_error = groupapply(school, :Sx, :MAch; compute_error = :across, group = :Minrty)
plot(grp_error, line = :bar)
```
<img width="489" alt="screenshot 2017-01-10 18 20 51" src="https://cloud.githubusercontent.com/assets/6333339/21819413/7923681e-d763-11e6-907d-c81447b4cc99.png">
1 change: 1 addition & 0 deletions REQUIRE
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ StatsBase
Distributions
DataFrames
KernelDensity
Loess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding another package here is a bit of debate - I would say that StatPlots should not be afraid of including statistical packages (I guess that is one reason to keep it separate from Plots) and I also think it should depend on GLM - what is your philosophy here, @tbreloff ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this comment. Yes adding dependencies should not be done lightly, but this is probably acceptable.

7 changes: 7 additions & 0 deletions src/StatPlots.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,15 @@ using Distributions
using DataFrames

import KernelDensity
import Loess
@recipe f(k::KernelDensity.UnivariateKDE) = k.x, k.density
@recipe f(k::KernelDensity.BivariateKDE) = k.x, k.y, k.density

@shorthands cdensity

export groupapply
export get_groupederror

include("dataframes.jl")
include("corrplot.jl")
include("cornerplot.jl")
Expand All @@ -24,5 +28,8 @@ include("hist.jl")
include("marginalhist.jl")
include("bar.jl")
include("shadederror.jl")
include("groupederror.jl")


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should export the groupapply function


end # module
2 changes: 1 addition & 1 deletion src/bar.jl
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ grouped_xy(y::AbstractMatrix) = 1:size(y,1), y
end
fr
else
get(d, :fillrange, 0)
get(d, :fillrange, nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me why you are throwing away the fillrange value here for non-stacked bars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fix to: #32 In case of errorbars, the fillrange to 0 also gets applied to the error series and creates those messy things of the issue (on PlotlyJS). Without any fillrange it seems to work just fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I worry what that may have of unintended consequences. Does get(d, :fillrange, nothing) work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that also works. I'll go for get(d, :fillrange, nothing) to stay as close as possible to the original design then.

end

seriestype := :bar
Expand Down
Loading