New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request for new function ggbarstats #78
Comments
Completely agree with this! The plan is actually to write a more general version of the function This is something that should happen by |
Understand and as long as you're thinking about a library(ggplot2)
library(dplyr)
library(ggstatsplot)
library(ggmosaic)
ddd <- group_by(Titanic_full, Class, Sex, Age, Survived) %>% count()
ggplot(data=ddd) +
geom_mosaic(aes(weight=n, x=product(Class), fill=Survived)) +
ggtitle("test", subtitle = ggstatsplot::subtitle_contigency_tab(Titanic_full,Class,Survived))
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#> Created on 2018-10-30 by the reprex package (v0.2.1) |
I'm about 65% done with this... have not created a pull request yet but working. |
Hmm, but the plan was not to create a new function just for the bar plots, but rather to create a general function |
Yes I remember the concept of minimizing the number of new functions. Problem is three fold:
Anyway I will certainly endeavor to minimize the amount of new or overlapping code. When I get to a good stable point (soon) I'll formally submit a PR and you can take a look. |
Fair enough! Let's proceed with separate functions and we can later figure out how to refactor them to avoid code duplication. |
Getting there. An example using one of the common datasets. library(ggstatsplot)
# using the current ggpiestats
# pies are especially inefficent when you have many levels of a factor
ggpiestats(movies_long,
mpaa,
genre,
bf.message = TRUE,
sampling.plan = "jointMulti",
title = "MPAA Ratings by Genre",
caption = "As of January 23, 2019",
nboot = 5,
perc.k = 1,
facet.proptest = FALSE,
palette = "Set2")
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#> # using nthe still nascent ggbarstats
ggbarstats(movies_long,
mpaa,
genre,
bf.message = TRUE,
sampling.plan = "jointMulti",
title = "MPAA Ratings by Genre",
caption = "As of January 23, 2019",
nboot = 5,
perc.k = 1,
facet.proptest = FALSE,
palette = "Set2")
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#> Created on 2019-01-23 by the reprex package (v0.2.1) |
Thanks, Chuck. This looks awesome! In case you already don't have these changes in mind, here are some minor comments to have aeshetic consistency across the plots from these two similar functions-
|
Laugh out loud. We have such different aesthetic tastes. But okay using your numbering system
Should be able to do these and cleanup tomorrow. |
Okay all these are fixed as well as adding a few features. PR follows. Need some more doco cleanup and some testing but it's done. library(jmv)
#>
#> Attaching package: 'jmv'
#> The following object is masked from 'package:stats':
#>
#> anova
library(ggstatsplot)
# for reproducibility
set.seed(123)
# simple function call with the defaults (with condition)
ggstatsplot::ggbarstats(
data = datasets::mtcars,
main = vs,
condition = cyl,
bf.message = TRUE,
nboot = 10,
factor.levels = c("0 = V-shaped", "1 = straight"),
legend.title = "Engine"
)
#> Warning in stats::chisq.test(x = data$main, y = data$condition, correct =
#> FALSE, : Chi-squared approximation may be incorrect
#> Note: 95% CI for effect size estimate was computed with 10 bootstrap samples.
#> # simple function call with count data
ggstatsplot::ggbarstats(
data = as.data.frame(HairEyeColor),
main = Eye,
condition = Hair,
counts = Freq
)
#> Note: 95% CI for effect size estimate was computed with 25 bootstrap samples.
#> ggbarstats(movies_long,
mpaa,
genre,
bf.message = TRUE,
sampling.plan = "jointMulti",
title = "MPAA Ratings by Genre",
caption = "As of January 23, 2019",
nboot = 5,
perc.k = 1,
x.axis.orientation = "slant",
facet.proptest = FALSE,
ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text( face = "italic")),
palette = "Set2"
)
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#> Created on 2019-01-24 by the reprex package (v0.2.1) |
Thanks, Chuck. This all looks good to me. I will make minor modifications after the PR is merged-
Would also like to add |
Here is what the output looks like with the modifications I have introduced to make this aesthetically as close to Reproducing the same examples you used above-
set.seed(123)
ggstatsplot::ggbarstats(
data = datasets::mtcars,
main = vs,
condition = cyl,
bf.message = TRUE,
nboot = 10,
factor.levels = c("0 = V-shaped", "1 = straight"),
legend.title = "Engine"
)
#> Note: Results from one-sample proportion tests for each
#> level of the condition variable testing for equal
#> proportions of the main variable.
#>
#> # A tibble: 3 x 7
#> condition `0` `1` `Chi-squared` df `p-value` significance
#> <fct> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 4 9.09% 90.91% 7.36 1 0.007 **
#> 2 6 42.86% 57.14% 0.143 1 0.705 ns
#> 3 8 100.00% 0.00% 14 1 0 ***
#> Warning in stats::chisq.test(x = data$main, y = data$condition, correct =
#> FALSE, : Chi-squared approximation may be incorrect
#> Note: 95% CI for effect size estimate was computed with 10 bootstrap samples.
#>
set.seed(123)
ggstatsplot::ggbarstats(
data = as.data.frame(HairEyeColor),
main = Eye,
condition = Hair,
counts = Freq
)
#> Note: Results from one-sample proportion tests for each
#> level of the condition variable testing for equal
#> proportions of the main variable.
#>
#> # A tibble: 4 x 9
#> condition Brown Blue Hazel Green `Chi-squared` df `p-value`
#> <fct> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Black 62.9~ 18.5~ 13.8~ 4.63% 87.3 3 0
#> 2 Brown 41.6~ 29.3~ 18.8~ 10.1~ 63.3 3 0
#> 3 Red 36.6~ 23.9~ 19.7~ 19.7~ 5.45 3 0.142
#> 4 Blond 5.51% 74.0~ 7.87% 12.6~ 164. 3 0
#> # ... with 1 more variable: significance <chr>
#> Note: 95% CI for effect size estimate was computed with 25 bootstrap samples.
#>
set.seed(123)
ggstatsplot::ggbarstats(
ggstatsplot::movies_long,
mpaa,
genre,
bf.message = TRUE,
sampling.plan = "jointMulti",
title = "MPAA Ratings by Genre",
caption = "As of January 23, 2019",
nboot = 5,
perc.k = 1,
x.axis.orientation = "slant",
facet.proptest = FALSE,
ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
palette = "Set2"
)
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#> Created on 2019-01-24 by the reprex package (v0.2.1) |
So the biggest difference I can see visually is that you want the significance testing results from the proportions test up top rather than down below? I can certainly live with that. Hard for me to see anything else you've done. |
Yes, additional changes that are not conspicuous-
And that's about it. If you are okay with these changes, I will push these changes to master and close this issue. |
Let's see there's this usual issue that I know you have a method for... checking R code for possible problems (11.8s) The minor grid fix on y is very nice I had removed the x grid Can we make the outline color selecatable. I don't care about the default Ditto sample size. I don't much care about the defaults as long as I can adjust |
@ibecav I have added tests for this function. Do you also want to make a PR for the |
Yes I will happily do so. I've been thinking about the other issue (how to consolidate top level functions effectively) as well. One strategy that comes to my mind is to group more by the number of variables involved (univariate, bivariate & multivariate) and less by plot type. For example in my mind pie charts are more like histograms and belong in that grouping whereas chi square association tests are bivariate and not much different in many ways than ggbetweenstats. Just early thinking. More when I have a chance to think it out. |
So I find piecharts very appealing for looking at univariate situations, although there are many who dislike them even when you add labels as you have. But as soon as you move to bivariate cases especially when one of the variables has multiple factor levels, I (and my students more so) have a hard time interpreting multiple pie charts depicting the relationship between two variables.
So looking at the usual Titanic example currently I can imagine a function that is very similiar in nature but shifts to percentage bars with labels.
You'll notice all I have really done is take hunks of your current code and change the call to
ggplot
in one small way...Created on 2018-10-26 by the reprex package (v0.2.1)
Thanks for considering
The text was updated successfully, but these errors were encountered: