Feature request for new function ggbarstats #78

ibecav · 2018-10-26T16:13:17Z

So I find piecharts very appealing for looking at univariate situations, although there are many who dislike them even when you add labels as you have. But as soon as you move to bivariate cases especially when one of the variables has multiple factor levels, I (and my students more so) have a hard time interpreting multiple pie charts depicting the relationship between two variables.

So looking at the usual Titanic example currently I can imagine a function that is very similiar in nature but shifts to percentage bars with labels.

You'll notice all I have really done is take hunks of your current code and change the call to ggplot in one small way...

library(ggstatsplot)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
ggpiestats(Titanic_full, main = Survived, condition = Class, title = "Test")
#> Note: Results from faceted one-sample proportion tests:
#> 
#> # A tibble: 4 x 7
#>   condition No     Yes    `Chi-squared`    df `p-value` significance
#>   <fct>     <chr>  <chr>          <dbl> <dbl>     <dbl> <chr>       
#> 1 1st       37.54% 62.46%         20.2      1     0     ***         
#> 2 2nd       58.60% 41.40%          8.43     1     0.004 **          
#> 3 3rd       74.79% 25.21%        174.       1     0     ***         
#> 4 Crew      76.05% 23.95%        240.       1     0     ***         
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#>

# this is the crucial piece of ggpiestats defunctionalized
Titanic_full %>%
  group_by(Class,Survived) %>%
  summarize(counts = n()) %>% 
  mutate(perc = (counts / sum(counts)))-> tempdf
# only real changes are geom bar and percent y axis
ggplot(tempdf, aes(fill=Survived, y=perc, x=Class)) +
  geom_bar(stat="identity", position="fill") +
  ylab("Percent") +
  scale_y_continuous(labels = scales::percent, breaks = seq(0, 1, by = 0.10)) +
  geom_label(aes(label = paste0(round(x = perc*100, digits = 1), "%")), show.legend = FALSE, position = position_fill(vjust = 0.5)) +
  ggtitle("test", subtitle = subtitle_contigency_tab(Titanic_full,Class,Survived))
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#>

^{Created on 2018-10-26 by the reprex package (v0.2.1)}
Thanks for considering

The text was updated successfully, but these errors were encountered:

IndrajeetPatil · 2018-10-27T13:18:12Z

Completely agree with this! The plan is actually to write a more general version of the function ggpiestats and call it ggcatstats (for categorical data) and provide a plot.type argument that can work with "pie", "bar", "slopegraph", and "alluvial" plots.

This is something that should happen by 1.0.0, so still a few months away. But I can already start adding some basic functionality/setup needed for this.

ibecav · 2018-10-30T15:28:18Z

Understand and as long as you're thinking about a plot.type argument please consider mosaic using ggmosaic. If you're unfamiliar it's like a barchart only the x axis factors are proportional...

library(ggplot2)
library(dplyr)
library(ggstatsplot)
library(ggmosaic)
ddd <- group_by(Titanic_full, Class, Sex, Age, Survived) %>% count()
ggplot(data=ddd) +
  geom_mosaic(aes(weight=n, x=product(Class), fill=Survived)) +
  ggtitle("test", subtitle = ggstatsplot::subtitle_contigency_tab(Titanic_full,Class,Survived))
#> Note: 95% CI for Cramer's V was computed with 25 bootstrap samples.
#>

^{Created on 2018-10-30 by the reprex package (v0.2.1)}

ibecav · 2019-01-22T22:31:03Z

I'm about 65% done with this... have not created a pull request yet but working.

IndrajeetPatil · 2019-01-22T22:43:28Z

Hmm, but the plan was not to create a new function just for the bar plots, but rather to create a general function ggcatstats (for categorical data) that has plot.type argument and deprecate ggpiestats. The problem with writing a separate function for each type of plot is that there is a lot of duplication of code across plotting functions. This means each modification will simultaneously need to be made to all functions since only the plotting portion of the function will change across functions.

ibecav · 2019-01-23T13:04:17Z

Yes I remember the concept of minimizing the number of new functions. Problem is three fold:

I need the functionality now for my day job so it's either invest time in a one off or move the overall package along.
Piecharts really are different than bars or mosaics since the base functionality is univariate whereas bars and mosaics are best for bivariate cases (then for both grouped_ functionality allows you to go farther
Related ggcatstats would be a misnomer since bars and mosaics deal nicely with ordinals and other levels of variables.

Anyway I will certainly endeavor to minimize the amount of new or overlapping code. When I get to a good stable point (soon) I'll formally submit a PR and you can take a look.

IndrajeetPatil · 2019-01-23T17:24:27Z

Fair enough!

Let's proceed with separate functions and we can later figure out how to refactor them to avoid code duplication.

ibecav · 2019-01-23T19:08:45Z

Getting there. An example using one of the common datasets.

library(ggstatsplot)
# using the current ggpiestats
# pies are especially inefficent when you have many levels of a factor
ggpiestats(movies_long,
           mpaa,
           genre,
           bf.message = TRUE,
           sampling.plan = "jointMulti",
           title = "MPAA Ratings by Genre",
           caption = "As of January 23, 2019",
           nboot = 5,
           perc.k = 1,
           facet.proptest = FALSE,
           palette = "Set2")
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#>

# using nthe still nascent ggbarstats
ggbarstats(movies_long,
           mpaa,
           genre,
           bf.message = TRUE,
           sampling.plan = "jointMulti",
           title = "MPAA Ratings by Genre",
           caption = "As of January 23, 2019",
           nboot = 5,
           perc.k = 1,
           facet.proptest = FALSE,
           palette = "Set2")
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#>

^{Created on 2019-01-23 by the reprex package (v0.2.1)}

IndrajeetPatil · 2019-01-23T19:21:19Z

Thanks, Chuck. This looks awesome!

In case you already don't have these changes in mind, here are some minor comments to have aeshetic consistency across the plots from these two similar functions-

The sample size should be denoted using n = (in ggpiestats) and not N= (in ggbarstats). Also, they should not be bold (in ggbarstats).
The percentages inside the bars should have text box around them with a white color background.
The y-axis label should be "percent" or "percentage" (or even "proportion", like in gghistostats)? I don't feel strongly either way.
The default legend position should be at the bottom (in ggpiestats) and not on the right hand side (in ggbarstats).

ibecav · 2019-01-23T22:52:03Z

Laugh out loud. We have such different aesthetic tastes. But okay using your numbering system

Okay n instead of N but I am worried that on wider plots the spaces around the equals sign take up a lot of room.
Okay i'll use geom_label instead of geom_text. But since I detest the white background I'll allow the user to select color and alpha
Technically percent or percentage is more correct
Legend at the bottom I also detest so I'll make bottom the default and allow user to move it amongst "top", "bottom", "left", and "right".

Should be able to do these and cleanup tomorrow.

ibecav · 2019-01-24T15:28:05Z

Okay all these are fixed as well as adding a few features. PR follows. Need some more doco cleanup and some testing but it's done.

library(jmv)
#> 
#> Attaching package: 'jmv'
#> The following object is masked from 'package:stats':
#> 
#>     anova
library(ggstatsplot)
# for reproducibility
set.seed(123)

# simple function call with the defaults (with condition)
ggstatsplot::ggbarstats(
  data = datasets::mtcars,
  main = vs,
  condition = cyl,
  bf.message = TRUE,
  nboot = 10,
  factor.levels = c("0 = V-shaped", "1 = straight"),
  legend.title = "Engine"
)
#> Warning in stats::chisq.test(x = data$main, y = data$condition, correct =
#> FALSE, : Chi-squared approximation may be incorrect
#> Note: 95% CI for effect size estimate was computed with 10 bootstrap samples.
#>

# simple function call with count data

ggstatsplot::ggbarstats(
  data = as.data.frame(HairEyeColor),
  main = Eye,
  condition = Hair,
  counts = Freq
)
#> Note: 95% CI for effect size estimate was computed with 25 bootstrap samples.
#>

ggbarstats(movies_long,
  mpaa,
  genre,
  bf.message = TRUE,
  sampling.plan = "jointMulti",
  title = "MPAA Ratings by Genre",
  caption = "As of January 23, 2019",
  nboot = 5,
  perc.k = 1,
  x.axis.orientation = "slant",
  facet.proptest = FALSE,
  ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text( face = "italic")),
  palette = "Set2"
)
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#>

^{Created on 2019-01-24 by the reprex package (v0.2.1)}

IndrajeetPatil · 2019-01-24T15:43:14Z

Thanks, Chuck. This all looks good to me. I will make minor modifications after the PR is merged-

Add ( to sample sizes to be consistent with ggpiestats.
The size of the sample size label text seems a bit small compared to the rest of the label text in the plot. I'll toy around with different values to see if a bit bigger text size looks better.
Will also add me to the author field (#' @author Chuck Powell -> #' @author Chuck Powell, Indrajeet Patil) since the entire non-plotting related body of the function comes from ggpiestats which I had written.

Would also like to add grouped_ggbarstats and tests for this function or do you want me to take care of it?
I am fine either way.

IndrajeetPatil · 2019-01-24T18:06:43Z

Here is what the output looks like with the modifications I have introduced to make this aesthetically as close to ggpiestats as possible. Lemme know what you think before I push these changes to master.

Reproducing the same examples you used above-

Example 1

set.seed(123)

ggstatsplot::ggbarstats(
  data = datasets::mtcars,
  main = vs,
  condition = cyl,
  bf.message = TRUE,
  nboot = 10,
  factor.levels = c("0 = V-shaped", "1 = straight"),
  legend.title = "Engine"
)
#> Note: Results from one-sample proportion tests for each
#>       level of the condition variable testing for equal
#>       proportions of the main variable.
#> 
#> # A tibble: 3 x 7
#>   condition `0`     `1`    `Chi-squared`    df `p-value` significance
#>   <fct>     <chr>   <chr>          <dbl> <dbl>     <dbl> <chr>       
#> 1 4         9.09%   90.91%         7.36      1     0.007 **          
#> 2 6         42.86%  57.14%         0.143     1     0.705 ns          
#> 3 8         100.00% 0.00%         14         1     0     ***
#> Warning in stats::chisq.test(x = data$main, y = data$condition, correct =
#> FALSE, : Chi-squared approximation may be incorrect
#> Note: 95% CI for effect size estimate was computed with 10 bootstrap samples.
#>

Example 2

set.seed(123)

ggstatsplot::ggbarstats(
  data = as.data.frame(HairEyeColor),
  main = Eye,
  condition = Hair,
  counts = Freq
)
#> Note: Results from one-sample proportion tests for each
#>       level of the condition variable testing for equal
#>       proportions of the main variable.
#> 
#> # A tibble: 4 x 9
#>   condition Brown Blue  Hazel Green `Chi-squared`    df `p-value`
#>   <fct>     <chr> <chr> <chr> <chr>         <dbl> <dbl>     <dbl>
#> 1 Black     62.9~ 18.5~ 13.8~ 4.63%         87.3      3     0    
#> 2 Brown     41.6~ 29.3~ 18.8~ 10.1~         63.3      3     0    
#> 3 Red       36.6~ 23.9~ 19.7~ 19.7~          5.45     3     0.142
#> 4 Blond     5.51% 74.0~ 7.87% 12.6~        164.       3     0    
#> # ... with 1 more variable: significance <chr>
#> Note: 95% CI for effect size estimate was computed with 25 bootstrap samples.
#>

Example 3

set.seed(123)

ggstatsplot::ggbarstats(
  ggstatsplot::movies_long,
  mpaa,
  genre,
  bf.message = TRUE,
  sampling.plan = "jointMulti",
  title = "MPAA Ratings by Genre",
  caption = "As of January 23, 2019",
  nboot = 5,
  perc.k = 1,
  x.axis.orientation = "slant",
  facet.proptest = FALSE,
  ggplot.component = ggplot2::theme(axis.text.x = ggplot2::element_text(face = "italic")),
  palette = "Set2"
)
#> Note: 95% CI for effect size estimate was computed with 5 bootstrap samples.
#>

^{Created on 2019-01-24 by the reprex package (v0.2.1)}

ibecav · 2019-01-24T18:23:33Z

So the biggest difference I can see visually is that you want the significance testing results from the proportions test up top rather than down below? I can certainly live with that.

Hard for me to see anything else you've done.

IndrajeetPatil · 2019-01-24T18:28:25Z

Yes, additional changes that are not conspicuous-

minor grid for y-axis is restricted to [0,1]
outline color for bars has been changed from grey to black
the sample size label has bigger text size

And that's about it.

If you are okay with these changes, I will push these changes to master and close this issue.

ibecav · 2019-01-24T18:33:39Z

Let's see there's this usual issue that I know you have a method for...

checking R code for possible problems (11.8s)
ggbarstats: no visible binding for global variable ‘N’
Undefined global functions or variables: N

The minor grid fix on y is very nice I had removed the x grid

Can we make the outline color selecatable. I don't care about the default

Ditto sample size. I don't much care about the defaults as long as I can adjust

IndrajeetPatil · 2019-01-24T20:45:41Z

@ibecav I have added tests for this function.

Do you also want to make a PR for the grouped_ version of this function?

ibecav · 2019-01-24T20:58:41Z

Yes I will happily do so. I've been thinking about the other issue (how to consolidate top level functions effectively) as well. One strategy that comes to my mind is to group more by the number of variables involved (univariate, bivariate & multivariate) and less by plot type. For example in my mind pie charts are more like histograms and belong in that grouping whereas chi square association tests are bivariate and not much different in many ways than ggbetweenstats.

Just early thinking. More when I have a chance to think it out.

IndrajeetPatil added the enhancement 🔥 New feature or request label Oct 27, 2018

IndrajeetPatil added this to the 0.0.8 milestone Oct 28, 2018

IndrajeetPatil modified the milestones: 0.0.8, 0.0.9 Jan 23, 2019

ibecav mentioned this issue Jan 24, 2019

Ggbarstats Issue number #78 #143

Merged

IndrajeetPatil closed this as completed in 8d71aa0 Jan 24, 2019

IndrajeetPatil reopened this Jan 24, 2019

IndrajeetPatil closed this as completed in bab2c80 Jan 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request for new function ggbarstats #78

Feature request for new function ggbarstats #78

ibecav commented Oct 26, 2018

IndrajeetPatil commented Oct 27, 2018

ibecav commented Oct 30, 2018

ibecav commented Jan 22, 2019

IndrajeetPatil commented Jan 22, 2019

ibecav commented Jan 23, 2019

IndrajeetPatil commented Jan 23, 2019

ibecav commented Jan 23, 2019

IndrajeetPatil commented Jan 23, 2019 •

edited

ibecav commented Jan 23, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

Feature request for new function ggbarstats #78

Feature request for new function ggbarstats #78

Comments

ibecav commented Oct 26, 2018

IndrajeetPatil commented Oct 27, 2018

ibecav commented Oct 30, 2018

ibecav commented Jan 22, 2019

IndrajeetPatil commented Jan 22, 2019

ibecav commented Jan 23, 2019

IndrajeetPatil commented Jan 23, 2019

ibecav commented Jan 23, 2019

IndrajeetPatil commented Jan 23, 2019 • edited

ibecav commented Jan 23, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 24, 2019

ibecav commented Jan 24, 2019

IndrajeetPatil commented Jan 23, 2019 •

edited