Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom annotations when using facets #23

Closed
basiliscus79 opened this issue Jun 30, 2017 · 9 comments
Closed

Custom annotations when using facets #23

basiliscus79 opened this issue Jun 30, 2017 · 9 comments

Comments

@basiliscus79
Copy link

Hello, I am trying to add custom annotations to ggplot2 faceted (facet_wrap) barplots and I am unable to make it work.
When I use the built-in tests in stat_signif() everything seems to work, from a technical standpoint, since I see the faceted barplots with all selected contrasts annotated by pvalues. Nonetheless I see strange things, like bars where there should clearly a statistically significant difference that are non-significant.
Therefore I decided to make tests separately and then add annotations in a custom way on the barplots.
I tried and it works nicely when I do a single plot, but when I add facets nothing works, I see just plain barplots without any annotation. I have tried to make it work by adding repeated lists to 'comparisons' and 'annotations' but the way it works is by piling each of the list elements on top of each other on every single plot of the facets rather than using one element per facet plot.
I hope I was able to explain my issue in a clear way.

Thanks for any suggestion

Michele

@basiliscus79
Copy link
Author

basiliscus79 commented Jul 3, 2017

I found a solution to the problem by using a modified version of the StatSignif function from https://github.com/const-ae/ggsignif/blob/master/R/significance_annotation.R

The input of the comparisons and annotations parameters of the stat_signif() function are the following:

comparisons = replicate(l, comp.list, FALSE)

where l is the number of unique facets in the plot and comp.list is the required list of binary comparisons

annotations = annot.list

where annot.list has the same length as comparisons (l) and each element of the list is a character vector of p value annotations, the latter having the same legnth as the elements in comp.list

Here is the modified version of the ggsignif StatSignif function:

StatSignif <- ggplot2::ggproto("StatSignif", ggplot2::Stat,
                               required_aes = c("x", "y", "group"),
                               setup_params = function(data, params) {
                                 # if(any(data$group == -1)|| any(data$group != data$x)){
                                 if(any(data$group == -1)){
                                   stop("Can only handle data with groups that are plotted on the x-axis")
                                 }
                                 #
                                 if (is.character(params$test)) params$test <- match.fun(params$test)
                                 params$complete_data <- data
                                 if(is.null(params$xmin) != is.null(params$xmax) || length(params$xmin) != length(params$xmax))
                                   stop("If xmin or xmax is set, the other one also needs to be set and they need to contain the same number of values")
                                 if(!is.null(params$xmin) && !is.null(params$comparisons))
                                   stop("Set either the xmin, xmax values or the comparisons")
                                 if(!is.null(params$xmin) && is.null(params$y_position))
                                   stop("If xmin, xmax are defined also define y_position")
                                 if(! is.null(params$y_position) && length(params$y_position) == 1)
                                   params$y_position <- rep(params$y_position, max(length(params$comparisons), length(params$xmin)))
                                 if(length(params$margin_top) == 1) params$margin_top <- rep(params$margin_top, max(length(params$comparisons),length(params$xmin)))
                                 if(length(params$step_increase) == 1) params$step_increase <- rep(params$step_increase, max(length(params$comparisons),length(params$xmin)))
                                 if(length(params$tip_length) == 1) params$tip_length <- rep(params$tip_length, max(length(params$comparisons),length(params$xmin)) * 2)
                                 if(length(params$tip_length) == length(params$comparisons)) params$tip_length <- rep(params$tip_length, each=2)
                                 if(length(params$tip_length) == length(params$xmin)) params$tip_length <- rep(params$tip_length, each=2)
                                 if(! is.null(params$annotations) && length(params$annotations) == 1)
                                   params$annotations <- rep(params$annotations, max(length(params$comparisons),length(params$xmin)))
                                 if(! is.null(params$annotations) && length(params$annotations) != max(length(params$comparisons),length(params$xmin)))
                                   stop(paste0("annotations contains a different number of elements (", length(params$annotations),
                                               ") than comparisons or xmin (", max(length(params$comparisons),length(params$xmin)), ")."))
                                 
                                 if(all(params$map_signif_level == TRUE)){
                                   params$map_signif_level <- c("***"=0.001, "**"=0.01, "*"=0.05)
                                 }else if(is.numeric(params$map_signif_level)){
                                   if(is.null(names(params$map_signif_level)) ){
                                     if(length(params$map_signif_level) <= 3){
                                       names(params$map_signif_level) <- tail(c("***", "**", "*"), n=length(params$map_signif_level))
                                     }else{
                                       stop('Cannot handle un-named map for significance values, please provide in the following format: c("***"=0.001, "**"=0.01, "*"=0.05)')
                                     }
                                   }
                                 }
                                 return(params)
                               },
                               compute_group = function(data, scales, comparisons, test, test.args, complete_data,
                                                        annotations, map_signif_level, y_position, xmax, xmin,
                                                        margin_top, step_increase, tip_length) {
                                 #
                                 idx <- as.numeric(as.character(data$PANEL[1]))
                                 
                                 
                                 if(! is.null(comparisons)){
                                   #
                                   comparisons2 <- comparisons[[idx]]
                                   #
                                   #- - - - - - - - - - #
                                   # debug only
                                   #
                                   # print(str(comparisons2))
                                   #- - - - - - - - - - #
                                   #
                                   i <- 0
                                   #
                                   result <- lapply(comparisons2, function(comp){
                                     #
                                     i <<- i + 1
                                     #
                                     uno <- comp[[1]]
                                     due <- comp[[2]]
                                     #- - - - - - - - - - #
                                     # debug only
                                     #
                                     # print(data$group[1])
                                     # print(uno)
                                     # print(due)
                                     #- - - - - - - - - - #
                                     #
                                     # All entries in group should be the same
                                     if(scales$x$map(uno) == data$group[1]){
                                       #- - - - - - - - - - #
                                       # debug only
                                       #
                                       # print("YES! Matched group mapping!")
                                       #- - - - - - - - - - #
                                       #
                                       test_result <- if(is.null(annotations)){
                                         #- - - - - - - - - - #
                                         # debug only
                                         #
                                         # print("USING automatic annotations!")
                                         #- - - - - - - - - - #
                                         #
                                         group_1 <- complete_data$y[complete_data$x == scales$x$map(uno) & complete_data$PANEL == data$PANEL[1]]
                                         group_2 <- complete_data$y[complete_data$x == scales$x$map(due) & complete_data$PANEL == data$PANEL[1]]
                                         p_value <- do.call(test, c(list(group_1, group_2), test.args))$p.value
                                         if(is.numeric(map_signif_level)){
                                           temp_value <- names(which.min(map_signif_level[which(map_signif_level > p_value)]))
                                           if(is.null(temp_value)){
                                             "NS."
                                           }else{
                                             temp_value
                                           }
                                         }else{
                                           if(p_value < 2.2e-16){
                                             "< 2.2e-16"
                                           }else{
                                             as.character(signif(p_value, digits=2))
                                           }
                                           
                                         }
                                         
                                       }else{
                                         #- - - - - - - - - - #
                                         # debug only
                                         #
                                         # print("USING custom annotations!")
                                         #- - - - - - - - - - #
                                         #
                                         valore <- sapply(annotations[idx], "[[", i)
                                         #- - - - - - - - - - #
                                         # debug only
                                         #
                                         # print(valore)
                                         #- - - - - - - - - - #
                                         #
                                       }
                                       y_scale_range <- (scales$y$range$range[2] - scales$y$range$range[1])
                                       y_pos <- if(is.null(y_position)){
                                         scales$y$range$range[2] + y_scale_range * margin_top[i] + y_scale_range * step_increase[i] * (i-1)
                                       }else{
                                         y_pos <- y_position[i]
                                       }
                                       data.frame(x=c(min(uno,due),
                                                      min(uno,due),
                                                      max(uno,due)),
                                                  xend=c(min(uno,due),
                                                         max(uno,due),
                                                         max(uno,due)),
                                                  y=c(y_pos - y_scale_range*tip_length[(i-1)*2+1], y_pos, y_pos),
                                                  yend=c(y_pos, y_pos, y_pos-y_scale_range*tip_length[(i-1)*2+2]),
                                                  annotation=test_result, group=paste(c(comp, i), collapse = "-"))
                                       
                                     } 
                                   })
                                   do.call(rbind, result)
                                   
                                 }else{
                                   
                                   if(data$x[1] == min(complete_data$x) & data$group[1] == min(complete_data$group)){
                                     y_scale_range <- (scales$y$range$range[2] - scales$y$range$range[1])
                                     data.frame(x=c(xmin, xmin, xmax),
                                                xend=c(xmin, xmax, xmax),
                                                y=c(y_position - y_scale_range*tip_length[seq_len(length(tip_length))%% 2 == 1], y_position, y_position),
                                                yend=c(y_position, y_position, y_position-y_scale_range*tip_length[seq_len(length(tip_length))%% 2 == 0]),
                                                annotation=rep(annotations, times=3), group=rep(seq_along(xmin), times=3))
                                   }
                                  }
                                 }
                               )

I hope this is of some help.

best and thank you

Michele

@vestlink
Copy link

vestlink commented Jul 3, 2017 via email

@basiliscus79
Copy link
Author

Please try to run the file ggSignif_MS_code.R after loading the associated .RData

ggSignif_MS_code.zip

Please note that the uploaded code is ready to produce a custom significance annotation plot and, in order for it to work, you SHOULD NOT LOAD the ggsignif library but rather source the stat_signif_modified.R (as specified in the script).
Vice versa, if you would like to test it using the regular ggsignif package, the custom annotation layer must be commented and the ggsignif layer for automatic stats must be un-commented as well. This should be clearly visible in the script itself.

Let me know if it works for you
I apologize for any issues there might be, I am new to Github and this is my very first thread.

Best

Michele

@const-ae
Copy link
Owner

const-ae commented Jul 6, 2017

Hey Michele,

sorry for my late response and thanks for your extensive description of the problem.
I am able to reproduce the customly labeled facets with your code, but I am not sure that your suggested solution with a nested list for annot.list is the best way forward in general.

#22 contains a discussion how to add labels that are specific for different facets, which might be useful for you.

It would be really helpful to solve this issue if you could create a condensed version of the code (5-15 lines) using one of the pre-installed datasets (e.g. mpg), because it takes quite a lot of effort to understand the ggSignif_MS_code.R example that you send .

If you have any further questions, I will be happy to help you.

Best,
Constantin

@basiliscus79
Copy link
Author

basiliscus79 commented Jul 7, 2017 via email

@basiliscus79
Copy link
Author

Just created two minimal example script with both the mpg and the diamonds datasets.

There U go

Different from the previously shared data and similar to many real-world examples, here the comparisons list is generated so as to account for the fact that grouping may differ in each different facet (see the mpg example).

These examples work ONLY with the custom/modified functions and DO NOT work with the regular ggsignif package, at least in my hands.

Again, I hope this could be of help

Best

Michele

@const-ae
Copy link
Owner

Hi Michele,

I have just pushed an update for the package to github that makes it possible to configure the geom_signif with a data.frame. The data frame has to contain the custom annotations and the start (xmin) and end (xmax) of the comparisons. In addition the data.frame can contain columns for the facet.

Here is a simplified example of the code you send me with the respective output:

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(ggsignif)
diamonds
#> # A tibble: 53,940 x 10
#>    carat       cut color clarity depth table price     x     y     z
#>    <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
#>  2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
#>  3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
#>  4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
#>  5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
#>  6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
#>  7  0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47
#>  8  0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
#>  9  0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
#> 10  0.23 Very Good     H     VS1  59.4    61   338  4.00  4.05  2.39
#> # ... with 53,930 more rows

annotation_df <- cross_d(list(clarity = unique(diamonds$clarity), cut = unique(diamonds$cut))) %>% 
  mutate(xmin = factor("Fair", levels = levels(cut)), 
    annotations = apply(combn(letters, 2), 2, paste0, collapse = "")[1:n()]) %>% 
  filter(cut != xmin) %>% 
  group_by(clarity) %>% 
  arrange(cut) %>% 
  mutate(y_position = 8000 + 2000 * row_number())

annotation_df
#> Source: local data frame [32 x 5]
#> Groups: clarity [8]
#> 
#> # A tibble: 32 x 5
#>    clarity       cut   xmin annotations y_position
#>     <fctr>    <fctr> <fctr>       <chr>      <dbl>
#>  1     SI2      Good   Fair          ar      10000
#>  2     SI1      Good   Fair          as      10000
#>  3     VS1      Good   Fair          at      10000
#>  4     VS2      Good   Fair          au      10000
#>  5    VVS2      Good   Fair          av      10000
#>  6    VVS1      Good   Fair          aw      10000
#>  7      I1      Good   Fair          ax      10000
#>  8      IF      Good   Fair          ay      10000
#>  9     SI2 Very Good   Fair          az      12000
#> 10     SI1 Very Good   Fair          bc      12000
#> # ... with 22 more rows

highsd = function(x) {
  return(mean(x, na.rm = T) + sd(x))
}

ggplot(data = diamonds, aes(x = cut, y = price)) + 
  stat_summary(fun.y = mean, fun.ymin = mean, fun.ymax = highsd,
    width = 0.8, geom = "errorbar", 
    position = position_dodge(width = 0.9), colour = "black", size = 0.3) + 
  stat_summary(geom = "bar", fun.y = mean, aes(fill = cut), 
    width = 0.8, colour = "black", 
    position = position_dodge(width = 0.9), size = 0.4, show.legend = F) +
  geom_signif(data = annotation_df, 
    aes(annotations = annotations, xmin = xmin, xmax = cut, y_position = y_position), 
    manual = TRUE) +
  facet_wrap(~clarity, scales = "free") +
  ylim(NA, 17000)
#> Warning: Ignoring unknown aesthetics: annotations, xmin, xmax, y_position
#> Warning: Removed 717 rows containing non-finite values (stat_summary).

#> Warning: Removed 717 rows containing non-finite values (stat_summary).

You should easily be able to change the output by modifying the anntation_df.

Best,
Constantin

@basiliscus79
Copy link
Author

basiliscus79 commented Jul 15, 2017 via email

@basiliscus79
Copy link
Author

basiliscus79 commented Jul 26, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants