Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with gganimate package #38

Closed
EvoLandEco opened this issue Jun 18, 2023 · 14 comments
Closed

Compatibility with gganimate package #38

EvoLandEco opened this issue Jun 18, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@EvoLandEco
Copy link

According to my tests ggpmisc and gganimate are currently not compatible with each other, or please enlight me if I was wrong:

library(gganimate)
#> Loading required package: ggplot2
library(ggpmisc)
#> Loading required package: ggpp
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

# Animation without stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'

# Static plot with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
    stat_poly_eq()
#> `geom_smooth()` using formula = 'y ~ x'

# Static plot with stat_poly_eq() and facets
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  stat_poly_eq() + facet_wrap(. ~ cut)
#> `geom_smooth()` using formula = 'y ~ x'

# Adding stat_poly_eq() to the animation causes the error
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  stat_poly_eq() +
  labs(title = "Cut = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Computation failed in `stat_poly_eq()`
#> Caused by error in `abs()`:
#> ! non-numeric argument to mathematical function
#> Error in `$<-.data.frame`(`*tmp*`, "group", value = ""): replacement has 1 row, data has 0

# Combining stat_poly_eq() with facets fails for each facet
ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth(method = "lm") +
  transition_states(clarity, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  stat_poly_eq() + facet_wrap(. ~ cut) +
  labs(title = "Clarity = {closest_state}")
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Computation failed in `stat_poly_eq()`
#> Caused by error in `abs()`:
#> ! non-numeric argument to mathematical function
#> Error in `$<-.data.frame`(`*tmp*`, "group", value = ""): replacement has 1 row, data has 0

Created on 2023-06-18 with reprex v2.0.2

It might also be the case that more efforts should be made by the author(s) of gganimte, but it would be nice if you could investigate a bit why they are not compatible

@EvoLandEco EvoLandEco added the enhancement New feature or request label Jun 18, 2023
@aphalo
Copy link
Owner

aphalo commented Jun 18, 2023

@EvoLandEco Many thanks for reporting this problem! A very quick check did not reveal the cause of the problem. I'll keep this issue open and come back to it as soon as possible.

@aphalo aphalo added this to the v0.5.3 milestone Jun 18, 2023
@EvoLandEco
Copy link
Author

@aphalo No worries! Though this compatibility will surely boost my thesis and paper writing, thank you again for what you've done already .

@aphalo
Copy link
Owner

aphalo commented Jun 19, 2023

@EvoLandEco 'gganimate' seems to struggle sometimes with mappings, especially those set for the whole plot by calling aes() as an argument to 'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimate' and what makes it silently ignore mappings. In addition, mappings not set directly through a call to aes() seem to als'ggplot()' and possibly the default mappings set by statistics. I cannot yet make sense of what triggers errors in 'gganimo cause difficulties.

[Edited] Fixing this problem did not seem easy without me studying the internals of 'gganimate', but see the next comment.

@aphalo aphalo modified the milestones: v0.5.3, Future versions Jun 19, 2023
@aphalo
Copy link
Owner

aphalo commented Jun 19, 2023

@EvoLandEco I think I found the root of the problem. stat_poly_eq() expects the column group at it's data input to be integer as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changes group into a character vector to distinguish scenes, and this breaks my code.

I used stat_debug_group() to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...

library(gganimate)
#> Loading required package: ggplot2
library(gginnards)
library(tibble)

diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ]

# 'gganimate' converts group from integer into character
ggplot(diamonds, aes(x = carat, y = price)) +
  stat_debug_group(summary.fun = as_tibble) +
  transition_states(cut)
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 66 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.9   2873 1     -1<1>
#>  2  0.5   1069 1     -1<1>
#>  3  0.7    945 1     -1<1>
#>  4  1.5   8190 1     -1<1>
#>  5  1.01  4072 1     -1<1>
#>  6  2.01 14402 1     -1<1>
#>  7  1.01  6366 1     -1<1>
#>  8  0.7   1895 1     -1<1>
#>  9  0.78  2312 1     -1<1>
#> 10  0.5   1238 1     -1<1>
#> # ℹ 56 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 185 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1.17  3866 1     -1<2>
#>  2  0.71  2161 1     -1<2>
#>  3  1.01  4912 1     -1<2>
#>  4  1.56  7094 1     -1<2>
#>  5  1.36  7549 1     -1<2>
#>  6  0.9   3621 1     -1<2>
#>  7  0.31   462 1     -1<2>
#>  8  0.7   2335 1     -1<2>
#>  9  0.34   589 1     -1<2>
#> 10  0.31   924 1     -1<2>
#> # ℹ 175 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 466 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1     6732 1     -1<3>
#>  2  0.5   1243 1     -1<3>
#>  3  0.26   499 1     -1<3>
#>  4  0.52  1273 1     -1<3>
#>  5  1.24  8298 1     -1<3>
#>  6  0.9   3975 1     -1<3>
#>  7  0.9   3909 1     -1<3>
#>  8  0.71  2098 1     -1<3>
#>  9  2.01 17751 1     -1<3>
#> 10  0.82  2643 1     -1<3>
#> # ℹ 456 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 586 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   965 1     -1<4>
#>  2  0.57  1746 1     -1<4>
#>  3  1.2   4131 1     -1<4>
#>  4  0.9   3774 1     -1<4>
#>  5  1.71 10457 1     -1<4>
#>  6  0.79  3230 1     -1<4>
#>  7  0.32   828 1     -1<4>
#>  8  1.02  3856 1     -1<4>
#>  9  0.7   3365 1     -1<4>
#> 10  0.75  3108 1     -1<4>
#> # ℹ 576 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 854 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   564 1     -1<5>
#>  2  0.3    814 1     -1<5>
#>  3  0.71  3710 1     -1<5>
#>  4  0.43   818 1     -1<5>
#>  5  1.02  6418 1     -1<5>
#>  6  0.81  2894 1     -1<5>
#>  7  0.58  1332 1     -1<5>
#>  8  2.22 15584 1     -1<5>
#>  9  0.41   788 1     -1<5>
#> 10  0.3    835 1     -1<5>
#> # ℹ 844 more rows

Created on 2023-06-19 with reprex v2.0.2

@aphalo
Copy link
Owner

aphalo commented Jun 19, 2023

@EvoLandEco Dear TianJian I think that the bug is now fixed. Please, install 'ggpmisc' from this GitHub repository, and let me know if it also works with your own data plots. The reprex you provided does now work as expected and a couple of variations that I tried.

Many thanks for reporting the problem and providing an example!

# 'ggpmisc' future Version 0.5.3

library(gganimate)
#> Loading required package: ggplot2
library(ggpmisc)
#> Loading required package: ggpp
#> 
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#> 
#>     annotate

# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq() +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq(mapping = use_label(c("eq", "R2", "F"))) +
  transition_states(cut, transition_length = 1, state_length = 1) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

# Animation with stat_poly_eq()
ggplot(diamonds, aes(x = carat, y = price, color = color)) +
  geom_point() +
  stat_poly_line() +
  stat_poly_eq() +
  theme_bw() +
  transition_states(cut, transition_length = 1, state_length = 2) +
  enter_fade() + exit_shrink() +
  labs(title = "Cut = {closest_state}")

Created on 2023-06-19 with reprex v2.0.2

aphalo added a commit that referenced this issue Jun 19, 2023
Issue #38, 'gganimate' modifies group from integer into character. The character contains the original integer as part of the string, so a group encoded as 'gganimate' uses the original integer is recovered and used to compute the label positions. This may break if the encoding used by 'gganimate' is modified.
@EvoLandEco
Copy link
Author

Thank you for the quick update! I tried it yesterday evenning but had some issues with my own data, I believe they were due to the complex nature of a real data set.

There were mainly three warnings:

Warning: Not enough data to perform fit for group 1; computing mean instead.

Warning in ci_f_ncp(stat, df1 = df1, df2 = df2, probs = probs) :
  Upper limit outside search range. Set to the maximum of the parameter range.

Warning: Computation failed in `stat_poly_eq()`
Caused by error in `check_output()`:
! out[1] <= out[2] is not TRUE

For the first one it was due to having only one observation in a group, but I have not idea what the other two came from. As a result the rr label cannot be shown in all of the frames of one panel. Do you have any idea about the potential cause? Or if you are interested in the data set I could also send you by email.

@EvoLandEco
Copy link
Author

EvoLandEco commented Jun 20, 2023

@EvoLandEco I think I found the root of the problem. stat_poly_eq() expects the column group at it's data input to be integer as I always thought it would be, and as far as I know always is in 'ggplot2'. 'gganimate' changes group into a character vector to distinguish scenes, and this breaks my code.

I used stat_debug_group() to print the data received as input by statistics as follows. I need still to think how to make 'ggpmisc' compatible with 'gganimate', but I now have a rough idea of what is needed...

library(gganimate)
#> Loading required package: ggplot2
library(gginnards)
library(tibble)

diamonds <- diamonds[sample.int(nrow(diamonds), nrow(diamonds) %/% 25), ]

# 'gganimate' converts group from integer into character
ggplot(diamonds, aes(x = carat, y = price)) +
  stat_debug_group(summary.fun = as_tibble) +
  transition_states(cut)
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 66 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.9   2873 1     -1<1>
#>  2  0.5   1069 1     -1<1>
#>  3  0.7    945 1     -1<1>
#>  4  1.5   8190 1     -1<1>
#>  5  1.01  4072 1     -1<1>
#>  6  2.01 14402 1     -1<1>
#>  7  1.01  6366 1     -1<1>
#>  8  0.7   1895 1     -1<1>
#>  9  0.78  2312 1     -1<1>
#> 10  0.5   1238 1     -1<1>
#> # ℹ 56 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 185 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1.17  3866 1     -1<2>
#>  2  0.71  2161 1     -1<2>
#>  3  1.01  4912 1     -1<2>
#>  4  1.56  7094 1     -1<2>
#>  5  1.36  7549 1     -1<2>
#>  6  0.9   3621 1     -1<2>
#>  7  0.31   462 1     -1<2>
#>  8  0.7   2335 1     -1<2>
#>  9  0.34   589 1     -1<2>
#> 10  0.31   924 1     -1<2>
#> # ℹ 175 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 466 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  1     6732 1     -1<3>
#>  2  0.5   1243 1     -1<3>
#>  3  0.26   499 1     -1<3>
#>  4  0.52  1273 1     -1<3>
#>  5  1.24  8298 1     -1<3>
#>  6  0.9   3975 1     -1<3>
#>  7  0.9   3909 1     -1<3>
#>  8  0.71  2098 1     -1<3>
#>  9  2.01 17751 1     -1<3>
#> 10  0.82  2643 1     -1<3>
#> # ℹ 456 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 586 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   965 1     -1<4>
#>  2  0.57  1746 1     -1<4>
#>  3  1.2   4131 1     -1<4>
#>  4  0.9   3774 1     -1<4>
#>  5  1.71 10457 1     -1<4>
#>  6  0.79  3230 1     -1<4>
#>  7  0.32   828 1     -1<4>
#>  8  1.02  3856 1     -1<4>
#>  9  0.7   3365 1     -1<4>
#> 10  0.75  3108 1     -1<4>
#> # ℹ 576 more rows
#> [1] "Summary of input 'data' to 'compute_group()':"
#> # A tibble: 854 × 4
#>        x     y PANEL group
#>    <dbl> <dbl> <fct> <chr>
#>  1  0.33   564 1     -1<5>
#>  2  0.3    814 1     -1<5>
#>  3  0.71  3710 1     -1<5>
#>  4  0.43   818 1     -1<5>
#>  5  1.02  6418 1     -1<5>
#>  6  0.81  2894 1     -1<5>
#>  7  0.58  1332 1     -1<5>
#>  8  2.22 15584 1     -1<5>
#>  9  0.41   788 1     -1<5>
#> 10  0.3    835 1     -1<5>
#> # ℹ 844 more rows

68747470733a2f2f692e696d6775722e636f6d2f435862434442342e676966

Created on 2023-06-19 with reprex v2.0.2

I tested a bit, if I set aes(color = color) and transition_manual(frames), if color is continuous, the grouping string is formatted as "-1<frame_number>". If color is discrete, then it will be formatted as "discreate_level<frame_number>". Seems that stat_poly_eq() only needs the first part of the grouping string to calculate the after stats, and that is the reason why you use regex to delete the <.*> part in an if condition.

I think it might be better to use grepl("-1?[0-9]+<[0-9]+>", data$group[1]) or grepl("-1<[0-9]+>", data$group[1]) | grepl("^[0-9]+<[0-9]+>", data$group[1]) to minimize the chance that any other package had simialr behavior, if gganimate doesn't cross beyond this formatting.

@aphalo
Copy link
Owner

aphalo commented Jun 20, 2023

@EvoLandEco In my third example, the group is not -1, but you are correct in that the test could be different to ensure long-term compatibility. grepl() would not work because it returns a logical value, but what I should instead of gsub() to remove the unwanted part is regex() to extract the first part of the string (an integer encoded a character) and discard whatever comes after it. However, any other package that modifies group into character would not work together with 'ggpmisc' unless the original integer value of 'group' can be extracted. The stat uses the integer group number to set the location of labels for the different groups (converting -1 into 1), otherwise they would overlap, or users would always need to set the positions manually.

The first message is triggered when there are not enough observations to fit the model. You should still get an equation like y = where is the mean. In most cases it can be ignored...

The second message I think is caused by failure of the algorithm used to compute confidence intervals, once again, most likely because of too few observations. The third warning is most likely an indirect consequence of this failure.

If there are not enough data to fit the model, R2 cannot be computed. There may be borderline cases when R2 can be computed but not its CI by bootstrapping, so I need to improve how this case can be handled. I will most likely disable the CI computation by default as it is also time consuming, but will need to handle the failure of CI computation more gracefully.

@EvoLandEco
Copy link
Author

@aphalo Good to know this, a better error handling will surely reduce the possibility to fail, and it is indeed hard to ensure a long term compatibility.

It's also quite hard for me to check and ensure enough observations because I generate a big amount of data through stochastic simulation on clustering computer, plots are thus automatically produced through pipeline.

I look forward to the next version of ggpmisc, thank you again!

@aphalo
Copy link
Owner

aphalo commented Jun 20, 2023

@EvoLandEco I was confused, no boostraping is involved in the stat. It is sometimes difficult to decide what should be a warning and what a message... The first one I think should be a message rather than a warning, and the test should take into account the model formula... In the case of lm() singularity is easier to handle automatically because lm() handles it gracefully. For rlm() it is difficult to automate because it stops with an error, so I added a parameter n.min that makes it possible to skip fitting the model given by formula when n < n.min in a group, fitting y ~ 1 instead of the model given by formula. The responsibilty is with the user, but in cases like your data when n is not predictable, it should help.

Testing for valid arguments to the CI calculation before attempting it, should solve the second error.

I am not sure if the last error is dependent on any of these. Anyway, NaNs are now handled correctly. This was a bug.

This is mostly a note to myself.

aphalo added a commit that referenced this issue Jun 21, 2023
Make extraction of original group number more robust in new code for 'gganimate', see #38. 
Update documentation and message texts in `stat_poly_eq()`.
Add commented-out test cases. Apparent bug in 'testthat'.
@aphalo
Copy link
Owner

aphalo commented Jun 21, 2023

@EvoLandEco I updated the regular expressions, but not exactly as you suggested, anyway this was a good point that you raised. Thanks! grepl("^(-1|[0-9]+).*$", data$group[1])) and gsub("^(-1|[0-9]+).*$", "\\1", data$group[1]).

If you have time, please, check the current version from GitHub. Thanks in advance!

I will update stat_ma_eq() and stat_quant_eq() before submitting to CRAN, and possibly other issues. So it will take some days or even a week or two before I release version 0.5.3.

Currently, when a parameter estimate is NA or NaN, the label is set to character(0). This produces "clean" plots, but may be confusing when say, R^2 is not shown at all. I am unsure about what is the most useful approach... Any suggestions?

@EvoLandEco
Copy link
Author

I think the latest update solved my current issue, really appreciated. I will be on holidays for two weeks, hopefully I will be able to try out your latest CRAN build by then

@aphalo
Copy link
Owner

aphalo commented Jun 22, 2023

Fixed stat_quant_eq() and stat_ma_eq(). I still need to fix stat_correlation() and stats based on 'broom'.

@aphalo
Copy link
Owner

aphalo commented Jun 23, 2023

Fixed all remaining stats.

@aphalo aphalo closed this as completed Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants