Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reached theoretically unreachable branch 2 #152

Closed
hadley opened this issue Jan 11, 2021 · 11 comments
Closed

Reached theoretically unreachable branch 2 #152

hadley opened this issue Jan 11, 2021 · 11 comments
Labels
Milestone

Comments

@hadley
Copy link

hadley commented Jan 11, 2021

a <- 1:20
b <- 10:500

diffobj::ses(a, b, max.diffs = 100)
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Internal Error: reached theoretically unreachable branch 2, contact maintainer.

Created on 2021-01-11 by the reprex package (v0.3.0.9001)

(Originally reported in r-lib/waldo#62)

@brodieG
Copy link
Owner

brodieG commented Jan 12, 2021

Thanks for reporting. I'll look at this next week.

@brodieG
Copy link
Owner

brodieG commented Jan 14, 2021

This specific issue is fixed in the "development" branch. I'm going to wait and see if the other one can be reproduced + I just submitted to CRAN so I'm going to wait a bit to resubmit.

@IndrajeetPatil
Copy link

IndrajeetPatil commented Feb 8, 2021

@brodieG I can confirm that the development branch indeed solves this problem.

I ran into this error message ("reached theoretically unreachable branch 2, contact maintainer.") while writing a snapshot test for HTML table output. But after updating to diffobj from the development branch, I no longer get this error and the snapshot test works perfectly! 😄

Reprex:

a <- 1:20
b <- 10:500

diffobj::ses(a, b, max.diffs = 100)
#> Warning in diff_myers(args[["a"]], args[["b"]], max.diffs =
#> args[["max.diffs"]], : Exceeded `max.diffs`: 511 vs 100 allowed. Diff is
#> probably suboptimal.
#> [1] "1,20c1,491"

Created on 2021-02-08 by the reprex package (v1.0.0)

@brodieG
Copy link
Owner

brodieG commented Feb 9, 2021

Thanks for confirming. At some point in the near future I'm going to sit down and stare really hard at the code to see if I can figure out how I can get the other "unreachable" branch to be reached, and (whether I figure it out or not), resubmit to CRAN.

@gadenbuie
Copy link
Contributor

gadenbuie commented Mar 1, 2021

Here's another set of test cases for you, based around comparing tibbles. I tried to reduce the reprex but I couldn't locate the problematic behavior beyond isolating the problematic columns.

I'm seeing these two errors

#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Logic Error: Exceeded buffer for finding fake snake; contact maintainer.

#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Internal Error: reached theoretically unreachable branch 2, contact maintainer.

and the second one goes away when using the development branch, while the other is reproducible in both.

CRAN
library(palmerpenguins)
library(dplyr)

comparison <- filter(penguins, year == 2007 | flipper_length_mm > 220)
test <- penguins %>% filter(year == 2008)

diffobj::ses(test$bill_length_mm, comparison$bill_length_mm, max.diffs = 100)
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Logic Error: Exceeded buffer for finding fake snake; contact maintainer.

test <- penguins %>% filter(year == 2007, flipper_length_mm > 200)
diffobj::ses(test$sex, comparison$sex, max.diffs = 100)
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Internal Error: reached theoretically unreachable branch 2, contact maintainer.

Created on 2021-03-01 by the reprex package (v1.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                      
#>  version  R version 3.6.3 Patched (2020-04-28 r79534)
#>  os       macOS  10.16                               
#>  system   x86_64, darwin15.6.0                       
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       America/New_York                           
#>  date     2021-03-01                                 
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version    date       lib
#>  P assertthat       0.2.1      2019-03-21 [?]
#>  P backports        1.2.0      2020-11-02 [?]
#>  P cli              2.3.0.9000 2021-02-17 [?]
#>  P crayon           1.4.1      2021-02-08 [?]
#>  P diffobj          0.3.3      2021-01-07 [?]
#>  P digest           0.6.27     2020-10-24 [?]
#>  P dplyr          * 1.0.2      2020-08-18 [?]
#>  P ellipsis         0.3.1      2020-05-15 [?]
#>  P evaluate         0.14       2019-05-28 [?]
#>  P fansi            0.4.2      2021-01-15 [?]
#>  P fs               1.5.0      2020-07-31 [?]
#>  P generics         0.1.0      2020-10-31 [?]
#>  P glue             1.4.2      2020-08-27 [?]
#>  P highr            0.8        2019-03-20 [?]
#>  P htmltools        0.5.1.9000 2021-02-25 [?]
#>  P knitr            1.30       2020-09-22 [?]
#>  P lifecycle        1.0.0      2021-02-15 [?]
#>  P magrittr         2.0.1      2020-11-17 [?]
#>  P palmerpenguins * 0.1.0      2020-07-23 [?]
#>  P pillar           1.5.0      2021-02-22 [?]
#>  P pkgconfig        2.0.3      2019-09-22 [?]
#>  P purrr            0.3.4      2020-04-17 [?]
#>  P R6               2.5.0      2020-10-28 [?]
#>    renv             0.12.3     2020-11-25 [1]
#>  P reprex           1.0.0      2021-01-27 [?]
#>  P rlang            0.4.10     2020-12-30 [?]
#>  P rmarkdown        2.7        2021-02-19 [?]
#>  P sessioninfo      1.1.1      2018-11-05 [?]
#>  P stringi          1.5.3      2020-09-09 [?]
#>  P stringr          1.4.0      2019-02-10 [?]
#>  P styler           1.3.2      2020-02-23 [?]
#>  P tibble           3.0.6      2021-01-29 [?]
#>  P tidyselect       1.1.0      2020-05-11 [?]
#>  P utf8             1.1.4      2018-05-24 [?]
#>  P vctrs            0.3.6      2020-12-17 [?]
#>  P withr            2.4.1      2021-01-26 [?]
#>  P xfun             0.19       2020-10-30 [?]
#>  P yaml             2.2.1      2020-02-01 [?]
#>  source                            
#>  CRAN (R 3.6.0)                    
#>  standard (@1.2.0)                 
#>  Github (r-lib/cli@9ef875e)        
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  Github (rstudio/htmltools@ac43afe)
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.3)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  standard (@0.12.3)                
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.3)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  standard (@3.0.6)                 
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>
#>  P ── Loaded and on-disk path mismatch.
development branch
library(palmerpenguins)
library(dplyr)

comparison <- filter(penguins, year == 2007 | flipper_length_mm > 220)
test <- penguins %>% filter(year == 2008)

diffobj::ses(test$bill_length_mm, comparison$bill_length_mm, max.diffs = 100)
#> Error in diff_myers(args[["a"]], args[["b"]], max.diffs = args[["max.diffs"]], : Logic Error: Exceeded buffer for finding fake snake; contact maintainer.

test <- penguins %>% filter(year == 2007, flipper_length_mm > 200)
diffobj::ses(test$sex, comparison$sex, max.diffs = 100)
#> Warning in diff_myers(args[["a"]], args[["b"]], max.diffs =
#> args[["max.diffs"]], : Exceeded `max.diffs`: 130 vs 100 allowed. Diff is
#> probably suboptimal.
#>  [1] "1d0"       "4,5c3,4"   "6a6"       "9,12c9,12" "15d14"     "17a17"    
#>  [7] "21d20"     "22a22"     "25d24"     "27c26"     "29a29"     "31d30"    
#> [13] "32a32"     "35c35"     "36a37,140"

Created on 2021-03-01 by the reprex package (v1.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                      
#>  version  R version 3.6.3 Patched (2020-04-28 r79534)
#>  os       macOS  10.16                               
#>  system   x86_64, darwin15.6.0                       
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       America/New_York                           
#>  date     2021-03-01                                 
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package        * version    date       lib
#>  P assertthat       0.2.1      2019-03-21 [?]
#>  P backports        1.2.0      2020-11-02 [?]
#>  P cli              2.3.0.9000 2021-02-17 [?]
#>  P crayon           1.4.1      2021-02-08 [?]
#>    diffobj          0.3.3.9000 2021-03-01 [1]
#>  P digest           0.6.27     2020-10-24 [?]
#>  P dplyr          * 1.0.2      2020-08-18 [?]
#>  P ellipsis         0.3.1      2020-05-15 [?]
#>  P evaluate         0.14       2019-05-28 [?]
#>  P fansi            0.4.2      2021-01-15 [?]
#>  P fs               1.5.0      2020-07-31 [?]
#>  P generics         0.1.0      2020-10-31 [?]
#>  P glue             1.4.2      2020-08-27 [?]
#>  P highr            0.8        2019-03-20 [?]
#>  P htmltools        0.5.1.9000 2021-02-25 [?]
#>  P knitr            1.30       2020-09-22 [?]
#>  P lifecycle        1.0.0      2021-02-15 [?]
#>  P magrittr         2.0.1      2020-11-17 [?]
#>  P palmerpenguins * 0.1.0      2020-07-23 [?]
#>  P pillar           1.5.0      2021-02-22 [?]
#>  P pkgconfig        2.0.3      2019-09-22 [?]
#>  P purrr            0.3.4      2020-04-17 [?]
#>  P R6               2.5.0      2020-10-28 [?]
#>    renv             0.12.3     2020-11-25 [1]
#>  P reprex           1.0.0      2021-01-27 [?]
#>  P rlang            0.4.10     2020-12-30 [?]
#>  P rmarkdown        2.7        2021-02-19 [?]
#>  P sessioninfo      1.1.1      2018-11-05 [?]
#>  P stringi          1.5.3      2020-09-09 [?]
#>  P stringr          1.4.0      2019-02-10 [?]
#>  P styler           1.3.2      2020-02-23 [?]
#>  P tibble           3.0.6      2021-01-29 [?]
#>  P tidyselect       1.1.0      2020-05-11 [?]
#>  P utf8             1.1.4      2018-05-24 [?]
#>  P vctrs            0.3.6      2020-12-17 [?]
#>  P withr            2.4.1      2021-01-26 [?]
#>  P xfun             0.19       2020-10-30 [?]
#>  P yaml             2.2.1      2020-02-01 [?]
#>  source                            
#>  CRAN (R 3.6.0)                    
#>  standard (@1.2.0)                 
#>  Github (r-lib/cli@9ef875e)        
#>  CRAN (R 3.6.2)                    
#>  Github (brodieg/diffobj@e718369)  
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  Github (rstudio/htmltools@ac43afe)
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.3)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  standard (@0.12.3)                
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.3)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.0)                    
#>  standard (@3.0.6)                 
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.2)                    
#>  CRAN (R 3.6.0) 
#> 
#>  P ── Loaded and on-disk path mismatch.

@brodieG
Copy link
Owner

brodieG commented Mar 2, 2021

Perfect, thanks. I'll look in the next couple of days.

@gadenbuie
Copy link
Contributor

Oh and one more data point for the fake snake error message. In my reprex, the error depends on max.diff and appears only when that value is between 72 and 203.

@brodieG brodieG added the bug label Mar 4, 2021
@brodieG brodieG added this to the 0.3.4 milestone Mar 4, 2021
@brodieG
Copy link
Owner

brodieG commented Mar 5, 2021

The development branch now addresses @gadenbuie additional case. I want to look at this more closely as it's clear that the cases with low max.diffs are not well covered, so I'm not going to submit to CRAN just yet. I've found a couple of other issues already.

Aside, while it's great that you all are testing these low max.diffs cases, I'm wonder a little bit why. I imagine performance should be acceptable for the likely use case (e.g. waldo output in up to ~15ish failed tests). For example, on my very slow system I get:

> a <- sample(letters[1:5], 1000, replace=TRUE)
> b <- sample(letters[1:5], 1000, replace=TRUE)
> system.time(ses_dat(a, b))
   user  system elapsed 
  0.020   0.000   0.021 

These could run with max.diffs=1e3.

@hadley
Copy link
Author

hadley commented Mar 5, 2021

I'm setting the number low not because of performance, but because for most of the waldo cases, once you beyond a small number of diffs you're probably comparing two unrelated vectors, and it's simpler to just print them side by side, rather than displaying a complex diff (e.g. r-lib/waldo#51). Is there a better approach?

@brodieG
Copy link
Owner

brodieG commented Mar 5, 2021

That makes sense. Maybe the thing to do there is to compare the match percentage, and if it is low report the vectors side by side instead of the diff'ed version? Using Garrick's example, we have:

> a <- test$bill_length_mm
> b <- comparison$bill_length_mm
> table(diffobj::ses_dat(a, b)[['op']])

 Match Insert Delete 
    25    115     89 

This requires some additional post-processing in waldo, but should be better than having a half-baked diff from hitting max.diffs. Whether or not that's worth the effort is another question.

@hadley
Copy link
Author

hadley commented Mar 5, 2021

Ah good idea, I've made a note to look into that next time I'm working on waldo.

brodieG added a commit that referenced this issue Mar 23, 2021
Fix #152: Theoretically unreachable branch, related also
Fix #155, Fix #157.  Generally improved (corrected) the
O(n) fallback algorithm.
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 6, 2021
## v0.3.4

* Add a print method for `ses_dat` return values that makes it easier to
  interpret the diff.
* [#152](brodieG/diffobj#152): Rewrite the
  fall-back "O(n)" algorithm that kicks in when there are `max.diff` differences
  to be more robust (h/t @hadley, @DanChaltiel, @gadenbui).


## v0.3.3

* Implement experimental .Rout / .Rout.save testing.
* Fix `all.equal` test breakages from
  [r79555](r-devel/r-svn@66d0165).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants