Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow args to opts_regex etc. to be passed with "..." #111

Closed
gagolews opened this issue Oct 31, 2014 · 6 comments
Closed

Allow args to opts_regex etc. to be passed with "..." #111

gagolews opened this issue Oct 31, 2014 · 6 comments

Comments

@gagolews
Copy link
Owner

Currently calling stri_opts_collator, stri_opts_regex, and stri_opts_brkiter e.g. in stringi-search functions is a "little bit" inconvenient. What about adding a "..." param to all the functions so that all the matching args are passed to stri_opts?

Something like:

> test <- function (str, pattern, simplify=FALSE, opts_regex = NULL, ...) 
+ {
+    if (!missing(...))
+       opts_regex <- do.call(stri_opts_regex, as.list(c(opts_regex, ...)))
+    .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
+ }
> 
> print(test("ala", "ALA"))
[[1]]
[1] NA

> print(test("ala", "ALA", case_insensitive=TRUE))
[[1]]
[1] "ala"

> print(test("ala", "ALA", case=TRUE))
[[1]]
[1] "ala"

> print(test("ala", "ALA", case_insensitive=TRUE, nosucharg=FALSE))
 Error in (function (case_insensitive, comments, dotall, literal, multiline,  : 
  unused argument (nosucharg = FALSE) 

> print(test("ala", "ALA", opts_regex=stri_opts_regex(case_insensitive=FALSE), case_insensitive=TRUE))

 Error in (function (case_insensitive, comments, dotall, literal, multiline,  : 
  formal argument "case_insensitive" matched by multiple actual arguments 

Benchmarks:

> microbenchmark::microbenchmark(
+    stri_extract_all_regex("ala", "ala"),
+    stri_extract_all_regex("ala", "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
+    stri_extract_all_regex("ala", "ala", opts_regex=list(case_insensitive=FALSE)),
+    test("ala", "ala"),
+    test("ala", "ala", case_insensitive=FALSE),
+    test("ala", "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
+    test("ala", "ala", opts_regex=list(case_insensitive=FALSE))
+ )
Unit: microseconds
                                                                                         expr    min      lq     mean  median      uq     max neval
                                                         stri_extract_all_regex("ala", "ala") 15.649 16.5875 19.09487 17.0950 18.0355  97.610   100
 stri_extract_all_regex("ala", "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 24.647 25.7270 27.77733 26.3455 28.0155  54.031   100
            stri_extract_all_regex("ala", "ala", opts_regex = list(case_insensitive = FALSE)) 16.800 17.9585 18.81852 18.5280 19.4885  27.298   100
                                                                           test("ala", "ala") 15.643 16.7275 17.91048 17.5255 18.2935  38.547   100
                                                 test("ala", "ala", case_insensitive = FALSE) 56.032 57.9775 61.66630 58.9165 60.0255 103.298   100
                   test("ala", "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 25.667 26.9000 28.57848 27.3160 28.1750  56.455   100
                              test("ala", "ala", opts_regex = list(case_insensitive = FALSE)) 16.599 17.9690 19.41654 18.8200 19.5445  38.198   100
> x <- stri_rand_strings(10000, 3)
> microbenchmark::microbenchmark(
+    stri_extract_all_regex(x, "ala"),
+    stri_extract_all_regex(x, "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
+    stri_extract_all_regex(x, "ala", opts_regex=list(case_insensitive=FALSE)),
+    test(x, "ala"),
+    test(x, "ala", case_insensitive=FALSE),
+    test(x, "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
+    test(x, "ala", opts_regex=list(case_insensitive=FALSE)),
+    unit="relative"
+ )
Unit: relative
                                                                                     expr       min        lq      mean   median        uq       max neval
                                                         stri_extract_all_regex(x, "ala") 0.9981751 0.9961943 0.9624526 1.001169 0.9908091 0.9326833   100
 stri_extract_all_regex(x, "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 1.0000000 1.0000000 1.0000000 1.000000 1.0000000 1.0000000   100
            stri_extract_all_regex(x, "ala", opts_regex = list(case_insensitive = FALSE)) 0.9976754 1.0013583 0.9913800 1.003427 1.0021834 0.9551786   100
                                                                           test(x, "ala") 0.9982575 1.0048992 0.9965834 1.010263 1.0219637 0.9833843   100
                                                 test(x, "ala", case_insensitive = FALSE) 1.0232501 1.0275143 1.0000415 1.033012 1.0288242 1.0404019   100
                   test(x, "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 1.0049166 1.0225035 1.0088873 1.024782 1.0277768 1.0289821   100
                              test(x, "ala", opts_regex = list(case_insensitive = FALSE)) 1.0047261 1.0019214 1.0068571 1.005476 1.0114852 2.1843647   100

Do we accept such a performance "drop"? That'd be really user-friendly, IMO. :)

@gagolews
Copy link
Owner Author

gagolews commented Nov 1, 2014

However, I'm unsure if missing(...) is valid at all, see this SO question

@gagolews
Copy link
Owner Author

gagolews commented Nov 1, 2014

An alternative solution posted by GSee@SO:

library("stringi")
test1 <- compiler::cmpfun(function(str, pattern, simplify=FALSE, opts_regex = NULL, ...) 
{
    if (!missing(...))
       opts_regex <- do.call(stri_opts_regex, as.list(c(opts_regex, ...)))
    .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
})

test2 <- compiler::cmpfun(function(str, pattern, simplify=FALSE, opts_regex = NULL, ...) 
{
   .dotargs <- list(...)
   if (length(.dotargs))
       opts_regex <- do.call(stri_opts_regex, as.list(c(opts_regex, .dotargs)))
    .Call("stri_extract_all_regex", str, pattern, simplify, opts_regex, PACKAGE = "stringi")
})

microbenchmark::microbenchmark(
    stri_extract_all_regex("ala", "ala"),
    test1("ala", "ala"),
    test2("ala", "ala"),
    stri_extract_all_regex("ala", "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
    test1("ala", "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
    test2("ala", "ala", opts_regex=stri_opts_regex(case_insensitive=FALSE)),
    stri_extract_all_regex("ala", "ala", opts_regex=list(case_insensitive=FALSE)),
    test1("ala", "ala", opts_regex=list(case_insensitive=FALSE)),
    test2("ala", "ala", opts_regex=list(case_insensitive=FALSE)),    
    test1("ala", "ala", case_insensitive=FALSE),
    test2("ala", "ala", case_insensitive=FALSE),
   unit="relative"
 )

give

Unit: relative
                                                                                         expr      min       lq     mean   median       uq       max neval
                                                         stri_extract_all_regex("ala", "ala") 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100
                                                                          test1("ala", "ala") 1.075398 1.052826 1.017465 1.053447 1.029103 0.8134295   100
                                                                          test2("ala", "ala") 1.103345 1.084199 1.148588 1.075272 1.161981 3.0893550   100
 stri_extract_all_regex("ala", "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 1.650189 1.578858 1.436439 1.565039 1.501811 1.1126940   100
                  test1("ala", "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 1.685809 1.609878 1.525307 1.585138 1.563039 1.3008204   100
                  test2("ala", "ala", opts_regex = stri_opts_regex(case_insensitive = FALSE)) 1.731662 1.652550 1.611348 1.645803 1.775540 1.7175679   100
            stri_extract_all_regex("ala", "ala", opts_regex = list(case_insensitive = FALSE)) 1.117478 1.091565 1.057609 1.093022 1.062635 1.2145773   100
                             test1("ala", "ala", opts_regex = list(case_insensitive = FALSE)) 1.204451 1.166843 1.147524 1.171154 1.259060 1.1053589   100
                             test2("ala", "ala", opts_regex = list(case_insensitive = FALSE)) 1.208672 1.186623 1.117606 1.179420 1.158620 1.1505213   100
                                                test1("ala", "ala", case_insensitive = FALSE) 3.606190 3.386835 3.133516 3.336023 3.373023 3.4719326   100
                                                test2("ala", "ala", case_insensitive = FALSE) 3.400652 3.238217 3.011531 3.175456 3.116777 2.8689177   100

@gsee
Copy link

gsee commented Nov 1, 2014

Try it where you pass in something without defining it first such that you get an error like in the SO post.

@gsee
Copy link

gsee commented Nov 1, 2014

Instead of .dotargs <- list(...);if (length(.dotargs) > 0), try using if (length(list(...))) like I did in the SO post ;)

@gagolews
Copy link
Owner Author

gagolews commented Nov 1, 2014

@gsee: I've updated the benchmarks; in the test1 case I would get an err on undefined something anyway, as I do c(opts_regex, ...). I think I'll use length(list(...) for this feature, thanks!

@gsee
Copy link

gsee commented Nov 1, 2014

agreed, you'd get an error with either test1 or test2 if you call with something, but if you were doing a lot of processing between when you check the arguments and when you attempt to use them, you might prefer to get the error sooner to avoid unnecessary computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants