Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSimplify is_na() #799
Simplify is_na() #799
Conversation
|
What speedups are you seeing? Also, line 47 and 48 probably want the same change for |
|
For this version (38d024f): sum_is_na <- Rcpp::cppFunction("
int sum_is_na(NumericVector x) {
int n = 0;
for (R_xlen_t i = 0; i < x.size(); ++i) {
double xi = x[i];
if (Rcpp::traits::is_na<REALSXP>(xi)) ++n;
}
return n;
}
"
)
N <- 5e6
x <- runif(N)
x[x < 0.5] <- NA_real_
microbenchmark::microbenchmark(
sum(is.na(x)),
sum_is_na(x)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.24388 10.62800 12.26077 11.25501 13.76406 18.73605 100
#> sum_is_na(x) 20.23770 20.65917 21.54281 21.25047 22.05458 25.35780 100
#> cld
#> a
#> b
For a version using #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.32959 10.55353 12.03510 11.08237 13.15562 16.21278 100
#> sum_is_na(x) 13.08563 13.35475 13.72524 13.57882 14.02745 16.02345 100
#> cld
#> a
#> bFor master (f1ec6cf): #> Unit: milliseconds
#> expr min lq mean median uq max neval
#> sum(is.na(x)) 10.31513 10.62298 12.18608 11.00471 13.85499 22.86626 100
#> sum_is_na(x) 49.11284 49.96843 51.05028 50.77832 52.05040 55.72882 100
#> cld
#> a
#> bBottom line: Faster by factor ~2.5, if we believe that Will update complex implementation once we reach consensus. |
|
2x speedup is good, even on a minuscule task. Just want to make really sure the results are the same but I suppose our unit tests are comprehensive. |
|
Can you please extend this to the complex case on lines 47 and 48 ? |
|
With |
|
Sorry, you misunderstood what I meant. There was a commit missing. See #800 which I'll open in just a second as an extension of this. I made the change a few hours ago locally, and ran a rev.dep. Do you have a Windows machine handy? I have been bisecting the last few hours trying to find what broke Windows. I get tests timeout on R Hub. I think it is #789 by @lionel but I am not quite sure. |
|
Sure, I can run a few tests. Do you want me to run R CMD check with pre- and post-#789? On R-devel? |
|
Yes, I just sent you a link to your "Kirill and R" handle at the mail box ... We can continue there. I have some note, but I am mostly dizzy and need a break. A second set of eyes for checking would be good. |
This speeds up the check, because compilers aren't likely to be able to optimize two external functions that basically return
f(x) && g(x)andf(x) && !g(x). The first mention of theR_isnancpp()function dates back to 2005, wch/r-source@1744378.Related: #790