Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify is_na() #799

Closed
wants to merge 2 commits into from
Closed

Simplify is_na() #799

wants to merge 2 commits into from

Conversation

krlmlr
Copy link
Contributor

@krlmlr krlmlr commented Jan 11, 2018

This speeds up the check, because compilers aren't likely to be able to optimize two external functions that basically return f(x) && g(x) and f(x) && !g(x). The first mention of the R_isnancpp() function dates back to 2005, wch/r-source@17443788ef.

Related: #790

@eddelbuettel
Copy link
Member

What speedups are you seeing?

Also, line 47 and 48 probably want the same change for complex.

@krlmlr
Copy link
Contributor Author

krlmlr commented Jan 11, 2018

For this version (38d024f):

sum_is_na <- Rcpp::cppFunction("
int sum_is_na(NumericVector x) {
  int n = 0;
  for (R_xlen_t i = 0; i < x.size(); ++i) {
    double xi = x[i];
    if (Rcpp::traits::is_na<REALSXP>(xi)) ++n;
  }
  return n;
}
"
)

N <- 5e6
x <- runif(N)
x[x < 0.5] <- NA_real_

microbenchmark::microbenchmark(
  sum(is.na(x)),
  sum_is_na(x)
)
#> Unit: milliseconds
#>           expr      min       lq     mean   median       uq      max neval
#>  sum(is.na(x)) 10.24388 10.62800 12.26077 11.25501 13.76406 18.73605   100
#>   sum_is_na(x) 20.23770 20.65917 21.54281 21.25047 22.05458 25.35780   100
#>  cld
#>   a 
#>    b

Created on 2018-01-11 by the reprex package (v0.1.1.9000).

For a version using std::isnan() (e473015, currently in my local repo):

#> Unit: milliseconds
#>           expr      min       lq     mean   median       uq      max neval
#>  sum(is.na(x)) 10.32959 10.55353 12.03510 11.08237 13.15562 16.21278   100
#>   sum_is_na(x) 13.08563 13.35475 13.72524 13.57882 14.02745 16.02345   100
#>  cld
#>   a 
#>    b

For master (f1ec6cf):

#> Unit: milliseconds
#>           expr      min       lq     mean   median       uq      max neval
#>  sum(is.na(x)) 10.31513 10.62298 12.18608 11.00471 13.85499 22.86626   100
#>   sum_is_na(x) 49.11284 49.96843 51.05028 50.77832 52.05040 55.72882   100
#>  cld
#>   a 
#>    b

Bottom line: Faster by factor ~2.5, if we believe that std::isnan() will be the same as R_isnancpp() forever, we can make it faster by factor ~4. R is still faster even with the additional penalty of allocating and filling a logical vector (why???)

Will update complex implementation once we reach consensus.

@eddelbuettel
Copy link
Member

2x speedup is good, even on a minuscule task. Just want to make really sure the results are the same but I suppose our unit tests are comprehensive.

@eddelbuettel
Copy link
Member

Can you please extend this to the complex case on lines 47 and 48 ?

@krlmlr
Copy link
Contributor Author

krlmlr commented Jan 14, 2018

With std::isnan() (x4) or with R_isnancpp() (x2.5)?

eddelbuettel added a commit that referenced this pull request Jan 14, 2018
@eddelbuettel
Copy link
Member

Sorry, you misunderstood what I meant. There was a commit missing. See #800 which I'll open in just a second as an extension of this. I made the change a few hours ago locally, and ran a rev.dep.

Do you have a Windows machine handy? I have been bisecting the last few hours trying to find what broke Windows. I get tests timeout on R Hub. I think it is #789 by @lionel but I am not quite sure.

@krlmlr
Copy link
Contributor Author

krlmlr commented Jan 14, 2018

Sure, I can run a few tests. Do you want me to run R CMD check with pre- and post-#789? On R-devel?

@eddelbuettel
Copy link
Member

Yes, I just sent you a link to your "Kirill and R" handle at the mail box ... We can continue there.

I have some note, but I am mostly dizzy and need a break. A second set of eyes for checking would be good.

@eddelbuettel
Copy link
Member

Also, look over at #789 -- I'll continue there.

I'll close this PR. Your commit is part of #800 though.

@krlmlr krlmlr deleted the f-simplify-is-na branch March 7, 2018 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants