New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in runif() behavior in Rcpp 0.12.16 #836

Closed
kbroman opened this Issue Mar 19, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@kbroman

kbroman commented Mar 19, 2018

With Rcpp 0.12.16, my simcross package started throwing errors in the tests. I don't fully understand what's going on, but it seems related to me calling runif(n,0,L) with n==0.

Consider the following code (also available as a gist). This simulates a Poisson process on the interval (0,L) by drawing a Poisson number of points and then drawing their positions from a uniform distribution. There are two versions of the function; in the first, I don't try to trap the n==0 cases; in the latter I call runif() only with n>0.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector sim_crossovers_orig(const double L)
{
    int n_xo;

    n_xo = R::rpois(L/100.0);

    NumericVector tmp = runif(n_xo, 0.0, L);
    return tmp.sort();
}

// [[Rcpp::export]]
NumericVector sim_crossovers_new(const double L)
{
    int n_xo;

    n_xo = R::rpois(L/100.0);

    NumericVector tmp(0);
    if(n_xo > 0) tmp = runif(n_xo, 0.0, L);
    return tmp.sort();
}

With Rcpp ver 0.12.15, these two functions give the same results.

Rcpp::sourceCpp("sim_crossovers.cpp")

set.seed(20180318)
z <- replicate(10000, sim_crossovers_orig(100))
table(sapply(z, length))
##    0    1    2    3    4    5    6
## 3661 3681 1837  631  151   35    4

set.seed(20180318)
z <- replicate(10000, sim_crossovers_new(100))
table(sapply(z, length))
##    0    1    2    3    4    5    6
## 3661 3681 1837  631  151   35    4

But with Rcpp ver 0.12.16, the former gives a ton of 0-length vectors, while the latter works fine, though it gives somewhat different results from ver 0.12.15. (The tables being printed should be 10,000 draws from a Poisson distribution with mean 1.)

Rcpp::sourceCpp("sim_crossovers.cpp")

set.seed(20180318)
z <- replicate(10000, sim_crossovers_orig(100))
table(sapply(z, length))
##    0    1    3
## 9994    5    1

set.seed(20180318)
z <- replicate(10000, sim_crossovers_new(100))
table(sapply(z, length))
##    0    1    2    3    4    5    6
## 3637 3698 1846  621  152   41    5

I've tried this in both Linux and MacOS (High Sierra), and with R 3.4.3 and 3.4.4, with the same results in all cases. (Just the version of Rcpp matters.)

Here's my session info in the Rcpp 0.12.16 case on my Mac:

R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rcpp_0.12.16    devtools_1.13.5

loaded via a namespace (and not attached):
[1] compiler_3.4.4   tools_3.4.4      withr_2.1.1.9000 memoise_1.1.0    digest_0.6.15
@kevinushey

This comment has been minimized.

Contributor

kevinushey commented Mar 19, 2018

This definitely has something to do with the RNG state. I suspect I broke this somehow with this commit:

573e71d

The following example provides the expected results:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector sim_crossovers_orig(const double L)
{
   int n_xo;

   GetRNGstate();
   n_xo = R::rpois(L/100.0);
   PutRNGstate();

   GetRNGstate();
   NumericVector tmp = runif(n_xo, 0.0, L);
   PutRNGstate();

   return tmp.sort();
}

/*** R
set.seed(20180318)
z <- replicate(10000, sim_crossovers_orig(100))
table(sapply(z, length))
##    0    1    2    3    4    5    6
## 3661 3681 1837  631  151   35    4
*/
@kevinushey

This comment has been minimized.

Contributor

kevinushey commented Mar 19, 2018

Argh. Let's understand where this issue is coming from. The call to runif() is implemented as:

inline NumericVector runif( int n /*, double min = 0.0, double max = 1.0 */ ){
return NumericVector( n, stats::UnifGenerator__0__1() ) ;
}

We use the 'fill' constructor of NumericVector, passing it a generator that generates random uniforms. That 'generator' thing is defined here:

class UnifGenerator__0__1 : public ::Rcpp::Generator<double> {
public:
UnifGenerator__0__1() {}
inline double operator()() const {
double u;
do {u = unif_rand();} while (u <= 0 || u >= 1);
return u;
}
};

It inherits from something called 'Generator'. What is that? Well, it lives here:

template <typename T>
class Generator : public RNGScope {
public:
typedef T r_generator ;
};

In other words, all our generator objects inherit from RNGScope. Which means every time a generator is constructed, and then destructed, it's now (indiscriminately) getting and setting the RNG state. Before, this didn't happen because these RNGScope calls were effectively screened by a global counter. But now, all of the Rcpp sugar random functions are getting and setting the RNG state -- potentially before other calls to R's C APIs for random number generation have had a chance to synchronize the R-level seed with the C-level seed.

It also doesn't help that apparently these generator objects get copied around a bit, which imply a few spurious calls to GetRNGstate() and PutRNGstate().

tl;dr: I broke it. Sorry for the trouble :-/

The issue can be fixed by making sure our generators don't inherit from RNGScope, and can be worked around on your side (in the interim) by surrounding any calls of the 'random' functions in the R:: namespace with GetRNGstate() and PutRNGstate().

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Mar 19, 2018

Nice analysis. The only other breakage we had seen was in ECctmc where I liased with the maintainer. So I guess the breakage was a little wider that we thought :-/

@kbroman

This comment has been minimized.

kbroman commented Mar 19, 2018

thanks, @kevinushey!

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Mar 19, 2018

@kbroman For completeness, were you able to check that the now-merged PR fixes your test failures?

@kbroman

This comment has been minimized.

kbroman commented Mar 20, 2018

@eddelbuettel Yes, the PR fixed my test failures; thanks!

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Mar 20, 2018

Thanks for confirming, and sorry for the breakage. We usually manage to do better, but the shiny folks wanted a change so ... it happened.

@kbroman

This comment has been minimized.

kbroman commented Mar 20, 2018

@eddelbuettel Absolutely no problem; only praise from me for Rcpp! And reducing my mess to a reasonably concise reproducible example was an interesting puzzle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment