New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNG behaves differently in nested calls to Rcpp functions #823

Closed
wch opened this Issue Feb 23, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@wch

wch commented Feb 23, 2018

The global RNGScopeCounter, causes GetRNGstate() and PutRNGstate() to be called only when entering/exiting the lowest Rcpp function on the call stack.

This can cause problems when one Rcpp function calls another one, and the latter one uses the random number generator.

In the example below, the behavior of f is different when called directly from the console vs. called from another Rcpp function.

library(Rcpp)

cppFunction('
  double rng() {
    return R::runif(1,10);
  }
')

cppFunction('
  Rcpp::RObject invoke(Rcpp::Function f) {
    return f();
  }
')

f <- function() {
  set.seed(100)
  a <- runif(1)
  rng()
  
  set.seed(100)
  rng()
  b <- runif(1)

  identical(a, b)
}

f()  # FALSE

invoke(f) # TRUE

cc: @jcheng5
From rstudio/shiny#1953.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Feb 23, 2018

As I discussed over there, I consider the example invalid and construed because of Rcpp::Function().

@wch wch changed the title from RNG does not behave correctly in nested calls to Rcpp functions to RNG behaves differently in nested calls to Rcpp functions Feb 23, 2018

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Feb 23, 2018

I guess our basic guarantee is this one:

R> set.seed(100)
R> c(rng(), runif(1))
[1] 0.307766 0.257673
R> set.seed(100)
R> c(runif(1), rng())
[1] 0.307766 0.257673
R> 

Using your rng() function (corrected using R::runif(0,1) to correspond to the R call -- ), we see that once a set is seed, it does matter whether your query the RNG via R or C. That is a nice property.

It seems like piercing this guarantee with Rcpp::Function() to call back to R destroy it. Oh well, so let's not do that then in apps relying on identical RNG streams. One could always call out to other RNGs.

Slightly mod'ed example below:

#include <Rcpp.h>

// [[Rcpp::export]]
double rng() {
  return R::runif(0, 1);
}

// [[Rcpp::export]]
Rcpp::RObject invoke(Rcpp::Function f) {
  Rcpp::RObject z = f();
  return z;
}

/*** R
f <- function() {
  set.seed(100)
  a1 <- runif(1)
  b1 <- rng()

  set.seed(100)
  a2 <- rng()
  b2 <- runif(1)

  identical(c(a1, b1), c(a2, b2))
}

f()  

invoke(f) # not TRUE
*/
@kevinushey

This comment has been minimized.

Contributor

kevinushey commented Feb 27, 2018

Some notes from a conversation with @wch (mostly writing here for my own benefit; this might already be obvious to those familiar with R's RNG).

GetRNGstate() and PutRNGstate() are effectively used to synchronize the R .Random.seed variable with the internal RNG state used by the active RNG. In effect:

  • GetRNGstate() synchronizes .Random.seed => internal state;
  • PutRNGstate() synchronizes internal state => .Random.seed.

In @wch's example, the call to rng() does not call GetRNGstate() / PutRNGstate(), so the C-level RNG table gets out of sync with .Random.seed. When a later usage of R's RNG is made (through e.g. runif(1), the RNG state is synchronized (.Random.seed => internal seed state). However, because .Random.seed is stale, we get an unexpected result.

I think it should be safe to recursively call GetRNGstate() / PutRNGstate(), though. The only downside is we might end up with a couple unneeded calls to GetRNGstate() and PutRNGstate(). E.g. suppose we had ('unwrapping' the Rcpp function wrapper to show the calls to GetRNGstate() / PutRNGstate()):

void f() {
    GetRNGstate();
        g();
        h();
    PutRNGstate();
}

In the above example, if g() makes use of the random number generator without calling PutRNGstate(), then h() (if it were to call GetRNGstate()) would end up with a stale view of the RNG. On the other hand:

void f() {
    GetRNGstate();
	GetRNGstate();
	g();
	PutRNGstate();
	GetRNGstate();
	h();
	PutRNGstate();
    PutRNGstate();
}

The only real downside is that we're making a couple unnecessary calls to GetRNGstate() / PutRNGstate(). That is, repeated calls to GetRNGstate() aren't wrong, just wasteful -- whereas missing calls to GetRNGstate() / PutRNGstate() can definitely get the user into a bind.

tl;dr: I think it's safe and correct to call GetRNGstate() and PutRNGstate() recursively.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Feb 27, 2018

I'm mostly on the same page following the discussing with @wch and @jcheng5 over on the shiny ticket.
When I look at git blame though, I see that the counter-based call of the get/put functions was added by nobody other than ... you. Do you recall details? Just a simple optimization?

@kevinushey

This comment has been minimized.

Contributor

kevinushey commented Feb 27, 2018

😬

I cannot recall the details. Perhaps a misguided optimization on my part?

@coatless

This comment has been minimized.

Contributor

coatless commented Feb 28, 2018

FYI this issue is similar to #268.

@eddelbuettel

This comment has been minimized.

Member

eddelbuettel commented Feb 28, 2018

Good catch. I still think the whole is somewhat secondary, but @kevinushey is of course influenced by what @wch and @jcheng5 have seen in shiny. But to me the whole issue is still on the edge of overblown: it took years of usage by an arguably large number of people ... to find a single unhappy shiny user who had somewhat non-standard code calling back to R.

As @jjallaire said in #268: Calling R functions from within C++ code is not the design center. Couldn't agree more. With the PR we will be making more RNG state change calls, all of which may tickle a garbage collection event as the discussion in #268 reminded us. I still think that overall this is going to be bening ... but we don't really know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment