Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fast rademacher random variables #49

Closed
kylebutts opened this issue Jul 1, 2023 · 1 comment
Closed

Support for fast rademacher random variables #49

kylebutts opened this issue Jul 1, 2023 · 1 comment

Comments

@kylebutts
Copy link
Contributor

Hi,

In statistics, a lot of methods require drawing Rademacher weights, -1 and 1, with probably 1/2 each (e.g. bootstrapping and approximations). I authored this small package https://github.com/kylebutts/rademacher which draws integers between 0 and 2^31-1 and then takes each bit of the integer (which is 0/1 with equal probability) and uses that as a 0/1 variable. It's about 6x as fast as dqrng::sample(c(-1, 1)).

I wanted to submit a pull request to add this feature using dqrng to try and maximize speed-up. I have an initial attempt here: https://github.com/kylebutts/dqrng where I use the uint64_t that gets generated by the dqrng::rng64_t and uses each of those bits.

However, I can't seem to stop segfaults from happening. I am a little out of my depth on RNG in C++ it seems, so I was wondering if you would be willing to look over the code and see if you can figure out why it's segfaulting?

Thank you for the package regardless!


Source code for generating rademacher:

#include <Rcpp.h>
using namespace Rcpp;

//' Sample Rademacher distribution really fast
//' 
//' @description This uses a fancy trick to draw Rademacher weights very 
//'   quickly. To do so, the function draws from 1:(2^31 - 1), and then 
//'   uses each bit of the integer to determine 31 values of 0/1. This
//'   allows for 31 Rademacher random variables to be drawn per random draw.
//'   Taking those bits * 2 - 1 gives the Rademacher random variables.
//'
//' @param n Integer, number of random variables to draw
//' 
//' @return integer vector of length n with values -1 or 1
//' 
//' @export
// [[Rcpp::export]]
IntegerVector sample_rademacher(int n) {
  if (n < 0) {
    stop("n must be a positive integer");
  }
  if (n > 2147483647) {
    stop("n must be less than 2147483647");
  }

  IntegerVector result(n);

  // Calculate the number of integers needed based on N
  int num_integers = ceil(n / 31.0);

  // 2^31 - 1 = 2147483647
  IntegerVector random_integer = Rcpp::sample(2147483647, num_integers, true);
  
  int k = 0;
  int J = 30;
  for (int i = 0; i < num_integers; i++) {
    int curr = random_integer[i];

    // Make sure not to overfill the result vector
    if (i == num_integers - 1) {
      J = (n % 31) - 1;
    } 
    
    for (int j = J; j >= 0; j--) {
      
      result[k] = ((curr >> j) & 1) * 2 - 1;
      k = k + 1;
    }
  }

  return result;
}
@rstub
Copy link
Member

rstub commented Jul 1, 2023

Interesting. Does the segfault happen always or only sometimes? Can you open a draft PR? That would make it easier for me to get the code and try it out locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants