Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random.sample() results are biased #7

Closed
pbkhrv opened this issue Sep 23, 2014 · 3 comments
Closed

Random.sample() results are biased #7

pbkhrv opened this issue Sep 23, 2014 · 3 comments

Comments

@pbkhrv
Copy link

pbkhrv commented Sep 23, 2014

Thanks for making a nice library. I was looking for a seed-able PRNG and random-js fits the bill.
I was using Random.sample(population, 1) and noticed that it always returns the same result, namely the last item in the population array.
Here is test code:

var Random = require('./random.js');
var r = new Random(Random.engines.mt19937().autoSeed());
var hits = {};
var population = ['a', 'b', 'c', 'd', 'e'];
var sampleSize = 1;
for(var i = 0; i < 100000; i++) {
  var k = r.sample(population, sampleSize).join(',');
  hits[k]++ || (hits[k] = 1); 
}
console.log(hits);

When I run it with node, it prints out the following:
{ e: 100000 }

When I use sampleSize = 2, the results also seem to heavily favor one of the elements of the population array:
{ 'd,b': 20013,
'd,c': 19884,
'd,e': 20011,
'd,a': 20069,
'e,d': 20023 }

This smells like an off-by-one somewhere... Or am I missing something?

Thanks.

  • Peter
@ckknight
Copy link
Owner

Your bug definitely seems valid. I'll try to reproduce and fix soon.

@hkbenchan
Copy link

I also encounter this problem, below is how I solve it temporarily, but this will break the original shuffle if I directly using myShuffle function as a replacement of the old shuffle

var random = new Random(Random.engines.browserCrypto);

random.myShuffle = function (array, downTo) {
    var length = array.length;
    if (length) {
      if (downTo == null) {
        downTo = 0;
      }
      for (var i = (length - 1) >>> 0; i >= downTo; --i) {
        var distribution = Random.integer(0, i);
        var j = distribution(this.engine);
        if (i !== j) {
          var tmp = array[i];
          array[i] = array[j];
          array[j] = tmp;
        }
      }
    }
    return array;
};


random.mySample = function (population, sampleSize) {
    if (sampleSize < 0 || sampleSize > population.length || !isFinite(sampleSize)) {
      throw new RangeError("Expected sampleSize to be within 0 and the length of the population");
    }

    if (sampleSize === 0) {
      return [];
    }

    var clone = population.slice();
    var length = clone.length;
    if (length === sampleSize) {
      return this.myShuffle(clone, 0);
    }
    var tailLength = length - sampleSize;
    return this.myShuffle(clone, tailLength).slice(tailLength);
};

@ckknight
Copy link
Owner

ckknight commented Mar 4, 2015

Yep, logically, there was an off-by-one error where if you sampled by one, it would shuffle exactly 0 items rather than 1. This applied to sample and not shuffle because shuffle has no point in shuffling the very last item (since it can only shuffle with itself), whereas with sample, it needs to take the next item into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants