Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for shuffle in UuidUtil.mathRNG #59

Closed
julemand101 opened this issue Feb 24, 2021 · 4 comments
Closed

Remove the need for shuffle in UuidUtil.mathRNG #59

julemand101 opened this issue Feb 24, 2021 · 4 comments

Comments

@julemand101
Copy link
Contributor

julemand101 commented Feb 24, 2021

The fix for #46 introduced the following in UuidUtil.mathRNG:

(seed == -1) ? b.shuffle() : b.shuffle(Random(seed));

This solution is not really optimal but was properly introduced because of missing knowledge about the underlying problem which is misuse of the Random class. This can be expected since the Dart SDK documentation for Random is not really that specific about how to use it.

The real problem in #46 is that we are creating a new Random() object every time we call UuidUtil.mathRNG. When we create a Random() object, The Dart VM are using an internal random generator to generate a seed for our own Random() instance. For Random(), this seed is 32 bit long. The implementation for this can be found here:
https://github.com/dart-lang/sdk/blob/fd81b73e67051a7ac06669a01ec32082ced0dc4f/sdk/lib/_internal/vm/lib/math_patch.dart#L183

32 bit of randomness is not really that great (two Random objects with same seed will produce the same number sequence) and will open op for problems if we create a lot of Random() objects. But the random generator behind Random() is well implemented without the worry of repeating patterns. So what we should do instead is just create a single instance for our UUID library and reuse this.

An example can be see here:

import 'dart:math';
import 'dart:typed_data';

void main(List<String> args) {
  print('brokenRandom: ${testForDuplicates(brokenRandom)}'); // Duplicates: 103
  print('betterRandom: ${testForDuplicates(betterRandom)}'); // Duplicates: 0
}

int testForDuplicates(Uint8List Function() getNumbers, [int count = 1000000]) =>
    count - {for (var i = 0; i < count; i++) getNumbers().join(',')}.length;

Uint8List brokenRandom() {
  final rand = Random();
  final b = Uint8List(16);

  for (var i = 0; i < 16; i++) {
    b[i] = rand.nextInt(256);
  }

  return b;
}

final _rand = Random();
Uint8List betterRandom() {
  final b = Uint8List(16);

  for (var i = 0; i < 16; i++) {
    b[i] = _rand.nextInt(256);
  }

  return b;
}

The reason why calling shuffle "fixes" the problem, is because shuffle are creating its own Random object every time:

  void shuffle([Random? random]) {
    random ??= Random();

https://github.com/dart-lang/sdk/blob/4a484c88e13ef0f4847daba3c25fa22dbe933a08/sdk/lib/collection/list.dart#L363

This is really not what big of a problem since we are not expecting shuffle to be based on great randomness (and if we do, we are properly using Random.secure(). But since we are creating another Random object, and uses that for shuffle, we do make it a lot more rare that we get two 32 bit seeds in a row which we have previous seen.

But the use of shuffle is really just needless use of CPU cycles for something which can be fixed by creating a static final random generator.

(My change does also add a static final for Random.secure(). The Dart VM is implemented to call the OS for random numbers for all numbers generated with Random.secure() so we don't have the same seed problem. But both for consistency, and because there can be some overhead, we should also just cache this random generator instance.)

@julemand101
Copy link
Contributor Author

I should add that the consequence of removing the shuffle also for when a seed has been given to UuidUtil.mathRNG is that the generated UUID for a given seed is no longer the same as before my suggested patch. If backward compatibility is important, we can add the shuffle again for this case.

@daegalus
Copy link
Owner

This is fantastic, and thank you for taking the time to dig this info up and relay the information. I don't believe I am concerned about backwards compatibility for 3.x since i did a lot of breaking changes already. This also primarily powers v4 numbers, which are primarily used for completely random UUIDs every time. Seeds are rarely used with v4 in production use-cases.

I am ok with the consequences. I will be honest, I have made changes that changed the UUID outputs for v4 in the past. I am currently focused on work stuff, but I will merge this and create a release in a few hours.

Again, thank you very much for the PR and figuring this out, this is good info to know.

@julemand101
Copy link
Contributor Author

Thanks for the fast response. 👍

And just take the time. This is not really that urgent. I just stumbled into this implementation and was kinda puzzled about the shuffle and the reasons behind it. :)

@daegalus
Copy link
Owner

daegalus commented Mar 1, 2021

Just released 3.0.1 with your fix. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants