Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid v1 are not unique at all #43

Closed
JCKodel opened this issue Apr 5, 2020 · 8 comments
Closed

uuid v1 are not unique at all #43

JCKodel opened this issue Apr 5, 2020 · 8 comments

Comments

@JCKodel
Copy link

JCKodel commented Apr 5, 2020

I have an application with 751,446 users that generated 863,497 database records and there is a lot of conflicts.

Unique Ids Two clashes Three clashes Four or more clashes
862,117 663 18 0

Code used to generate the id:

Uuid().v1()

Those 18 clashes generated these records:

Click here to expand
UUID Device Date
80efe870-de35-11e9-8d55-214e2769a4f0 514503fd4f5d967a 2019-09-23 19:08:18.0000000 +00:00
80efe870-de35-11e9-8d55-214e2769a4f0 514503fd4f5d967a 2019-09-23 19:08:18.0000000 +00:00
80efe870-de35-11e9-8d55-214e2769a4f0 5f469abde7842bb6 2019-09-23 19:08:18.0000000 +00:00
83873a40-e158-11e9-bba5-37afc07df59f e766537ad7fe8714 2019-09-27 18:56:29.0000000 +00:00
83873a40-e158-11e9-bba5-37afc07df59f e766537ad7fe8714 2019-09-27 18:56:29.0000000 +00:00
83873a40-e158-11e9-bba5-37afc07df59f e766537ad7fe8714 2019-09-27 18:56:29.0000000 +00:00
887da9d0-d7da-11e9-9274-3bccee8f26b2 1dc22bdfceceede2 2019-09-15 17:02:00.0000000 +00:00
887da9d0-d7da-11e9-9274-3bccee8f26b2 1dc22bdfceceede2 2019-09-15 17:02:00.0000000 +00:00
887da9d0-d7da-11e9-9274-3bccee8f26b2 1dc22bdfceceede2 2019-09-15 17:02:00.0000000 +00:00
82a65b00-d632-11e9-aa3f-3dc1d9683a0f ae1a928095f336b8 2019-09-13 14:26:44.0000000 +00:00
82a65b00-d632-11e9-aa3f-3dc1d9683a0f ae1a928095f336b8 2019-09-13 14:26:44.0000000 +00:00
82a65b00-d632-11e9-aa3f-3dc1d9683a0f ae1a928095f336b8 2019-09-13 14:26:44.0000000 +00:00
e3251440-bbba-11e9-a29d-49adc69e972e a69fc2c20bc05b04 2019-08-10 22:04:56.0000000 +00:00
e3251440-bbba-11e9-a29d-49adc69e972e a69fc2c20bc05b04 2019-08-10 22:04:56.0000000 +00:00
e3251440-bbba-11e9-a29d-49adc69e972e a69fc2c20bc05b04 2019-08-10 22:04:56.0000000 +00:00
ebae1bc0-e455-11e9-b7c5-75e78d1d35ac 3c86565a56848517 2019-10-01 14:15:28.0000000 +00:00
ebae1bc0-e455-11e9-b7c5-75e78d1d35ac 3c86565a56848517 2019-10-01 14:15:28.0000000 +00:00
ebae1bc0-e455-11e9-b7c5-75e78d1d35ac 68cfda7341c3b7df 2019-10-01 14:15:28.0000000 +00:00
8984c110-e9d7-11e9-99e9-77424a6c0053 ae5b97827890695c 2019-10-08 14:25:54.0000000 +00:00
8984c110-e9d7-11e9-99e9-77424a6c0053 ae5b97827890695c 2019-10-08 14:25:54.0000000 +00:00
8984c110-e9d7-11e9-99e9-77424a6c0053 ae5b97827890695c 2019-10-08 14:25:54.0000000 +00:00
49d78790-cad4-11e9-93b5-7d3ec148d0be 983d9e4f85c287f5 2019-08-30 03:14:33.0000000 +00:00
49d78790-cad4-11e9-93b5-7d3ec148d0be 983d9e4f85c287f5 2019-08-30 03:14:33.0000000 +00:00
49d78790-cad4-11e9-93b5-7d3ec148d0be 983d9e4f85c287f5 2019-08-30 03:14:33.0000000 +00:00
d180f8b0-eb8c-11e9-a00d-8136a2b770d2 b478c085fa129a3d 2019-10-10 18:36:05.0000000 +00:00
d180f8b0-eb8c-11e9-a00d-8136a2b770d2 b780c1f5afaf51f5 2019-10-10 18:36:05.0000000 +00:00
d180f8b0-eb8c-11e9-a00d-8136a2b770d2 b780c1f5afaf51f5 2019-10-10 18:36:05.0000000 +00:00
5a2cec00-eae3-11e9-ae5f-9375d4cfb5a6 6ba68801bd95431b 2019-10-09 22:23:00.0000000 +00:00
5a2cec00-eae3-11e9-ae5f-9375d4cfb5a6 6ba68801bd95431b 2019-10-09 22:23:00.0000000 +00:00
5a2cec00-eae3-11e9-ae5f-9375d4cfb5a6 6ba68801bd95431b 2019-10-09 22:23:00.0000000 +00:00
3f4b3ef0-e485-11e9-a8d9-93e35cf7a47d d95b00fa73ba5397 2019-10-01 19:54:15.0000000 +00:00
3f4b3ef0-e485-11e9-a8d9-93e35cf7a47d d95b00fa73ba5397 2019-10-01 19:54:15.0000000 +00:00
3f4b3ef0-e485-11e9-a8d9-93e35cf7a47d d95b00fa73ba5397 2019-10-01 19:54:15.0000000 +00:00
c2003370-da51-11e9-a5bd-b14162c6c18f 88108a74d1b62728 2019-09-18 20:20:29.0000000 +00:00
c2003370-da51-11e9-a5bd-b14162c6c18f 88108a74d1b62728 2019-09-18 20:20:29.0000000 +00:00
c2003370-da51-11e9-a5bd-b14162c6c18f bf58c07c7baf3b91 2019-09-18 20:20:29.0000000 +00:00
ccb96740-d35e-11e9-8f56-b589858ca938 21562e55368a0f6c 2019-09-10 00:06:12.0000000 +00:00
ccb96740-d35e-11e9-8f56-b589858ca938 21562e55368a0f6c 2019-09-10 00:06:12.0000000 +00:00
ccb96740-d35e-11e9-8f56-b589858ca938 21562e55368a0f6c 2019-09-10 00:06:12.0000000 +00:00
a6356880-dcff-11e9-998c-c9c82aad3773 a93e033c43ca5b96 2019-09-22 06:10:17.0000000 +00:00
a6356880-dcff-11e9-998c-c9c82aad3773 a93e033c43ca5b96 2019-09-22 06:10:17.0000000 +00:00
a6356880-dcff-11e9-998c-c9c82aad3773 a93e033c43ca5b96 2019-09-22 06:10:17.0000000 +00:00
c08960d0-c1b9-11e9-8d8f-db20ba6cee0f 1e8a896f31966fcf 2019-08-18 13:11:55.0000000 +00:00
c08960d0-c1b9-11e9-8d8f-db20ba6cee0f 1e8a896f31966fcf 2019-08-18 13:11:55.0000000 +00:00
c08960d0-c1b9-11e9-8d8f-db20ba6cee0f 1e8a896f31966fcf 2019-08-18 13:11:55.0000000 +00:00
8617dd80-e6b9-11e9-b53e-e122453c938e 31f51293fcdcfb64 2019-10-04 15:13:30.0000000 +00:00
8617dd80-e6b9-11e9-b53e-e122453c938e 31f51293fcdcfb64 2019-10-04 15:13:30.0000000 +00:00
8617dd80-e6b9-11e9-b53e-e122453c938e 31f51293fcdcfb64 2019-10-04 15:13:30.0000000 +00:00
39e97480-ebc0-11e9-9619-ef0a2d518b9a b29e47e4859ee389 2019-10-11 00:44:04.0000000 +00:00
39e97480-ebc0-11e9-9619-ef0a2d518b9a b29e47e4859ee389 2019-10-11 00:44:04.0000000 +00:00
39e97480-ebc0-11e9-9619-ef0a2d518b9a b29e47e4859ee389 2019-10-11 00:44:04.0000000 +00:00
b5aad7c0-aa0d-11e9-8c14-fd0b7fe3a1df 8d301975d6c4f807 2019-07-19 10:12:27.0000000 +00:00
b5aad7c0-aa0d-11e9-8c14-fd0b7fe3a1df 8d301975d6c4f807 2019-07-19 10:12:27.0000000 +00:00
b5aad7c0-aa0d-11e9-8c14-fd0b7fe3a1df 8d301975d6c4f807 2019-07-19 10:12:27.0000000 +00:00

I think the problem is the granularity of the date (notice how those records were generated at the same hour, minute and second (but I really doubt about milliseconds)).

I copied the Guid Comb generator from the C# nHibernate ORM (available here: https://github.com/nhibernate/nhibernate-core/blob/master/src/NHibernate/Id/GuidCombGenerator.cs) with some minor adjustments as not generating the same Id in the exactly same millisecond (while generating in a loop) and taking the device id into account (didn't made any tests to check if it really works with 0 clashes, though):

Click here to expand
  final _onlyHexPattern = RegExp("[^0-9a-f]", caseSensitive: false);
  List<int> _deviceIdGuidArray;
  final _baseDate = DateTime.utc(1900, 1, 1);
  int _rand = 0;

  Future<String> generateNewId() async {
    if (_deviceIdGuidArray == null) {
      _deviceIdGuidArray = List<int>(16);

      var deviceId = (await Device.instance).deviceId;

      deviceId = deviceId.replaceAll(_onlyHexPattern, "");
      deviceId = deviceId + deviceId + deviceId + deviceId;

      var idx = 0;

      for (var i = 0; i < 32; i += 2) {
        _deviceIdGuidArray[idx++] = int.parse(deviceId.substring(i, i + 2), radix: 16);
      }
    }

    final now = DateTime.now().toUtc();
    final guidArray = List<int>.from(_deviceIdGuidArray);
    final days = now.difference(_baseDate);
    final msecs = Duration(hours: now.hour, minutes: now.minute, seconds: now.second);
    final daysList = Uint32List.fromList([days.inDays]);
    final msecsList = Uint32List.fromList([msecs.inMilliseconds]);
    final daysArray = Uint8List.view(daysList.buffer).reversed.toList();
    final msecsArray = Uint8List.view(msecsList.buffer).reversed.toList();

    for (var i = 0; i < 2; i++) {
      guidArray[guidArray.length - 7 + i] = daysArray[daysArray.length - 2 + i];
    }

    for (var i = 0; i < 4; i++) {
      guidArray[guidArray.length - 5 + i] = msecsArray[msecsArray.length - 4 + i];
    }

    guidArray[guidArray.length - 1] = _rand % 256;
    guidArray[guidArray.length - 8] = (_rand >> 8) % 256;
    _rand = (_rand + 1) % 65536;

    final a = guidArray.take(4);
    final b = guidArray.skip(4).take(2);
    final c = guidArray.skip(6).take(2);
    final d = guidArray.skip(8).take(2);
    final e = guidArray.skip(10).take(6);
    final f = List<String>();

    for (final value in a) {
      f.add(value.toRadixString(16).padLeft(2, "0"));
    }

    f.add("-");

    for (final value in b) {
      f.add(value.toRadixString(16).padLeft(2, "0"));
    }

    f.add("-");

    for (final value in c) {
      f.add(value.toRadixString(16).padLeft(2, "0"));
    }

    f.add("-");

    for (final value in d) {
      f.add(value.toRadixString(16).padLeft(2, "0"));
    }

    f.add("-");

    for (final value in e) {
      f.add(value.toRadixString(16).padLeft(2, "0"));
    }

    return f.join("");
  }
@daegalus
Copy link
Owner

daegalus commented Apr 5, 2020

Hmm, that's serious. I will investigate. based on your findings, it seems its not taking into account milliseconds or 100 nanosecond increments for things happening in the same millisecond. Theoretically this should support 10 million unique hashes a second with no collision.

Thanks for bringing it to my attention. I will definitely work on fixing this today if I can, or at least figure out how this is happening.

I think this is a common issues for v1 its why v4 is commonly the preferred UUID version. Also C#/Microsoft GUIDs are actually a little different. I forget which came first, but the RFC4122 UUID and the MS GUIDs aren't directly compatible even if very similar.

@daegalus
Copy link
Owner

daegalus commented Apr 5, 2020

So, I can't seem to reproduce it. I created the following test:

test('Generate lots of codes to see if we get v1 collisions.', () {
      var uuids = Set();
      var collisions = 0;
      for (var i = 0; i < 10000000; i++) {
        var code = uuid.v1();
        if (uuids.contains(code)) {
          collisions++;
          print("Collision of code: $code");
        } else {
          uuids.add(code);
        }
      }

      expect(collisions, equals(0));
      expect(uuids.length, equals(10000000));
    });

Generated 10,000,000 codes and no collisions. Took only a few seconds for it to run through fully and passed.

I then noticed you use Uuid().v1(), creating a new uuid object everytime instead of re-using the same object. So I modified the code, and ran it again. Took about 5 mins for 10 million codes, and 30 seconds for 1 million codes. But still no collisions.

Can I get more info on where this is running? Flutter, Browser, Server-side? Might be something mobile or browser related if so. I will create more tests to figure out what might be happening there if so.

@JCKodel
Copy link
Author

JCKodel commented Apr 5, 2020

Nothing special. An app with Flutter 1.10.17 for Android and 1.9.xxx for iOS.

My pubspec.yaml:

uuid: 2.0.2

I always use Uuid().v1() when creating new objects in the database. These records lasts for at least one month (they are not created every day).

I just noticed that because my actual database (Azure Tables) requires a pair primary key (so I'm using this Id and the UserId). No problems (even so, it will replace the data on collision without excceptions). When I tried to copy the database to a MSSQL server, I've noticed the conflicting Ids for different users (forcing me to use a composite key).

I can't tell why =\ It just happened. I have 699 collisions.

Look at this case:

id userId updated deviceId model
80efe870-de35-11e9-8d55-214e2769a4f0 Bwi5XvXl7zQNFhxR1Tbdi3ztrwa2 2019-09-23 19:08:18.0000000 +00:00 514503fd4f5d967a Samsung SM-6532M (grandpplteub)
80efe870-de35-11e9-8d55-214e2769a4f0 UwDLSWCwn8WiuIoNQY06eeObtm63 2019-09-23 19:08:18.0000000 +00:00 5f469abde7842bb6 Samsung SM-J260M (j2corelteub)

They are two different persons in two different devices.

Let's suppose those 2 users in 2 different devices did something at exactly the same millisecond. It should not generate the same id, because they are in different machines. UUID v1 requires the mac address to prevent that. You can use the deviceId for this (it is a string in Android and a Guid in iOS - it represents that app on that device), otherwise, you will get conflicts when there is a lot of concurrent users (I have 2000 concurrent users throught out the day, and about 32000 in some specific times (mostly 20:00, 20:30)).

@JCKodel
Copy link
Author

JCKodel commented Apr 5, 2020

BTW, this is what I'm using in my new app's version. It takes in consideration the device id and the current authenticated user (the chances of conflict are quite reduced here). If you take out the userId part and do a better job turning the android deviceId in a hexa number, I think it will be random enough to replace the macaddress required in v1):

static final _onlyHexPattern = RegExp("[^0-9a-f]", caseSensitive: false);
static final _baseDate = DateTime.utc(2020, 1, 1);
static int _seq = 0;
static String _prefix;

static Future<String> generateNewId() async {
  if (_prefix == null) {
    final deviceId = (await Device.instance).deviceId.replaceAll(_onlyHexPattern, "").toLowerCase().padLeft(10, "0");
    final currentUser = await FirebaseAuth.instance.currentUser();
    final userId = currentUser == null ? "" : currentUser.uid.replaceAll(_onlyHexPattern, "").toLowerCase().padLeft(8, "0").substring(0, 8);

    _prefix = userId + "-" + deviceId.substring(0, 4) + "-" + deviceId.substring(4, 8) + "-" + deviceId.substring(8, 10);
  }

  final now = DateTime.now().toUtc();
  final timestamp = now.difference(_baseDate).inMilliseconds.toRadixString(16).padLeft(12, "0");
  final seq = _seq.toRadixString(16).padLeft(2, "0");

  _seq = (_seq + 1) % 256;

  return _prefix + seq + "-" + timestamp.substring(0, 12);
}

@daegalus
Copy link
Owner

daegalus commented Apr 5, 2020

Its a shame the timestamp doesnt go lower than seconds. I can't recreate the codes without the milliseconds and 100 nanosecond increments.

But you are correct that even the way you are doing it, should not cause collisions, especially at so low numbers. I will setup a Flutter project and do some testing.

As for MAC address. SInce I need to support so many platforms, I used the alternative method of generating an equivalent number where I just generate a 48 bit seed using random number generator.

_seedBytes = (options['v1rng'] != null)
        ? Function.apply(options['v1rng'], v1PositionalArgs, v1NamedArgs)
        : UuidUtil.mathRNG();

// Per 4.5, create a 48-bit node id (47 random bits + multicast bit = 1)
    _nodeId = [_seedBytes[0] | 0x01, _seedBytes[1], _seedBytes[2], _seedBytes[3], _seedBytes[4], _seedBytes[5]];

    // Per 4.2.2, randomize (14 bit) clockseq
    _clockSeq = (_seedBytes[6] << 8 | _seedBytes[7]) & 0x3ffff;

I never actually use the Mac Address. I just use random numbers. I do it in the constructor, so the seed is consistent if you dont recreate it. But since you recreate the object, it actually is MORE random. So I am not even sure how that is happening. So there is less reason for there to be a collision.

Sorry that this is happening. I will do my best to figure this out. I will keep digging to see if I can figure out why this is happening.

@daegalus
Copy link
Owner

daegalus commented May 17, 2020

Ok, I have been trying on and off for a few weeks, on web, flutter/android, local. I cant hit any collisions.

If you want me to keep looking into it, can you create minimal project that exhibits this problem and post it somewhere or send it over to me. Maybe I am just setting something up differently.

@JCKodel
Copy link
Author

JCKodel commented May 17, 2020

Ok, I have been trying on and off for a few weeks, on web, flutter/android, local. I cant hit any collisions.

If you want me to keep looking into it, can you create minimal project that exhibits this problem and post it somewhere or send it over to me. Maybe I am just setting something up differently.

Is not about a project, is about the amout of users. I have more than 2 million downloads, no less than 3000 users using the app at the same time. Eventually, some records on my database were duplicated =\

I'm not using this lib anymore because of those collisions (I'm using a custom made generator based on user's Firebase Id, current date time and a incremental to avoid local collisions). But, it is in development. Can't say I won't incur in the same collisions once is release for millions of users =\

@daegalus
Copy link
Owner

Ok, I am sorry that I haven't been able to help. I'll close this ticket for now, but if you ever want to try again or something, feel free to reopen or start a new issue. I'll keep this in mind going forward and see if I ever stumble onto something that will exhibit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants