Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that generated GUIDs conform to the RFC 4122 standard. #969

Merged
merged 4 commits into from
Dec 9, 2019
Merged

Ensure that generated GUIDs conform to the RFC 4122 standard. #969

merged 4 commits into from
Dec 9, 2019

Conversation

sgielen
Copy link
Contributor

@sgielen sgielen commented Nov 26, 2019

RFC 4122 describes the format of a UUID (or as Microsoft calls them, GUID) as
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, where M is the UUID version and some bits
of N denote the UUID variant. The GUID generator in this repository sets these
nibbles to random values, making the generated UUIDs incompatible with the RFC.

This may cause problems later on, as there is a chance future UUID
implementations may misbehave on nonconformant UUIDs or not accept them at all.
For example, UUID libraries may assume that a UUID with version 0 is a nil uuid
and compare all such UUIDs to be equal to 00000000-0000-0000-0000-000000000000
and each other.

This commit fixes this by setting the UUID version and variant properly in
generated UUIDs. The same number of bytes is still used for the ticks, while
there is now one byte less randomness in each UUID. However, together with the
ticks in 100 ns precision, the chance of a collision is still negligible. The
version used is 4, for a UUID based on randomness; we hide the fact that most
of the UUID is not truly random but based on time. The variant is 0b10xx for
RFC 4122.

This commit also adds tests for the new properties as well as the old
properties presumed by the existing implementation, such as that generated
GUIDs are sorted on time and unique, non-empty GUIDs are generated.

RFC 4122 describes the format of a UUID (or as Microsoft calls them, GUID) as
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, where M is the UUID version and some bits
of N denote the UUID variant. The GUID generator in this repository sets these
nibbles to random values, making the generated UUIDs incompatible with the RFC.

This may cause problems later on, as there is a chance future UUID
implementations may misbehave on nonconformant UUIDs or not accept them at all.
For example, UUID libraries may assume that a UUID with version 0 is a nil uuid
and compare all such UUIDs to be equal to 00000000-0000-0000-0000-000000000000
and each other.

This commit fixes this by setting the UUID version and variant properly in
generated UUIDs. The same number of bytes is still used for the ticks, while
there is now one byte less randomness in each UUID. However, together with the
ticks in 100 ns precision, the chance of a collision is still negligible.  The
version used is 4, for a UUID based on randomness; we hide the fact that most
of the UUID is not truly random but based on time. The variant is 0b10xx for
RFC 4122.

This commit also adds tests for the new properties as well as the old
properties presumed by the existing implementation, such as that generated
GUIDs are sorted on time and unique, non-empty GUIDs are generated.
@mguinness
Copy link
Collaborator

mguinness commented Nov 26, 2019

Is possible to reuse Sequential GUIDs code in SequentialGuidValueGenerator from EF Core instead?

Further detail about what the algorithm does at https://stackoverflow.com/a/47682820/310601.

Also see algorithm to generate ordered GUIDs (COMBs) at GuidCombGenerator for NHibernate.

@mguinness
Copy link
Collaborator

@caleblloyd I saw that you did an earlier Sequential GUIDs PR, do you have any comments?

@lauxjpn lauxjpn added this to the 3.1.0 milestone Nov 26, 2019
@lauxjpn lauxjpn self-assigned this Nov 26, 2019
@lauxjpn
Copy link
Collaborator

lauxjpn commented Nov 26, 2019

Thank you for your PR!

Though I don't share your concerns about future incompatibilities of the current implementation, I think it is a good idea to advertise conformance with Variant 1, Version 4 of RFC 4122 and I don't see any downsides to your approach.

@lauxjpn
Copy link
Collaborator

lauxjpn commented Nov 26, 2019

@mguinness Interestingly, SQL Server 14 does not seem to implement its NEWSEQUENTIALID() function anymore like SequentialGuidValueGenerator, but actually more like we do.

Actually, it could be that SQL Server is just saving Guids in reverse order by default, and of course we don't. This would need more testing to say for sure.

This is an example of how a sequence of generated Guids look like in MSSQL:

5addfff2-8310-ea11-953c-000c29a016f1
5bddfff2-8310-ea11-953c-000c29a016f1
5cddfff2-8310-ea11-953c-000c29a016f1

On the other hand, just executing the following line on MSSQL will keep all bytes in order:

insert into dbo.UuidTable (UuidColumn) values ('3E1B727F-5C6B-4692-BB0D-4F195B883EBB');

3E1B727F-5C6B-4692-BB0D-4F195B883EBB

And according to the SequentialGuidValueGenerator, the UTC based counter is added to the last 8 bytes:

https://github.com/aspnet/EntityFrameworkCore/blob/93d5bfbba3019fa741876cb2becf2c3ccb36a4c0/src/EFCore/ValueGeneration/SequentialGuidValueGenerator.cs#L25-L45

public override Guid Next(EntityEntry entry)
{
    var guidBytes = Guid.NewGuid().ToByteArray();
    var counterBytes = BitConverter.GetBytes(Interlocked.Increment(ref _counter));


    if (!BitConverter.IsLittleEndian)
    {
        Array.Reverse(counterBytes);
    }


    guidBytes[08] = counterBytes[1];
    guidBytes[09] = counterBytes[0];
    guidBytes[10] = counterBytes[7];
    guidBytes[11] = counterBytes[6];
    guidBytes[12] = counterBytes[5];
    guidBytes[13] = counterBytes[4];
    guidBytes[14] = counterBytes[3];
    guidBytes[15] = counterBytes[2];


    return new Guid(guidBytes);
}

@caleblloyd's optimized implementation for both, binary based and character base columns, is still preserved with this PR. It's just being fixed to conform to the standard.

@ajcvickers
Copy link

@smitpatel @bricelam @AndriySvyryd @roji @maumar Interesting discussion here on GUIDs. Most specifically, that SQL Server may have changed sequential GUID generation since we last looked at it. Should we react in EF?

@caleblloyd
Copy link
Contributor

caleblloyd commented Nov 26, 2019

It makes sense to conform to the RFC. Thanks for adding the tests. I think they need to be moved out of the Functional tests and into the Unit tests at test/EFCore.MySql.Tests. The unit test project already includes Moq.

@roji
Copy link

roji commented Nov 26, 2019

@ajcvickers yeah, we should definitely consider changing on our side...

@ajcvickers
Copy link

@roji Can you file an issue outlining what you think we should do?

@mguinness
Copy link
Collaborator

mguinness commented Nov 26, 2019

Interestingly, SQL Server 14 does not seem to implement its NEWSEQUENTIALID() function anymore like SequentialGuidValueGenerator, but actually more like we do.

At a lower level NEWSEQUENTIALID() function uses Windows API UuidCreateSequential that uses MAC address (see SQL Server on Linux: CU4 – NewSequentialId() – Uuid), which is different to NewGuid() used by CoreFX in Unix. It's not relevant to this discussion, but NEWSEQUENTIALID() is not RFC 4122 compliant.

@lauxjpn
Copy link
Collaborator

lauxjpn commented Nov 27, 2019

@mguinness Good links! They also lead to UNRAVELING THE MYSTERIES OF NEWSEQUENTIALID with a practical analysis and examples of the differences between the individual Guid generation methods, including NEWSEQUENTIALID().

According to the link, something similar to the following pseudo code is being done:

newGuid = UuidCreateSequential();
bytes = newGuid.ToByteArray()
Array.Reverse(bytes, 0, 4)
Array.Reverse(bytes, 4, 2)
Array.Reverse(bytes, 6, 2)

The proposed solution by @sgielen (based on the previous work by @caleblloyd) is similar to NEWSEQUENTIALID() in its effect to generate 128 bit values with improved sorting/indexing capabilities, has a higher level of privacy due to not making use of MAC addresses and returns a RFC conforming Guid.

- Make TreatTinyAsBoolean virtual as well.
- Move tests to EFCore.MySql.Tests.
@sgielen
Copy link
Contributor Author

sgielen commented Nov 27, 2019

Thanks for your feedback, all! I've addressed relevant proposed changes in the last commit.

@roji
Copy link

roji commented Dec 2, 2019

Issue on the EF Core repo to investigate behavior on the SQL Server side: dotnet/efcore#19124

Copy link
Collaborator

@lauxjpn lauxjpn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is great and checks out!
Only minor naming issues that need to be addressed.
We will merge right away after they are resolved.

@sgielen
Copy link
Contributor Author

sgielen commented Dec 9, 2019

Thank you @lauxjpn for your feedback, I've addressed the naming issues you mentioned.

@lauxjpn lauxjpn merged commit b1b9cb3 into PomeloFoundation:master Dec 9, 2019
@lauxjpn
Copy link
Collaborator

lauxjpn commented Dec 9, 2019

@sgielen Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants