Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Guid.NewNonCryptographicGuid #65736

Closed
deeprobin opened this issue Feb 22, 2022 · 14 comments
Closed

[API Proposal]: Guid.NewNonCryptographicGuid #65736

deeprobin opened this issue Feb 22, 2022 · 14 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime

Comments

@deeprobin
Copy link
Contributor

deeprobin commented Feb 22, 2022

Background and motivation

Our Guid implementation aims to be safe. But there are certainly use cases where this is mandatory.

For example, I don't need a cryptographically secure result for generating a random Guid for COM interfaces.
And if you want to generate several random guids at once, such an implementation is more performant.

The comment in the Unix implementation describes this quite well.

// Guid.NewGuid is often used as a cheap source of random data that are sometimes used for security purposes.

API Proposal

namespace System
{
    public partial struct Guid
    {
        public static Guid NewNonCryptographicGuid();
    }
}

API Usage

var guid = Guid.NewNonCryptographicGuid();

Alternative Designs

namespace System
{
    public partial class Random
    {
        public virtual Guid Next();

        // optional (to match the other overloads):
        public virtual Guid Next(Guid maxValue);
        public virtual Guid Next(Guid minValue, Guid maxValue);
    }
}

Risks

I don't currently see any, because the method name describes this sufficiently enough that this is not a cryptographic random function.

If necessary, we could still mention in the documentation comment that we should not use this method in a security-critical context.

@deeprobin deeprobin added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 22, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 22, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@EgorBo
Copy link
Member

EgorBo commented Feb 22, 2022

#22721

@ghost
Copy link

ghost commented Feb 22, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Our Guid implementation aims to be safe. But there are certainly use cases where this is mandatory.

For example, I don't need a cryptographically secure result for generating a random Guid for COM interfaces.
And if you want to generate several random guids at once, such an implementation is more performant.

The comment in the Unix implementation describes this quite well.

// Guid.NewGuid is often used as a cheap source of random data that are sometimes used for security purposes.

API Proposal

namespace System
{
    public partial struct Guid
    {
        public static Guid NewNonCryptographicGuid();
    }
}

API Usage

var guid = Guid.NewNonCryptographicGuid();

Alternative Designs

namespace System
{
    public partial class Random
    {
        public virtual Guid Next();

        // optional (to match the other overloads):
        public virtual Guid Next(Guid maxValue);
        public virtual Guid Next(Guid minValue, Guid maxValue);
    }
}

Risks

I don't currently see any, because the method name describes this sufficiently enough that this is not a cryptographic random function.

If necessary, we could still mention in the documentation comment that we should not use this method in a security-critical context.

Author: deeprobin
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Feb 22, 2022

Fast non-cryptographic guid:
image
😄

@Clockwork-Muse
Copy link
Contributor

... I mean, Guid has constructors that take arrays of bytes, so you can just wrap calls to System.Random if you want something random but not secure.

@svick
Copy link
Contributor

svick commented Feb 23, 2022

@Clockwork-Muse Be aware that directly passing in an array of random bytes will not produce a valid version 4 GUID. You'll also have to set a few bits to the right values.

@svick
Copy link
Contributor

svick commented Feb 23, 2022

@deeprobin Can you explain why performance is important here? How often are you generating GUIDs such that the current implementation is not fast enough for you?

@deeprobin
Copy link
Contributor Author

deeprobin commented Feb 23, 2022

@deeprobin Can you explain why performance is important here? How often are you generating GUIDs such that the current implementation is not fast enough for you?

Personally, I usually prefer ascending numeral ids in databases, but there are people who use random GUIDs (in Java, a lot of people also use the similar UUID for database stuff).

For the ID in database in the rarest cases a cryptograpfically secure generated ID is really useful.
Often it is really only about the uniqueness (which is then validated by the database again).
And, if you imagine that in many database tables in larger companies several million records are inserted in one minute, implementing this, can significantly improve the performance.

For reference: Usages of Guid.NewGuid in GitHub Search https://github.com/search?l=C%23&q=%22Guid.NewGuid%28%29%22&type=Code - I think most don't need such a secure implementation.
For reference: Usages of UUID.randomUUID in java - https://github.com/search?l=Java&q=%22UUID.randomUUID%28%29%22&type=Code

@stephentoub
Copy link
Member

stephentoub commented Feb 23, 2022

Personally, I usually prefer ascending numeral ids in databases, but there are people who use random GUIDs (in Java, a lot of people also use the similar UUID for database stuff).

You're inserting into a database with a new guid, and the performance difference in the cryptographic vs non-cryptographic randomness employed in creating that guid shows up as a measurable cost? Can you share your benchmark? On my Windows machine, the difference between the crypto and non-crypto randomness when creating the guid is ~50ns. If that 50ns matters when doing a database insert, I'll be impressed. 😄

@Clockwork-Muse
Copy link
Contributor

... also, it's completely possible that your database can generate the guid for you (SQL Server can do so, at a minimum). Many also have some form of return-modified-rows, so you don't need any extra query to get inserted information.

@Tornhoof
Copy link
Contributor

Personally, I usually prefer ascending numeral ids in databases,

As for guids in databases, you need to be careful with index fragmentation, prefer the solutions by the db in question as @Clockwork-Muse suggests, than using GUID.
There is a lot of information about this problem, e.g. https://www.mssqltips.com/sqlservertip/6595/sql-server-guid-column-and-index-fragmentation/

@deeprobin
Copy link
Contributor Author

deeprobin commented Feb 23, 2022

Personally, I usually prefer ascending numeral ids in databases, but there are people who use random GUIDs (in Java, a lot of people also use the similar UUID for database stuff).

You're inserting into a database with a new guid, and the performance difference in the cryptographic vs non-cryptographic randomness employed in creating that guid shows up as a measurable cost? Can you share your benchmark? On my Windows machine, the difference between the crypto and non-crypto randomness when creating the guid is ~50ns. If that 50ns matters when doing a database insert, I'll be impressed. 😄

I benchmarked this 😄.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Engines;

namespace ConsoleApp4;

[SimpleJob(RunStrategy.Throughput)]
public class GuidBenchmarks
{
    [Benchmark(Baseline = true)]
    public Guid NewGuid()
    {
        return Guid.NewGuid();
    }

    [Benchmark]
    public Guid NewNonCryptographicGuid()
    {
        return GuidExtensions.NewNonCryptographicGuid();
    }
}

public static class GuidExtensions
{
    // This will create a new random guid based on the https://www.ietf.org/rfc/rfc4122.txt
    [SkipLocalsInit]
    public static unsafe Guid NewNonCryptographicGuid()
    {
        Span<byte> guidBytes = stackalloc byte[sizeof(Guid)];
        Random.Shared.NextBytes(guidBytes);

        const ushort versionMask = 0xF000;
        const ushort randomGuidVersion = 0x4000;

        const byte clockSeqHiAndReservedMask = 0xC0;
        const byte clockSeqHiAndReservedValue = 0x80;
        
        // Modify bits indicating the type of the GUID
        unchecked
        {
            // time_hi_and_version
            ref var c = ref Unsafe.As<byte, short>(ref guidBytes[4]);
            c = (short)((c & ~versionMask) | randomGuidVersion);

            // clock_seq_hi_and_reserved
            var d = guidBytes[6];
            guidBytes[6] = (byte)((d & ~clockSeqHiAndReservedMask) | clockSeqHiAndReservedValue);
        }

        return new Guid(guidBytes);
    }
}

Results on Windows:

Method Mean Error StdDev Ratio
NewGuid 52.01 ns 0.819 ns 0.726 ns 1.00
NewNonCryptographicGuid 14.45 ns 0.315 ns 0.337 ns 0.28

A ratio of 0.28 therefore makes quite a difference. I could imagine that this will improve even further with further PGO actions.

Results on Ubuntu (libc) in WSL:

Method Mean Error StdDev Ratio
NewGuid 1,441.99 ns 28.140 ns 27.637 ns 1.00
NewNonCryptographicGuid 15.41 ns 0.287 ns 0.268 ns 0.01

It should be noted, of course, that there is already an issue for the performance problems with the Unix implementation (#13628). As I understand it, this is a performance problem with /dev/urandom - I don't know if there is a fix for this. If not, using a normal random as a non-safe alternative would make sense.

/cc @adamsitnik

@stephentoub
Copy link
Member

A ratio of 0.28 therefore makes quite a difference

Does it? What's the end-to-end macro scenario here, and how does the ~39ns difference between NewGuid and NewNonCryptographicGuid in your example come into play? That's what folks are asking for on this thread: what is the all-up scenario where this is needed, and what is the performance impact on that scenario? The current API is "safe", and you're effectively proposing one that trades off that safety for performance, so there needs to be a good reason for doing it, and it needs to be common enough that the alternative (which you already implemented in a few lines of code in your NewNonCryptographicGuid example, and which anyone who saw this as a real bottleneck could copy/paste) isn't sufficient for the presumably vast minority of folks that would benefit from it.

@deeprobin
Copy link
Contributor Author

A ratio of 0.28 therefore makes quite a difference

Does it? What's the end-to-end macro scenario here, and how does the ~39ns difference between NewGuid and NewNonCryptographicGuid in your example come into play? That's what folks are asking for on this thread: what is the all-up scenario where this is needed, and what is the performance impact on that scenario? The current API is "safe", and you're effectively proposing one that trades off that safety for performance, so there needs to be a good reason for doing it, and it needs to be common enough that the alternative (which you already implemented in a few lines of code in your NewNonCryptographicGuid example, and which anyone who saw this as a real bottleneck could copy/paste) isn't sufficient for the presumably vast minority of folks that would benefit from it.

Stephen, I honestly don't know of any real end-to-end scenario except, for example, in databases, ....
Guid.NewGuid I think is a method often called in the aggregate, but in the rarest of cases also in a very hot path.
There are examples in the GitHub search that use Guid.NewGuid in larger loops, but no one has complained about the performance.

I did some research and found that even Java 1 and Node 2 use at least a SafeRandom.


However, I dare say that in the long run it makes sense to use a C# implementation instead of a P/Invoke on Ole32. Thus, in the long run, one could also merge the code of Unix and Windows.
I just can't tell you to what extent we currently get the same (or better) performance with a .NET implementation.
But I think at the latest when we have enough PGO optimizations in the JIT, a .NET implementation would be the smartest choice.

Is there already an issue for this?


I'm closing the issue, if someone still has good arguments for this api feel free to comment them as long as the .NET bot doesn't lock the conversation.

Footnotes

  1. JDK Source of UUID.randomUUID()

  2. Node Source of crypto.randomUUID()

@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants