Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: RandomDataGenerator #73864

Closed
vcsjones opened this issue Aug 12, 2022 · 6 comments · Fixed by #78598
Closed

[API Proposal]: RandomDataGenerator #73864

vcsjones opened this issue Aug 12, 2022 · 6 comments · Fixed by #78598
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Security blog-candidate Completed PRs that are candidate topics for blog post coverage in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@vcsjones
Copy link
Member

vcsjones commented Aug 12, 2022

Background and motivation

We have primitives of random number generation on RandomNumberGenerator. I would propose expanding some of the ways we have help developers create sufficiently random data in other forms, including strings.

While RandomNumberGenerator has gotten more ergonomic to use to efficiently create random data, developers may choose to user other APIs that appear to create random data because of the simpler API surface, like Guid.NewGuid(). Personally I discourage the use of Guid.NewGuid() for cryptographic randomness purposes. However, the alternative goes from a simple expression to "do it yourself".

API Proposal

I propose introducing a new static class, RandomDataGenerator. It provides static methods to create random sequences of data, either by returning an array or filling a buffer.

namespace System.Security.Cryptography {
    public static partial class RandomDataGenerator {
        // Proposed:
        public static void Generate<T>(Span<T> destination, ReadOnlySpan<T> choices);
        public static T[] Generate<T>(int length, ReadOnlySpan<T> choices);
        public static string Generate(int stringLength, ReadOnlySpan<char> choices);

        // Optional helpers:
        // Accelerator where 'choices' is 0-9A-F.
        public static void GenerateHexadecimal(Span<char> destination);
        public static string GenerateHexadecimal(int stringLength);
    }
}

Out of scope:

  • The intention of this API is random selection, not shuffling like a Fisher-Yates shuffle, or generate strings that meet certain requirements (must have 1 upper and 1 lower, for example).
  • No abstractions or instance methods like RandomNumberGenerator.

API Usage

string random128Bit = RandomDataGenerator.CreateHexadecimal(32);

bool[] coinFlips = RandomDataGenerator.Create(16, new bool[] { true, false }); 

string recoveryCode = RandomDataGenerator.Create(16, "0123456789!@#^&*ABCyougettheideaXYZ");

Alternative Designs

I do have some thoughts about the proposal:

  1. What if T is a mutable struct or reference type? The choice could be modified after the random data has been generated, thus going through the generated data. Is that simply "caller beware"?

  2. Many people think of randomness in terms of "bits". Consider's ruby's SecureRandom:

    require 'securerandom'
    #👋 from day job
    r = SecureRandom.hex(16)

    r's length is going to be 32 because the parameter is not the length of the return value, it's the number of bytes.

    The API above proposed returns the length of the sequence because it's generic: it's intended to work on any character set or generic items.

    That leaves me thinking there are two different, but related needs here. Random bytes in different formats (hex, base64) and creating random things.

Risks

No response

@vcsjones vcsjones added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Security labels Aug 12, 2022
@ghost
Copy link

ghost commented Aug 12, 2022

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

We have primitives of random number generation on RandomNumberGenerator. I would propose expanding some of the ways we have help developers create sufficiently random data in other forms, including strings.

While RandomNumberGenerator has gotten more ergonomic to use to efficiently create random data, developers may choose to user other APIs that appear to create random data because of the simpler API surface, like Guid.NewGuid(). Personally I discourage the use of Guid.NewGuid() for cryptographic randomness purposes. However, the alternative goes from a simple expression to "do it yourself".

API Proposal

I propose introducing a new static class, RandomDataGenerator. It provides static methods to create random sequences of data, either by returning an array or filling a buffer.

namespace System.Security.Cryptography {
    public static partial class RandomDataGenerator {
        // Proposed:
        public static void Fill<T>(Span<T> destination, ReadOnlySpan<T> choices);
        public static T[] Create<T>(int length, ReadOnlySpan<T> choices);
        public static string Create(int length, ReadOnlySpan<char> choices);

        // Optional helpers:
        // Accelerator where 'choices' is 0-9A-F.
        public static void FillHexadecimal(Span<char> destination);
        public static string CreateHexadecimal(int length);
        // Accelerator where 'choices' is 0-9A-z.
        public static void FillAlphaNumeric(Span<char> destination);
        public static string CreateAlphaNumeric(int length);
    }
}

Out of scope:

  • The intention of this API is not to do "not random" things, like a Fisher-Yates shuffle, or generate strings that meet certain requirements (must have 1 upper and 1 lower, for example).
  • No abstractions or instance methods like RandomNumberGenerator.

API Usage

string random128Bit = RandomDataGenerator.CreateHexadecimal(32);

bool[] coinFlips = RandomDataGenerator.Create(16, new bool[] { true, false }); 

string recoveryCode = RandomDataGenerator.Create(16, "0123456789!@#^&*ABCyougettheideaXYZ");

Alternative Designs

I do have some thoughts about the proposal:

  1. What if T is a mutable struct or reference type? The choice could be modified after the random data has been generated, thus going through the generated data. Is that simply "caller beware"?

  2. Many people think of randomness in terms of "bits". Consider's ruby's SecureRandom:

    require 'securerandom'
    r = SecureRandom.hex(16)

    r's length is going to be 32 because the parameter is not the length of the return value, it's the number of bytes.

    The API above proposed returns the length of the sequence because it's generic: it's intended to work on any character set or generic items.

    That leaves me thinking there are two different, but related needs here. Random bytes in different formats (hex, base64) and creating random things.

Risks

No response

Author: vcsjones
Assignees: -
Labels:

api-suggestion, area-System.Security

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 12, 2022
@bartonjs bartonjs added this to the Future milestone Aug 12, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 12, 2022
@bartonjs
Copy link
Member

bartonjs commented Aug 12, 2022

public static T[] Create<T>(int length, ReadOnlySpan<T> choices);

A couple things about the name.

  • Since one of the parameters is choices it feels like the verb might be Choose, though it's doing it multiple times, so now it feels like ChooseMany, and that's maybe getting weird so maybe that's not the right rabbit hole.
  • Create is usually the verb we use on "call this instead of a ctor", but in this case we're not returning our containing type. Generate, maybe, since we're a generator?
public static string CreateHexadecimal(int length);
string random128Bit = RandomDataGenerator.CreateHexadecimal(32);

Assuming your math is intentional, CreateHexadecimal is functionally equivalent to Create(32, "0123456789ABCDEF"), which makes an amount of sense. It especially makes sense when compared to FillHexadecimal, since that could have an odd length of input. At the same time, there are people (like you and me) who will wonder if length was in bytes (like Ruby's n parameter) or in hex-chars (like you are suggesting). Maybe all that's needed here is changing int length to int stringLength (and maybe Create => Generate for different reasons).

(I'm not sure that implementing CreateHexadecimal in that way is the best answer, but it could be perf-compared to the cost of hexifying the raw RNG output to see which was better... if we cared to)

Random bytes in different formats (hex, base64)

Base64 seems hard, since you'd have to talk about the amount of bytes to generate and then write (len + 2) / 3 * 4 chars of output, as 1, 2, and 3 bytes all write 4 chars. If we want to do that then we'd need the hex to behave like Ruby's, and then decide what Fill does about odd lengths.

"not random" things, like a Fisher-Yates shuffle

"Shuffle", unless you're an SIMD instruction, is generally random. F-Y selection like

private static void Shuffle<T>(Span<T> source)
{
    T temp;

    for (int i = source.Length - 1; i > 0; i--)
    {
        int to = RandomNumberGenerator.GetInt32(i + 1);

        if (to != i)
        {
           temp = source[to];
           source[to] = source[i];
           source[i] = temp;
        }
    }
}

is random, and CSPRNG random. (It's equivalent to calling the proposed Create with integer indices from 0 .. Length - 1 and using those to reorder things).

But, I don't really feel it's called for. It's way less common than things like "I need a random 12 character password from this alphabet" (string temp = RandomDataGenerator.Create(12, s_alphabet);)

@vcsjones
Copy link
Member Author

Generate, maybe, since we're a generator?

Seems reasonable.

Assuming your math is intentional

Sometimes I do things intentionally. But yes, in this case I was going for characters, not bytes, of input.

Maybe all that's needed here is changing int length to int stringLength (and maybe Create => Generate for different reasons).

That also seems reasonable.

I'm not sure that implementing CreateHexadecimal in that way is the best answer, but it could be perf-compared to the cost of hexifying the raw RNG output to see which was better... if we cared to)

That is one reason I had for the hex helper. It's something we could have a better implementation for instead of doing it the naive way. For alpha numeric that requires 6 bits per-character, so if we really wanted to we could build a better encoder for that. It's a shame that alpha numeric is 62 instead of 63. If only we had half a letter more in the alphabet.

Expand Me
// Entirely untested and thrown together. But illustrates the idea.

static void GenerateHexadecimal(Span<char> input)
{
    Span<char> remaining = input;
    Span<byte> randomBuffer = stackalloc byte[128];
    int randomOffset = randomBuffer.Length;

    while (remaining.Length >= 2)
    {
        if (randomOffset >= randomBuffer.Length)
        {
            // The offset should never overreach the buffer.
            Debug.Assert(randomOffset == randomBuffer.Length);
            RandomNumberGenerator.Fill(randomBuffer);
            randomOffset = 0;
        }

        // Encode whichever is less, how much random data we have in the buffer, or how much is remaining to generate.
        int encode = Math.Min(remaining.Length / 2, randomBuffer.Length - randomOffset);
        HexConverter.EncodeToUtf16(randomBuffer.Slice(randomOffset, encode), remaining, HexConverter.Casing.Upper);
        remaining = remaining.Slice(encode * 2);
        randomOffset += encode;
    }

    if (!remaining.IsEmpty)
    {
        Debug.Assert(remaining.Length == 1);

        if (randomOffset >= randomBuffer.Length)
        {
            // Only need one byte.
            RandomNumberGenerator.Fill(randomBuffer.Slice(0, 1));
            randomOffset = 0;
        }

        remaining[0] = HexConverter.ToCharUpper(randomBuffer[randomOffset]);
    }
}

is generally random.

"Not random" wasn't a good choice of words for me but I think you and I agree on it, at least I don't have a use case for it.

I updated the proposal.

@bartonjs bartonjs added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Aug 16, 2022
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Aug 23, 2022

What if T is a mutable struct or reference type? The choice could be modified after the random data has been generated, thus going through the generated data. Is that simply "caller beware"?

Valid concern, but I think this is ok. It's not really any different than myReadOnlySpanOfReferenceType.CopyTo(writableSpan);, where the reference is copied but both spans still point to the same objects in memory. People don't really seem to get tripped up by that in practice. And the common use case is going to be chars / ints / enums / other primitives anyway, which don't have this issue.

The only real feedback I have on the API is GenerateAlphaNumeric, which I think should be dropped. The API suffers from two big problems: what's the exact meaning of alphanumeric (why is "0-9A-Za-z" correct?); and are you absolutely sure the caller intended for the result to be case-sensitive? I've seen my fair share of issues due to callers generating what they believed are unique identifiers, and then sticking them in locations like DB primary key entries which are case-insensitive. (The end result is that the distribution ends up weighting letters too heavily compared to digits.) In these cases I think it's best to let the caller be explicit about exactly what they intended.

FWIW, the base Generate API would be useful for scenarios like dotnet/aspnetcore#40723, where we're generating recovery keys which resemble Microsoft 5x5 product codes.

@vcsjones
Copy link
Member Author

The only real feedback I have on the API is GenerateAlphaNumeric, which I think should be dropped.

Okay. That's.. reasonable. Dropped.

@bartonjs bartonjs modified the milestones: Future, 8.0.0 Nov 10, 2022
@bartonjs
Copy link
Member

bartonjs commented Nov 15, 2022

Video

  • We renamed some members while I was being bad at notes.
  • The group wanted to collapse it into the existing RandomNumberGenerator type
  • And let's do this for System.Random, too.
    • We added array overloads for System.Random
    • We felt none of the things on System.Random needed to be virtual
namespace System.Security.Cryptography {
    public partial class RandomNumberGenerator {
        // Proposed:
        public static void GetItems<T>(ReadOnlySpan<T> choices, Span<T> destination);
        public static T[] GetItems<T>(ReadOnlySpan<T> choices, int length);
        public static string GetString(ReadOnlySpan<char> choices, int length);

        // Optional helpers:
        // Accelerator where 'choices' is 0-9A-F.
        public static void GetHexString(Span<char> destination, bool lowercase=false);
        public static string GetHexString(int stringLength, bool lowercase=false);

        public static void Shuffle<T>(Span<T> values);
    }
}

namespace System
{
    partial class Random
    {
        public void GetItems<T>(ReadOnlySpan<T> choices, Span<T> destination);
        public T[] GetItems<T>(T[] choices, int length);
        public T[] GetItems<T>(ReadOnlySpan<T> choices, int length);

        public void Shuffle<T>(Span<T> values);
        public void Shuffle<T>(T[] values);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 15, 2022
@vcsjones vcsjones self-assigned this Nov 15, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 19, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 12, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 12, 2023
@jeffhandley jeffhandley added the blog-candidate Completed PRs that are candidate topics for blog post coverage label Mar 14, 2023
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jul 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Security blog-candidate Completed PRs that are candidate topics for blog post coverage in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants