Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add short and URL friendly string representation to System.Guid #55290

Closed
prezaei opened this issue Jul 7, 2021 · 13 comments
Closed

Add short and URL friendly string representation to System.Guid #55290

prezaei opened this issue Jul 7, 2021 · 13 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@prezaei
Copy link

prezaei commented Jul 7, 2021

Background and Motivation

The shortest form of a string representation of System.Guid is 32 characters long (format = "N"). Although this is URL friendly, it is not the most concise URL friendly representation of it. From RFC2396 Section 2.3, the URL safe characters are:

Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper- and lower-case letters, decimal digits, and a limited set of punctuation marks and symbols.

unreserved  = alphanum | mark
mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

It would be worthwhile to add support for a new format specifier, perhaps U to System.Guid that generates a shorter URL friendly string representation of the guid.

Proposed API

The following changes will be required:

API Change
System.Guid.Parse(string input) It will be able to parse the shorter string representation
System.Guid.Parse(ReadOnlySpan<char> input) It will be able to parse the shorter string representation
System.Guid.ParseExact(string input, string format) It will parse the shorter string representation when format is "U"
System.Guid.ParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format) It will parse the shorter string representation when format is "U"
System.Guid.TryParse([NotNullWhen(true)] string? input, out Guid result) It will be able to parse the shorter string representation
System.Guid.TryParse(ReadOnlySpan<char> input, out Guid result) It will be able to parse the shorter string representation
System.Guid.TryParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format, out Guid result) It will parse the shorter string representation when format is "U"
System.Guid.TryParseExact([NotNullWhen(true)] string? input, [NotNullWhen(true)] string? format, out Guid result) It will parse the shorter string representation when format is "U"
System.Guid.ToString(string? format) It will return the shorter string representation when format isU
System.Guid.TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default) It will try to format the current Guid instance into the provided character span in its shorter string form when format is "U"

Usage Examples

var str = Guid.NewGuid().ToString("U");
Console.WriteLine(str); // prints out something like: "abcdefgh123"

Alternative Designs

We could also add extension methods.

Risks

All I can think of is that TryParse(...) now requires an extra check on the length of the string to determine if it should try to parse the string as a short representation of the URL.

Notes

  • Perhaps, we also might want to consider not using all marks characters ("-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")") to keep the URLs even more readable.
  • I'll be happy to send a PR if
@prezaei prezaei added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jul 7, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jul 7, 2021
@ghost
Copy link

ghost commented Jul 7, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

The shortest form of a string representation of System.Guid is 32 characters long (format = "N"). Although this is URL friendly, it is not the most concise URL friendly representation of it. From RFC2396 Section 2.3, the URL safe characters are:

Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include upper- and lower-case letters, decimal digits, and a limited set of punctuation marks and symbols.

unreserved  = alphanum | mark
mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

It would be worthwhile to add support for a new format specifier, perhaps U to System.Guid that generates a shorter URL friendly string representation of the guid.

Proposed API

The following changes will be required:

API Change
System.Guid.Parse(string input) It will be able to parse the shorter string representation
System.Guid.Parse(ReadOnlySpan<char> input) It will be able to parse the shorter string representation
System.Guid.ParseExact(string input, string format) It will parse the shorter string representation when format is "U"
System.Guid.ParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format) It will parse the shorter string representation when format is "U"
System.Guid.TryParse([NotNullWhen(true)] string? input, out Guid result) It will be able to parse the shorter string representation
System.Guid.TryParse(ReadOnlySpan<char> input, out Guid result) It will be able to parse the shorter string representation
System.Guid.TryParseExact(ReadOnlySpan<char> input, ReadOnlySpan<char> format, out Guid result) It will parse the shorter string representation when format is "U"
System.Guid.TryParseExact([NotNullWhen(true)] string? input, [NotNullWhen(true)] string? format, out Guid result) It will parse the shorter string representation when format is "U"
System.Guid.ToString(string? format) It will return the shorter string representation when format isU
System.Guid.TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default) It will try to format the current Guid instance into the provided character span in its shorter string form when format is "U"

Usage Examples

var str = Guid.NewGuid().ToString("U");
Console.WriteLine(str); // prints out something like: "abcdefgh123"

Alternative Designs

We could also add extension methods.

Risks

All I can think of is that TryParse(...) now requires an extra check on the length of the string to determine if it should try to parse the string as a short representation of the URL.

Notes

  • Perhaps, we also might want to consider not using all marks characters ("-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")") to keep the URLs even more readable.
  • I'll be happy to send a PR if
Author: prezaei
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@GrabYourPitchforks
Copy link
Member

Your sample has the comment prints out something like: "abcdefgh123". Can you give a concrete example of the type of output you expect? For example, what would be the exact output of Guid.Parse("7a1e687f-a5a9-47e1-b5ec-fd71abf06303").ToString("U")?

@svick
Copy link
Contributor

svick commented Jul 7, 2021

I think you won't be able to do much better than base64-encoding the bytes of the GUID. If you do that, you'll get a 24 character string (22 if you remove padding). And since you can already easily do that today (e.g. Convert.ToBase64String(guid.ToByteArray())), it doesn't help that much and any such encoding would be completely non-standard, I don't see much reason to add this directly to Guid.

@Tornhoof
Copy link
Contributor

Tornhoof commented Jul 7, 2021

I think you won't be able to do much better than base64-encoding the bytes of the GUID.

As for Web, Base64Url encoding fits better, there is a nice helper method in ASP.NET Core.
https://docs.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.webutilities.webencoders.base64urlencode?view=aspnetcore-5.0

@prezaei
Copy link
Author

prezaei commented Jul 7, 2021

You got it. Effectively, we want to base64url encode the Guid. Doing this outside of System.Guid forces a heap allocation if we use Guid.ToByteArray(). The only way around the heap allocation that I can think of is something like this:

var guid = Guid.NewGuid();
Span<byte> bytes = stackalloc byte[16];
guid.TryWriteBytes(bytes);

// now convert the bytes to a string using a base64URL encoder...
var result = Base64UrlEncode(bytes);

This is messy and given how often we have all seen Guids in URLs of the sites that we visit, it seems like a common problem that we should have a solution for.

Thoughts?

@GrabYourPitchforks
Copy link
Member

I don't see much appetite for adding a domain-specific method (base64url encoding) directly on the Guid type. Keep in mind also that GUIDs and other identifiers tend to be used as paths in URLs rather than as query string components, and base64 is a case-sensitive encoding. Most real-world applications stick to all-lowercase identifiers for things that appear in paths and do not expect to see mixed case-sensitive identifiers. This further restricts the range of applications which might get use out of such an API.

@prezaei
Copy link
Author

prezaei commented Jul 7, 2021

@GrabYourPitchforks, totally agree that we might end up with Base45. I would not look at this as a domain specific thing here. The actual problem I am trying to solve right now is to pass shorter correlation id (x-correlation-id) headers between some of our Azure products. Today, we use the simple Guid.ToString("N"). That wastes bandwidth.

In fact, Guid.ToString("N") is significantly used for serializing to JSON, YAML, gRPC and much more. Oh and don't forget all the logs that go into Geneva with all these long identifiers. Only if there was a shorter version of this, we will be helping climate change! You think I am joking, but I am not. This really is not a niche scenario for service code.

@GrabYourPitchforks
Copy link
Member

That last response kinda provides evidence for my point that this is domain-specific, no? :) The problem as originally stated is that you wanted something appropriate for placement in URLs; but #55290 (comment) shows that you actually want something that's the shortest ASCII computer-readable representation of arbitrary binary data (which doesn't need to be URL-safe); and that making something human-readable and URL-appropriate might require yet another format (like base45). But Guid.ToString is really meant to produce something that fulfills both a standard pattern and is human-readable, so it's really not the ideal place for putting this functionality.

I'm sympathetic to the problem, but since your desire is for the shortest possible representation and that you're willing to use a non-standard format to accomplish it, what's wrong with defining your own extension method?

public static string ToMinimalRepresentation(this Guid guid)
{
    Span<byte> asBytes = stackalloc byte[16];
    Guid.TryWriteBytes(asBytes);
    Span<char> asChars = stackalloc char[22];
    Base64UrlEncode(from: asBytes, to: asChars);
    return asChars.ToString(); // the one and only allocation
}

@prezaei
Copy link
Author

prezaei commented Jul 7, 2021

@GrabYourPitchforks, I can certainly do this and in fact have done so. My point is this pattern is pretty common out there. From websites to HTTP headers, to logs, etc. One of the reasons is that frameworks just don't make it available/easy for all devs to use these. Open any of our logs in Kusto/Cosmos and you will be shocked that no-one has taken the time to use a shorter version for a correlation id. Why? Is it because they can't write the code? No. It is because we don't make it easy for them to use an out of the box formatter and they are busy with so many other things. A good framework is there to simplify these types of work.

Let me ask you this: Why do we have so many other format specifiers but feel hesitant to add one more that has serious and real use cases? For instance, have you ever seen a Guid in this format: {0x00000000,0x0000,0x0000,{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}} (that is format = X).

Another question: What is the downside of adding this format? I am totally with you that we need to have a high bar for corlib but I believe I have made a good case here with so many use cases.

@svick
Copy link
Contributor

svick commented Jul 7, 2021

@prezaei

The actual problem I am trying to solve right now is to pass shorter correlation id (x-correlation-id) headers between some of our Azure products. Today, we use the simple Guid.ToString("N"). That wastes bandwidth.

If you care about saving every byte of bandwidth, why are you using a globally unique identifier? Wouldn't an identifier that's unique just to your application serve you as well, while being much shorter?

On the other hand, I just googled "guid to short string" and it seems to be a relatively common problem (with base64 usually being the suggested solution).

Why do we have so many other format specifiers but feel hesitant to add one more that has serious and real use cases?

Maybe there was a reason for the other formats when they were first added. Maybe there still is. Or maybe they were a mistake. In any case, I don't that's really a justification to add one more format.

@prezaei
Copy link
Author

prezaei commented Jul 7, 2021

@svick, still need something globally unique. This is not for a single application. It will potentially be used by all of Azure if I get my way. HTH

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021
@tannergooding tannergooding added this to the Future milestone Jul 12, 2021
@tannergooding
Copy link
Member

Agree with @GrabYourPitchforks that this seems like a very domain specific API and not something we'd be interested in expose on System.Guid directly.

Given that Guid can format to a Span<char>, Utf8Formatter can be used to format to a Span<byte>, and Base64Encoder likewise has APIs that can process a Span, you can already do this "allocation free" just potentially with an additional loop over what a custom implementation might provide/allow.

@tannergooding tannergooding closed this as not planned Won't fix, can't repro, duplicate, stale Sep 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

6 participants