Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add IdnMapping Span-based APIs #32411

Open
MihaZupan opened this issue Feb 16, 2020 · 9 comments
Open

Proposal: Add IdnMapping Span-based APIs #32411

MihaZupan opened this issue Feb 16, 2020 · 9 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization
Milestone

Comments

@MihaZupan
Copy link
Member

MihaZupan commented Feb 16, 2020

The current IdnMapping API accepts/returns strings and throws on invalid input. I propose a set of Span-based APIs to avoid allocations.

namespace System.Globalization
{
    public sealed class IdnMapping
    {
        // Existing API
        public string GetAscii(string unicode);
        public string GetAscii(string unicode, int index);
        public string GetAscii(string unicode, int index, int count);

        public string GetUnicode(string ascii);
        public string GetUnicode(string ascii, int index);
        public string GetUnicode(string ascii, int index, int count);

        // Proposed API
        public string GetAscii(ReadOnlySpan<char> unicode);
        public bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

        public string GetUnicode(ReadOnlySpan<char> ascii);
        public bool TryGetUnicode(ReadOnlySpan<char> ascii, Span<char> destination, out int charsWritten);
    }
}

Both Get and TryGet* methods would throw on invalid input.
TryGet* would return false on insufficient space in the destination span.

This new API would simplify call sites and remove allocations throughout code dealing with internationalized domain names, like Uri and Markdig.

cc: @tarekgh

@MihaZupan MihaZupan added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 16, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Feb 16, 2020
@tarekgh
Copy link
Member

tarekgh commented Feb 16, 2020

In general, the proposal looks reasonable.

Why we need the APIs:

    public string GetAscii(ReadOnlySpan<char> unicode);
    public string GetUnicode(ReadOnlySpan<char> ascii);

I don't think these are useful if we are going to allocate a string anyway. And the other proposed APIs can be used at that time. What do you think about that?

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Feb 16, 2020
@tarekgh tarekgh added this to the 5.0 milestone Feb 16, 2020
@MihaZupan
Copy link
Member Author

Those aren't needed in my use-cases, as Try* would always be used.

I'll remove them as they could easily be exposed later if a use-case presents itself.

@tarekgh tarekgh added api-ready-for-review and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Feb 17, 2020
@stephentoub
Copy link
Member

I don't understand these APIs:

public bool TryGetAscii(ReadOnlySpan<char> unicode, out string ascii);
public bool TryGetUnicode(ReadOnlySpan<char> ascii, out string unicode);

What does the Try mean here? If it's to avoid throwing in the case where the data is somehow invalid, that's a different meaning than the other Try overloads would have, which would be based solely on whether the destination is large enough. The Boolean returned from a Try is supposed to convey only one thing, and in such span-based Try methods, it's always used to connote whether the destination was large enough to store the transformed data. I don't think we want two overloads of the same method having a different meaning for the Try.

@MihaZupan
Copy link
Member Author

MihaZupan commented Jun 12, 2020

The idea was for both the out string and Span destination to return false on invalid input. The span one would also return false on insufficient space.

I agree the Span overload would be confusing to use since you couldn't differentiate between invalid input/insufficient space, without ensuring you supply a worst-case sized buffer.

Do we have a pattern of Try* methods ever throwing? If so, we could have all overloads throw on invalid input, where the Try only returns false on insufficient space.

// Existing
public string GetAscii(string unicode);

// New
public string GetAscii(ReadOnlySpan<char> unicode);
bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

Alternatively, we would need an OperationStatus-style return?

@MihaZupan MihaZupan added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed api-ready-for-review labels Jun 12, 2020
@stephentoub
Copy link
Member

stephentoub commented Jun 12, 2020

Do we have a pattern of Try* methods ever throwing?

Yes, Try methods can still throw.

Alternatively, we would need an OperationStatus-style return?

Why not just:

string GetAscii(ReadOnlySpan<char> unicode);
bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

?

If the exception for invalid input really is unexceptional, though, with the exception happening so frequently as to be a performance problem in real situations, then yeah, OperationStatus is what you'd want.

Can you share examples where the exception is a meaningful problem?

@MihaZupan
Copy link
Member Author

MihaZupan commented Jun 12, 2020

string GetAscii(ReadOnlySpan<char> unicode);

Yes, that is the shape we'd want (lazy copy-pasting, I've edited the comment).

A few examples of exceptional inputs from reading IdnMapping code:

  • Length: Empty / over 255 chars total / over 63 chars per label
  • Certain sequences
    • Two dots in a row
    • (- before .): foo-.bar
    • Invalid right-to-left
  • etc.
  • There are a few more cases in test code.

I wouldn't expect the performance overhead of exceptions to be a problem for well-behaving apps.

Edit:

where the exception is a meaningful problem?

No.

@ericstj ericstj modified the milestones: 5.0.0, 6.0.0 Jul 29, 2020
@ericstj
Copy link
Member

ericstj commented Jul 29, 2020

Moving this to 6.0.0 as it is not yet approved and we're driving 5.0.0 issues down. Move it back if you think this is still needed in 5.0.0.

@ericstj ericstj modified the milestones: 6.0.0, Future Jul 12, 2021
@ghost ghost moved this from 6.0.0 to Future in ML, Extensions, Globalization, etc, POD. Jul 12, 2021
@ericstj
Copy link
Member

ericstj commented Jul 12, 2021

This is too late for 6.0.0 at this point.

@MihaZupan MihaZupan added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Aug 3, 2023
@terrajobst
Copy link
Member

terrajobst commented Aug 8, 2023

Video

  • Looks good as proposed
namespace System.Globalization;

public sealed class IdnMapping
{
    // Existing API
    // public string GetAscii(string unicode);
    // public string GetAscii(string unicode, int index);
    // public string GetAscii(string unicode, int index, int count);
    //
    // public string GetUnicode(string ascii);
    // public string GetUnicode(string ascii, int index);
    // public string GetUnicode(string ascii, int index, int count);

    // Proposed API
    public string GetAscii(ReadOnlySpan<char> unicode);
    public bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

    public string GetUnicode(ReadOnlySpan<char> ascii);
    public bool TryGetUnicode(ReadOnlySpan<char> ascii, Span<char> destination, out int charsWritten);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization
Projects
None yet
Development

No branches or pull requests

6 participants