Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API proposal: Utf8.IsValid(bytes) #502

Closed
GrabYourPitchforks opened this issue Dec 4, 2019 · 3 comments · Fixed by #88004
Closed

API proposal: Utf8.IsValid(bytes) #502

GrabYourPitchforks opened this issue Dec 4, 2019 · 3 comments · Fixed by #88004
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Dec 4, 2019

See dotnet/corefxlab#2792 (comment) for some further discussion. Many of our UTF-8 utility methods are internal for use only by Utf8String or UTF8Encoding, but per that discussion there are some scenarios where callers might find those utility methods helpful.

In particular, we should expose an API which answers "given a ROS<byte>, please return a bool telling me if the input was well-formed UTF-8." It could live on the existing System.Text.Unicode.Utf8 utility type.

namespace System.Text.Unicode
{
    // existing static type that shipped in Core 3.0
    public static class Utf8
    {
        // proposed new API
        public static bool IsValid(ReadOnlySpan<byte> value);
    }
}
@GrabYourPitchforks GrabYourPitchforks added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime labels Dec 4, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2019
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@joperezr joperezr added this to the Future milestone Jul 7, 2020
@neon-sunset
Copy link
Contributor

neon-sunset commented Jun 6, 2023

Because the other proposal has been closed as duplicate, please consider changing IsWellFormed(...) to IsValid(...) which matches already approved and implemented Ascii.isValid(...).

Please also let me know what else needs to be done for this proposal to move forward.

Thank you.

@jeffhandley
Copy link
Member

@neon-sunset, thanks for that suggestion--@GrabYourPitchforks and I were just double-checking the existing API name to make them consistent. We've updated the proposal above to capture that, and we've made the proposal more targeted.

We want to look back at this proposal and determine if it should be included in our next round of planning. We already have an implementation of this that is internal on Utf8Utility. Marking this as api-ready-for-review API is ready for review, it is NOT ready for implementation .

public static unsafe int GetIndexOfFirstInvalidUtf8Sequence(ReadOnlySpan<byte> utf8Data, out bool isAscii)
{
fixed (byte* pUtf8Data = &MemoryMarshal.GetReference(utf8Data))
{
byte* pFirstInvalidByte = GetPointerToFirstInvalidByte(pUtf8Data, utf8Data.Length, out int utf16CodeUnitCountAdjustment, out _);
int index = (int)(void*)Unsafe.ByteOffset(ref *pUtf8Data, ref *pFirstInvalidByte);
isAscii = (utf16CodeUnitCountAdjustment == 0); // If UTF-16 char count == UTF-8 byte count, it's ASCII.
return (index < utf8Data.Length) ? index : -1;
}
}

@jeffhandley jeffhandley added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 6, 2023
@jeffhandley jeffhandley changed the title API proposal: Utf8.IsWellFormedData(bytes) API proposal: Utf8.IsValid(bytes) Jun 7, 2023
@stephentoub stephentoub modified the milestones: Future, 8.0.0 Jun 7, 2023
@bartonjs
Copy link
Member

bartonjs commented Jun 22, 2023

Video

Looks good as proposed

namespace System.Text.Unicode
{
    // existing static type that shipped in Core 3.0
    public static class Utf8
    {
        // proposed new API
        public static bool IsValid(ReadOnlySpan<byte> value);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 22, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 24, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 28, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants