System.Text.Rune ref APIs and unit tests #33395
Conversation
public static bool TryCreate(uint value, out Rune result) { throw null; } | ||
public bool TryEncode(Span<char> destination, out int charsWritten) { throw null; } | ||
public static bool TryGetRuneAt(string input, int index, out Rune value) { throw null; } | ||
public static double GetNumericValue(Rune value) { throw null; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does returning double
enable any specific use-cases that e.g. long
does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode defines fractional values, such as U+00BD (½) for which a non-integral value is returned.
http://unicodefractions.com/ for some others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pull the data from https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedNumericValues.txt.
Our test file has a specific test for the U+00BD case:
yield return new object[] { new UnicodeInfoTestData { ScalarValue = (Rune)(0xBD), UnicodeCategory = UnicodeCategory.OtherNumber, NumericValue = 0.5, IsControl = false, IsDigit = false, IsLetter = false, IsLetterOrDigit = false, IsLower = false, IsNumber = true, IsPunctuation = false, IsSeparator = false, IsSymbol = false, IsUpper = false, IsWhiteSpace = false } };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets even more fun, since some scalars can correspond to negative numeric values! :D
For example, U+0F33 TIBETAN DIGIT HALF ZERO ('༳') has a numeric value of -0.5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Is this API as generally useful as to be exposed directly on the Rune
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't personally have a valid use case for it. It's there for parity with System.Char
and System.Text.CharUnicodeInfo
.
@@ -7574,6 +7574,63 @@ public enum NormalizationForm | |||
FormKC = 5, | |||
FormKD = 6, | |||
} | |||
public readonly struct Rune : IComparable<Rune>, IEquatable<Rune> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if you can use the ref generation tool to auto-gen the reference so it has correct sort ordering and fully qualified type names (like System.IEquatable<Rune>
).
msbuild /t:GenerateReferenceSource
However, for System.Runtime, there are some warts with the auto-gen tool, so you may have to undo some of the changes that it makes (to other types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no documentation for how to run that tool or what parameters to give it. If somebody creates docs I can run it. Otherwise I'll stick with doing this by hand because at least I know by hand works. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no documentation for how to run that tool or what parameters to give it.
Agreed. Here is how I tend to do it (for System.Runtime):
- Build coreclr release
- Build corefx release with coreclr bits
- Run
msbuild /t:GenerateReferenceSource /p:ConfigurationGroup=Release
from the System.Runtime/ref directory - Filter out all unrelated changes and extract the changes you care about (ignore certain attributes being removed).
Updated the ref here: #33512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added documentation here: #33515
public static explicit operator Rune(int value) { throw null; } | ||
public bool IsAscii { get { throw null; } } | ||
public bool IsBmp { get { throw null; } } | ||
public int Plane { get { throw null; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be more descriptive: UnicodePlane
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think current version should be fine considering the whole class is related to the Unicode code point
public bool IsAscii { get { throw null; } } | ||
public bool IsBmp { get { throw null; } } | ||
public int Plane { get { throw null; } } | ||
public static Rune ReplacementChar { get { throw null; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReplacementRune
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
U+FFFD REPLACEMENT CHARACTER
is the official Unicode name for this value. We can expand "Char" -> "Character", but we shouldn't use the name Rune here.
try to build all configurations to ensure building the packages as I expect the api compat will fail for .Net Native as the APIs wouldn't should up there yet and you'll need to add some entries to the baseline file. |
904718e
to
415cd7c
Compare
The latest commit is just in case we change the name |
My 2c: @GrabYourPitchforks I understood the |
@krwq Unicode Scalar is right term here. it is different than code point. The code point is: while Unicode scalar is: This type is about Unicode Scalar |
Thanks @tarekgh I can see in fact tests confirming the spec definition. I concur with the name in this case :-) |
The coreclr implementation is now checked in. I'll rebase this PR when the train makes its way to corefx, and hopefully everything will just pass. :) (fingers crossed) |
@dotnet-bot test this please |
The train from coreclr into corefx is delayed to a build infrastructure issue. The issue appears to be resolved now but there's a small backlog. I'm hoping the train will arrive in a few hours, and then I can rebase on top of it and get this to pass. |
be02a24
to
c7646c4
Compare
Commit migrated from dotnet/corefx@7626a26
Reference assembly APIs and unit tests for dotnet/coreclr#20935
This PR will not pass CI until the underlying coreclr change makes its way over.