Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

System.Text.Rune ref APIs and unit tests #33395

Merged
merged 3 commits into from Nov 15, 2018

Conversation

GrabYourPitchforks
Copy link
Member

Reference assembly APIs and unit tests for dotnet/coreclr#20935

This PR will not pass CI until the underlying coreclr change makes its way over.

public static bool TryCreate(uint value, out Rune result) { throw null; }
public bool TryEncode(Span<char> destination, out int charsWritten) { throw null; }
public static bool TryGetRuneAt(string input, int index, out Rune value) { throw null; }
public static double GetNumericValue(Rune value) { throw null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does returning double enable any specific use-cases that e.g. long does not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode defines fractional values, such as U+00BD (½) for which a non-integral value is returned.

http://unicodefractions.com/ for some others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pull the data from https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedNumericValues.txt.
Our test file has a specific test for the U+00BD case:

yield return new object[] { new UnicodeInfoTestData { ScalarValue = (Rune)(0xBD), UnicodeCategory = UnicodeCategory.OtherNumber, NumericValue = 0.5, IsControl = false, IsDigit = false, IsLetter = false, IsLetterOrDigit = false, IsLower = false, IsNumber = true, IsPunctuation = false, IsSeparator = false, IsSymbol = false, IsUpper = false, IsWhiteSpace = false } };

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets even more fun, since some scalars can correspond to negative numeric values! :D
For example, U+0F33 TIBETAN DIGIT HALF ZERO ('༳') has a numeric value of -0.5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is this API as generally useful as to be exposed directly on the Rune type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I don't personally have a valid use case for it. It's there for parity with System.Char and System.Text.CharUnicodeInfo.

@@ -7574,6 +7574,63 @@ public enum NormalizationForm
FormKC = 5,
FormKD = 6,
}
public readonly struct Rune : IComparable<Rune>, IEquatable<Rune>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you can use the ref generation tool to auto-gen the reference so it has correct sort ordering and fully qualified type names (like System.IEquatable<Rune>).

msbuild /t:GenerateReferenceSource

However, for System.Runtime, there are some warts with the auto-gen tool, so you may have to undo some of the changes that it makes (to other types).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no documentation for how to run that tool or what parameters to give it. If somebody creates docs I can run it. Otherwise I'll stick with doing this by hand because at least I know by hand works. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no documentation for how to run that tool or what parameters to give it.

Agreed. Here is how I tend to do it (for System.Runtime):

  1. Build coreclr release
  2. Build corefx release with coreclr bits
  3. Run msbuild /t:GenerateReferenceSource /p:ConfigurationGroup=Release from the System.Runtime/ref directory
  4. Filter out all unrelated changes and extract the changes you care about (ignore certain attributes being removed).

https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/developer-guide.md#testing-with-private-coreclr-bits

Updated the ref here: #33512

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation here: #33515

public static explicit operator Rune(int value) { throw null; }
public bool IsAscii { get { throw null; } }
public bool IsBmp { get { throw null; } }
public int Plane { get { throw null; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more descriptive: UnicodePlane?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current version should be fine considering the whole class is related to the Unicode code point

public bool IsAscii { get { throw null; } }
public bool IsBmp { get { throw null; } }
public int Plane { get { throw null; } }
public static Rune ReplacementChar { get { throw null; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplacementRune?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U+FFFD REPLACEMENT CHARACTER is the official Unicode name for this value. We can expand "Char" -> "Character", but we shouldn't use the name Rune here.

@tarekgh
Copy link
Member

tarekgh commented Nov 12, 2018

try to build all configurations to ensure building the packages as I expect the api compat will fail for .Net Native as the APIs wouldn't should up there yet and you'll need to add some entries to the baseline file.

@GrabYourPitchforks
Copy link
Member Author

The latest commit is just in case we change the name Rune back to UnicodeScalar, pending API review. I just wanted to have the commit ready to go whatever the decision.

@krwq
Copy link
Member

krwq commented Nov 13, 2018

My 2c: @GrabYourPitchforks I understood the Rune since it is short and you can get used to a fairly weird sounding (at least for me) name, but since UnicodeScalar is long already I'd rather go with UnicodeCodePoint - that would at least match the spec :-)

@tarekgh
Copy link
Member

tarekgh commented Nov 13, 2018

@krwq Unicode Scalar is right term here. it is different than code point.

The code point is:
In the Unicode Standard, the codespace consists of the integers from 0 to 10FFFF16, comprising 1,114,112 code points available for assigning the repertoire of abstract characters. http://www.unicode.org/versions/Unicode11.0.0/ch02.pdf#G25564

while Unicode scalar is:
Unicode Scalar Value. Any Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive. (See definition D76 in Section 3.9, Unicode Encoding Forms.) http://unicode.org/glossary/#unicode_scalar_value

This type is about Unicode Scalar

@krwq
Copy link
Member

krwq commented Nov 13, 2018

Thanks @tarekgh I can see in fact tests confirming the spec definition. I concur with the name in this case :-)

@GrabYourPitchforks
Copy link
Member Author

The coreclr implementation is now checked in. I'll rebase this PR when the train makes its way to corefx, and hopefully everything will just pass. :) (fingers crossed)

@stephentoub
Copy link
Member

@dotnet-bot test this please

@GrabYourPitchforks
Copy link
Member Author

The train from coreclr into corefx is delayed to a build infrastructure issue. The issue appears to be resolved now but there's a small backlog. I'm hoping the train will arrive in a few hours, and then I can rebase on top of it and get this to pass.

@GrabYourPitchforks GrabYourPitchforks merged commit 7626a26 into dotnet:master Nov 15, 2018
jlennox pushed a commit to jlennox/corefx that referenced this pull request Dec 16, 2018
@karelz karelz added this to the 3.0 milestone Dec 21, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants