Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose System.Globalization.CharUnicodeInfo.GetBidiCategory #44464

Open
OronDF343 opened this issue Nov 10, 2020 · 2 comments
Open

Expose System.Globalization.CharUnicodeInfo.GetBidiCategory #44464

OronDF343 opened this issue Nov 10, 2020 · 2 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Globalization
Milestone

Comments

@OronDF343
Copy link

Background and Motivation

Currently, there is no public API for determining whether a Unicode character is strongly right-to-left, strongly left-to-right or neutral. Such an API can be useful when displaying text in right-to-left and left-to-right languages simultaneously.

Proposed API

StrongBidiCategory.cs

-    internal enum StrongBidiCategory
+    public enum StrongBidiCategory
     {
     ...

CharUnicodeInfo.cs

     public static partial class CharUnicodeInfo
     {
     ...
+        public static StrongBidiCategory GetBidiCategory(char ch)
+        {
+            return GetBidiCategoryNoBoundsChecks(ch);
+        }
+
+        public static StrongBidiCategory GetBidiCategory(int codePoint)
+        {
+            if (!UnicodeUtility.IsValidCodePoint((uint)codePoint))
+            {
+                ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.codePoint);
+            }
+
+            return GetBidiCategoryNoBoundsChecks((uint)codePoint);
+        }

-        internal static StrongBidiCategory GetBidiCategory(string s, int index)
+        public static StrongBidiCategory GetBidiCategory(string s, int index)
         {
     ...

Usage Examples

For example, this API can be used to determine the direction to which a block of text should be aligned, based on the first character that is strongly RTL or strongly LTR:

using System.Globalization;

public bool IsRtl(string str)
{
    var cat = StrongBidiCategory.Other;
    for (var i = 0; i < str.Length && cat == StrongBidiCategory.Other; ++i)
    {
        cat = CharUnicodeInfo.GetBidiCategory(str, i);
    }
    return cat == StrongBidiCategory.StrongRightToLeft;
}

Alternative Designs

The above proposed API was designed to match the existing public overloads of GetUnicodeCategory. Another welcome addition would be overloads accepting ReadOnlySpan<char> for each of the methods accepting string s, int index.

In .NET Framework, there is a similar internal function with the same name, but it returns BidiCategory, an enum that represents the actual BIDI category rather that just the "strong" one. For some use-cases, it may be beneficial to get the more detailed information rather than just a simplified version of it. Therefore, the alternative option would be to instead add new APIs that return BidiCategory, plus an extension method that converts BidiCategory to StrongBidiCategory.

Risks

Since the proposed API already exists internally, there should no risks for development on newer platforms such as .NET 5. Implementing the alternative design using BidiCategory rather than StrongBidiCategory may require naming changes and therefore changes to any code which calls the internal APIs.

@OronDF343 OronDF343 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 10, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Nov 10, 2020
@ghost
Copy link

ghost commented Nov 10, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.


Issue meta data
Issue content: ## Background and Motivation

Currently, there is no public API for determining whether a Unicode character is strongly right-to-left, strongly left-to-right or neutral. Such an API can be useful when displaying text in right-to-left and left-to-right languages simultaneously.

Proposed API

StrongBidiCategory.cs

-    internal enum StrongBidiCategory
+    public enum StrongBidiCategory
     {
     ...

CharUnicodeInfo.cs

     public static partial class CharUnicodeInfo
     {
     ...
+        public static StrongBidiCategory GetBidiCategory(char ch)
+        {
+            return GetBidiCategoryNoBoundsChecks(ch);
+        }
+
+        public static StrongBidiCategory GetBidiCategory(int codePoint)
+        {
+            if (!UnicodeUtility.IsValidCodePoint((uint)codePoint))
+            {
+                ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.codePoint);
+            }
+
+            return GetBidiCategoryNoBoundsChecks((uint)codePoint);
+        }

-        internal static StrongBidiCategory GetBidiCategory(string s, int index)
+        public static StrongBidiCategory GetBidiCategory(string s, int index)
         {
     ...

Usage Examples

For example, this API can be used to determine the direction to which a block of text should be aligned, based on the first character that is strongly RTL or strongly LTR:

using System.Globalization;

public bool IsRtl(string str)
{
    var cat = StrongBidiCategory.Other;
    for (var i = 0; i < str.Length && cat == StrongBidiCategory.Other; ++i)
    {
        cat = CharUnicodeInfo.GetBidiCategory(str, i);
    }
    return cat == StrongBidiCategory.StrongRightToLeft;
}

Alternative Designs

The above proposed API was designed to match the existing public overloads of GetUnicodeCategory. Another welcome addition would be overloads accepting ReadOnlySpan<char> for each of the methods accepting string s, int index.

In .NET Framework, there is a similar internal function with the same name, but it returns BidiCategory, an enum that represents the actual BIDI category rather that just the "strong" one. For some use-cases, it may be beneficial to get the more detailed information rather than just a simplified version of it. Therefore, the alternative option would be to instead add new APIs that return BidiCategory, plus an extension method that converts BidiCategory to StrongBidiCategory.

Risks

Since the proposed API already exists internally, there should no risks for development on newer platforms such as .NET 5. Implementing the alternative design using BidiCategory rather than StrongBidiCategory may require naming changes and therefore changes to any code which calls the internal APIs.

Issue author: OronDF343
Assignees: -
Milestone: -

@GrabYourPitchforks
Copy link
Member

FWIW, .NET Framework has the internal BidiCategory enum, while .NET Core / .NET 5 has the internal StrongBidiCategory enum. The runtime does not carry full UCD bidi information; we only carry enough information to make our own use cases (mainly some System.Uri functionality) work correctly. Carrying full bidi information would require reworking the internal CharUnicodeInfo backing tables, as currently we don't have enough spare bits to carry that information.

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Nov 10, 2020
@tarekgh tarekgh added this to the Future milestone Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Globalization
Projects
None yet
Development

No branches or pull requests

4 participants