Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple case folding #8120

Closed
iSazonov opened this issue Oct 25, 2018 · 32 comments
Closed

Simple case folding #8120

iSazonov opened this issue Oct 25, 2018 · 32 comments
Labels
Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance
Milestone

Comments

@iSazonov
Copy link
Collaborator

From http://unicode.org/faq/casemap_charprop.html :

Case folding is mostly used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation.

I opened https://github.com/dotnet/corefx/issues/33047 to implement simple case folding.

We could use this for improving performance in:

  • PowerShell identifer comparisons
  • Select-String cmdlet
  • some more (???)
@iSazonov
Copy link
Collaborator Author

iSazonov commented Oct 27, 2018

I started. Alfa code https://github.com/iSazonov/PowerShell/tree/add-unicode3/src/System.Management.Automation/utils/unicode
The code doesn't depend on PowerShell so you can freely copy-past the files to your local folder and run with dotnet run -c release.

Better result is 10x win in best case and -4x in worst case. It is worth exploring further!

@mklement0 Have you an experience with Unicode? I'll need help to create a test set and review API design.

@mklement0
Copy link
Contributor

mklement0 commented Oct 27, 2018

(Somewhat ironically, the U in Unicode in your URL should be lowercase u - else the link doesn't work: https://github.com/iSazonov/PowerShell/tree/add-unicode3/src/System.Management.Automation/utils/unicode)

Thanks for looking into this, looks promising.

I think I have a basic understanding of Unicode fundamentals, but no experience relating to implementation aspects.

Can you elaborate on how I could help? Not sure what you mean by test site.

As for implementing the Unicode-version-dependent case-folding tables ourselves, please note the following comments from https://github.com/dotnet/corefx/issues/17233:

I think the open issue is how we support this on Windows (since I think that we can get Case Folding functionality from ICU).

In general, we've tried to move away from the framework itself shipping globalization data and used the OS provided APIs instead. Since Windows doesn't currently expose a way to do case folding, we'd have to figure out what to do.

So it sounds like we should take advantage of OS support where available (ICU on Linux; ?? on macOS).
As for Windows, and therefore overall: Perhaps it's worth waiting for CoreFx to support this?

@SteveL-MSFT SteveL-MSFT added the Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif label Oct 27, 2018
@iSazonov
Copy link
Collaborator Author

Not sure what you mean by test site.

Test set :-) It is Friday evening. If we would develop the APIs we should create a full set of tests. (We could look and grab CoreFX/CoreCLR tests - possible start search point dotnet/corefx#32968).

Can you elaborate on how I could help?

  1. We need right tests to get correct code.
  2. I would like to receive confirmation from you, community and MSFT team that we are moving in the right direction - can we use simple case folding for:
    • case-insensitive search in:
      • string.IndexOf( "...", StringComparison.OrdinalIgnoreCase/CurrentCultureIgnoreCase/InvarianCultureIgnoreCase) and Select-String cmdlet respectively.
      • RegEx engine
    • case-insensitive comparison of language (VB, PowerShell, etc.) identifiers.

http://unicode.org/ contains standard papers. I could share more tomorrow.

So it sounds like we should take advantage of OS support where available (ICU on Linux; ?? on macOS).
As for Windows, and therefore overall: Perhaps it's worth waiting for CoreFx to support this?

  1. Label on the issue is "Future" - no immediate plans. If they started today it would appear only in version 3.0 and we would get it only a year later in PowerShell Core 6.4. But no immediate plans exist and we will wait too long.
    I believe that we could do this work partly here to get the benefits faster and at the same time work on it in CoreCRL/CoreFX. This practice was already used when porting to PowerShell Core.
  2. ICU is native code. My understanding is:
    • performance penalty is very large for p/invoke.
    • currently CoreFX/CoreCLR uses p/invoke to do culture specific string handling. On Windows it is p/invokes to Kernel32.
    • so moving from Kernel32 to ICU makes no sense. I see that .Net Core team is consistently removing p/invokes from .Net Core code in favor of the managed code implementation and get performance wins.
    • It was these considerations that inspired me to begin this work of implementing "simple case folding" on managed code.

Inital plan is:

  • design and implement internally-only simple APIs for char and string conversion, equality, comparison and search based on simple case folding.
  • use the APIs in Select-String -SimpleMatch.

Ideal plan is to use the APIs in PowerShell engine for identifers.

A perfect plan :-) is to get full featured API in CoreFX/CoreCLR.

@iSazonov
Copy link
Collaborator Author

Today I have done some performance tests and it seems it will not be easy to overtake Kernel32 p/invokes. Though I have no experience with low level optimizations at all.

@iSazonov
Copy link
Collaborator Author

From: http://unicode.org/faq/casemap_charprop.html

Q: What is the difference between case mapping and case folding?
A: Case mapping or case conversion is a process whereby strings are converted to a particular form—uppercase, lowercase, or titlecase—possibly for display to the user. Case folding is mostly used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation. Case folding in Unicode is primarily based on the lowercase mapping, but includes additional changes to the source text to help make it language-insensitive and consistent. As a result, case-folded text should be used solely for internal processing and generally should not be stored or displayed to the end user.

From http://www.unicode.org/reports/tr31/tr31-29.html#R5

Equivalent Case-Insensitive Identifiers: To meet this requirement, an implementation shall specify either simple or full case folding, and adhere to the Unicode specification for that folding. Any two identifiers that have the same case-folded form shall be treated as equivalent by the implementation.

From http://www.unicode.org/reports/tr21/tr21-5.html

1.3 Caseless Matching
Caseless matching is implemented using case-folding. The latter is the process of mapping strings to a canonical form where case differences are erased. Case-folding allows for fast caseless matches in lookups, since only binary comparison is required. Case-folding is more than just conversion to lowercase. For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.

From http://unicode.org/reports/tr18/#Folded_Matching

3.10 Folded Matching (Retracted)
RL3.10 Folded Matching
Previous versions of RL3.10 described tailored folding. However, for most full-featured regular expression engines, it is quite difficult to match under folding equivalences that are not 1:1. For more discussion of this, see 1.5 Simple Loose Matches and 2.1 Canonical Equivalents. Thus RL3.10 has been retracted.

From http://unicode.org/reports/tr18/#Simple_Loose_Matches

1.5 Simple Loose Matches
Most regular expression engines offer caseless matching as the only loose matching. If the engine does offers this, then it needs to account for the large range of cased Unicode characters outside of ASCII.
RL1.5 Simple Loose Matches
To meet this requirement, if an implementation provides for case-insensitive matching, then it shall provide at least the simple, default Unicode case-insensitive matching, and specify which properties are closed and which are not.
To meet this requirement, if an implementation provides for case conversions, then it shall provide at least the simple, default Unicode case folding.

From http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf

3.13 Default Case Algorithms

Default Case Folding
Default Case Folding is based on the full case conversion operations without the contextdependent
mappings sensitive to the casing context.
R4 toCasefold(X): Map each character C in X to Case_Folding(C).
• Case_Folding(C) uses the mappings with the status field value “C” or “F” in the
data file CaseFolding.txt in the Unicode Character Database.

A modified form of Default Case Folding is designed for best behavior when doing caseless
matching of strings interpreted as identifiers.
(See above http://www.unicode.org/reports/tr31/tr31-29.html#R5)

Default Caseless Matching
Default caseless matching is the process of comparing two strings for case-insensitive
equality. The definitions of Unicode Default Caseless Matching build on the definitions of
Unicode Default Case Folding.
Default Caseless Matching uses full case folding:
D144 A string X is a caseless match for a string Y if and only if:
toCasefold(X) = toCasefold(Y)

Not related but interesting http://www.unicode.org/notes/tn31/tn31-2.html
Fast Compression Algorithm for Unicode Text (compression ratio ~20-50%, speed ratio ~15 times)

@iSazonov
Copy link
Collaborator Author

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 1, 2018

First alfa code performance results (yes, I got the code fast!):
/ * Summary *

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.10240.17146 (1507/RTM/Threshold1)
Intel Core i3-4150 CPU 3.50GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3410090 Hz, Resolution=293.2474 ns, Timer=TSC
.NET Core SDK=2.1.403
[Host] : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
DefaultJob : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT

Method StrA StrB Mean Error StdDev Median Scaled ScaledSD Gen 0 Allocated
ToLowerRussian CaseFolding1 CaseFolding 73.28 ns 2.129 ns 3.120 ns 72.32 ns 0.99 0.06 0.0660 104 B
ToLowerInvariant CaseFolding1 CaseFolding 73.85 ns 1.660 ns 3.978 ns 72.09 ns 1.00 0.00 0.0660 104 B
StringFold CaseFolding1 CaseFolding 65.15 ns 1.372 ns 3.068 ns 63.81 ns 0.88 0.06 0.0660 104 B
ToLowerRussian ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 188.23 ns 5.924 ns 17.185 ns 177.66 ns 1.09 0.11 0.0660 104 B
ToLowerInvariant ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 173.58 ns 3.699 ns 8.646 ns 169.48 ns 1.00 0.00 0.0660 104 B
StringFold ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 77.15 ns 1.580 ns 2.054 ns 76.17 ns 0.45 0.02 0.0660 104 B

@vexx32
Copy link
Collaborator

vexx32 commented Nov 1, 2018

Given how frequently PS uses case-insensitive comparisons, that will likely make a massive difference if it can be utilised widely in the PS code base.

Looking good @iSazonov!!

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 1, 2018

Yes, it is encouraging. I updated link above to the alfa code. Now I started to work on fold comparison methods and perf tests.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 1, 2018

This can not be trusted but the first results are impressive.
// * Summary *

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.10240.17146 (1507/RTM/Threshold1)
Intel Core i3-4150 CPU 3.50GHz (Max: 0.80GHz) (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3410090 Hz, Resolution=293.2474 ns, Timer=TSC
.NET Core SDK=2.1.403
[Host] : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
DefaultJob : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT

Method StrA StrB Mean Error StdDev Median Allocated
CoreFXCompareOrdinal CaseFolding1 CaseFolding 15.97 ns 0.4665 ns 0.4581 ns 15.75 ns 0 B
CoreFXCompareOrdinalIgnoreCase CaseFolding1 CaseFolding 28.47 ns 0.6793 ns 1.1535 ns 27.98 ns 0 B
CoreFXCompareInvariantCulture CaseFolding1 CaseFolding 78.80 ns 1.5962 ns 3.6354 ns 77.11 ns 0 B
CoreFXCompareRussianCulture CaseFolding1 CaseFolding 205.74 ns 4.0894 ns 3.8252 ns 203.90 ns 0 B
CoreFXCompareRussianCultureIgnoreCase CaseFolding1 CaseFolding 213.12 ns 17.6287 ns 17.3137 ns 204.75 ns 0 B
CompareFolded CaseFolding1 CaseFolding 23.20 ns 0.4771 ns 0.5495 ns 22.90 ns 0 B
CoreFXCompareOrdinal ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 16.27 ns 0.3593 ns 0.3690 ns 16.21 ns 0 B
CoreFXCompareOrdinalIgnoreCase ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 35.92 ns 1.1880 ns 1.3205 ns 35.35 ns 0 B
CoreFXCompareInvariantCulture ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 101.51 ns 2.7983 ns 7.8468 ns 97.63 ns 0 B
CoreFXCompareRussianCulture ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 205.93 ns 3.9585 ns 4.2356 ns 204.45 ns 0 B
CoreFXCompareRussianCultureIgnoreCase ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 210.18 ns 15.8387 ns 14.8155 ns 204.99 ns 0 B
CompareFolded ЯЯЯЯЯЯЯЯЯЯЯ1 ЯЯЯЯЯЯЯЯЯЯЯ 22.83 ns 0.1892 ns 0.1477 ns 22.79 ns 0 B

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 2, 2018

I updated my work branch. Now the code is placed in correct (I hope :-) ) folders and you can compile PowerShell and run xUnit and performance tests.

Next step is to create a comparer based on simple folded strings. Then we could try it in PowerShell engine.
Update: comparer was pushed.

@mklement0
Copy link
Contributor

mklement0 commented Nov 2, 2018

Great stuff, @iSazonov; thanks also for the helpful collection of quotes from the Unicode standard.

I've never had to deal with Unicode normalization problems in the context of .NET (e.g., that accented letter ü can be represented either as single code point U+00FC or as two code points U+0075 + U+0308 - u followed by combining diacritic ̈): do they come into play here?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 2, 2018

There is case mapping and case folding.
Case mapping is a conversion like ToLowerCase() and ToUpperCase(). Case mapping is culture aware.
Case folding is still another conversion and it is culture-independent. The last feature is key for us here.
Case folding can be full and simple.
Full case folding can change string length (i.e. replace 2-byte to 4 byte code point).
Simple case folding always preserves the string length. This is a key for us here too.
Case folding is 1:1 char replacement - no chars added, no chars removed.

Unicode standard exposes CaseFolding.txt file containing a table for case folding (full and simple in one table). We parse the file (we created a PowerShell script for this) and one-time generate C# file with two static char arrays - first for 2-byte code points and second for 4-byte code points - this is important because chars in C# is Utf16 2-byte values, it also means that we have to process surrogates. These array is used for one-step very fast char conversions. Surrogates require additional calculations but surrogates is very rare and the calculations is very simple - this should not degrade performance.

@mklement0
Copy link
Contributor

mklement0 commented Nov 2, 2018

Thanks, @iSazonov.
Minor addition: default case folding - which is the one we're concerned with - is culture-invariant, but case folding can be culture-sensitive (tailored), if needed (the T status values in the case-folding table, and additional language-specific rules not covered there).

While the normalization problem I've mentioned is unrelated to casing per se, I still wonder if it affects us here:

# Composed normalization form: 'ö' as a *single* code point.
$oUmlautNFC = [string] [char] 0xf6

# Decomposed normalization form: 'ö' as *two* code points:
# The base character, 'o', followed by a combining diacritic.
$oUmlautNFD = -join [char[]] (0x6f, 0x308)

# String comparison *does* recognize the equivalence, even though
# the strings have distinct code points:
$oUmlautNFC -eq $oUmlautNFD  # -> $True!

Does your code recognize this kind of equivalence too?

@vexx32
Copy link
Collaborator

vexx32 commented Nov 2, 2018

I would expect that it doesn't need to, @mklement0. In that example, let's consider how case-folding would treat the characters:

I would expect that folding the single code-point variant would consider both 0xD6 and 0xF6 (upper and lowercase code points of Ö) the same.

In the case of the regular character and the combining diacritic, it only needs to fold case for the regular character, and the combining diacritic would be unaffected.

@mklement0
Copy link
Contributor

@vexx32: That applies if both input strings are in the same normalization form (and only differ by case), but the question is what happens if they're not:

With -eq (.Equals() with option InvariantCultureIgnoreCase - also works with CurrentCultureIgnoreCase), the equivalence is recognized:

# NFC 'Ö' (LATIN CAPITAL LETTER O WITH DIAERESIS)
$oUmlautNFC_Upper = [string] [char] 0xd6

# Compare uppercase (single-code-point) NFC Ö to lowercase (2-code-point) NFD ö
$oUmlautNFC_Upper -eq $oUmlautNFD # -> !True!

By contrast - much to my surprise - Select-String does not recognize this equivalence:

$oUmlautNFC_Upper | Select-String $oUmlautNFD # !! NO OUTPUT

Is it by design that Select-String compares strings differently? (Never looked at the internals).

Either way, while the answer may be that we cannot handle normalization differences for performance reasons, we need to have clarity on that decision and to document it accordingly.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 2, 2018

@mklement0 Your question is open question. I think the question is "What is PowerShell identifier" in Unicode terms and our answer should be based on http://www.unicode.org/reports/tr31/
As I mentioned above there is a rule that we could use for PowerShell:

UAX31-R5. Equivalent Case-Insensitive Identifiers: To meet this requirement, an implementation shall specify either simple or full case folding, and adhere to the Unicode specification for that folding. Any two identifiers that have the same case-folded form shall be treated as equivalent by the implementation.

@mklement0
Copy link
Contributor

There's a chapter on normalization in the document you link to: http://www.unicode.org/reports/tr31/#normalization_and_case

Have only glanced at it at this point, but this caught my eye:

Implementations that take normalization and case into account have two choices: to treat variants as equivalent, or to disallow variants.

So it sounds like disallowing variants (making no attempt to recognize different normalization forms as equivalent) is an option - again, a decision we should be clear about and document.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 3, 2018

Yes, the standard describes all common possibilities. And it says about normalization of identifiers. This allows "human" identifiers. I believe it make sense for DSL or AI and make no sense for strong typed languages like C# and perhaps for PowerShell too - I'd very wonder if PowerShell consider "$Ёлка" and "$Елка" as equal although both are Christmas tree (the sample is only "display difference"! not Unicode specific).

@mklement0
Copy link
Contributor

I agree that for identifiers disallowing normalization variants makes sense.

(As an aside: your Ёлка example is not normalization-related and also not covered by the case-folding table - is it an example of language-specific case-folding? Or is it something more informal?)

Select-String - and anywhere outside input is processed - is a different story, however, and we already have a split there, based on CoreFx behavior:

  • [string] IS normalization-aware
  • [regex] is NOT normalization-aware

That explains why $oUmlautNFC_Upper | Select-String $oUmlautNFD (regex) does not match, whereas $oUmlautNFC_Upper | Select-String $oUmlautNFD -SimpleMatch (string) does.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 4, 2018

In Russian we can freely replace Ё - 0401 with Е - 0415. (For lower case too). But "Ёлка" -eq "Елка" returns false - I don't know why. Perhaps because of Unicode backword compatibility and stability only.
Update: Unicode equivalence is defined for the same abstract character. Ё and Е is different chars.

I don't wonder about [string]. By default it use a culture overload for Equals bool Equals(string value, System.StringComparison comparisonType) which makes p/invoke of OS culture API.
I am happy that [rexeg] is not normalization-aware - it makes things simple for us.

I've been going to do this for a long time #8180 for Select-String -SimpleMatch.

@mklement0
Copy link
Contributor

mklement0 commented Nov 5, 2018

Thanks for the background on Е / Ё. There are cases in the official case-folding table where distinct characters are treated the same, and there are additional, language-specific ones in SpecialCasing.txt, but I guess we needn't worry about cases that are defined in neither.

-eq consistently uses InvariantCultureIgnoreCase, from what I gather, which aligns with PowerShell's overall policy of using the invariant culture in explicit string operations.

It is unfortunate that -like, -match / -replace / split and Select-String all use culture-sensitive comparison (which is documented for Select-String, but not the others).

The difference rarely matters, but there is the infamous Turkish "I" problem in the tr and az cultures.

[The following is at least partially incorrect.]
Even more curiously, -match / -replace / split apply the Turkish / Azerbaijani case mapping irrespective of what culture is in effect: "ı" -match "I" yields $True in any culture (don't try this on macOS, where the Turkish rules are fundamentally unsupported).

Haven't looked at the implementation yet - any idea what's going on?
I'm also baffled that on macOS -match / -replace / split do not support Turkish case rules, seemingly due to lack of OS support, whereas Select-String (without -SimpleMatch) does.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 5, 2018

Seems the Turkish case mapping is not related with case folding. I believe simple case folding works with Turkish well because the folding is intended to be language-neutral. This is just the reason why we can get a win in regex and PowerShell identifier comparisons internally and can not in -match / -replace / split.

@mklement0
Copy link
Contributor

Seems the Turkish case mapping is not related with case folding.

It is part of the case-folding table, but of the language-specific variety: status T in CaseFolding.txt, where T, I presume, stands for tailored.

#    The mappings with status T can be used or omitted depending on the desired case-folding
#    behavior. (The default option is to exclude them.)
[...]
0049; T; 0131; # LATIN CAPITAL LETTER I
0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE

To reiterate: case folding can be language-neutral or language-specific (tailored), as needed.

But I agree that for identifiers it definitely makes sense to stay language-neutral.

get a win in regex and PowerShell identifier comparisons internally and can not in -match / -replace / split.

Understood re identifiers, but what regex used do you mean in this case, given that regexes are also used for -match / -replace, / -split?

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 6, 2018

It is part of the case-folding table, but of the language-specific variety: status T in CaseFolding.txt, where T, I presume, stands for tailored.

-match / -replace, / -split

  1. Seems it would be a breaking change if we make them language-neutral.
  2. Without rewriting regex code we can only add some optimizations before invoking a regex. Today I don't see how we could get a win here. Perhaps only for -imatch.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Nov 7, 2018

Today I measured Import-Csv on large file (vs 6.1.0 RTM ) and discover up to 8% win with new SimpleFoldedStringComparer.

@iSazonov
Copy link
Collaborator Author

iSazonov commented Dec 5, 2018

I opened new issue in CoreFXlab repo to start a work to add the enhancement in .Net Core.
The main problem now is code performance. A large cache is now used in my alfa code and it is fast. The proposed version with a 3-level cache (size in 10x times less) is too slow (2-3 times). Any help is appreciated how make the code compact and fastest (comparable to ToLowerInvariant() for folding and OrdinalIgnoreCase for comparer).

@vexx32 @HumanEquivalentUnit @mklement0 @daxian-dbw

@iSazonov
Copy link
Collaborator Author

@SteveL-MSFT I am ready to create a nuget package in CoreFXlab dotnet/corefxlab#2610
Could we use the package after it will created and tested?

@SteveL-MSFT
Copy link
Member

@iSazonov I think we can use it with 6.3-preview.1

@iSazonov
Copy link
Collaborator Author

@SteveL-MSFT Thanks! Currently the code was merged in Corefxlab but nuget package is not still created. Targeting is to .Net Core 3.0 only. Now there need to review public api and to make performance optimizations. This is a work for MSFT experts. It would be nice if you could sync internally with Core team.
As for PowerShell we will need approval using SCF because folding is not full equivalent of ToLowerCase() and correspondingly there is a littel difference with OrdinalIgnoreCase.

@SteveL-MSFT SteveL-MSFT added the WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance label Mar 23, 2019
@SteveL-MSFT SteveL-MSFT added this to the 7.0-Consider milestone May 6, 2019
@SteveL-MSFT
Copy link
Member

Moving out of 7.0 as it seems .NET 5 may support this

@iSazonov
Copy link
Collaborator Author

Yes, they plan to move ICU internally in Core and unify Core behavior (to fall back to OS with p/invoke sometimes). In the case OrdinalIgnoreCase will implement the SCF, I hope with great performance.

While waiting Core 5.0 we could resolve #8180 without risk to conflict with Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Discussion the issue may not have a clear classification yet. The issue may generate an RFC or may be reclassif WG-Engine-Performance core PowerShell engine, interpreter, and runtime performance
Projects
None yet
Development

No branches or pull requests

5 participants