Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Optimize Char.GetUnicodeCategory and related checks #20864

Merged
merged 1 commit into from
Nov 9, 2018
Merged

Optimize Char.GetUnicodeCategory and related checks #20864

merged 1 commit into from
Nov 9, 2018

Conversation

pentp
Copy link

@pentp pentp commented Nov 7, 2018

The static data change is inspired by #20768 and range checks are changed to a single conditional instead of multiple branches.
I didn't do perf tests, but asm diffs had only wins.

@pentp
Copy link
Author

pentp commented Nov 8, 2018

@dotnet-bot test Ubuntu x64 Checked CoreFX Tests

@ufcpp
Copy link

ufcpp commented Nov 8, 2018

Can this be also applicable to CharUnicodeInfoData and avoid pinning by using MemoryMarshal.AsBytes?

@pentp
Copy link
Author

pentp commented Nov 8, 2018

Only partially - currently only byte[] is supported by the compiler/runtime, other types still have to use regular arrays.

@@ -75,20 +75,20 @@ namespace System
// Return true for all characters below or equal U+00ff, which is ASCII + Latin-1 Supplement.
private static bool IsLatin1(char ch)
{
return (ch <= '\x00ff');
return (uint)ch <= '\x00ff';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to shave a bounds check by comparing with 0x100 here (have not verified it).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, didn't help. Also tried to switch the order. I think there was some special way to write this comparison so that Roslyn encodes it in a way that jit recognises (it's used in number parsing if remember correctly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pentp How do those casts to uint help? Disassembly seems almost identical: SharpLab.io.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping they would help avoid a range check, but it didn't work.

@jkotas
Copy link
Member

jkotas commented Nov 8, 2018

@dotnet-bot test this please

@pentp
Copy link
Author

pentp commented Nov 9, 2018

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test

@jkotas jkotas merged commit 9406df8 into dotnet:master Nov 9, 2018
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Nov 9, 2018
…0864)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@pentp pentp deleted the char branch November 9, 2018 15:36
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Nov 9, 2018
…0864)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Nov 9, 2018
…0864)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@pentp
Copy link
Author

pentp commented Nov 10, 2018

@jkotas CharUnicodeInfoData.cs file mentions a script named GenUnicodeProp.pl that is used to generate the file - where could I find this script?

@jkotas
Copy link
Member

jkotas commented Nov 10, 2018

@jkotas CharUnicodeInfoData.cs file mentions a script named GenUnicodeProp.pl that is used to generate the file - where could I find this script?

@tarekgh ?

@tarekgh
Copy link
Member

tarekgh commented Nov 10, 2018

@pentp you can find the script here https://1drv.ms/f/s!AhP2SwMuINnCjORSJkPdx8NQfMDf9A
If you are going to generate the Unicode data, please try to use the latest UnicodeData.txt file from https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

@jkotas
Copy link
Member

jkotas commented Nov 10, 2018

@tarekgh Would it make sense to commit the script into the repo?

@tarekgh
Copy link
Member

tarekgh commented Nov 10, 2018

I was holding on doing that because I wanted to re-write and also not using using perl. Note that this script produce the desktop generated bin files too.

if you think it is ok to have in the repo with the current state I don't mind putting there.

@jkotas
Copy link
Member

jkotas commented Nov 10, 2018

Perfect is the enemy of good enough.

@pentp
Copy link
Author

pentp commented Nov 10, 2018

I'll try rewriting it in C# (excluding desktop bin parts) and then try to refactor the output to use ROS<byte> indexes (probably split the index levels) and maybe make it endianess neutral also.

MichalStrehovsky pushed a commit to dotnet/corert that referenced this pull request Nov 12, 2018
…0864)

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants