Optimizing Int32 Primitive Parsers and clean up #1616

ahsonkhan · 2017-06-16T10:32:11Z

Also adding some performance tests and additional test cases.
Optimized Invariant UTF-8 and Non-Invariant (for Int32). I would imagine similar optimizations can be applied to the rest.

Here are the key optimizations:

TBD

cc @KrzysztofCwalina, @shiftylogic

Edit: Updated the results with latest changes

ahsonkhan · 2017-06-17T00:58:19Z

cc @jkotas

Any thoughts on how I can improve the performance for small integers/strings for Invariant UTF8 TryParse?

public static bool TryParseInt32(ReadOnlySpan<byte> text, out int value, out int bytesConsumed)

I think we can get some more savings to avoid the ~10% perf regression for a single digit.

jkotas · 2017-06-17T01:24:15Z

First, you should fix the buffer overruns in your unsafe code before trying to figure out how to optimize it.

I do not think you should be using unsafe code for number parsing. The overhead of the bounds check should be neglible - assuming that the JIT works as expected.

jkotas · 2017-06-17T01:25:59Z

And yes ... I can be optimized. You can start by reading the first character from memory no more than once.

jkotas · 2017-06-20T14:58:39Z

@alexandrnikitin is working on improving Int32 parser in CoreLib: dotnet/coreclr#12196. Some of the tricks can be shared.

ahsonkhan · 2017-06-21T05:51:40Z

The PR is ready for review. I have updated the performance results in the original post.

jkotas · 2017-06-21T13:18:23Z

src/System.Text.Primitives/System/Text/Parsing/InvariantSigned.cs

-                if (text[0] == '-')
+                sbyte sign = 1;
+                int index = 0;
+                byte num = text[index];


It will likely work better for this to be int, so that you pay for byte->int zero extension just once.

jkotas · 2017-06-21T13:19:15Z

src/System.Text.Primitives/System/Text/Parsing/InvariantSigned.cs

-                // Parse the first digit separately. If invalid here, we need to return false.
-                int firstDigit = text[indexOfFirstDigit] - 48; // '0'
-                if (firstDigit < 0 || firstDigit > 9)
+                if (num >= '0' && num <= '9')


You can use a trick like (num - '0') <= ('9' '- '0') to save an extra conditional branch in these.

It'd be nice if the JIT would do this (can't remember if there's an issue tracking it). Better yet, the C# compiler or an IL optimizer.

if there's an issue tracking it

I am not able to find one (it was discussed in dotnet/coreclr#4881 and other PRs). @ahsonkhan Could you please make one?

Logged https://github.com/dotnet/roslyn/issues/20375 on the C# side.

Can't remember if there's an issue tracking it

AFAIK there's no such issue. Though I happen to have an experimental JIT change that does this optimization. It remains to be seen what will come out of it.

VSadov · 2017-06-21T20:54:05Z

src/System.Text.Primitives/System/Text/Parsing/InvariantSigned.cs

                if (text.Length < 1)
                {
+                    charsConsumed = 0;


default(int) ? is that just 0 ?

yes, though default implies it isn't necessarily set vs explicitly setting to 0 which suggests a chosen value? So semantically using value = default(int); makes more sense?

Yes, of course. This code will likely change quite a bit, so I will address it then.
I will change it to value=default in the InvariantUTF8.TryParse case, since that is the candidate method I am optimizing.

VSadov · 2017-06-21T21:00:53Z

src/System.Text.Primitives/System/Text/Parsing/Signed.cs

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        private static bool IsDigit(int i)
+        {
+            return i >= 0 && i <= 9;


(uint)i <= 9

Why? The first check should already have guaranteed the value is not negative.

private static bool IsDigitA(int i) { return (uint)i <= 9; } private static bool IsDigitB(int i) { return i >= 0 && i <= 9; }

IsDigitA(Int32)
L0000: cmp ecx, 0x9
L0003: setbe al
L0006: movzx eax, al
L0009: ret

IsDigitB(Int32)
L0000: test ecx, ecx
L0002: jl L000e
L0004: cmp ecx, 0x9
L0007: setle al
L000a: movzx eax, al
L000d: ret
L000e: xor eax, eax
L0010: ret

Ah, I misread as return i >=0 && (uint)i <= 9 rather than just replace with return (uint)i <= 9.

VSadov · 2017-06-21T21:14:29Z

src/System.Text.Primitives/System/Text/Parsing/Signed.cs

+        }
+
+        // If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow.
+        // if parsedValue == (sbyte.MaxValue / 10), any nextDigit greater than 7 or 8 (depending on sign) implies overflow.


the second line of the comment does not match the implementation. not sure which is right. the comment?

Is this the part you are referring to?

// if parsedValue == (sbyte.MaxValue / 10)

Are you suggesting something like this:

// If parsedValue > (sbyte.MaxValue / 10), any more appended digits will cause overflow. // Else, any nextDigit greater than 7 or 8 (depending on sign) implies overflow.

Hmm, just realized we need still need this check (i.e. the comment is correct):

if parsedValue == (sbyte.MaxValue / 10)

Will fix.

VSadov · 2017-06-21T22:28:29Z

src/System.Text.Primitives/System/Text/Parsing/InvariantSigned.cs

+                                if (num >= '0' && num <= '9')
+                                {
+                                    num -= (byte)'0';
+                                    if (WillOverFlow(answer, num, sign)) goto FalseExit;


This seems very complicated. My suggestion would be

separate parsing of input longer than Int32OverflowLength into a separate method. Shorter numbers are much more common.

WillOverFlow seems pretty complex for what it does. a * 10 + b is not so expensive nowdays. Perhaps it is cheaper to just add and see if sign flips ?

Here is basically the idea for parsing numbers longer than Int32OverflowLength, I wonder if it will work better:

static void Main(string[] args) { for (long i = int.MaxValue - 1000; i < int.MaxValue + 10L; i++) { // lets parse "chars" string chars = i.ToString(); // this will need to come from parsing leading '-' int sign = 1; // here is our result int accumulator = 0; foreach (var c in chars) { // will need to check if 'c' is a digit. Assume it is.. var d = c - '0'; // skip leading 0s if ((d | accumulator) == 0) continue; // try add a digit if (!TryAddNoOverflow(ref accumulator, d, sign)) { System.Console.WriteLine("not representable " + (i)); } } } } // accumulate the value and ensure the sign // returns false when number is no longer representable in 32bit static bool TryAddNoOverflow(ref int value, int nextDigit, int sign) { var newValue = value * 10 + (nextDigit * sign); // opposite sign check for {newValue, sign} if ((newValue ^ sign) < 0) { return false; } value = newValue; return true; }

// opposite sign check for {newValue, sign}

This won't always give you the right result.

If value becomes > 2* int.MaxValue when a new digit gets added, the sign doesn't change.

TryAddNoOverflow(214748364, 7, 1); => Returns True, correct.
TryAddNoOverflow(514748364, 7, 1); => Returns True, wrong.

right, i did not realize that. Most of this solution would not be applicable then.

ahsonkhan · 2017-06-23T08:20:51Z

@dotnet-bot test Innerloop OSX10.12 Debug Build and Test

ahsonkhan · 2017-06-23T09:02:30Z

@dotnet-bot test Innerloop OSX10.12 Debug Build and Test

…imizingParser

ahsonkhan · 2017-06-23T20:19:27Z

Here is what I have in mind (other suggestions are incorporated as well:

index++;
if (index < textLength) goto Done;
if (!IsDigit(text[index])) goto Done;

Should line 2 above be if (index <= textLength)? Otherwise, there is an argument out of range exception.
Also, after changing line 2, a test like this would fail: Parse("21474836471") - should return false, it returns true since if (lAnswer > Int32.MaxValue) is false.

ahsonkhan · 2017-06-23T20:19:56Z

@jkotas, @VSadov
Thoughts on a loop unrolled version? It seems quite promising:
https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6

jkotas · 2017-06-23T20:55:18Z

Should line 2 above be if (index <= textLength)?

Yes, it should be if (index >= textLength).

jkotas · 2017-06-23T21:03:45Z

Thoughts on a loop unrolled version? It seems quite promising:

The switch works great for microbenchmarks that are not sensitive to code size (the manual unrolling is like 10x code bloat); and that are always parsing numbers with exact same number of digits - the indirect jump that switch compiles into is well predicted. Once you throw more realistic mix on it - where the number of digits is not always the same and there is a bunch of other code running - it becomes not so good. We used to have similar switch in the memcpy implementation in CoreLib. dotnet/coreclr#9786 got rid of it because of it did not work for real workloads.

So it is tricky to see whether the switch is a good thing on microbenchmarks.

jkotas · 2017-06-23T21:17:21Z

Loop unrolled version without a switch may be interesting though. Something like:

   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   index++;
   answer = 10 * answer + num - '0';

   ... same block repeated ...

   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   index++;
   answer = 10 * answer + num - '0';

   // Potential overflow
   if (index >= textLength) goto Done;
   num = text[index];
   if (!IsDigit(num)) goto Done;
   long lAnswer = (long)answer * 10 + num - '0';
   if (sign < 0)
   {
       if (lAnswer > (long)Int32.MaxInt + 1) goto FalseExit;
   }
   else
   {
       if (lAnswer > (long)Int32.MaxInt) goto FalseExit;
   }
   answer = (int)lAnswer;
   index++;

   if (index >= textLength) goto Done;
   if (!IsDigit(text[index])) goto Done;

   // Guaranteed overflow
   goto FalseExit;

You can also tune the code bloat factor by unrolling just the first few digits - where it improves the microbenchmark significantly, and run a regular loop for the rest - where the unrolling benefits starts to diminish.

jkotas · 2017-06-23T21:21:14Z

src/System.Text.Primitives/System/Text/Parsing/Signed.cs


-            int sign = 1;
-            if ((TextEncoder.Symbol)nextSymbol == TextEncoder.Symbol.MinusSign)
+            ref byte textByte = ref text.DangerousGetPinnableReference();


This should not be using unsafe code either.

jkotas · 2017-06-23T22:56:08Z

Any ideas on how such an optimization would work for Int64 parsing

The following modification should work equally well for both Int32 and Int64.

// Potential overflow 
if (index >= textLength) goto Done;
num = text[index];
if (!IsDigit(num)) goto Done; 
if (answer > Int32.MaxValue/10 + 1) goto FalseExit; // Overflow
answer = answer * 10 + num - '0'; 

if ((uint)answer > (uint)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit; // Overflow
index++;
if (index >= textLength) goto Done;
if (!IsDigit(text[index])) goto Done; 

// Guaranteed overflow 
goto FalseExit;

jkotas · 2017-06-23T23:02:37Z

Is this optimization in the right direction? We save one conditional and substitute with some arithmetic instructions

It looks better - do you see any measurable perf difference? It will be different if you take the unification for Int32/Int64 above. And for 32-bit vs. 64-bit platforms.

It would be interesting to see the code for the full parse method - for both x86 and x64. Could you please post it into a gist once you get something you are happy with?

…imizingParser

ahsonkhan · 2017-06-23T23:17:33Z

The following modification should work equally well for both Int32 and Int64.

Ah, lol, leveraging unsigned int works :)

What about unsigned long parsing?

jkotas · 2017-06-23T23:31:21Z

What about unsigned long parsing?

Check that the answer got smaller (same for uint32 parsing):

if (answer > UInt64.MaxValue/10 + 1) goto FalseExit; // Overflow
ulong answer2 = answer * 10 + num - '0'; 

if (answer2 < answer) goto FalseExit; // Overflow
answer = answer2;
index++;

BTW: long/ulong should not have the loop fully unrolled - having the same bit repeated ~18x would be too much, I think.

ahsonkhan · 2017-06-23T23:37:06Z

BTW: long/ulong should have the loop fully unrolled - having the same bit repeated ~18x would be too much, I think.

should NOT have the loop fully unrolled?

jkotas · 2017-06-23T23:38:00Z

right

ahsonkhan · 2017-06-24T04:25:31Z

It will be different if you take the unification for Int32/Int64 above. And for 32-bit vs. 64-bit platforms.
It would be interesting to see the code for the full parse method - for both x86 and x64. Could you please post it into a gist once you get something you are happy with?

@jkotas, here is the data and disassembly:
x64: https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6#gistcomment-2131566
x86: https://gist.github.com/ahsonkhan/dba0afb3304de15be844111d0f8a3ea6#gistcomment-2131567

ahsonkhan · 2017-06-24T04:38:43Z

Is the PR good to go or should I update the overflow checks to one of the following options?
(This is currently in the PR)

if (lAnswer > (long)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit;

(Option 1)

if (sign < 0)
{
    if (lAnswer > (long)Int32.MaxValue + 1) goto FalseExit;
}
else
{
    if (lAnswer > Int32.MaxValue) goto FalseExit;
}

(Option 2)

if ((uint)answer > (uint)Int32.MaxValue + (-1 * sign + 1) / 2) goto FalseExit;

(Option 3)

if (sign < 0)
{
    if ((uint)answer > (uint)Int32.MaxValue + 1) goto FalseExit;
}
else
{
    if ((uint)answer > Int32.MaxValue) goto FalseExit;
}

jkotas · 2017-06-24T05:03:33Z

update the overflow checks to one of the following options?

I do not think that it matters a lot which option you pick. Option 2 is probably smallest/fastest code.

jkotas · 2017-06-24T05:09:47Z

04BB16A4  cmp         edx,ebx  
04BB16A6  jge         04BB1923  
04BB16AC  cmp         edx,edi  
04BB16AE  jae         04BB1943

This looks like that the caching of length in textLength local is hurting because of the JIT just burns a register for it. I think you should use text.Length instead everywhere, and get rid of the local.

Also, these bounds checks should not be duplicated. I know it was discussed before, but I do not remember the conclusion ... you may need to help the JIT to eliminate the redundant check by using unsigned comparison like if ((uint)index >= (uint)text.Length).

…tnet#1632)

ahsonkhan · 2017-06-26T21:17:15Z

you may need to help the JIT to eliminate the redundant check by using unsigned comparison

That did not help remove the duplicated bounds check.

ahsonkhan · 2017-06-26T21:37:16Z

@shiftylogic, @jkotas - good to go?

shiftylogic · 2017-06-26T21:45:32Z

This code makes assumptions that the values in the SymbolTable.Symbol enum are very specific. I suggest adding a comment to that enum stating that the parsing code depends on these specific value settings and that they should not be touched without being very careful of breaking the parsing code.

ahsonkhan · 2017-06-26T21:56:58Z

I suggest adding a comment to that enum stating that the parsing code depends on these specific value settings and that they should not be touched without being very careful of breaking the parsing code.

I believe the tests would fail if the enums changed. I will definitely add a comment though.
However, this PR did not introduce this dependency on the SymbolTable, it already existed.

* CaterburyPerf * flush/compress * huge changes * space * small * flush/compress * huge changes * space * small * unsaved chagnes * test fix depends on APIchanges * resolve issues * Less alocation, change State * issue * remove unnecessary * resolve * clean up changes, change state * resolve * guidelines * fix bug * resolve issues * change access * small issues

Optimizing some int32 parsers and clean up

171a23a

ahsonkhan requested review from KrzysztofCwalina and shiftylogic June 16, 2017 10:32

dnfclas added the cla-already-signed label Jun 16, 2017

wip - adding tests and fixing impl bugs

befd080

ahsonkhan added 4 commits June 20, 2017 14:44

WIP - new implementation and tests to compare with previous

174e2f8

WIP - fixing non-invariant parser bugs and adding tests

568e736

Cleaning up functional and performance tests and removing old code.

c8f775d

Removing unnecessary using directive

e5c0cac

jkotas reviewed Jun 21, 2017

View reviewed changes

ahsonkhan mentioned this pull request Jun 21, 2017

Api brotli changes #1621

Merged

VSadov reviewed Jun 21, 2017

View reviewed changes

Addressing PR comments and adding more tests.

09cd24c

Merge branch 'master' of https://github.com/dotnet/corefxlab into Opt…

0e0ff41

…imizingParser

jkotas reviewed Jun 23, 2017

View reviewed changes

Addressing PR comments and adding loop unrolling.

67183fe

Merge branch 'master' of https://github.com/dotnet/corefxlab into Opt…

ba83ad1

…imizingParser

Fixing issues missed after merge

91a6191

ahsonkhan added 2 commits June 23, 2017 18:47

Fixing non-invariant int32 parser

65b8221

Addressing PR comment, removing unused DangerousGetPinnableReference

cfa230f

Cleanup and updating test helpers

86f4969

shiftylogic and others added 2 commits June 26, 2017 14:08

Patch to dotnet install scripts for downloading via new blob URL. (do…

e89a809

…tnet#1632)

Removing text.Length cache and changing to unsigned comparison

f4c8e83

ahsonkhan and others added 3 commits June 26, 2017 16:51

Adding comment and removing unnecessary math operations using 0 (D0)

b7ce275

Merge branch 'master' into OptimizingParser

b7800f8

ahsonkhan merged commit 5dbb7ad into dotnet:master Jun 27, 2017

ahsonkhan deleted the OptimizingParser branch June 27, 2017 03:33

ahsonkhan mentioned this pull request Jun 28, 2017

Combining the T[] and OwnedBuffer<T> fields of Buffer<T> into a single object #1634

Merged

ahsonkhan mentioned this pull request Nov 8, 2017

Unroll loop for Utf8Parser unsigned integer parsers dotnet/corefx#25130

Merged

VSadov mentioned this pull request Jan 31, 2020

Consider detecting and optimizing common range check patterns dotnet/runtime#13347

Closed

Optimizing Int32 Primitive Parsers and clean up #1616

Optimizing Int32 Primitive Parsers and clean up #1616

Conversation

ahsonkhan commented Jun 16, 2017 • edited Loading

ahsonkhan commented Jun 17, 2017

jkotas commented Jun 17, 2017

jkotas commented Jun 17, 2017

jkotas commented Jun 20, 2017

ahsonkhan commented Jun 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan Jun 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan Jun 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan commented Jun 23, 2017

ahsonkhan commented Jun 23, 2017

ahsonkhan commented Jun 23, 2017

ahsonkhan commented Jun 23, 2017

jkotas commented Jun 23, 2017

jkotas commented Jun 23, 2017

jkotas commented Jun 23, 2017

Choose a reason for hiding this comment

jkotas commented Jun 23, 2017

jkotas commented Jun 23, 2017

ahsonkhan commented Jun 23, 2017

jkotas commented Jun 23, 2017 • edited Loading

ahsonkhan commented Jun 23, 2017

jkotas commented Jun 23, 2017

ahsonkhan commented Jun 24, 2017

ahsonkhan commented Jun 24, 2017 • edited Loading

jkotas commented Jun 24, 2017

jkotas commented Jun 24, 2017

ahsonkhan commented Jun 26, 2017 • edited Loading

ahsonkhan commented Jun 26, 2017 • edited Loading

shiftylogic commented Jun 26, 2017

ahsonkhan commented Jun 26, 2017

ahsonkhan commented Jun 16, 2017 •

edited

Loading

ahsonkhan Jun 21, 2017 •

edited

Loading

ahsonkhan Jun 21, 2017 •

edited

Loading

jkotas commented Jun 23, 2017 •

edited

Loading

ahsonkhan commented Jun 24, 2017 •

edited

Loading

ahsonkhan commented Jun 26, 2017 •

edited

Loading

ahsonkhan commented Jun 26, 2017 •

edited

Loading