New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of BigInteger.ToString("x") #25353

Merged
merged 5 commits into from Nov 20, 2017

Conversation

@stephentoub
Member

stephentoub commented Nov 19, 2017

Benchmark:

using System;
using System.Numerics;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
[InProcess]
public class Program
{
    public static void Main() => BenchmarkRunner.Run<Program>();

    private static BigInteger s_smallBigInt = new BigInteger(100);
    private static BigInteger s_largeBigInt = BigInteger.Parse(new string('9', 2000));

    [Benchmark]
    public void BigIntegerToString_Small()
    {
        BigInteger small = s_smallBigInt;
        for (int i = 0; i < 100_000; i++) small.ToString("x");
    }

    [Benchmark]
    public void BigIntegerToString_Large()
    {
        BigInteger large = s_largeBigInt;
        for (int i = 0; i < 1_000; i++) large.ToString("x");
    }
}

Before:

 |                   Method |     Mean |     Error |    StdDev |     Gen 0 | Allocated |
 |------------------------- |---------:|----------:|----------:|----------:|----------:|
 | BigIntegerToString_Small | 20.01 ms | 0.4187 ms | 0.6519 ms | 6843.7500 |  27.47 MB |
 | BigIntegerToString_Large | 29.62 ms | 0.3435 ms | 0.3045 ms | 8468.7500 |  33.91 MB |

After:

 |                   Method |     Mean |     Error |    StdDev |     Gen 0 | Allocated |
 |------------------------- |---------:|----------:|----------:|----------:|----------:|
 | BigIntegerToString_Small | 9.445 ms | 0.1153 ms | 0.0834 ms |  750.0000 |   3.05 MB |
 | BigIntegerToString_Large | 3.946 ms | 0.0603 ms | 0.0534 ms | 1000.0000 |   4.01 MB |

So for this small value, an ~2x throughput improvement, and for this large value, an ~8x throughput improvement, and for both, an ~9x reduction in allocation.

cc: @bartonjs

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System.Buffers;

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

is this worth tests of its own?

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

especially as the growth path is not getting exercised, unless tests need more than 128 hex chars?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

unless tests need more than 128 hex chars?

There are tests that need more than 128 hex chars.

is this worth tests of its own?

I can add some.

bool success = _chars.TryCopyTo(poolArray);
Debug.Assert(success);
char[] toReturn = _arrayToReturnToPool;

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

why not just

 +            if (_arrayToReturnToPool != null)
 +            {
 +                ArrayPool<char>.Shared.Return(_arrayToReturnToPool);
 +            }
 +            _chars = _arrayToReturnToPool = poolArray;

We are only doing the "cache field as local" in the most perf sensitive cases?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

As a defensive mechanism, because it's too easy to accidentally get something wrong returning a buffer to the pool while still holding on to that buffer. There's essentially no difference in perf between your version and mine, but mine guarantees our field no longer has a reference to the buffer when it's returned.

_pos = 0;
}
public string GetString()

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

Why not ToString()? To avoid virtual dispatch?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

@jkotas, are virtual overrides supported on ref-like types? I thought "no" (which is why I used a different name, and I didn't want to use "new" to avoid confusion), but a simple test shows it works. Is that supported, or is it playing with fire?

This comment has been minimized.

@jkotas

jkotas Nov 20, 2017

Member

Yes, they are supported.

They are just always called non-virtually because of you cannot get a boxed copy.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Yes, they are supported.

Ok, good to know. I'll just make it ToString then.

{
internal ref struct ValueStringBuilder
{
private char[] _arrayToReturnToPool;

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

it would be more readable if instead of char[] and Span you just had Span plus a boolean indicating whether it wrapped an array from the pool. But I guess there is no reasonable way to get the array out of the Span? (Perhaps with some Unsafe magic)

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

it would be more readable if instead of char[] and Span you just had Span plus a boolean indicating whether it wrapped an array from the pool

You can't get the array out of the span. And any efforts to do so would be way less readable ;)

Debug.Assert(format == 'x' || format == 'X');
// Get the bytes that make up the BigInteger.
Span<byte> bits = stackalloc byte[64]; // arbitrary limit to switch from stack to heap

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

can we call this bytes, not bits? I was confused at first.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

This is the same name that was there; I've just changed its type from byte[] to Span<byte>. Someone more familiar with BigInteger can choose to rename this as part of a subsequent cleanup if desired, but this bits terminology is used in other places.

bits.Slice(0, bytesWritten) :
value.ToByteArray();
Span<char> stackSpace = stackalloc char[128];

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

if bits.Length is large, we can predict that 128 won't be enough. We could tell VSB a good guess at the capacity we will need, and it will save it growing (potentially pathologically, reallocating every character)

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

potentially pathologically, reallocating every character

As I noted above, that will currently only happen after 2M characters.

We could tell VSB a good guess at the capacity we will need

I can add a method to be called ahead of the loops below.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

I can add a method to be called ahead of the loops below.

Actually, I already have that: AppendSpan. I can just use it here.

[MethodImpl(MethodImplOptions.NoInlining)]
private void GrowAndAppend(char c)
{
Grow(1);

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

reallocating and copying for just one extra char seems potentially inefficient if there are many one char appends. why not use a doubling strategy, eg?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

why not use a doubling strategy, eg?

It does use a doubling strategy, it's just not explicit. The call site is saying that requiredAdditionalCapacity==1, and it's up to the Grow method to ensure there's enough space. The Grow method then rents a buffer of at least the necessary size from the default ArrayPool, which itself employs a doubling scheme in that it only has buckets for arrays of powers of 2 lengths.

We can be explicit about it in Grow if desired. What I said above is true for the vast majority of use cases, but the default ArrayPool currently only has buffers for sizes up to 2MB, so if you used ValueStringBuilder to create strings larger than that, you would end up with lots of growing by 1 char.

value.ToByteArray();
Span<char> stackSpace = stackalloc char[128];
var sb = new ValueStringBuilder(stackSpace);

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

given the semantics are not quite the same, and ToString doesn't work, maybe vsb instead of sb?

}
private static char GetUpperCaseHexValue(int i) => i < 10 ? (char)(i + '0') : (char)(i - 10 + 'A');

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

might be clearer to just manually inline both of these. jit will duplicate them anyway presumably

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Each is used twice. Why write the code twice?

public bool TryCopyTo(Span<char> destination, out int charsWritten)
{
if (_pos > destination.Length)

This comment has been minimized.

@danmosemsft

danmosemsft Nov 20, 2017

Member

could be just

 +            if (! _chars.Slice(0, _pos).TryCopyTo(destination))
 +            {
 +                charsWritten = 0;
 +                return false;
               }
             

as span repeats the check ?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

I'll change it. The current approach slightly benefits the case where the destination isn't long enough, whereas your suggestion slightly benefits the case where the destination is long enough, and presumably the latter is more common.

}
Span<char> dst = _chars.Slice(_pos, length);
for (int i = 0; i < dst.Length; i++)

This comment has been minimized.

@davidfowl

davidfowl Nov 20, 2017

Contributor
new Span<char>(value, length).CopyTo(dst)

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

This method is currently only used in one place, in coreclr, with a call site that passes in at most 10 characters but generally just 1 or a handful. The suggest approach to copying is much faster for larger pieces of memory, but it's slower for just a few characters; on my machine, the crossover point is at ~10 chars, with it being ~3x slower for 1 or 2 chars, so I've left it as is for now. There's an open issue dotnet/coreclr#15076 about the overheads of CopyTo.

_pos += count;
}
public unsafe void Append(char* value, int length)

This comment has been minimized.

@davidfowl

davidfowl Nov 20, 2017

Contributor

Replace this with Append(Span<char>) instead?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

I only added Append methods where existing callers needed them (as part of swapping this in for StringBuilder), and no existing callers had spans. Making this Span<char> wouldn't help perf. If this type is ever exposed, its surface area will need to be rationalized.

@@ -29,14 +29,19 @@
<Compile Include="$(CommonPath)\System\Globalization\FormatProvider.Number.cs">
<Link>System\Globalization\FormatProvider.Number.cs</Link>

This comment has been minimized.

@jkotas

jkotas Nov 20, 2017

Member

FormatProvider.Number.cs may benefit from ValueStringBuilder too. It is similar to Number.Formatting.cs in CoreCLR.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Yup. That's necessary in support of #25336.

}
int remaining = _pos - index;
_chars.Slice(index, remaining).CopyTo(_chars.Slice(index + count));

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

If index > _pos, is the exception here useful?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

You get an argument or index out of range exception, I forget which. But this is all internal code and should never incur exceptions.

for (int i = 0; i < dst.Length; i++)
{
dst[i] = c;
}

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

This is just dst.Fill(c), right? You've used Fill elsewhere, why not here?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

I actually should remove the Fill I added elsewhere; at the moment it'll make it difficult to share these changes back to coreclr, as Fill is currently an extension method implemented in corefx. More generally, though, I didn't use Fill here because based on current usage count is expected to be very small, and this manual loop is currently faster for very small counts.

Append(c);
}
[MethodImpl(MethodImplOptions.NoInlining)]

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

Why NoInlining? (Also, why all the AggressiveInlining?)

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Why NoInlining?

Because it's the slow path, and exactly because other things are marked AggressiveInlining... we don't want to have all of the slow path code inlined.

Also, why all the AggressiveInlining?

Because these operations are done on hot paths very frequently, e.g. per char. We want it to be as close to just storing a char into an array as possible.

}
[MethodImpl(MethodImplOptions.NoInlining)]
private void Grow(int requiredAdditionalCapacity)

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

FWIW, I like "EnsureCapacity" better than "Grow". Grow sounds like a command ("get bigger"). EnsureCapacity just says "I need this much, I trust you to do the right thing".

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Grow sounds like a command ("get bigger")

That's what it is. There's even an assert below to make sure it's only being called when getting bigger is warranted.

{
Debug.Assert(requiredAdditionalCapacity > _chars.Length - _pos);
char[] poolArray = ArrayPool<char>.Shared.Rent(Math.Max(_pos + requiredAdditionalCapacity, _chars.Length * 2));

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

Is doubling actually the right answer? Growing by some sort of fixed "page" size seems more appropriate; especially if we can get the capacity predicted up front.

Maybe it doesn't matter here for the predicted load; but I get skeptical when 16k becomes 32k and the final answer was 16k+1.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Is doubling actually the right answer?

As I noted in response to Dan's question earlier, doubling is what we'll get anyway due to the default ArrayPool implementation storing only arrays of power-of-two sizes.

_chars = _arrayToReturnToPool = poolArray;
if (toReturn != null)
{
ArrayPool<char>.Shared.Return(toReturn);

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

Do we have guidelines on clearing things before returning? In System.Security.Cryptography the answer is "always clear"; but I can appreciate that I have to be more defensive than the general case.

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

We've iterated on that a bit, but our current stance is that you only need to clear if you want to take extra precaution because you know the data to be very sensitive, e.g. in cryptography. In general, though, we haven't been clearing. It's all same process, single appdomain, etc. There's no trust boundary.

@@ -514,11 +502,18 @@ internal static char ParseFormatSpecifier(string format, out int digits)
return (char)0; // Custom format
}
private static string FormatBigIntegerToHexString(BigInteger value, char format, int digits, NumberFormatInfo info)
private static string FormatBigIntegerToHex(BigInteger value, char format, int digits, NumberFormatInfo info)

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

It's still returning a string... any particular driver of the rename?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

a) Nothing else in the class that returns a String includes that in the name, e.g. FormatBigInteger rather than FormatBigIntegerString.
b) It'll probably change anyway for #25336

{
byte b = bits[cur--];
chars[charsPos++] = GetLowerCaseHexValue(b >> 4);
chars[charsPos++] = GetLowerCaseHexValue(b & 0xF);

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member

Instead of two functions and splitting the format; would it make more sense to have a local char for 'A' vs 'a'?

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

Instead of two functions and splitting the format; would it make more sense to have a local char for 'A' vs 'a'?

Can you elaborate?

This comment has been minimized.

@bartonjs

bartonjs Nov 20, 2017

Member
var sb = new ValueStringBuilder(stackSpace);
char alphaBase = format == 'x' ? 'a' : 'A';

...
                 if (head < 0x08 || clearHighF)
                 {
                     sb.Append(head < 10 ? (char)(head + '0') : (char)((head-10) + alphaBase));
                 }

            ...

            chars[charPos++] = GetHexValue(b >> 4, alphaBase);
            ...

private static char GetHexValue(int i, char alphaBase) => i < 10 ? (char)(i + '0') : (char)(i - 10 + alphaBase);

Depending on how optimized string or span indexing is, it might be better to dump the ternary expression math and just index into two strings.

private const string LowerHex = "0123456789abcdef";
private const string UpperHex = "0123456789ABCDEF";

...
var sb = new ValueStringBuilder(stackSpace);
string basis = format == 'x' ? LowerHex : UpperHex;

chars[charsPos++] = basis[b >> 4];
chars[charsPos++] = basis[b & 0xF];

This comment has been minimized.

@stephentoub

stephentoub Nov 20, 2017

Member

I like that string version. Will switch.

@stephentoub

This comment has been minimized.

Member

stephentoub commented Nov 20, 2017

Thanks for the feedback.

@stephentoub stephentoub merged commit 785a57f into dotnet:master Nov 20, 2017

12 checks passed

CROSS Check Build finished.
Details
Linux arm Release Build Build finished.
Details
Linux x64 Release Build Build finished.
Details
NETFX x86 Release Build Build finished.
Details
OSX x64 Debug Build Build finished.
Details
Packaging All Configurations x64 Debug Build Build finished.
Details
Tizen armel Debug Build Build finished.
Details
UWP CoreCLR x64 Debug Build Build finished.
Details
UWP NETNative x86 Release Build Build finished.
Details
WIP ready for review
Details
Windows x64 Debug Build Build finished.
Details
Windows x86 Release Build Build finished.
Details

@stephentoub stephentoub deleted the stephentoub:biginteger_tryformat branch Nov 20, 2017

@karelz karelz added this to the 2.1.0 milestone Dec 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment