New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxing Cache? #8423

Closed
benaadams opened this Issue Dec 2, 2016 · 16 comments

Comments

Projects
None yet
6 participants
@benaadams
Collaborator

benaadams commented Dec 2, 2016

Revisiting #111

Using repo example manged code; shows pre-cached boxes for common values can improve the performance of boxing (see below). Is there a way of building this into the jit or runtime?

Or is this just madness?

Integer Box Caching

                   Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------- |---------- |---------- |---------- |------- |--------------- |
      Int32UncachedBoxing | 7.4069 ns | 0.0498 ns | 7.3884 ns |   1.00 | 135,008,742.93 |
        Int32CachedBoxing | 5.3065 ns | 0.0495 ns | 5.3282 ns |   0.72 | 188,448,857.58 |
 Int32CachedBoxExtenstion | 6.7465 ns | 0.0880 ns | 6.7784 ns |   0.91 | 148,226,092.97 |

Boolean Box Caching

                  Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------ |---------- |---------- |---------- |------- |--------------- |
      BoolUncachedBoxing | 7.3923 ns | 0.0391 ns | 7.3866 ns |   1.00 | 135,276,250.40 |
        BoolCachedBoxing | 4.5859 ns | 0.0310 ns | 4.5954 ns |   0.62 | 218,057,656.08 |
 BoolCachedBoxExtenstion | 4.5874 ns | 0.0428 ns | 4.5988 ns |   0.62 | 217,986,777.33 |

Suggestion, cache boxes for:

bool: true, falue
byte: 0 to 255
char: 0 to 127
int/short: -128 to 127

Maybe others?

Gave it a go benaadams@2f7726d, but not entirely sure what I'm doing, so you probably have a bunch of really weird Dr Watson reports...

Edit Updated with metrics post #8423 (comment)

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Dec 2, 2016

Contributor

I would guess that this makes boxing slower. Who's benefiting from that? 👽s?

Contributor

mikedn commented Dec 2, 2016

I would guess that this makes boxing slower. Who's benefiting from that? 👽s?

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Dec 2, 2016

Collaborator

Even if it was faster, it is a bad idea since boxed value types can have their values modified. Not expressible in C# (without interface tricks), but possible in IL and C++/CLI:

void Test(Object^ obj)
{
	Int32^ i = (Int32^)obj;
	*i = 30;
}
Collaborator

jakobbotsch commented Dec 2, 2016

Even if it was faster, it is a bad idea since boxed value types can have their values modified. Not expressible in C# (without interface tricks), but possible in IL and C++/CLI:

void Test(Object^ obj)
{
	Int32^ i = (Int32^)obj;
	*i = 30;
}
@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Dec 2, 2016

Member

Right, I was about the write the same comment as @Janiels

ldc.i4 0
box bool // cached box for false
unbox bool // address of the bool in the cached box
ldc.i4 1
stobj bool // cached box for false is true now
Member

jkotas commented Dec 2, 2016

Right, I was about the write the same comment as @Janiels

ldc.i4 0
box bool // cached box for false
unbox bool // address of the bool in the cached box
ldc.i4 1
stobj bool // cached box for false is true now
@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Dec 2, 2016

Contributor

Not expressible in C# (without interface tricks),

Or reflection:

object x = 1;
x.GetType().GetField("m_value", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(x, 42);
Console.WriteLine(x); // prints 42
Contributor

mikedn commented Dec 2, 2016

Not expressible in C# (without interface tricks),

Or reflection:

object x = 1;
x.GetType().GetField("m_value", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(x, 42);
Console.WriteLine(x); // prints 42
@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 2, 2016

Collaborator

Is changing the value of a boxed value directly that common?

To prevent subtle errors, could allocate boxes from a single page, then mark them as PAGE_READONLY (using VirtualProtect or mprotect) and you'd get an access violation if you tried... 😉

Obviously if its common practice then couldn't do that...

Collaborator

benaadams commented Dec 2, 2016

Is changing the value of a boxed value directly that common?

To prevent subtle errors, could allocate boxes from a single page, then mark them as PAGE_READONLY (using VirtualProtect or mprotect) and you'd get an access violation if you tried... 😉

Obviously if its common practice then couldn't do that...

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Dec 2, 2016

Contributor

Is that common?

I'd say not. But is it worth the risk? What kind of code does a lot of boxing, uses a limited enough range of numeric values and expects good performance?

Contributor

mikedn commented Dec 2, 2016

Is that common?

I'd say not. But is it worth the risk? What kind of code does a lot of boxing, uses a limited enough range of numeric values and expects good performance?

@jakobbotsch

This comment has been minimized.

Show comment
Hide comment
@jakobbotsch

jakobbotsch Dec 2, 2016

Collaborator

unbox is actually specced to return a controlled-mutability managed pointer in ECMA-335. Reading about those it means that the unboxed pointer is actually read-only for the types you outlined, so that makes this slightly more interesting...

I don't really think it matters all that much, but spec-wise, modifying these unboxed values seems to be disallowed.

Collaborator

jakobbotsch commented Dec 2, 2016

unbox is actually specced to return a controlled-mutability managed pointer in ECMA-335. Reading about those it means that the unboxed pointer is actually read-only for the types you outlined, so that makes this slightly more interesting...

I don't really think it matters all that much, but spec-wise, modifying these unboxed values seems to be disallowed.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 2, 2016

Collaborator

What kind of code does a lot of boxing, uses a limited enough range of numeric values

String.Format dotnet/corefx#1514
+Common usages of string interpolation
Passing values to SQL dotnet/corefx#8955
Reading values from SQL SqlClient/SqlBuffer.cs
Structured logging aspnet/Logging#523
Reflection dotnet/corefx#14021
Typed Parsing System/Json/JavaScriptReader.cs
Typed Json Data Newtonsoft.Json/JsonReader.cs

expects good performance

Well, that works both ways; introducing a cache would likely slow down boxing slightly; but ease up on GC. For reference I think the asm JIT_BoxFastMP_InlineGetThread is the fastest version of boxing?

Is common practice in Java; so

Integer i0 = 127;
Integer i1 = 127;
System.out.println(i0 == i1); // Prints true, reference equality

Integer i2 = 128;
Integer i3 = 128;
System.out.println(i2 == i3); // Prints false, different references

Which confuses people; but Integer has a different equality to int; which I think also confuses people. Whereas in C# for reference equality it would need to be a clearer object == object test; which makes more sense that it's reference equals.

Collaborator

benaadams commented Dec 2, 2016

What kind of code does a lot of boxing, uses a limited enough range of numeric values

String.Format dotnet/corefx#1514
+Common usages of string interpolation
Passing values to SQL dotnet/corefx#8955
Reading values from SQL SqlClient/SqlBuffer.cs
Structured logging aspnet/Logging#523
Reflection dotnet/corefx#14021
Typed Parsing System/Json/JavaScriptReader.cs
Typed Json Data Newtonsoft.Json/JsonReader.cs

expects good performance

Well, that works both ways; introducing a cache would likely slow down boxing slightly; but ease up on GC. For reference I think the asm JIT_BoxFastMP_InlineGetThread is the fastest version of boxing?

Is common practice in Java; so

Integer i0 = 127;
Integer i1 = 127;
System.out.println(i0 == i1); // Prints true, reference equality

Integer i2 = 128;
Integer i3 = 128;
System.out.println(i2 == i3); // Prints false, different references

Which confuses people; but Integer has a different equality to int; which I think also confuses people. Whereas in C# for reference equality it would need to be a clearer object == object test; which makes more sense that it's reference equals.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Dec 2, 2016

Contributor

String.Format

String.Format is far from having stellar performance anyway. And not because of boxing. If you try something like:

int x = 42;
for (int i = 0; i < 10000000; i++)
{
    String.Format("hello {0}", x);
}

and change the type of x from int to object you won't notice any difference.

but ease up on GC

Eh, the old story about GC. Let's cache everything because GC can't handle it. But GC handles short lived objects pretty well so it's not clear how much this will "ease up on GC".

Is common practice in Java; so

Considering that even with generics Java can't store integers in collections without boxing I'd say that it needs such caching more than .NET does. Besides, that Stackoverflow question is a good enough reason to not attempt to implement such a trick in .NET. Because this is a trick and a rather ugly one.

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and a couple of WPF specific enums (AFAIR Visibility is one those enums). It does this due to the way it stores property values - always in boxed form. That's a case where having just 2 instances of bool instead of a zillion does actually save something - memory, GC time etc.

Contributor

mikedn commented Dec 2, 2016

String.Format

String.Format is far from having stellar performance anyway. And not because of boxing. If you try something like:

int x = 42;
for (int i = 0; i < 10000000; i++)
{
    String.Format("hello {0}", x);
}

and change the type of x from int to object you won't notice any difference.

but ease up on GC

Eh, the old story about GC. Let's cache everything because GC can't handle it. But GC handles short lived objects pretty well so it's not clear how much this will "ease up on GC".

Is common practice in Java; so

Considering that even with generics Java can't store integers in collections without boxing I'd say that it needs such caching more than .NET does. Besides, that Stackoverflow question is a good enough reason to not attempt to implement such a trick in .NET. Because this is a trick and a rather ugly one.

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and a couple of WPF specific enums (AFAIR Visibility is one those enums). It does this due to the way it stores property values - always in boxed form. That's a case where having just 2 instances of bool instead of a zillion does actually save something - memory, GC time etc.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 2, 2016

Collaborator

String.Format is far from having stellar performance anyway.

Added a few more examples; longer lived would be typed documents in memory; Json, Csv, Xaml etc as your example of WPF

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and ...

And people create their own boxing caches again and again dotnet/corefx#6533, LinqExpressions

Collaborator

benaadams commented Dec 2, 2016

String.Format is far from having stellar performance anyway.

Added a few more examples; longer lived would be typed documents in memory; Json, Csv, Xaml etc as your example of WPF

This is something that user code can do reasonably well when truly needed. For example WPF does this for bool and ...

And people create their own boxing caches again and again dotnet/corefx#6533, LinqExpressions

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Dec 2, 2016

Member

Eh, the old story about GC. Let's cache everything because GC can't handle it.

Exactly. You do not actually care whether there are thousands GCs or no GC. What you care about is how fast a thing runs and what the GC pauses are. And caching short-lived objects does not improve either.

And people create their own boxing caches again and again

People who do not measure do... . The example from Expressions make sense because of these objects seem to be long-lived. It is a rare case. The CustomAttributeDecoder example makes less sense.

Member

jkotas commented Dec 2, 2016

Eh, the old story about GC. Let's cache everything because GC can't handle it.

Exactly. You do not actually care whether there are thousands GCs or no GC. What you care about is how fast a thing runs and what the GC pauses are. And caching short-lived objects does not improve either.

And people create their own boxing caches again and again

People who do not measure do... . The example from Expressions make sense because of these objects seem to be long-lived. It is a rare case. The CustomAttributeDecoder example makes less sense.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 2, 2016

Collaborator

Maybe I should have led with a repo you can try locally and some metrics for it

Integer Box Caching

                   Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------- |---------- |---------- |---------- |------- |--------------- |
      Int32UncachedBoxing | 7.4069 ns | 0.0498 ns | 7.3884 ns |   1.00 | 135,008,742.93 |
        Int32CachedBoxing | 5.3065 ns | 0.0495 ns | 5.3282 ns |   0.72 | 188,448,857.58 |
 Int32CachedBoxExtenstion | 6.7465 ns | 0.0880 ns | 6.7784 ns |   0.91 | 148,226,092.97 |

Boolean Box Caching

                  Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------ |---------- |---------- |---------- |------- |--------------- |
      BoolUncachedBoxing | 7.3923 ns | 0.0391 ns | 7.3866 ns |   1.00 | 135,276,250.40 |
        BoolCachedBoxing | 4.5859 ns | 0.0310 ns | 4.5954 ns |   0.62 | 218,057,656.08 |
 BoolCachedBoxExtenstion | 4.5874 ns | 0.0428 ns | 4.5988 ns |   0.62 | 217,986,777.33 |

Updated summary

Collaborator

benaadams commented Dec 2, 2016

Maybe I should have led with a repo you can try locally and some metrics for it

Integer Box Caching

                   Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------- |---------- |---------- |---------- |------- |--------------- |
      Int32UncachedBoxing | 7.4069 ns | 0.0498 ns | 7.3884 ns |   1.00 | 135,008,742.93 |
        Int32CachedBoxing | 5.3065 ns | 0.0495 ns | 5.3282 ns |   0.72 | 188,448,857.58 |
 Int32CachedBoxExtenstion | 6.7465 ns | 0.0880 ns | 6.7784 ns |   0.91 | 148,226,092.97 |

Boolean Box Caching

                  Method |      Mean |    StdDev |    Median | Scaled |            RPS |
------------------------ |---------- |---------- |---------- |------- |--------------- |
      BoolUncachedBoxing | 7.3923 ns | 0.0391 ns | 7.3866 ns |   1.00 | 135,276,250.40 |
        BoolCachedBoxing | 4.5859 ns | 0.0310 ns | 4.5954 ns |   0.62 | 218,057,656.08 |
 BoolCachedBoxExtenstion | 4.5874 ns | 0.0428 ns | 4.5988 ns |   0.62 | 217,986,777.33 |

Updated summary

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Dec 2, 2016

Member

Microbenchmark is good, but it does not tell the full story. This would need to be looked at in the context of real workloads like Roslyn or ASP.NET.

Member

jkotas commented Dec 2, 2016

Microbenchmark is good, but it does not tell the full story. This would need to be looked at in the context of real workloads like Roslyn or ASP.NET.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 3, 2016

Collaborator

Roslyn caches boxes for true, false, all zeros, int32 1, chars 0 - 127 https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/Collections/Boxes.cs

But I take your point

Collaborator

benaadams commented Dec 3, 2016

Roslyn caches boxes for true, false, all zeros, int32 1, chars 0 - 127 https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/Collections/Boxes.cs

But I take your point

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Dec 3, 2016

Contributor

And people create their own boxing caches again and again dotnet/corefx#6533, LinqExpressions

That might mean that the framework should provide some kind of mechanism to help with this issue. But it doesn't necessarily mean that the mechanism should be built into the boxing operation itself. A Box method that can be called from user code as needed might be just enough. People who need this and are happy with what it offers will use it. Those who aren't happy with what it offers may still do their own thing.

Contributor

mikedn commented Dec 3, 2016

And people create their own boxing caches again and again dotnet/corefx#6533, LinqExpressions

That might mean that the framework should provide some kind of mechanism to help with this issue. But it doesn't necessarily mean that the mechanism should be built into the boxing operation itself. A Box method that can be called from user code as needed might be just enough. People who need this and are happy with what it offers will use it. Those who aren't happy with what it offers may still do their own thing.

@benaadams

This comment has been minimized.

Show comment
Hide comment
@benaadams

benaadams Dec 3, 2016

Collaborator

That might mean that the framework should provide some kind of mechanism to help with this issue.

From my benchmark tests; what I was attempting to do was the wrong approach as it was a (virtual) call in the runtime; and about half of the gain is already lost if its not inlined.

So (if automatic) it would either need to be a code replace in jit or something like a Roslyn generator - depending how that plays out? dotnet/roslyn#5561

And... I'd guess the second would be preferred as it would be a user choice/reference install?

Collaborator

benaadams commented Dec 3, 2016

That might mean that the framework should provide some kind of mechanism to help with this issue.

From my benchmark tests; what I was attempting to do was the wrong approach as it was a (virtual) call in the runtime; and about half of the gain is already lost if its not inlined.

So (if automatic) it would either need to be a code replace in jit or something like a Roslyn generator - depending how that plays out? dotnet/roslyn#5561

And... I'd guess the second would be preferred as it would be a user choice/reference install?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment