New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Discussion) Lightweight Boxing? #111

Closed
AdamSpeight2008 opened this Issue Feb 6, 2015 · 43 comments

Comments

Projects
None yet
10 participants
@AdamSpeight2008
Contributor

AdamSpeight2008 commented Feb 6, 2015

The current box used in IL / .net is more akin to a Shipping Container.(20x )
If was possible to have a less costly and lightweight form of box. (1x .. 3x)

What would be needed?

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 6, 2015

Contributor

I don't think this is really possible - basically, what boxing does - it allocates memory on managed heap (as opposed to thread stack where structs live) and copies struct value to there. So, you must pay for allocation, copying and garbage collection (just like you would if you use object instead of struct in first place, or even more - object is allocated only once, while struct may be boxed many times during it's lifetime).

It's likely possible make some optimization here, but I highly doubt it will be 10x (or else it would be done by now)

Contributor

Alexx999 commented Feb 6, 2015

I don't think this is really possible - basically, what boxing does - it allocates memory on managed heap (as opposed to thread stack where structs live) and copies struct value to there. So, you must pay for allocation, copying and garbage collection (just like you would if you use object instead of struct in first place, or even more - object is allocated only once, while struct may be boxed many times during it's lifetime).

It's likely possible make some optimization here, but I highly doubt it will be 10x (or else it would be done by now)

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 6, 2015

Contributor

It is possible to convert certain heap allocations to stack allocations but the analysis required to do so is pretty expensive for a JIT compiler. It's more likely to see this optimization in an AOT compiler, such as .NET Native.

Contributor

mikedn commented Feb 6, 2015

It is possible to convert certain heap allocations to stack allocations but the analysis required to do so is pretty expensive for a JIT compiler. It's more likely to see this optimization in an AOT compiler, such as .NET Native.

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 6, 2015

Contributor

I think that Java has this optimization is JIT - instead of having "classes" and "structs" they have JIT that is able to convert heap allocations to stack allocations when possible.

As of AOT - actually performance tests show that now it is inferior to JIT. And, it theory, JIT has wider set of optimizations since it is able to see whole set of libraries loaded and make optimizations with knowledge of inner workings of that code, while AOT in general can't do that.
I've seen much better AOT compilers - believe it or not, but Adobe AIR (desktop version of Flash) has absolutely brilliant AOT for iOS (well, it's slow as hell (or at least was), but other than that it's great). That compiler just bakes app, runtime and all required libraries into one huge executable, so it's able to optimize very well.

Contributor

Alexx999 commented Feb 6, 2015

I think that Java has this optimization is JIT - instead of having "classes" and "structs" they have JIT that is able to convert heap allocations to stack allocations when possible.

As of AOT - actually performance tests show that now it is inferior to JIT. And, it theory, JIT has wider set of optimizations since it is able to see whole set of libraries loaded and make optimizations with knowledge of inner workings of that code, while AOT in general can't do that.
I've seen much better AOT compilers - believe it or not, but Adobe AIR (desktop version of Flash) has absolutely brilliant AOT for iOS (well, it's slow as hell (or at least was), but other than that it's great). That compiler just bakes app, runtime and all required libraries into one huge executable, so it's able to optimize very well.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 6, 2015

Contributor

As of AOT - actually performance tests show that now it is inferior to JIT.

Presumably you're referring to .NET Native, not AOT in general. I'm not sure what has to do with anything, it was just an example in case some readers don't know what AOT is. Not to mention that it's a preview.

And, it theory, JIT has wider set of optimizations since it is able to see whole set of libraries loaded and make optimizations with knowledge of inner workings of that code, while AOT in general can't do that.

It's the other way around. The AOT compiler sees all libraries and can do whole program optimization. The JIT compiler couldn't care less about all libraries, it has time and space constraints that prevent it from seeing beyond the function it compiles (except when inlining). And that's exactly why the escape analysis required for stack allocation is problematic, to be of practical use it requires the compiler to look beyond the current function. I suppose a JIT compiler could do this by placing a limit on the depth of call tree exploration but that means that it will miss optimization opportunities. And it's not like there are a lot of them to begin with...

That compiler just bakes app, runtime and all required libraries into one huge executable, so it's able to optimize very well.

Now it seems that you're contradicting yourself. Just earlier you stated the AOT compilers can't see the whole set of libraries.

Contributor

mikedn commented Feb 6, 2015

As of AOT - actually performance tests show that now it is inferior to JIT.

Presumably you're referring to .NET Native, not AOT in general. I'm not sure what has to do with anything, it was just an example in case some readers don't know what AOT is. Not to mention that it's a preview.

And, it theory, JIT has wider set of optimizations since it is able to see whole set of libraries loaded and make optimizations with knowledge of inner workings of that code, while AOT in general can't do that.

It's the other way around. The AOT compiler sees all libraries and can do whole program optimization. The JIT compiler couldn't care less about all libraries, it has time and space constraints that prevent it from seeing beyond the function it compiles (except when inlining). And that's exactly why the escape analysis required for stack allocation is problematic, to be of practical use it requires the compiler to look beyond the current function. I suppose a JIT compiler could do this by placing a limit on the depth of call tree exploration but that means that it will miss optimization opportunities. And it's not like there are a lot of them to begin with...

That compiler just bakes app, runtime and all required libraries into one huge executable, so it's able to optimize very well.

Now it seems that you're contradicting yourself. Just earlier you stated the AOT compilers can't see the whole set of libraries.

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 6, 2015

Contributor

Presumably you're referring to .NET Native, not AOT in general. I'm not sure what has to do with anything, it was just an example in case some readers don't know what AOT is. Not to mention that it's a preview.

Actually, both NGEN and .NET Native has worse run-time performance compared to JIT.

It's the other way around. The AOT compiler sees all libraries and can do whole program optimization.

When AOT is working there is now way it can know real context. For example, you have app that calls int.ToString() in tight loop. You say - "inline it!"? Nope, it can't do that, because mscorlib may get some patch that will change behavior of int.ToString() and your app will not get that change because code is inlined. (And that's why we have that TargetedPatchingOptOut attribute).
Plus, some code may be loaded in dynamic manner - for example, all sorts of plugins.

Now it seems that you're contradicting yourself. Just earlier you stated the AOT compilers can't see the whole set of libraries.

Well, probably I was unclear - I meant that AOT compilers for .NET (and Mono too) for some unknown reason apply only "per assembly" approach - they don't bake all dependencies into one big blob. Probably, that's because baking everything will result in blob that's way too big (since .NET BCL is fairly big).
While that particular AOT implementation actually does that and it results in much better performance (but loading code at runtime is not supported - so, no plugins).

Probably we're just talking about different things - I mean situation that we currently have with .NET Native and NGEN. It's possible to have much more effective AOT, but for now there is none. Maybe .NET Native will go for it - we'll see.

Contributor

Alexx999 commented Feb 6, 2015

Presumably you're referring to .NET Native, not AOT in general. I'm not sure what has to do with anything, it was just an example in case some readers don't know what AOT is. Not to mention that it's a preview.

Actually, both NGEN and .NET Native has worse run-time performance compared to JIT.

It's the other way around. The AOT compiler sees all libraries and can do whole program optimization.

When AOT is working there is now way it can know real context. For example, you have app that calls int.ToString() in tight loop. You say - "inline it!"? Nope, it can't do that, because mscorlib may get some patch that will change behavior of int.ToString() and your app will not get that change because code is inlined. (And that's why we have that TargetedPatchingOptOut attribute).
Plus, some code may be loaded in dynamic manner - for example, all sorts of plugins.

Now it seems that you're contradicting yourself. Just earlier you stated the AOT compilers can't see the whole set of libraries.

Well, probably I was unclear - I meant that AOT compilers for .NET (and Mono too) for some unknown reason apply only "per assembly" approach - they don't bake all dependencies into one big blob. Probably, that's because baking everything will result in blob that's way too big (since .NET BCL is fairly big).
While that particular AOT implementation actually does that and it results in much better performance (but loading code at runtime is not supported - so, no plugins).

Probably we're just talking about different things - I mean situation that we currently have with .NET Native and NGEN. It's possible to have much more effective AOT, but for now there is none. Maybe .NET Native will go for it - we'll see.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 6, 2015

Contributor

I meant that AOT compilers for .NET (and Mono too) for some unknown reason apply only "per assembly" approach - they don't bake all dependencies into one big blob. Probably, that's because baking everything will result in blob that's way too big (since .NET BCL is fairly big).

But that's exactly what .NET Native does, it takes all your code and the BCL code and produces a single executable. And that's exactly why I used it as an example to begin with, because due to the way it works is in a much better position to perform escape analysis. Is it too much to ask to do some basic research before posting?

Contributor

mikedn commented Feb 6, 2015

I meant that AOT compilers for .NET (and Mono too) for some unknown reason apply only "per assembly" approach - they don't bake all dependencies into one big blob. Probably, that's because baking everything will result in blob that's way too big (since .NET BCL is fairly big).

But that's exactly what .NET Native does, it takes all your code and the BCL code and produces a single executable. And that's exactly why I used it as an example to begin with, because due to the way it works is in a much better position to perform escape analysis. Is it too much to ask to do some basic research before posting?

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 6, 2015

Contributor

OK, I admit it - I didn't have a chance to research how .NET Native works (since it is only for WinRT apps now), but the point was that it don't optimize well (at least not yet).

Also, this is off-topic anyway since .NET Native is not part of this repo - it only contains JIT.

So, this discussion probably should go forward into direction of making boxing faster with JIT.

Contributor

Alexx999 commented Feb 6, 2015

OK, I admit it - I didn't have a chance to research how .NET Native works (since it is only for WinRT apps now), but the point was that it don't optimize well (at least not yet).

Also, this is off-topic anyway since .NET Native is not part of this repo - it only contains JIT.

So, this discussion probably should go forward into direction of making boxing faster with JIT.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Perhaps someone should present the scenario where the overhead of boxing is a significant issue. The introductions of generics in .Net 2 removed the need to box/unbox in many cases. So it may be possible to solve any remaining issues by introducing new generic overloads to corelib and corefx libraries.

An example of this is Object.Equals(object other). IEquatable.Equals(T other) avoid the boxing operation.

OtherCrashOverride commented Feb 7, 2015

Perhaps someone should present the scenario where the overhead of boxing is a significant issue. The introductions of generics in .Net 2 removed the need to box/unbox in many cases. So it may be possible to solve any remaining issues by introducing new generic overloads to corelib and corefx libraries.

An example of this is Object.Equals(object other). IEquatable.Equals(T other) avoid the boxing operation.

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Feb 7, 2015

Contributor

Roslyn

Contributor

AdamSpeight2008 commented Feb 7, 2015

Roslyn

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Without more information its difficult to determine if this is a runtime issue or a Roslyn issue. It may simply be that Roslyn has not yet been optimized to avoid boxing. Maybe they simply need to change some structs (ValueType) to classes (Object) if the primary use of the construct is as an object. This would avoid unnecessary boxing/unboxing. It may also be that Roslyn depends on core API calls that would benefit from generic overloads. As stated originally, a more detailed analysis is required.

OtherCrashOverride commented Feb 7, 2015

Without more information its difficult to determine if this is a runtime issue or a Roslyn issue. It may simply be that Roslyn has not yet been optimized to avoid boxing. Maybe they simply need to change some structs (ValueType) to classes (Object) if the primary use of the construct is as an object. This would avoid unnecessary boxing/unboxing. It may also be that Roslyn depends on core API calls that would benefit from generic overloads. As stated originally, a more detailed analysis is required.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 7, 2015

Contributor

Perhaps someone should present the scenario where the overhead of boxing is a significant issue.

This isn't really about boxing. The actual problem is allocation and the only optimization you can do is not specific to boxing, any heap allocation can be converted to stack allocation in certain cases. And there's no need for a scenario, anyone with enough .NET experience knows that heap allocations in applications with large GC heaps have a significant cost.

Without more information its difficult to determine if this is a runtime issue or a Roslyn issue. It may simply be that Roslyn has not yet been optimized to avoid boxing.

I assume you never looked at the Roslyn code. What started this issue is the fact that Roslyn tries to avoid allocations at the cost of contorting the code in various ways.

Contributor

mikedn commented Feb 7, 2015

Perhaps someone should present the scenario where the overhead of boxing is a significant issue.

This isn't really about boxing. The actual problem is allocation and the only optimization you can do is not specific to boxing, any heap allocation can be converted to stack allocation in certain cases. And there's no need for a scenario, anyone with enough .NET experience knows that heap allocations in applications with large GC heaps have a significant cost.

Without more information its difficult to determine if this is a runtime issue or a Roslyn issue. It may simply be that Roslyn has not yet been optimized to avoid boxing.

I assume you never looked at the Roslyn code. What started this issue is the fact that Roslyn tries to avoid allocations at the cost of contorting the code in various ways.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

If this is not about boxing and is about Roslyn benefiting from stack allocations, then perhaps this issue should be raised there instead. The runtime already provides a solution for this they could incorporate:
https://msdn.microsoft.com/en-us/library/cx9s2sy4.aspx
"The stackalloc keyword is used in an unsafe code context to allocate a block of memory on the stack. "

The runtime itself serves many languages and now many platforms. The impact of auto-magically moving objects from heap to stack allocation would have to be studied. Does it have behavior side-effects for languages other than C#? Does it need to be reimplemented for each platform?

The immediate consequence I see is that it destroys the established behavior of object finalizers. This would imply a mandatory GC.Collect() at the end of scope to allow the heap [should say stack] objects to be properly destroyed. This would have the opposite desired effect of the intended 'optimization' and at best would mean it could only be performed under special cases without negative consequences.

OtherCrashOverride commented Feb 7, 2015

If this is not about boxing and is about Roslyn benefiting from stack allocations, then perhaps this issue should be raised there instead. The runtime already provides a solution for this they could incorporate:
https://msdn.microsoft.com/en-us/library/cx9s2sy4.aspx
"The stackalloc keyword is used in an unsafe code context to allocate a block of memory on the stack. "

The runtime itself serves many languages and now many platforms. The impact of auto-magically moving objects from heap to stack allocation would have to be studied. Does it have behavior side-effects for languages other than C#? Does it need to be reimplemented for each platform?

The immediate consequence I see is that it destroys the established behavior of object finalizers. This would imply a mandatory GC.Collect() at the end of scope to allow the heap [should say stack] objects to be properly destroyed. This would have the opposite desired effect of the intended 'optimization' and at best would mean it could only be performed under special cases without negative consequences.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 7, 2015

Contributor

The runtime already provides a solution for this they could incorporate:...

That isn't a solution by no means. It's only useful in very limited circumstances and it's a bigger contortion than what Roslyn does anyway.

Does it have behavior side-effects for languages other than C#?

A properly implemented compiler optimization doesn't have visible side-effects.

Does it need to be reimplemented for each platform?

Why would it need to be reimplemented? Are you actually familiar with what's being discussed here or are you just making wild speculations?

The immediate consequence I see is that it destroys the established behavior of object finalizers.

How so? Most objects that are interesting when it comes to stack allocation aren't finalizable anyway. And if an object is finalizable then it's not a candidate for stack allocation because a reference to it must be stored in the finalization queue.

This would imply a mandatory GC.Collect() at the end of scope to allow the heap [should say stack] objects to be properly destroyed.

What does GC.Collect() has to do with this? The whole purpose of the optimization is to avoid involving the GC.

Contributor

mikedn commented Feb 7, 2015

The runtime already provides a solution for this they could incorporate:...

That isn't a solution by no means. It's only useful in very limited circumstances and it's a bigger contortion than what Roslyn does anyway.

Does it have behavior side-effects for languages other than C#?

A properly implemented compiler optimization doesn't have visible side-effects.

Does it need to be reimplemented for each platform?

Why would it need to be reimplemented? Are you actually familiar with what's being discussed here or are you just making wild speculations?

The immediate consequence I see is that it destroys the established behavior of object finalizers.

How so? Most objects that are interesting when it comes to stack allocation aren't finalizable anyway. And if an object is finalizable then it's not a candidate for stack allocation because a reference to it must be stored in the finalization queue.

This would imply a mandatory GC.Collect() at the end of scope to allow the heap [should say stack] objects to be properly destroyed.

What does GC.Collect() has to do with this? The whole purpose of the optimization is to avoid involving the GC.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Are you actually familiar with what's being discussed here or are you just making wild speculations?

The element of the discussion that I am not familiar with is Roslyn; however, this is a common language runtime (CLR) discussion area where first boxing optimization was asked for and now stack object allocation is being asked for.

For those like myself that use the CLR for things other than Roslyn, the question becomes: how do your Roslyn specific requests affect us? Anything affecting object placement or destruction behavior has consequences. This is not speculation. It has already occurred during the early days of Mono.

So what I am asking is for someone to quantify the benefit of undertaking this request so it can be weighed against the consequences it would have. What profiling data is there to show that either boxing or heap allocation is the bottleneck in Roslyn? What performance gains can then be estimated by your proposed solutions?

I will end with saying that since both Roslyn and the CLR are open source, you could submit a proof-of-concept patch for review showing the benefits of your proposal.

OtherCrashOverride commented Feb 7, 2015

Are you actually familiar with what's being discussed here or are you just making wild speculations?

The element of the discussion that I am not familiar with is Roslyn; however, this is a common language runtime (CLR) discussion area where first boxing optimization was asked for and now stack object allocation is being asked for.

For those like myself that use the CLR for things other than Roslyn, the question becomes: how do your Roslyn specific requests affect us? Anything affecting object placement or destruction behavior has consequences. This is not speculation. It has already occurred during the early days of Mono.

So what I am asking is for someone to quantify the benefit of undertaking this request so it can be weighed against the consequences it would have. What profiling data is there to show that either boxing or heap allocation is the bottleneck in Roslyn? What performance gains can then be estimated by your proposed solutions?

I will end with saying that since both Roslyn and the CLR are open source, you could submit a proof-of-concept patch for review showing the benefits of your proposal.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 7, 2015

Contributor

first boxing optimization was asked for and now stack object allocation is being asked for.

No. Boxing optimization is that was asked for. And I explained why boxing is potentially slow and what can be done about that. As far as I'm concerned I didn't ask anything, I even said that this optimization is potentially problematic for a JIT compiler.

For those like myself that use the CLR for things other than Roslyn, the question becomes: how do your Roslyn specific requests affect us?

Why do you keep dragging Roslyn into this? Simply because Adam Speight used it as an example? It's just an example, it's not the only .NET application in the world with a GC heap that can grow to hundreds of megabytes and where GC time can be significant.

What performance gains can then be estimated by your proposed solutions?

Certainly not 20x. Not even 2x. 20% would be awesome. 5-10% more likely.

I will end with saying that since both Roslyn and the CLR are open source, you could submit a proof-of-concept patch for review showing the benefits of your proposal.

I'd love to do that if I'd have the time. Since I don't have the time I limit myself to explaining others what the situation is and what can be done. That seems more constructive to me than dismissing everything and sending people on a while goose chase with a stackalloc gun in their hands.

Contributor

mikedn commented Feb 7, 2015

first boxing optimization was asked for and now stack object allocation is being asked for.

No. Boxing optimization is that was asked for. And I explained why boxing is potentially slow and what can be done about that. As far as I'm concerned I didn't ask anything, I even said that this optimization is potentially problematic for a JIT compiler.

For those like myself that use the CLR for things other than Roslyn, the question becomes: how do your Roslyn specific requests affect us?

Why do you keep dragging Roslyn into this? Simply because Adam Speight used it as an example? It's just an example, it's not the only .NET application in the world with a GC heap that can grow to hundreds of megabytes and where GC time can be significant.

What performance gains can then be estimated by your proposed solutions?

Certainly not 20x. Not even 2x. 20% would be awesome. 5-10% more likely.

I will end with saying that since both Roslyn and the CLR are open source, you could submit a proof-of-concept patch for review showing the benefits of your proposal.

I'd love to do that if I'd have the time. Since I don't have the time I limit myself to explaining others what the situation is and what can be done. That seems more constructive to me than dismissing everything and sending people on a while goose chase with a stackalloc gun in their hands.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Why do you keep dragging Roslyn into this? Simply because Adam Speight used it as an example?

Because I am trying to understand the segment where this is a real world problem.

I do have quite a bit of experience with boxing, stack allocations, and the pitfalls of garbage collection because they are all issues game developers solved many years ago. So its a bit of a mystery that the knowledge never propagated outside that domain. It doesn't require any modifications to the runtime. It just requires updating your coding practices.

OtherCrashOverride commented Feb 7, 2015

Why do you keep dragging Roslyn into this? Simply because Adam Speight used it as an example?

Because I am trying to understand the segment where this is a real world problem.

I do have quite a bit of experience with boxing, stack allocations, and the pitfalls of garbage collection because they are all issues game developers solved many years ago. So its a bit of a mystery that the knowledge never propagated outside that domain. It doesn't require any modifications to the runtime. It just requires updating your coding practices.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Did you know ...
The use of a "for" loop in place of a "for each" can have an impact on garbage generation and boxing?

OtherCrashOverride commented Feb 7, 2015

Did you know ...
The use of a "for" loop in place of a "for each" can have an impact on garbage generation and boxing?

@Sebazzz

This comment has been minimized.

Show comment
Hide comment
@Sebazzz

Sebazzz Feb 7, 2015

Wouldn't it be possible to use a similiar approach as Java when it comes to boxing? For example, Java caches a certain range of (-256 - 256 I believe) Integer instances. It is not a real solution to the problem when it comes to user defined types, but when using Boolean, Int32, Int64, Int16 and similar integral types (including enum) would benefit from it.

Sebazzz commented Feb 7, 2015

Wouldn't it be possible to use a similiar approach as Java when it comes to boxing? For example, Java caches a certain range of (-256 - 256 I believe) Integer instances. It is not a real solution to the problem when it comes to user defined types, but when using Boolean, Int32, Int64, Int16 and similar integral types (including enum) would benefit from it.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 7, 2015

Contributor

So its a bit of a mystery that the knowledge never propagated outside that domain. It doesn't require any modifications to the runtime. It just requires updating your coding practices.

What's so mysterious about it? Games have different allocations patterns. A lot of allocations probably happen when a "level" is loaded. Some per frame allocations can be avoided by object pooling. The GC heap is less likely to reach into the gigabyte range and if it does it's more like due to texture/geometry/sound data and that kind of data has a smaller impact on GC. As for updating coding practices, I mentioned earlier that some of these allocation avoiding practices lead to less than normal code. The problem is if there's something that can be done to write normal code and get good performance at the same time.

The use of a "for" loop in place of a "for each" can have an impact on garbage generation and boxing?

Maybe yes, maybe no. It's not like I can read people minds to know what you mean by that.

Wouldn't it be possible to use a similiar approach as Java when it comes to boxing? For example, Java caches a certain range of (-256 - 256 I believe) Integer instances.

Hmm, that's possible in theory but very dangerous in practice, at least in the current implementation. A bozo could use reflection to modify the value field of those cached integers. There needs to be a way to prevent this from happening.

As a side note, WPF does something like this for bools. It doesn't avoid boxing from happening in the first place when you call SetValue but it avoids retaining multiple bool instances with the same value.

Contributor

mikedn commented Feb 7, 2015

So its a bit of a mystery that the knowledge never propagated outside that domain. It doesn't require any modifications to the runtime. It just requires updating your coding practices.

What's so mysterious about it? Games have different allocations patterns. A lot of allocations probably happen when a "level" is loaded. Some per frame allocations can be avoided by object pooling. The GC heap is less likely to reach into the gigabyte range and if it does it's more like due to texture/geometry/sound data and that kind of data has a smaller impact on GC. As for updating coding practices, I mentioned earlier that some of these allocation avoiding practices lead to less than normal code. The problem is if there's something that can be done to write normal code and get good performance at the same time.

The use of a "for" loop in place of a "for each" can have an impact on garbage generation and boxing?

Maybe yes, maybe no. It's not like I can read people minds to know what you mean by that.

Wouldn't it be possible to use a similiar approach as Java when it comes to boxing? For example, Java caches a certain range of (-256 - 256 I believe) Integer instances.

Hmm, that's possible in theory but very dangerous in practice, at least in the current implementation. A bozo could use reflection to modify the value field of those cached integers. There needs to be a way to prevent this from happening.

As a side note, WPF does something like this for bools. It doesn't avoid boxing from happening in the first place when you call SetValue but it avoids retaining multiple bool instances with the same value.

@mburbea

This comment has been minimized.

Show comment
Hide comment
@mburbea

mburbea Feb 7, 2015

WPF was another place I saw a lot of developers do something like declare a boolconst class with boxed true and false.
How's about a pool of boxes for the common built in types? So rather then allocating a new boxed int every time it just pulls one out of the pool?

mburbea commented Feb 7, 2015

WPF was another place I saw a lot of developers do something like declare a boolconst class with boxed true and false.
How's about a pool of boxes for the common built in types? So rather then allocating a new boxed int every time it just pulls one out of the pool?

@Sebazzz

This comment has been minimized.

Show comment
Hide comment
@Sebazzz

Sebazzz Feb 7, 2015

Hmm, that's possible in theory but very dangerous in practice, at least in the current implementation. A bozo could use reflection to modify the value field of those cached integers. There needs to be a way to prevent this from happening.

I wouldn't worry to much about that. We have partial trust and security sandboxing to prevent that from happening. Also, the solution would possibly more be suited to implement at CLR level than at the BCL level.

Sebazzz commented Feb 7, 2015

Hmm, that's possible in theory but very dangerous in practice, at least in the current implementation. A bozo could use reflection to modify the value field of those cached integers. There needs to be a way to prevent this from happening.

I wouldn't worry to much about that. We have partial trust and security sandboxing to prevent that from happening. Also, the solution would possibly more be suited to implement at CLR level than at the BCL level.

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Feb 7, 2015

Contributor

Firstly this wasn't a proposal but a discussion (hence the (Discussion) in the title), Yes the projects maybe Open Source but doesn't necessary guarantee you automatically get access to all of the previous discussions (design meetings notes, API review notes etc etc) that happened prior being opened sourced.

Avoid boxing seems to be the mantra for "performance code".
When it seems to actually you mean Avoid unnecessary allocations.


Falls through soapbox

Would an additional smaller heap (for small structs) help?

Contributor

AdamSpeight2008 commented Feb 7, 2015

Firstly this wasn't a proposal but a discussion (hence the (Discussion) in the title), Yes the projects maybe Open Source but doesn't necessary guarantee you automatically get access to all of the previous discussions (design meetings notes, API review notes etc etc) that happened prior being opened sourced.

Avoid boxing seems to be the mantra for "performance code".
When it seems to actually you mean Avoid unnecessary allocations.


Falls through soapbox

Would an additional smaller heap (for small structs) help?

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 7, 2015

Contributor

@AdamSpeight2008 I believe that additional heap won't help because we already have this idea implemented in generational GC

Contributor

Alexx999 commented Feb 7, 2015

@AdamSpeight2008 I believe that additional heap won't help because we already have this idea implemented in generational GC

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Maybe yes, maybe no. It's not like I can read people minds to know what you mean by that

That pretty much sums up this discussion. Nobody is putting forth any test cases or profiling data to show that this is actually a problem worth solving. We are depending on psychic profiling and hear-say.

There are tons of kids out there than can teach you how to get every last drop of perf out of the CLR. The same issues mentioned here were magnified 1000X on the Xbox 360. So "games are different than real work!" is not a valid point. When you have to meet a hard deadline of doing everything in 1/60 of a second and then doing it ALL again, you discover every possible way to 'game' the compiler and runtime. Every single one of them will also tell you to start with a perf tool to discover where you are actually spending your time.

OtherCrashOverride commented Feb 7, 2015

Maybe yes, maybe no. It's not like I can read people minds to know what you mean by that

That pretty much sums up this discussion. Nobody is putting forth any test cases or profiling data to show that this is actually a problem worth solving. We are depending on psychic profiling and hear-say.

There are tons of kids out there than can teach you how to get every last drop of perf out of the CLR. The same issues mentioned here were magnified 1000X on the Xbox 360. So "games are different than real work!" is not a valid point. When you have to meet a hard deadline of doing everything in 1/60 of a second and then doing it ALL again, you discover every possible way to 'game' the compiler and runtime. Every single one of them will also tell you to start with a perf tool to discover where you are actually spending your time.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

Its also worth pointing out that regardless of runtime or language there is always a difference between code written for clarity and maintenance and that written for performance. Its unrealistic to expect a runtime that serves different languages with different paradigms to be an "all knowing panacea of optimization". It worth considering that the types of optimizations proposed are ironically better suited for implementation in the code Roslyn itself produces. It would certainly have much more knowledge about when an object is a good candidate for stack allocation or when a box operation could be replaced with a non-boxing equivalent.

OtherCrashOverride commented Feb 7, 2015

Its also worth pointing out that regardless of runtime or language there is always a difference between code written for clarity and maintenance and that written for performance. Its unrealistic to expect a runtime that serves different languages with different paradigms to be an "all knowing panacea of optimization". It worth considering that the types of optimizations proposed are ironically better suited for implementation in the code Roslyn itself produces. It would certainly have much more knowledge about when an object is a good candidate for stack allocation or when a box operation could be replaced with a non-boxing equivalent.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 7, 2015

Contributor

There are tons of kids out there than can teach you how to get every last drop of perf out of the CLR.

OK, I take it that you think that everyone here, including the Roslyn team, are fools who don't know how to write efficient code and only the kids out there know that. That indeed sums it up.

It worth considering that the types of optimizations proposed are ironically better suited for implementation in the code Roslyn itself produces. It would certainly have much more knowledge about when an object is a good candidate for stack allocation or when a box operation could be replaced with a non-boxing equivalent.

By saying that you effectively admit that you have no clue about what the discussion is about, and that includes CLR inner workings.

I have no idea why you are trying so hard to dismiss an idea that's not even presented as a proposal.

Contributor

mikedn commented Feb 7, 2015

There are tons of kids out there than can teach you how to get every last drop of perf out of the CLR.

OK, I take it that you think that everyone here, including the Roslyn team, are fools who don't know how to write efficient code and only the kids out there know that. That indeed sums it up.

It worth considering that the types of optimizations proposed are ironically better suited for implementation in the code Roslyn itself produces. It would certainly have much more knowledge about when an object is a good candidate for stack allocation or when a box operation could be replaced with a non-boxing equivalent.

By saying that you effectively admit that you have no clue about what the discussion is about, and that includes CLR inner workings.

I have no idea why you are trying so hard to dismiss an idea that's not even presented as a proposal.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Feb 7, 2015

I have no idea why you are trying so hard to dismiss an idea that's not even presented as a proposal.

Because the .Net team has already been here almost a decade ago.

As I did not attend the "Linus Torvalds Finishing School of Social Etiquette", I will bow out of this discussion so as not to further disrupt it.

OtherCrashOverride commented Feb 7, 2015

I have no idea why you are trying so hard to dismiss an idea that's not even presented as a proposal.

Because the .Net team has already been here almost a decade ago.

As I did not attend the "Linus Torvalds Finishing School of Social Etiquette", I will bow out of this discussion so as not to further disrupt it.

@Alexx999

This comment has been minimized.

Show comment
Hide comment
@Alexx999

Alexx999 Feb 7, 2015

Contributor

I agree with @OtherCrashOverride - in game development GC is your worst enemy.
It becomes more obvious when you have really high memory pressure due to having hundreds of megabytes of textures and artwork that you can't unload (because it's all on screen right now) and every allocation may trigger full GC. And don't forget that you only have 16ms for doing EVERYTHING including all calculations and rendering, so full GC will make your game stutter.
In other areas it's non-critical - in web you just purchase more hardware, in desktop apps you show some nice progress indicator, but in games you can't do that - users tend to play games on various crappy hardware, and low performance just kills game experience.

For now, there are areas that allocate extra objects - for example, Delegate.Invoke, parsing of enums, probably lots of others. I think that many extra boxing operations and allocations can be avoided by revising those.

As of escape analysis as such - it's probably good long-term goal, but it is surely too complex to be implemented in near future.

And speedups for boxing - we need some analysis that will show that it can be faster than now. I suppose we're not first who said "hey, boxing is slow" - that's why generics were implemented in first place.

Contributor

Alexx999 commented Feb 7, 2015

I agree with @OtherCrashOverride - in game development GC is your worst enemy.
It becomes more obvious when you have really high memory pressure due to having hundreds of megabytes of textures and artwork that you can't unload (because it's all on screen right now) and every allocation may trigger full GC. And don't forget that you only have 16ms for doing EVERYTHING including all calculations and rendering, so full GC will make your game stutter.
In other areas it's non-critical - in web you just purchase more hardware, in desktop apps you show some nice progress indicator, but in games you can't do that - users tend to play games on various crappy hardware, and low performance just kills game experience.

For now, there are areas that allocate extra objects - for example, Delegate.Invoke, parsing of enums, probably lots of others. I think that many extra boxing operations and allocations can be avoided by revising those.

As of escape analysis as such - it's probably good long-term goal, but it is surely too complex to be implemented in near future.

And speedups for boxing - we need some analysis that will show that it can be faster than now. I suppose we're not first who said "hey, boxing is slow" - that's why generics were implemented in first place.

@waneck

This comment has been minimized.

Show comment
Hide comment
@waneck

waneck Feb 10, 2015

I'm not sure I understand why escape analysis would be too complex. Tagging each function argument regarding escaping can be enough to be able to cheaply determine what arguments can be allocated on the stack.
Things do get trickier when dealing with virtual functions, and I don't know how much CoreCLR handle virtual function calls' optimizations, but if all else fails, escape analysis may only work on non-virtual function calls at start.

waneck commented Feb 10, 2015

I'm not sure I understand why escape analysis would be too complex. Tagging each function argument regarding escaping can be enough to be able to cheaply determine what arguments can be allocated on the stack.
Things do get trickier when dealing with virtual functions, and I don't know how much CoreCLR handle virtual function calls' optimizations, but if all else fails, escape analysis may only work on non-virtual function calls at start.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Feb 10, 2015

Contributor

I'm not sure I understand why escape analysis would be too complex.

Well, complex is probably not the best term. Most compiler optimizations have a certain degree of complexity. The problem with escape analysis is that it should be interprocedural and that's a bit too much to ask from a JIT compiler. It requires more time because multiple methods have to be analyzed to compile just one and it may also require more memory because you probably want to store the summarization information of previously analyzed method for future use.

Of course, you can skip the interprocedural part and simply mark as escaped any objects that end up as arguments of another method. But that means that there will be fewer opportunities for optimizations and IMO there aren't that many to begin with. It appears that's exactly what Java does.

The article on which Java's escape analysis/stack allocation is based claims some very high reductions in object allocation, nearly 80% for certain programs. I find that difficult to believe, maybe that's because Java doesn't have value types and as a result it needs more heap allocations to begin with. The fact that the article is quite old (1999) doesn't help, trying to find the various samples that were used for benchmarks is not trivial.

Things do get trickier when dealing with virtual functions, and I don't know how much CoreCLR handle virtual function calls' optimizations,

It doesn't. It eliminates virtual calls in the case of value types and nothing else. It doesn't even eliminate virtual calls in the case of sealed classes.

Contributor

mikedn commented Feb 10, 2015

I'm not sure I understand why escape analysis would be too complex.

Well, complex is probably not the best term. Most compiler optimizations have a certain degree of complexity. The problem with escape analysis is that it should be interprocedural and that's a bit too much to ask from a JIT compiler. It requires more time because multiple methods have to be analyzed to compile just one and it may also require more memory because you probably want to store the summarization information of previously analyzed method for future use.

Of course, you can skip the interprocedural part and simply mark as escaped any objects that end up as arguments of another method. But that means that there will be fewer opportunities for optimizations and IMO there aren't that many to begin with. It appears that's exactly what Java does.

The article on which Java's escape analysis/stack allocation is based claims some very high reductions in object allocation, nearly 80% for certain programs. I find that difficult to believe, maybe that's because Java doesn't have value types and as a result it needs more heap allocations to begin with. The fact that the article is quite old (1999) doesn't help, trying to find the various samples that were used for benchmarks is not trivial.

Things do get trickier when dealing with virtual functions, and I don't know how much CoreCLR handle virtual function calls' optimizations,

It doesn't. It eliminates virtual calls in the case of value types and nothing else. It doesn't even eliminate virtual calls in the case of sealed classes.

@jkotas jkotas added the hard problem label Mar 14, 2015

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Jul 3, 2015

Collaborator

Roslyn

Just to add the "contortions" that the Roslyn developers had to perform to minimise allocations are nicely outlined in this talk.

But it would be nice if the complier, JIT and/or runtime could handle some of these issues for us (I believe at least one of the scenarios from that talk was fixed in the compiler, I just can't remember which one).

Collaborator

mattwarren commented Jul 3, 2015

Roslyn

Just to add the "contortions" that the Roslyn developers had to perform to minimise allocations are nicely outlined in this talk.

But it would be nice if the complier, JIT and/or runtime could handle some of these issues for us (I believe at least one of the scenarios from that talk was fixed in the compiler, I just can't remember which one).

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Jul 3, 2015

But it would be nice if the complier, JIT and/or runtime could handle some of these issues for us

I think it should be the responsibility of an offline tool or the IDE to handle the situations outlined in that presentation (none of which required changes to the runtime).

public void Log(int id, int size)
{
     string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these types of optimizations.

OtherCrashOverride commented Jul 3, 2015

But it would be nice if the complier, JIT and/or runtime could handle some of these issues for us

I think it should be the responsibility of an offline tool or the IDE to handle the situations outlined in that presentation (none of which required changes to the runtime).

public void Log(int id, int size)
{
     string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these types of optimizations.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 4, 2015

Contributor

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString. Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

Contributor

mikedn commented Jul 4, 2015

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString. Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Jul 4, 2015

Collaborator

There's already a Roslyn analyzer that tries to highlight all of these
scenarios, see https://github.com/mjsabby/RoslynClrHeapAllocationAnalyzer

On Saturday, 4 July 2015, OtherCrashOverride notifications@github.com
wrote:

But it would be nice if the complier, JIT and/or runtime could handle some
of these issues for us

I think it should be the responsibility of an offline tool or the IDE to
handle the situations outlined in that presentation (none of which required
changes to the runtime).

public void Log(int id, int size)
{
string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these
types of optimizations.


Reply to this email directly or view it on GitHub
#111 (comment).

Collaborator

mattwarren commented Jul 4, 2015

There's already a Roslyn analyzer that tries to highlight all of these
scenarios, see https://github.com/mjsabby/RoslynClrHeapAllocationAnalyzer

On Saturday, 4 July 2015, OtherCrashOverride notifications@github.com
wrote:

But it would be nice if the complier, JIT and/or runtime could handle some
of these issues for us

I think it should be the responsibility of an offline tool or the IDE to
handle the situations outlined in that presentation (none of which required
changes to the runtime).

public void Log(int id, int size)
{
string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these
types of optimizations.


Reply to this email directly or view it on GitHub
#111 (comment).

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Jul 4, 2015

Collaborator

Although it doesn't offer a fix, it just highlights hidden allocations

On Saturday, 4 July 2015, Matt Warren matt.warren@live.co.uk wrote:

There's already a Roslyn analyzer that tries to highlight all of these
scenarios, see https://github.com/mjsabby/RoslynClrHeapAllocationAnalyzer

On Saturday, 4 July 2015, OtherCrashOverride <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

But it would be nice if the complier, JIT and/or runtime could handle
some of these issues for us

I think it should be the responsibility of an offline tool or the IDE to
handle the situations outlined in that presentation (none of which required
changes to the runtime).

public void Log(int id, int size)
{
string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these
types of optimizations.


Reply to this email directly or view it on GitHub
#111 (comment).

Collaborator

mattwarren commented Jul 4, 2015

Although it doesn't offer a fix, it just highlights hidden allocations

On Saturday, 4 July 2015, Matt Warren matt.warren@live.co.uk wrote:

There's already a Roslyn analyzer that tries to highlight all of these
scenarios, see https://github.com/mjsabby/RoslynClrHeapAllocationAnalyzer

On Saturday, 4 July 2015, OtherCrashOverride <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

But it would be nice if the complier, JIT and/or runtime could handle
some of these issues for us

I think it should be the responsibility of an offline tool or the IDE to
handle the situations outlined in that presentation (none of which required
changes to the runtime).

public void Log(int id, int size)
{
string.Format("{0}:{1}", id, size);
}

The tool or IDE should report:

'string.Format("{0}:{1}", id, size);' results in unnecessary boxing, consider
'string.Format("{0}:{1}", id.ToString(), size.ToString());' instead.

It would be nice if there was a tool like StyleCop that focused on these
types of optimizations.


Reply to this email directly or view it on GitHub
#111 (comment).

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Jul 4, 2015

@mikedn

Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

Your continued personal attacks are both unwelcome and unwarranted. Perhaps a better use of that energy and time would be producing benchmarks and pull requests. I recommend watching the video linked earlier to understand the optimization process.

As for 'missing the point', you have made your position clear: you want stack allocation of heap objects. This issue topic, however, is "(Discussion) Lightweight Boxing?". Since it is a discussion, you are likely to encounter dissenting viewpoints. Please be respectful of others.

OtherCrashOverride commented Jul 4, 2015

@mikedn

Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

Your continued personal attacks are both unwelcome and unwarranted. Perhaps a better use of that energy and time would be producing benchmarks and pull requests. I recommend watching the video linked earlier to understand the optimization process.

As for 'missing the point', you have made your position clear: you want stack allocation of heap objects. This issue topic, however, is "(Discussion) Lightweight Boxing?". Since it is a discussion, you are likely to encounter dissenting viewpoints. Please be respectful of others.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Jul 4, 2015

For the sake of completeness in evaluation, I feel i should add:

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString.

Whether your function calls ToString() or string.Format() calls ToString() on your behalf, either way it still gets called. Escape analysis is not going to change that. Therefore, the solution I mentioned (as presented by the Roslyn team) is the only solution that will give you any optimization as the box is avoided entirely. With escape analysis, the object would still be created (on the stack instead of the heap) and the value copied to it. With the code modification, the value already exists on the stack and no additional allocation for boxing is performed.

[Edit]
This underscores why its important to test and measure rather than just present the hypothesis as fact.

OtherCrashOverride commented Jul 4, 2015

For the sake of completeness in evaluation, I feel i should add:

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString.

Whether your function calls ToString() or string.Format() calls ToString() on your behalf, either way it still gets called. Escape analysis is not going to change that. Therefore, the solution I mentioned (as presented by the Roslyn team) is the only solution that will give you any optimization as the box is avoided entirely. With escape analysis, the object would still be created (on the stack instead of the heap) and the value copied to it. With the code modification, the value already exists on the stack and no additional allocation for boxing is performed.

[Edit]
This underscores why its important to test and measure rather than just present the hypothesis as fact.

@sharwell

This comment has been minimized.

Show comment
Hide comment
@sharwell

sharwell Jul 5, 2015

Member

The CLR already has "lightweight boxing" through the following:

  1. User defined value types and generics which work with them (i.e. avoid boxing altogether)
  2. The constrained. instruction prefix

Java needs escape analysis much more than C# because it does not have either of these, but it does not do nearly as good a job as careful use of value types in code running on the CLR.

Member

sharwell commented Jul 5, 2015

The CLR already has "lightweight boxing" through the following:

  1. User defined value types and generics which work with them (i.e. avoid boxing altogether)
  2. The constrained. instruction prefix

Java needs escape analysis much more than C# because it does not have either of these, but it does not do nearly as good a job as careful use of value types in code running on the CLR.

@AdamSpeight2008

This comment has been minimized.

Show comment
Hide comment
@AdamSpeight2008

AdamSpeight2008 Jul 5, 2015

Contributor

String Format Diagnostics shows you how to do analysis of format strings. Note that it's in need of a update / rewrite as it was developed be interpolation strings.

Contributor

AdamSpeight2008 commented Jul 5, 2015

String Format Diagnostics shows you how to do analysis of format strings. Note that it's in need of a update / rewrite as it was developed be interpolation strings.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 5, 2015

Contributor

Your continued personal attacks are both unwelcome and unwarranted.

That's news to me, stating the facts constitutes a personal attack. And it is rather disingenuous to claim "person attack" when you seem to assume that everyone else is stupid and doesn't know how to optimize code.

As for 'missing the point', you have made your position clear: you want stack allocation of heap objects.

I don't want anything. Someone asked if X could be improved and I answered that yes, Y could be done to improve X. It's difficult to implement and of questionable effectiveness but the answer was, is and always be yes, something can be done to improve X.

Since it is a discussion, you are likely to encounter dissenting viewpoints. Please be respectful of others.

Thoughtful and on topic dissent is welcomed but I've yet to see that from you. Same for respect. Until now your dissent consist of workarounds that nobody asked for, questioning other's ability to optimize code and un-implementable suggestions. In general, your attitude in this discussion seems to be more like "you guys are not allowed to discuss this because I don't like it". You are not the owner of this repository and you haven't made any contributions to it so you're not in a position to decide what people can discuss here. If the owners of the repository consider that this discussion is not constructive then they can close the issue. Probably they should.

Whether your function calls ToString() or string.Format() calls ToString() on your behalf.

I know all that and I also know that it is an unfortunate implementation detail of String.Format. As an implementation detail is subject to change, and it can and should be changed for common types such as numeric types and DateTime. The framework's inability to parse and format numbers without allocating a temporary string is rather well known, at least to some.

Contributor

mikedn commented Jul 5, 2015

Your continued personal attacks are both unwelcome and unwarranted.

That's news to me, stating the facts constitutes a personal attack. And it is rather disingenuous to claim "person attack" when you seem to assume that everyone else is stupid and doesn't know how to optimize code.

As for 'missing the point', you have made your position clear: you want stack allocation of heap objects.

I don't want anything. Someone asked if X could be improved and I answered that yes, Y could be done to improve X. It's difficult to implement and of questionable effectiveness but the answer was, is and always be yes, something can be done to improve X.

Since it is a discussion, you are likely to encounter dissenting viewpoints. Please be respectful of others.

Thoughtful and on topic dissent is welcomed but I've yet to see that from you. Same for respect. Until now your dissent consist of workarounds that nobody asked for, questioning other's ability to optimize code and un-implementable suggestions. In general, your attitude in this discussion seems to be more like "you guys are not allowed to discuss this because I don't like it". You are not the owner of this repository and you haven't made any contributions to it so you're not in a position to decide what people can discuss here. If the owners of the repository consider that this discussion is not constructive then they can close the issue. Probably they should.

Whether your function calls ToString() or string.Format() calls ToString() on your behalf.

I know all that and I also know that it is an unfortunate implementation detail of String.Format. As an implementation detail is subject to change, and it can and should be changed for common types such as numeric types and DateTime. The framework's inability to parse and format numbers without allocating a temporary string is rather well known, at least to some.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride Jul 5, 2015

when you seem to assume that everyone else is stupid and doesn't know how to optimize code.

https://en.wikipedia.org/wiki/Straw_man

We really should put this issue to bed.

Can we (CoreCLR) improve boxing? No, and nobody has suggested otherwise.

As @mikedn stated:

It is possible to convert certain heap allocations to stack allocations but the analysis required to do so is pretty expensive for a JIT compiler. It's more likely to see this optimization in an AOT compiler, such as .NET Native.

So there too, there is nothing for CoreCLR to do. The discussion should be taken up on dotnet/llilc.

As for the possibility of the compiler outputting non-boxing equivalents automatically, that discussion belongs on dotnet/roslyn.

The only real solution presented is the one that I stated from the start of this fateful discussion and that is: Don't Box!

@mikedn said:

I know all that and I also know that it is an unfortunate implementation detail of String.Format. As an implementation detail is subject to change, and it can and should be changed for common types such as numeric types and DateTime. The framework's inability to parse and format numbers without allocating a temporary string is rather well known, at least to some.

Now that is just really reaching. Having just previously stated:

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString. Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

It takes a person of character to be able to admit they were wrong and clearly that is an area you are lacking.

@AdamSpeight2008 unless there is some area that you feel has not been covered by this discussion that is not better suited to some other discussion area, please consider closing this issue.

OtherCrashOverride commented Jul 5, 2015

when you seem to assume that everyone else is stupid and doesn't know how to optimize code.

https://en.wikipedia.org/wiki/Straw_man

We really should put this issue to bed.

Can we (CoreCLR) improve boxing? No, and nobody has suggested otherwise.

As @mikedn stated:

It is possible to convert certain heap allocations to stack allocations but the analysis required to do so is pretty expensive for a JIT compiler. It's more likely to see this optimization in an AOT compiler, such as .NET Native.

So there too, there is nothing for CoreCLR to do. The discussion should be taken up on dotnet/llilc.

As for the possibility of the compiler outputting non-boxing equivalents automatically, that discussion belongs on dotnet/roslyn.

The only real solution presented is the one that I stated from the start of this fateful discussion and that is: Don't Box!

@mikedn said:

I know all that and I also know that it is an unfortunate implementation detail of String.Format. As an implementation detail is subject to change, and it can and should be changed for common types such as numeric types and DateTime. The framework's inability to parse and format numbers without allocating a temporary string is rather well known, at least to some.

Now that is just really reaching. Having just previously stated:

And of course, now you have a string temporary that could be eliminated by some fancy escape analysis except that it won't since it's the return of ToString. Hurray for missing the point (again) and turning something that could be optimized into something that can't be optimized.

It takes a person of character to be able to admit they were wrong and clearly that is an area you are lacking.

@AdamSpeight2008 unless there is some area that you feel has not been covered by this discussion that is not better suited to some other discussion area, please consider closing this issue.

@mikedn

This comment has been minimized.

Show comment
Hide comment
@mikedn

mikedn Jul 5, 2015

Contributor

https://en.wikipedia.org/wiki/Straw_man

There's a difference between saying "when you seem to assume" and "when you assume". You could be more careful with the way you phrase your replies and avoid potentially inflammatory remarks, perhaps that will avoid misunderstandings.

Can we (CoreCLR) improve boxing? No, and nobody has suggested otherwise.

I actually suggested otherwise. The fact that it's difficult and potentially problematic in a JIT compiler doesn't imply that it is also impossible. Java somehow managed to do it.

As @mikedn stated:

Great strategy. When you're out of on-topic arguments you resort to using other's people replies.

The only real solution presented is the one that I stated from the start of this fateful discussion and that is: Don't Box!

That's not the solution for this discussion. People ask you what's the result of 2+2 and you answer "an integer". Indeed, it is an integer but that's not the answer people were looking for.

It takes a person of character to be able to admit they were wrong and clearly that is an area you are lacking.

Except that I wasn't wrong. I knew all along what String.Format does under the covers because I looked at the code long before this repository even existed. I left out that little detail to see how you'll react. As expected, you're again resorting to questioning my technical abilities and even my character.

We really should put this issue to bed.

I thought you did so. At least that's what you claimed in an earlier post. If you can't keep your word then fine, I'll do it myself.

Contributor

mikedn commented Jul 5, 2015

https://en.wikipedia.org/wiki/Straw_man

There's a difference between saying "when you seem to assume" and "when you assume". You could be more careful with the way you phrase your replies and avoid potentially inflammatory remarks, perhaps that will avoid misunderstandings.

Can we (CoreCLR) improve boxing? No, and nobody has suggested otherwise.

I actually suggested otherwise. The fact that it's difficult and potentially problematic in a JIT compiler doesn't imply that it is also impossible. Java somehow managed to do it.

As @mikedn stated:

Great strategy. When you're out of on-topic arguments you resort to using other's people replies.

The only real solution presented is the one that I stated from the start of this fateful discussion and that is: Don't Box!

That's not the solution for this discussion. People ask you what's the result of 2+2 and you answer "an integer". Indeed, it is an integer but that's not the answer people were looking for.

It takes a person of character to be able to admit they were wrong and clearly that is an area you are lacking.

Except that I wasn't wrong. I knew all along what String.Format does under the covers because I looked at the code long before this repository even existed. I left out that little detail to see how you'll react. As expected, you're again resorting to questioning my technical abilities and even my character.

We really should put this issue to bed.

I thought you did so. At least that's what you claimed in an earlier post. If you can't keep your word then fine, I'll do it myself.

@OtherCrashOverride

This comment has been minimized.

Show comment
Hide comment
@OtherCrashOverride

OtherCrashOverride commented Jul 5, 2015

Beating a dead horse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment