Clear entire range on decommit#128032
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @dotnet/gc |
| decommit_succeeded_p)); | ||
| if (require_clearing_memory_p) | ||
| { | ||
| uint8_t* clear_end = never_decommit_p ? heap_segment_used (region) : heap_segment_committed (region); |
There was a problem hiding this comment.
@BenV I am not 100% sure the actual problem is at this place. Last week I've tried to add clearing to all places where we skip it for large pages. And I've found that if I keep it at all the other places except of this one, it still passed the test you've shared. I am going to look into it more today, but that seems to indicate that the clearing we are adding here may be just fixing what should have been done at another place.
I didn't have a chance to read through @cshung's analysis yet though, I am going to do that first.
There was a problem hiding this comment.
@cshung after reading your analysis (thank you for that!), I wonder - why is clearing the memory here better than clearing the necessary part after the compaction / replanning instead? Wouldn't we end up zeroing less memory that way?
There was a problem hiding this comment.
That sounds good, let me know what you find and if you need anything else from us. Thanks again.
There was a problem hiding this comment.
@janvorli, the change will stop the bleeding for now, we can definitely try to optimize to zero less memory.
There was a problem hiding this comment.
I am admittedly biased but I agree it would be great to get the simple fix in and optimize later (particularly if we could backport this to .NET 10) - I'd rather zero a bit of extra memory if it is guaranteed to avoid corruption as GCLargePages is essentially a landmine right now. That being said if some additional investigation reveals a similarly simple fix elsewhere I would be all for that as well.
|
Closing this in favor of #128217 |
Fixes: #127892 (comment)
Detailed analysis from @cshung here: https://cshung.github.io/posts/memory-corruption-5/
This logic was also very recently touched by #127328 so we may want @pavelsavara to look as well, but I suspect we would see the same heap corruption that was happening with large pages in WASM builds as well.
I didn't add a comment for the change because it felt a bit odd to reference
never_decommit_por huge pages when the code was being removed, but I could add something to try and prevent this optimization from being re-introduced if you all feel appropriate (or feel free to just edit an appropriate comment in)Thanks again for taking this seriously and helping us track this down!
CC: @mangod9