Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayPool<T>.Shared.Return should clear buffer when T is reference or contains references #25378

Closed
GrabYourPitchforks opened this issue Mar 9, 2018 · 13 comments
Assignees
Labels
area-System.Buffers enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

The current implementation of ArrayPool<T>.Shared is that there are per-thread buckets of arrays waiting to be re-rented. The default implementation of Return is that the arrays aren't cleared before being thrown into the TLS buckets. This leads to situations where if an application uses scratch buffers of reference types, the elements in those scratch buffers can stick around forever once they've been returned to the pool.

Example:

var buffer = ArrayPool<string>.Shared.Rent(16);
buffer[0] = new string('x', 300);
var weakRef = new WeakReference<string>(buffer[0]);
ArrayPool<string>.Shared.Return(buffer);
buffer = null;

// mimic doing a lot of work
for (int i = 0; i < 100; i++) {
  GC.Collect();
}

Console.WriteLine($"Is alive = {weakRef.TryGetTarget(out var target)}");
Console.WriteLine($"Generation = {GC.GetGeneration(target)}");

This prints "Is alive = True / Generation = 2" to the console. The expected behavior is that the object would be eligible for collection once the containing array has been returned to the pool.

The API could still retain the behavior where the buffer doesn't get cleared if Retain is just going to throw away the array without putting it back in the pool.

@stephentoub
Copy link
Member

Return takes a clearArray argument. If it's set to false, since that's the default value (for better or worse), it can't tell the difference between whether that was an explicit choice by the developer ("I don't want to pay for clearing, I know what I'm doing") or whether it was done unknowingly ("Look, I can call Return with just the buffer, that's easy"). Given the design is to leave it up to the caller to decide, it seems strange to me then that we'd override the caller's choice, making the argument effectively meaningless.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 9, 2018

I'm only advocating auto clearing on return for when T is a reference type. Otherwise the references in the array can stick around forever, causing the referenced objects eventually to get promoted to Gen 2, when the buffer was ostensibly for scratch work.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2018

ArrayPool tends to give you much larger buffers than what you actually need and use. Return does not know how much of the buffer was actually used. It would be pretty unfortunate for Return to clear say 1000 elements when you have actually only used 10 elements.

If we want to fix this, we should rather do that by treating array pool caches in a special way by the GC. It is something we will likely need to do anyway to stop array pool from leaking/hoarding too much memory.

@GrabYourPitchforks
Copy link
Member Author

I've updated the repro code to be a bit clearer since this is different from the other "Rent should zero-init" issues I've been opening. @jkotas's suggestion of special-casing this in the GC would also work. We could also use weak references under the covers to mimic this behavior until the GC changes come online. Clearing the buffer is just one alternative.

@stephentoub
Copy link
Member

stephentoub commented Mar 12, 2018

I'm only advocating auto clearing on return for when T is a reference type. Otherwise the references in the array can stick around forever, causing the referenced objects eventually to get promoted to Gen 2, when the buffer was ostensibly for scratch work.

The way the API is designed, it's up to the caller of Return. If they opt not for the array to be cleared, who are we to ignore them and do it anyway? They could actually want the behavior, e.g. using the pool not only as a cache of arrays but also as a cache of arrays containing cached objects, and we'd be causing them to incur more cost by clearing automatically.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 15, 2018

If they opt not for the array to be cleared

That is the fundamental problem I have with this argument. The parameter is optional and defaults to false, so in the majority case the caller did not opt in to or out of anything. I would believe that most callers would be surprised by this "the objects live forever" behavior.

@stephentoub
Copy link
Member

That's an arguable flaw in the API design, but it's what the API is.

@GrabYourPitchforks
Copy link
Member Author

You could always change the default. It would not affect existing compiled code, and anybody who is passing an expicit parameter anyway would retain the same behavior even after recompilation.

@ViktorHofer
Copy link
Member

I stumbled upon this issue a few weeks ago during some perf related work in Regex. I agree with what Levi says that the default is dangerous but also agree with what Jan says that Rent could return larger arrays than what you actually requested and clearing those could cause severe bottlenecks.

We could change the current design from:

namespace System.Buffers
{
    public abstract class ArrayPool<T>
    {
        // Only clears the array when clearArray: true is explitiely passed in.
        public abstract void Return(T[] array, bool clearArray = false);
    }
}

to

namespace System.Buffers
{
    public abstract class ArrayPool<T>
    {
        // Only clears the array if T is a reference type.
        public abstract void Return(T[] array);
        // Clears the array if clearArray is true for both value and reference types.
        public abstract void Return(T[] array, bool clearArray);
    }
}

and adapt the the existing uses in the BCL.

@danmoseley
Copy link
Member

@GrabYourPitchforks can you please drive htis to consensus one way or another so we can get it closed by 30th?

@joshfree
Copy link
Member

Flagging this as post-zbb. We'll prioritize this once the Memory/OwnedMemory lifetime management discussions conclude.

@joshfree
Copy link
Member

You could always change the default.

Please note that these APIs shipped long ago in .NET Core 1.0. We shouldn't be making breaking changes to the default overload behavior. Let's stick to docs / guidance or analyzers for existing shipped APIs.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@GrabYourPitchforks
Copy link
Member Author

Closing stale issues that aren't actionable.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Buffers enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants