New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281
Comments
cc @zeroshade |
It seems we could instead use the same approach as C++ and Java do: when importing buffers, bind them to a ref-counted |
While there is no guarantee that a particular finalizer will run before a program exits, it will eventually be called over the lifetime of a program or the program will exit (at which point it becomes moot since it would be released like any other memory from the program). Even though it's not guaranteed on any individual GC-run, in practice I've found that running the GC is enough to trigger the finalizer (and is how the python <--> Go integration tests in that dir work).
I can see a way in which we could possibly do this because |
Indeed. However, in the real world, it can be annoying that a large piece of data doesn't get collected even after a GC run. My experience with C Data integration is that I have to run the GC up to 10 times. (and I suppose Arrow Go users would like |
That's absolutely interesting and not what I expected (though it's also not unexpected), I'll see what I can come up with. |
This is a classic issue with GC finalizers: they can't run at a deterministic time because the GC has to prove it's not referenced anymore (by the end of a tracing cycle) and even then it can decide to delay the finalization/collection to a later collection cycle to keep the cost of GC pauses low. This is a fundamental issue of tracing-based GCs, so calling finalizers "unreliable" is a bit unfair. :-) Go's GC is also incremental: a tracing cycle doesn't traverse the whole graph at once. So these 10 invocations are necessary due to the size of the object graph. The bigger the graph, the higher the number of incremental steps necessary to reach an specific object in the graph.
That's very unsafe because code retaining references to the object won't know whether Release has been already called on the shared reference. |
I don't think so. Python finalizers work reliably after a GC pass, they don't stay pending until several other GC passes, waiting for a goroutine to pick them up. (Python is reference-counted but also has a mark-and-sweep GC for reference cycles) Go finalizers being executed "eventually" is a design choice, not a fundamental obligation.
Go's GC may be incremental, but Runtime.GC is not supposed to be. At least I couldn't find a source stating that Runtime.GC only performs an incremental GC pass.
Code retaining references to the object should have called Retain() to ensure they own the reference to the object. |
If you're keeping ref-counts, your GC can be precise. Even though you trace to break cycles, "tracing GC" refers to GCs that don't have ref-counts and always have to trace.
Interesting. Docs indeed say that, but apparently there is work being delayed if we're observing that an arbitrary number of cycles is necessary to get a collection.
Well... sure. If that is a public API this will be easier said than done. |
This is where we disagree on the interpretation :-). You are saying that My interpretation reading the
AFAICT, all Go Arrow objects that expose a refcounted |
Haha. I wasn't interpreting the docs, just commenting about tracing GCs in general (the ones without ref-counting) based on what I've read in the past about Go's GC. You convinced me that the docs give very mixed signals.
It's possible, but extremely costly, so people developing these runtimes tend to cheat to prevent their customers from killing their servers' throughput by abusing the manual GC API. Designs that rely on finalizers are very frowned upon.
Yes. It's a common pattern. It would be a problem if the API was already being used publicly relying solely on GC's semantics. |
The reason why the I leveraged this for #38314 (review) |
) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: #38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
apache#38314) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: apache#38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
apache#38314) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: apache#38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
apache#38314) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: apache#38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
The current implementation of array and record batch importation in Go sets a finalizer to trigger the release callback when any reference to the underlying
ArrayData
is collected.Unfortunately, Go finalizers are unreliable and even a GC run is not necessarily sufficient to get them invoked.
Relevant excerpts from https://pkg.go.dev/runtime#SetFinalizer :
Component(s)
Go
The text was updated successfully, but these errors were encountered: