[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281

pitrou · 2023-10-16T14:15:41Z

Describe the bug, including details regarding any error messages, version, and platform.

The current implementation of array and record batch importation in Go sets a finalizer to trigger the release callback when any reference to the underlying ArrayData is collected.

Unfortunately, Go finalizers are unreliable and even a GC run is not necessarily sufficient to get them invoked.

Relevant excerpts from https://pkg.go.dev/runtime#SetFinalizer :

The finalizer is scheduled to run at some arbitrary time after the program can no longer reach the object to which obj points. There is no guarantee that finalizers will run before a program exits, so typically they are useful only for releasing non-memory resources associated with an object during a long-running program.

A single goroutine runs all finalizers for a program, sequentially.

Component(s)

Go

The text was updated successfully, but these errors were encountered:

pitrou · 2023-10-16T14:15:50Z

cc @zeroshade

pitrou · 2023-10-16T14:21:36Z

It seems we could instead use the same approach as C++ and Java do: when importing buffers, bind them to a ref-counted Allocator that will decref on Free and call the release callback when the decref drops to zero.

zeroshade · 2023-10-16T14:25:05Z

While there is no guarantee that a particular finalizer will run before a program exits, it will eventually be called over the lifetime of a program or the program will exit (at which point it becomes moot since it would be released like any other memory from the program). Even though it's not guaranteed on any individual GC-run, in practice I've found that running the GC is enough to trigger the finalizer (and is how the python <--> Go integration tests in that dir work).

It seems we could instead use the same approach as C++ and Java do: when importing buffers, bind them to a ref-counted Allocator that will decref on Free and call the release callback when the decref drops to zero.

I can see a way in which we could possibly do this because Allocator is an interface, i'll see if i can sketch something out that can work for this.

pitrou · 2023-10-16T14:28:49Z

Even though it's not guaranteed on any individual GC-run, in practice I've found that running the GC is enough to trigger the finalizer (and is how the python <--> Go integration tests in that dir work).

Indeed. However, in the real world, it can be annoying that a large piece of data doesn't get collected even after a GC run. My experience with C Data integration is that I have to run the GC up to 10 times.

(and I suppose Arrow Go users would like Release() to work correctly?)

zeroshade · 2023-10-16T14:33:03Z

That's absolutely interesting and not what I expected (though it's also not unexpected), I'll see what I can come up with.

felipecrv · 2023-10-17T18:03:39Z

That's absolutely interesting and not what I expected (though it's also not unexpected), I'll see what I can come up with.

This is a classic issue with GC finalizers: they can't run at a deterministic time because the GC has to prove it's not referenced anymore (by the end of a tracing cycle) and even then it can decide to delay the finalization/collection to a later collection cycle to keep the cost of GC pauses low. This is a fundamental issue of tracing-based GCs, so calling finalizers "unreliable" is a bit unfair. :-)

Go's GC is also incremental: a tracing cycle doesn't traverse the whole graph at once. So these 10 invocations are necessary due to the size of the object graph. The bigger the graph, the higher the number of incremental steps necessary to reach an specific object in the graph.

(and I suppose Arrow Go users would like Release() to work correctly?)

That's very unsafe because code retaining references to the object won't know whether Release has been already called on the shared reference.

pitrou · 2023-10-17T18:10:58Z

This is a fundamental issue of tracing-based GCs, so calling finalizers "unreliable" is a bit unfair.

I don't think so. Python finalizers work reliably after a GC pass, they don't stay pending until several other GC passes, waiting for a goroutine to pick them up.

(Python is reference-counted but also has a mark-and-sweep GC for reference cycles)

Go finalizers being executed "eventually" is a design choice, not a fundamental obligation.

Go's GC is also incremental: a tracing cycle doesn't traverse the whole graph at once. So these 10 invocations are necessary due to the size of the object graph.

Go's GC may be incremental, but Runtime.GC is not supposed to be. At least I couldn't find a source stating that Runtime.GC only performs an incremental GC pass.

That's very unsafe because code retaining references to the object won't know whether Release has been already called on the shared reference.

Code retaining references to the object should have called Retain() to ensure they own the reference to the object.

felipecrv · 2023-10-17T18:34:01Z

(Python is reference-counted but also has a mark-and-sweep GC for reference cycles)

If you're keeping ref-counts, your GC can be precise. Even though you trace to break cycles, "tracing GC" refers to GCs that don't have ref-counts and always have to trace.

Go's GC may be incremental, but Runtime.GC is not supposed to be. At least I couldn't find a source stating that Runtime.GC only performs an incremental GC pass.

Interesting. Docs indeed say that, but apparently there is work being delayed if we're observing that an arbitrary number of cycles is necessary to get a collection.

Code retaining references to the object should have called Retain() to ensure they own the reference to the object.

Well... sure. If that is a public API this will be easier said than done.

pitrou · 2023-10-17T18:47:19Z

Interesting. Docs indeed say that, but apparently there is work being delayed if we're observing that an arbitrary number of cycles is necessary to get a collection.

This is where we disagree on the interpretation :-). You are saying that Runtime.GC does not reliably ensure a full GC pass occurred. I could not find a reference to support that point. Other languages like Java or C# with sophisticated GCs also expose functions to ensure a full GC pass, so it's at least conceptually possible to do so.

My interpretation reading the Runtime.SetFinalizer doc is that, even if the GC detects that an object does not have any external references anymore, its finalization might be arbitrarily delayed (until the end of the process if you're unlucky).

Well... sure. If that is a public API this will be easier said than done.

AFAICT, all Go Arrow objects that expose a refcounted Release method also expose a Retain counterpart. But @zeroshade should be able to confirm or infirm this.

felipecrv · 2023-10-17T19:15:11Z

Haha. I wasn't interpreting the docs, just commenting about tracing GCs in general (the ones without ref-counting) based on what I've read in the past about Go's GC. You convinced me that the docs give very mixed signals.

Other languages like Java or C# with sophisticated GCs also expose functions to ensure a full GC pass, so it's at least conceptually possible to do so.

It's possible, but extremely costly, so people developing these runtimes tend to cheat to prevent their customers from killing their servers' throughput by abusing the manual GC API. Designs that rely on finalizers are very frowned upon.

AFAICT, all Go Arrow objects that expose a refcounted Release method also expose a Retain counterpart. But @zeroshade should be able to confirm or infirm this.

Yes. It's a common pattern. It would be a problem if the API was already being used publicly relying solely on GC's semantics.

zeroshade · 2023-10-17T19:39:48Z

The reason why the Release/Retain API was put forth for the Arrow objects is because we have the Allocator interface to allow for allocations which don't rely on the Go GC, so we maintain the reference counts in order to ensure proper cleanup calls to Free on the allocator.

I leveraged this for #38314 (review)

) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: #38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>

apache#38314) ### Rationale for this change The usage of `SetFinalizer` means that it's not *guaranteed* that calling `Release()` on an imported Record or Array will actually free the memory during the lifetime of the process. Instead we can leverage a shared buffer count, atomic ref counting and a custom allocator to ensure proper and more timely memory releasing when importing from C Data interface. ### What changes are included in this PR? * Some simplifications of code to use `unsafe.Slice` instead of the deprecated handling of `reflect.SliceHeader` to improve readability * Updating tests using `mallocator.Mallocator` in order to easily allow testing to ensure that memory is being cleaned up and freed * Fixing a series of memory leaks subsequently found by the previous change of using the `mallocator.Mallocator` to track the allocations used for testing arrays. ### Are these changes tested? Yes, unit tests are updated and included. * Closes: apache#38281 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>

pitrou added the Type: bug label Oct 16, 2023

github-actions bot added the Component: Go label Oct 16, 2023

pitrou added this to the 15.0.0 milestone Oct 16, 2023

zeroshade added a commit to zeroshade/arrow that referenced this issue Oct 16, 2023

apacheGH-38281: [Go] Ensure CData imported arrays are freed on release

85a3e79

github-actions bot mentioned this issue Oct 17, 2023

GH-38281: [Go] Ensure CData imported arrays are freed on release #38314

Merged

github-actions bot assigned zeroshade Oct 17, 2023

zeroshade closed this as completed in #38314 Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281

[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281

pitrou commented Oct 16, 2023 •

edited

pitrou commented Oct 16, 2023

pitrou commented Oct 16, 2023

zeroshade commented Oct 16, 2023

pitrou commented Oct 16, 2023

zeroshade commented Oct 16, 2023

felipecrv commented Oct 17, 2023

pitrou commented Oct 17, 2023

felipecrv commented Oct 17, 2023

pitrou commented Oct 17, 2023

felipecrv commented Oct 17, 2023 •

edited

zeroshade commented Oct 17, 2023

[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281

[Go] Release() on C Data-imported arrays and batches does not ensure release() is called #38281

Comments

pitrou commented Oct 16, 2023 • edited

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

pitrou commented Oct 16, 2023

pitrou commented Oct 16, 2023

zeroshade commented Oct 16, 2023

pitrou commented Oct 16, 2023

zeroshade commented Oct 16, 2023

felipecrv commented Oct 17, 2023

pitrou commented Oct 17, 2023

felipecrv commented Oct 17, 2023

pitrou commented Oct 17, 2023

felipecrv commented Oct 17, 2023 • edited

zeroshade commented Oct 17, 2023

pitrou commented Oct 16, 2023 •

edited

felipecrv commented Oct 17, 2023 •

edited