ARROW-14061: [Go][C++] Add Cgo Arrow Memory Pool Allocator #11206

zeroshade · 2021-09-21T20:15:33Z

Continuing with the idea of exposing the Compute APIs within the Go implementation via CGO, in order to ensure safer memory handling there should be an allocator implementation which uses CGO in order to allocate memory via the C++ memory pool along with utilities for tracking memory leaks.

github-actions · 2021-09-21T20:15:55Z

https://issues.apache.org/jira/browse/ARROW-14061

github-actions · 2021-09-21T20:15:56Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

zeroshade · 2021-09-28T14:52:03Z

CC @emkornfield

We really need to get someone else who can review Go stuff so I don't keep overwhelming you with so many PRs haha.

Also, @pitrou may want to look at this or know who else I can tag given the connection with the C++ memory pool stuff here.

… arrow-14061

pitrou · 2021-09-29T08:14:25Z

This PR isn't based on git master, is it?

ci/docker/debian-10-go-cgo.dockerfile

pitrou · 2021-09-29T08:16:20Z

go/arrow/memory/cgo_allocator.go

+// the context of a single function call. If the memory in use isn't allocated
+// on the C side, then it is not safe for any pointers to data to be held outside
+// of Go beyond the context of a single Cgo function call as it will be invisible
+// to the Go garbage collector and could potentially get moved without being updated.


I don't understand this comment. In #11220 your releaseData function seems to take care of this correctly (by defining a global container of exported arrays).

When making a CGO call, during the execution of it the Go garbage collector will pin memory being used so that any pointers to go memory will remain valid during the length of that cgo call. But the documentation is very clear that it is not safe to let C/C++ maintain pointers to Go memory beyond the context of that single call because after the cgo call returns, the Go garbage collector is free to move memory around as necessary. Normally the garbage collector can update any references in Go when it does this so that everything continues to work just fine, but it cannot update any pointers in C/C++ when it does this.

So the global container of exported arrays used by releaseData serves two purposes:

Ensuring that during the context of a single CGO call that there is a maintained reference to the Go objects so that it keeps the garbage collector from cleaning it up

Allowing C/C++ to call releaseData to free the memory if we're handing it off, or allowing the ability to hand off memory to let C/C++ own it and control when it is released. this is only safe if the underlying buffers were allocated by C and not Go allocated memory.

For more information about this specifically in terms of passing pointers around: https://pkg.go.dev/cmd/cgo#hdr-Passing_pointers is the documentation on CGO that i'm referencing.

Ok, so what it seems to mean is that Go should always use its own allocator when allocating array data, rather than the Go GC. Would there be a problem in doing that?

So that's also a possibility, rather than using the memorypool exposed by the arrow lib we could instead create our own allocator (or utilize a different one) that allocates memory using C always rather than the DefaultGoAllocator using Go-allocated memory. But that then changes some semantics in terms of memory usage and potential performance characteristics because CGO calls have a higher overhead.

If you aren't making calls to C or passing data around then using the Go allocator is perfectly fine and performant (though if you're doing a lot of allocations, there are a few libraries which have shown significant benefits to using things like jemalloc and manually managing the memory via C calls but this has to be done carefully to avoid paying large overheads in CGO calls and avoid introducing extra dependencies that may not be needed).

I can definitely see a future enhancement to make the default allocator do this, but I didn't want to change the default allocation behavior yet.

i don't want the vanilla go get github.com/apache/arrow/go/arrow library to require having the C++ libarrow available to link against

I don't think the C++ libarrow allocator is necessary, just any C-based allocator (or perhaps there's already a public Go package for this?)

That's fair, and i was considering something, though the drawback there is that we don't want every allocation to go through C because CGO has a higher overhead, so if you're making a lot of smaller allocations the overhead of the CGO calls could pose a potential problem in order to be the default from a performance standpoint. So I'd want to at a minimum hide it behind a build tag just in case.

Ok, cgo performance does seem horrible: https://about.sourcegraph.com/go/gophercon-2018-adventures-in-cgo-performance/

Yea, I figured it was simplest to use the memory pool that already exists in the libarrow library (as I use the libarrow library in my future changes to connect to the compute API), but I could potentially use something like https://github.com/spinlock/jemalloc-go to just include jemalloc as a go-getable dependency. Which i'll play around with and see how it does. I liked the libarrrow memory pool because of the consistency it provided and feature handling such as the logging proxy and the tracking of how many bytes had been allocated in it, but it is likely pretty easy to do similar functionality and use jemalloc directly. As long as consumers are using Reserve and otherwise pre-allocating memory, it would cut down the number of cgo calls to mitigate performance hits.

Alternately I could pursue looking into a slab style allocator that manages much larger chunks of memory that it hands out as a way to amortize the number of calls, but that's a separate thing to look into later.

go/arrow/memory/internal/cgoalloc/helpers.h

go/arrow/memory/internal/cgoalloc/allocator.go

go/arrow/memory/internal/cgoalloc/allocator.cc

pitrou · 2021-09-29T08:31:47Z

@kou Do you want to take a look at the MinGW CI additions?

zeroshade · 2021-09-29T13:48:31Z

@pitrou this is based on master though it wasn't up to date until just now when i re-fetched and merged upstream

… allocations

… arrow-14061

.github/workflows/go.yml

ci/docker/debian-go-cgo.dockerfile

zeroshade · 2021-09-30T16:20:49Z

@pitrou @kou Do either of you have any further comments / issues with this as it stands?

pitrou · 2021-09-30T16:29:12Z

No, feel free to move forward.

Continuing with the idea of exposing the Compute APIs within the Go implementation via CGO, in order to ensure safer memory handling there should be an allocator implementation which uses CGO in order to allocate memory via the C++ memory pool along with utilities for tracking memory leaks. Closes apache#11206 from zeroshade/arrow-14061 Lead-authored-by: Matthew Topol <mtopol@factset.com> Co-authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matthew Topol <mtopol@factset.com>

Matthew Topol added 3 commits September 21, 2021 15:56

implementation

e91c738

ensure base package builds with cgo disabled and cgo image builds

1c6e8c7

add tag ccalloc to properly test

558a3e3

github-actions bot added the Component: Go label Sep 21, 2021

Matthew Topol and others added 8 commits September 21, 2021 16:22

macos using old bash doesn't have -v test

570e648

forgot to fix go_test for macos

842c5f8

macos test with cgo

aed85a3

add -std flag to cxx compiler

e7bf430

don't static link on windows yet

6cf5559

setup ming paths

3a48d1a

fixup tests

e3f36f8

Merge branch 'apache:master' into arrow-14061

968d3c3

zeroshade mentioned this pull request Sep 27, 2021

ARROW-14106: [Go][C] Implement Exporting to the C Data Interface #11220

Closed

zeroshade added 2 commits September 27, 2021 12:20

Merge branch 'apache:master' into arrow-14061

37fd8fb

Merge branch 'apache:master' into arrow-14061

c37df43

Matthew Topol added 3 commits September 28, 2021 11:58

adding a bunch of comments and docs for the allocator and helpers

b0e643d

Merge branch 'arrow-14061' of https://github.com/zeroshade/arrow into…

21d159f

… arrow-14061

comment about costs of small allocations with cgo

70770e8

pitrou reviewed Sep 29, 2021

View reviewed changes

Merge branch 'apache:master' into arrow-14061

90fbeab

Matthew Topol added 4 commits September 29, 2021 10:45

default memory pool and comments for clarification. don't leak 0 byte…

8edd1f9

… allocations

Merge branch 'arrow-14061' of https://github.com/zeroshade/arrow into…

3608d0a

… arrow-14061

use default allocator, simplify the mem_holder

3afa43b

cleanup dockerfile and comments

9558fc8

update to debian 11 via base argument

6547b24

kou reviewed Sep 29, 2021

View reviewed changes

.github/workflows/go.yml Outdated Show resolved Hide resolved

ci/docker/debian-go-cgo.dockerfile Show resolved Hide resolved

Matthew Topol added 2 commits September 30, 2021 12:05

move cgo env vars to msys2_setup.sh

b33c65f

add ming_prefix/bin to path

a13bacd

asfgit closed this in 8568942 Sep 30, 2021

zeroshade deleted the arrow-14061 branch September 30, 2021 16:50

asfimport mentioned this pull request Sep 30, 2021

[Go] Add Cgo Arrow Memory Pool Allocator #29657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-14061: [Go][C++] Add Cgo Arrow Memory Pool Allocator #11206

ARROW-14061: [Go][C++] Add Cgo Arrow Memory Pool Allocator #11206

zeroshade commented Sep 21, 2021

github-actions bot commented Sep 21, 2021

github-actions bot commented Sep 21, 2021

zeroshade commented Sep 28, 2021

pitrou commented Sep 29, 2021

pitrou Sep 29, 2021

zeroshade Sep 29, 2021

pitrou Sep 29, 2021

zeroshade Sep 29, 2021

zeroshade Sep 29, 2021

pitrou Sep 29, 2021

zeroshade Sep 29, 2021

pitrou Sep 29, 2021

zeroshade Sep 29, 2021

pitrou commented Sep 29, 2021

zeroshade commented Sep 29, 2021

zeroshade commented Sep 30, 2021

pitrou commented Sep 30, 2021

ARROW-14061: [Go][C++] Add Cgo Arrow Memory Pool Allocator #11206

ARROW-14061: [Go][C++] Add Cgo Arrow Memory Pool Allocator #11206

Conversation

zeroshade commented Sep 21, 2021

github-actions bot commented Sep 21, 2021

github-actions bot commented Sep 21, 2021

zeroshade commented Sep 28, 2021

pitrou commented Sep 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou commented Sep 29, 2021

zeroshade commented Sep 29, 2021

zeroshade commented Sep 30, 2021

pitrou commented Sep 30, 2021