-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13816: [Go][C] Implement Consumer APIs for C Data Interface in Go #11037
Conversation
The failure in the integration test is an issue with the rust implementation that is unrelated to this change it seems. |
Yes, the integration test failure is unrelated. |
Hmm, I feel incompetent to review Go code.
|
@pitrou I'm rather happy with myself at the moment since I've managed to actually pull your suggestion off, though i'm not sure the best way to actually incorporate such a test into the current build system. For posterity, here's some snippets showing what I did: Created a go file with the following: package main
import (
"fmt"
"unsafe"
"github.com/apache/arrow/go/arrow/cdata"
)
// #include <stdint.h>
import "C"
//export importSchema
func importSchema(ptr C.uintptr_t) {
sc := (*cdata.CArrowSchema)(unsafe.Pointer(uintptr(ptr)))
fmt.Println(cdata.ImportCArrowSchema(sc))
}
func main() {} In order to build a c-shared library it apparently needs to be a main package. Anyway, the above file will export a function "importSchema" which takes a uintptr_t which is a pointer to an >>> import pyarrow as pa
>>> from pyarrow.cffi import ffi
>>> c_schema = ffi.new("struct ArrowSchema*")
>>> ptr_schema = int(ffi.cast("uintptr_t", c_schema))
>>> pa.schema([('ints', pa.list_(pa.int32()))], metadata={b'key1': b'value1'})._export_to_c(ptr_schema)
>>> from ctypes import *
>>> gotest = cdll.LoadLibrary('gotest.dll') # or gotest.so
>>> gotest.importSchema(c_ulonglong(ptr_schema))
schema:
fields: 1
- ints: type=list<item: int32>, nullable
metadata: ["key1": "value1"]
0 So all in all, really cool. But I don't know where in the tests would make sense for how I should incorporate such a test where I build the go dynamic lib and then run the a python script with pyarrow to call the function. Any ideas? |
Well, I don't know how Go runs tests, so I'm not able to answer that question. Perhaps @sbinet or @fsaintjacques have an idea. |
Note that the CI runs the scripts https://github.com/apache/arrow/blob/master/ci/scripts/go_build.sh and https://github.com/apache/arrow/blob/master/ci/scripts/go_test.sh as part of its Docker routine: https://github.com/apache/arrow/blob/master/docker-compose.yml#L1196-L1199 (which you can also reproduce using |
so, the go tests just run by running |
Ah, you can just |
Ok, i'll muck around with the tests and see if i can easily put something together to do that as an integration test then. |
@pitrou I've updated the test cases to add a test which installs pyarrow and runs a python script against a built golang shared object module to test and confirm exporting a schema and record batch from python to Go and Go importing it correctly. 😄 |
@pitrou @emkornfield @sbinet @fsaintjacques Any chance I might be able to get an approval on this? I have POC for implementing the producer APIs but it relies a lot on this same consumer code so I'd rather file it as a subsequent PR rather than make this one bigger. |
I don't really read Go. I can try to give general advice but some Go developer will have to review the implementation. |
@zeroshade I pushed some changes to improve the Python tests and try to fix some allocation issues. Can you double-check them? |
def run_gc(self): | ||
# Several Go GC runs can be required to run all finalizers | ||
for i in range(5): | ||
cgotest.runGC() | ||
gc.collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsaintjacques Does this look reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou rather than doing this, would it make more sense for me to simply include calls to runtime.GC()
in the test module's import functions? that way the python doesn't need to explicitly call the Go garbage collector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this would be sufficient, but when we test the other way (export from Go to Python) we may need to invoke the Go GC from Python after the Python Schema or RecordBatch is garbage-collected.
(sorry: may, not will)
@pitrou I'm updating the comments on the exported Import functions in Otherwise your changes look fine from my end. Thanks for the assistance. |
@emkornfield @sbinet @fsaintjacques would one of you be willing to just give this a once over and approve? While I am a committer now, I'm not sure I wanna go merging this without at least one approval on it haha. 😄 |
@pitrou I know you're not completely comfortable reviewing the Go code, but given that i've been unable to get someone else to come comment on it, do you have any particular qualms with me merging this as it's holding up some other development that I want to get PRs up for with the Go library. |
@zeroshade I'm fine with you merging this. Go ahead, and let's move forward :-) |
This supercedes some portions of apache#10995 as I think it's better to centralize the cdata interface interactions in a specific module that is part of the arrow module rather than it being a part of the dataset module. So this should get reviewed and merged first. CC @pitrou @emkornfield Closes apache#11037 from zeroshade/go-cdata Lead-authored-by: Matthew Topol <mtopol@factset.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Matt Topol <mtopol@factset.com> Signed-off-by: Matthew Topol <mtopol@factset.com>
This supercedes some portions of #10995 as I think it's better to centralize the cdata interface interactions in a specific module that is part of the arrow module rather than it being a part of the dataset module. So this should get reviewed and merged first.
CC @pitrou @emkornfield