Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37789: [Integration][Go] Go C Data Interface integration testing #37788

Merged
merged 7 commits into from
Sep 26, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 19, 2023

Rationale for this change

We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.

What changes are included in this PR?

  1. Enable C Data Interface integration testing for the Arrow Go implementation
  2. Fix compatibility issues found by the integration tests

Are these changes tested?

Yes, by construction.

Are there any user-facing changes?

Bugfixes in the Arrow Go C Data Interface implementation.

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@pitrou
Copy link
Member Author

pitrou commented Sep 19, 2023

@zeroshade It would be nice if you could take a look at this. This is draft as my Go skills are almost non-existent :-)

@zeroshade
Copy link
Member

@pitrou I'll definitely take a look at this today. thanks for drafting it out in the first place, I was planning on doing this myself after i saw your other PR get merged!

@pitrou pitrou changed the title WIP: Go C Data Integration GH-37789: [Integration][Go] Go C Data Interface integration testing Sep 19, 2023
@github-actions
Copy link

⚠️ GitHub issue #37789 has been automatically assigned in GitHub to PR creator.

}

func newJsonReader(cJsonPath *C.char) (*arrjson.Reader, error) {
jsonPath := C.GoString(cJsonPath)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade I don't really understand our Go coding style. Should I use camelCase or snake_case for variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general style for go is camelCase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for confirming my intuition. Is there any way to enforce that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if there's an option i'm missing in the linter workflow that would enforce it

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 19, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 20, 2023
@pitrou pitrou marked this pull request as ready for review September 20, 2023 14:59
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 20, 2023
@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

@zeroshade This is ready for review now.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM just some stylistic nit picks

go_lib="arrow_go_integration.so"
;;
Darwin)
go_lib="arrow_go_integration.so"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use .dylib for mac?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah, yes, we might!
Though, admittedly this entire condition is a bit pedantic as the library is only loaded explicitly using its full pathname :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, though looking at my existing cdata/test/test_export_to_cgo.py it looks like I do the same thing as you're doing here. I guess the default output name is still .so on mac unless you provide a -o option to give it the dylib explicitly. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, according to https://stackoverflow.com/a/29226313, there's a theoretical difference between the two which doesn't seem to matter in practice:

The difference between .dylib and .so on mac os x is how they are compiled. For .so files you use -shared and for .dylib you use -dynamiclib. Both .so and .dylib are interchangeable as dynamic library files and either have a type as DYLIB or BUNDLE. [...] The reason the two are equivalent on Mac OS X is for backwards compatibility with other UNIX OS programs that compile to the .so file type.

dev/archery/archery/integration/tester_go.py Outdated Show resolved Hide resolved
Comment on lines +201 to +202
# Note: the Arrow Go C Data export functions expect their output
# ArrowStream or ArrowArray argument to be zero-initialized.
# This is currently ensured through the use of `ffi.new`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, i thought trampoline.c zero-initialized everything but you're right, we're only trampolining to zero-initialize for streamGetSchema and streamGetNext, darn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what trampoline.c is, but I directly call into CGo-exported functions here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See apache/arrow-adbc#729 for the full discussion, but essentially the callbacks that were populated into the C Data ArrowArrayStream struct for getting the schema and getting the next record batch had the same assumption that the output arguments were zero-initialized. So @lidavidm created trampoline.c which essentially provided wrappers around the Go exported streamGetSchema and streamGetNext function pointers, and we use the trampoline methods as the function pointers we use (see exportStream in cdata/exports.go)

But like i said, it looks like we only did that for the function pointers we set for the callbacks in the ArrowArrayStream struct and not for the other exported functions for regular import/export.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to apply the same protections for the rest.

Notably, Arrow-C++ does NOT zero-initialize structs before using them so we should try to reflect reality here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was just thinking it should be done in a different PR than this one, unless @pitrou feels like adding the same protections to the existing exported funcs here 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's defer this to another issue and PR.

defaulttz := "UTC"
defaulttz := ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this incorrect by the spec? I thought I pulled this from the c bridge which had the same default? (I could be misremembering though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was incorrect and failing the integration tests between C++ and Go.
A timestamp with an missing timezone is not the same as a timestamp with the UTC timezone, as explained in the format spec:

/// Timestamp is a 64-bit signed integer representing an elapsed time since a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, must have mistaken that when I originally coded this 😦

go/arrow/cdata/cdata_exports.go Outdated Show resolved Hide resolved
go/arrow/internal/cdata_integration/entrypoints.go Outdated Show resolved Hide resolved
go/arrow/internal/cdata_integration/entrypoints.go Outdated Show resolved Hide resolved
go/arrow/internal/cdata_integration/entrypoints.go Outdated Show resolved Hide resolved
go/arrow/internal/cdata_integration/entrypoints.go Outdated Show resolved Hide resolved
go/arrow/internal/cdata_integration/entrypoints.go Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 22, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 25, 2023
@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2023

The CI failures are unrelated. The Go and integration tests are all green, and I believe I've addressed all your comments @zeroshade . Is it ok if I merge this?

@pitrou pitrou merged commit cc51e68 into apache:main Sep 26, 2023
52 of 56 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Sep 26, 2023
@pitrou pitrou deleted the go-c-data-integration branch September 26, 2023 07:14
@zeroshade
Copy link
Member

Sorry for not responding yesterday, I was out sick. But this all looked good to me! Thanks!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit cc51e68.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

etseidl pushed a commit to etseidl/arrow that referenced this pull request Sep 28, 2023
…ting (apache#37788)

### Rationale for this change

We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.

### What changes are included in this PR?

1. Enable C Data Interface integration testing for the Arrow Go implementation
2. Fix compatibility issues found by the integration tests

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

Bugfixes in the Arrow Go C Data Interface implementation.

* Closes: apache#37789

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…ting (apache#37788)

### Rationale for this change

We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.

### What changes are included in this PR?

1. Enable C Data Interface integration testing for the Arrow Go implementation
2. Fix compatibility issues found by the integration tests

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

Bugfixes in the Arrow Go C Data Interface implementation.

* Closes: apache#37789

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…ting (apache#37788)

### Rationale for this change

We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.

### What changes are included in this PR?

1. Enable C Data Interface integration testing for the Arrow Go implementation
2. Fix compatibility issues found by the integration tests

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

Bugfixes in the Arrow Go C Data Interface implementation.

* Closes: apache#37789

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ting (apache#37788)

### Rationale for this change

We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.

### What changes are included in this PR?

1. Enable C Data Interface integration testing for the Arrow Go implementation
2. Fix compatibility issues found by the integration tests

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

Bugfixes in the Arrow Go C Data Interface implementation.

* Closes: apache#37789

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Integration][Go] Implement Go C Data Interface integration testing
3 participants