GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

pitrou · 2023-09-18T14:57:39Z

Rationale for this change

Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.

What changes are included in this PR?

Add Archery infrastructure for integration testing of the C Data Interface
Add implementation of this interface for Arrow C++

Are these changes tested?

Yes, by construction.

Are there any user-facing changes?

No.

Closes: [Integration] Add C Data Interface integration testing #37537

…esting

pitrou · 2023-09-18T15:34:26Z

@wjones127 @tustvold, FYI, following this PR, you'll need to add the --run-ipc option in these two places:

tustvold · 2023-09-18T15:43:44Z

Thank you for the heads up - created a tracking issue so this doesn't get missed - apache/arrow-rs#4828

bkietz

Looks solid

bkietz · 2023-09-18T17:26:07Z

dev/archery/archery/integration/tester.py

+        raise NotImplementedError
+
+    def compare_allocation_state(self, recorded: object,
+                                 gc_until: typing.Callable[[_Predicate], bool]


I find this parameter's type quite confusing. IIUC it's a cancel token to cut off long-running gc? I'm not sure why this control is given to an exporter or importer. Instead, I would think it would make more sense for the runner check_memory_released to construct this token. It can be Callable[[], bool] and returns true if GC is taking too long and the runner is aborting the current case

No, really, it's as the doc states. Some runtimes may need several GC calls to properly release memory, hence the name.

I'm probably missing something, but it seems like this could be done more simply with something like:

@contextmanager def check_memory_released(exporter: CDataExporter, importer: CDataImporter, gc_timeout: Callable[[], bool] = _default_timeout): do_check = (exporter.supports_releasing_memory and importer.supports_releasing_memory) if not do_check: yield; return before = exporter.record_allocation_state() yield while exporter.record_allocation_state() != before: if gc_timeout(): raise ... importer.gc_once()

It probably could, though I'm not sure it's simpler :-). Both solutions (yours and mine) are IMHO not very elegant, and we may have to revisit if making two GCs coexist ends up more complicated...

I prefer mine for less inversion of control since the exporter doesn't need to manage the importer's garbage collection.

I'll go with the current version. This can evolve quite easily as this is internal tooling, so no API guarantees.

dev/archery/archery/integration/tester_cpp.py

wjones127

Overall this looks good. Have a few minor comments.

dev/archery/archery/integration/runner.py

dev/archery/archery/integration/tester.py

Co-authored-by: Will Jones <willjones127@gmail.com>

pitrou · 2023-09-19T14:41:17Z

I'll merge this PR now, thanks for the reviews!

conbench-apache-arrow · 2023-09-19T17:54:22Z

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b646ad.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

…esting (apache#37769) ### Rationale for this change Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests. ### What changes are included in this PR? 1. Add Archery infrastructure for integration testing of the C Data Interface 2. Add implementation of this interface for Arrow C++ ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37537 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

github-actions bot added Component: C++ awaiting review Awaiting review labels Sep 18, 2023

apacheGH-37537: [Integration][C++] Add C Data Interface integration t…

92cb90e

…esting

pitrou force-pushed the gh37537-c-data-integration branch from ce9186f to 92cb90e Compare September 18, 2023 15:29

pitrou marked this pull request as ready for review September 18, 2023 15:29

pitrou requested review from assignUser, kou and raulcd as code owners September 18, 2023 15:29

tustvold mentioned this pull request Sep 18, 2023

Enable New Integration Tests apache/arrow-rs#4828

Closed

pitrou requested review from bkietz and wjones127 and removed request for raulcd and assignUser September 18, 2023 15:54

pitrou added Component: Archery Component: Integration labels Sep 18, 2023

bkietz requested changes Sep 18, 2023

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 18, 2023

wjones127 approved these changes Sep 18, 2023

View reviewed changes

dev/archery/archery/integration/runner.py Outdated Show resolved Hide resolved

dev/archery/archery/integration/runner.py Outdated Show resolved Hide resolved

dev/archery/archery/integration/tester.py Show resolved Hide resolved

github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Sep 18, 2023

pitrou and others added 3 commits September 19, 2023 08:16

Update dev/archery/archery/integration/runner.py

0886841

Co-authored-by: Will Jones <willjones127@gmail.com>

Update dev/archery/archery/integration/runner.py

11b707b

Co-authored-by: Will Jones <willjones127@gmail.com>

Add default implementation for gc_until()

f22bc37

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Sep 19, 2023

pitrou merged commit 3b646ad into apache:main Sep 19, 2023
40 checks passed

pitrou removed the awaiting changes Awaiting changes label Sep 19, 2023

pitrou deleted the gh37537-c-data-integration branch September 19, 2023 14:41

github-actions bot added the awaiting committer review Awaiting committer review label Sep 19, 2023

pitrou mentioned this pull request Sep 27, 2023

[Integration][Java] Implement C Data Interface integration testing for Java #37910

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

pitrou commented Sep 18, 2023 •

edited by github-actions bot

Loading

pitrou commented Sep 18, 2023

tustvold commented Sep 18, 2023

bkietz left a comment

bkietz Sep 18, 2023

pitrou Sep 19, 2023

bkietz Sep 19, 2023

pitrou Sep 19, 2023

bkietz Sep 19, 2023

pitrou Sep 19, 2023

wjones127 left a comment

pitrou commented Sep 19, 2023

conbench-apache-arrow bot commented Sep 19, 2023

GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

Conversation

pitrou commented Sep 18, 2023 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

pitrou commented Sep 18, 2023

tustvold commented Sep 18, 2023

bkietz left a comment

Choose a reason for hiding this comment

bkietz Sep 18, 2023

Choose a reason for hiding this comment

pitrou Sep 19, 2023

Choose a reason for hiding this comment

bkietz Sep 19, 2023

Choose a reason for hiding this comment

pitrou Sep 19, 2023

Choose a reason for hiding this comment

bkietz Sep 19, 2023

Choose a reason for hiding this comment

pitrou Sep 19, 2023

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

pitrou commented Sep 19, 2023

conbench-apache-arrow bot commented Sep 19, 2023

pitrou commented Sep 18, 2023 •

edited by github-actions bot

Loading