Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37537: [Integration][C++] Add C Data Interface integration testing #37769

Merged
merged 4 commits into from
Sep 19, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 18, 2023

Rationale for this change

Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.

What changes are included in this PR?

  1. Add Archery infrastructure for integration testing of the C Data Interface
  2. Add implementation of this interface for Arrow C++

Are these changes tested?

Yes, by construction.

Are there any user-facing changes?

No.

@pitrou pitrou marked this pull request as ready for review September 18, 2023 15:29
@pitrou
Copy link
Member Author

pitrou commented Sep 18, 2023

@wjones127 @tustvold, FYI, following this PR, you'll need to add the --run-ipc option in these two places:

@tustvold
Copy link
Contributor

Thank you for the heads up - created a tracking issue so this doesn't get missed - apache/arrow-rs#4828

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid

raise NotImplementedError

def compare_allocation_state(self, recorded: object,
gc_until: typing.Callable[[_Predicate], bool]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this parameter's type quite confusing. IIUC it's a cancel token to cut off long-running gc? I'm not sure why this control is given to an exporter or importer. Instead, I would think it would make more sense for the runner check_memory_released to construct this token. It can be Callable[[], bool] and returns true if GC is taking too long and the runner is aborting the current case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, really, it's as the doc states. Some runtimes may need several GC calls to properly release memory, hence the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but it seems like this could be done more simply with something like:

@contextmanager
def check_memory_released(exporter: CDataExporter, importer: CDataImporter,
                          gc_timeout: Callable[[], bool] = _default_timeout):
    do_check = (exporter.supports_releasing_memory and
                importer.supports_releasing_memory)
    if not do_check:
        yield; return
    before = exporter.record_allocation_state()
    yield
    while exporter.record_allocation_state() != before:
        if gc_timeout():
            raise ...
        importer.gc_once()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably could, though I'm not sure it's simpler :-). Both solutions (yours and mine) are IMHO not very elegant, and we may have to revisit if making two GCs coexist ends up more complicated...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer mine for less inversion of control since the exporter doesn't need to manage the importer's garbage collection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with the current version. This can evolve quite easily as this is internal tooling, so no API guarantees.

dev/archery/archery/integration/tester_cpp.py Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 18, 2023
Copy link
Member

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good. Have a few minor comments.

dev/archery/archery/integration/runner.py Outdated Show resolved Hide resolved
dev/archery/archery/integration/runner.py Outdated Show resolved Hide resolved
dev/archery/archery/integration/tester.py Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Sep 18, 2023
pitrou and others added 3 commits September 19, 2023 08:16
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Sep 19, 2023
@pitrou
Copy link
Member Author

pitrou commented Sep 19, 2023

I'll merge this PR now, thanks for the reviews!

@pitrou pitrou merged commit 3b646ad into apache:main Sep 19, 2023
40 checks passed
@pitrou pitrou removed the awaiting changes Awaiting changes label Sep 19, 2023
@pitrou pitrou deleted the gh37537-c-data-integration branch September 19, 2023 14:41
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Sep 19, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b646ad.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…esting (apache#37769)

### Rationale for this change

Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.

### What changes are included in this PR?

1. Add Archery infrastructure for integration testing of the C Data Interface
2. Add implementation of this interface for Arrow C++

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37537

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…esting (apache#37769)

### Rationale for this change

Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.

### What changes are included in this PR?

1. Add Archery infrastructure for integration testing of the C Data Interface
2. Add implementation of this interface for Arrow C++

### Are these changes tested?

Yes, by construction.

### Are there any user-facing changes?

No.
* Closes: apache#37537

Lead-authored-by: Antoine Pitrou <antoine@python.org>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Integration] Add C Data Interface integration testing
4 participants