-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37537: [Integration][C++] Add C Data Interface integration testing #37769
Conversation
ce9186f
to
92cb90e
Compare
@wjones127 @tustvold, FYI, following this PR, you'll need to add the |
Thank you for the heads up - created a tracking issue so this doesn't get missed - apache/arrow-rs#4828 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid
raise NotImplementedError | ||
|
||
def compare_allocation_state(self, recorded: object, | ||
gc_until: typing.Callable[[_Predicate], bool] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this parameter's type quite confusing. IIUC it's a cancel token to cut off long-running gc? I'm not sure why this control is given to an exporter or importer. Instead, I would think it would make more sense for the runner check_memory_released
to construct this token. It can be Callable[[], bool]
and returns true if GC is taking too long and the runner is aborting the current case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, really, it's as the doc states. Some runtimes may need several GC calls to properly release memory, hence the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but it seems like this could be done more simply with something like:
@contextmanager
def check_memory_released(exporter: CDataExporter, importer: CDataImporter,
gc_timeout: Callable[[], bool] = _default_timeout):
do_check = (exporter.supports_releasing_memory and
importer.supports_releasing_memory)
if not do_check:
yield; return
before = exporter.record_allocation_state()
yield
while exporter.record_allocation_state() != before:
if gc_timeout():
raise ...
importer.gc_once()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably could, though I'm not sure it's simpler :-). Both solutions (yours and mine) are IMHO not very elegant, and we may have to revisit if making two GCs coexist ends up more complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer mine for less inversion of control since the exporter doesn't need to manage the importer's garbage collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with the current version. This can evolve quite easily as this is internal tooling, so no API guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good. Have a few minor comments.
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
I'll merge this PR now, thanks for the reviews! |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3b646ad. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…esting (apache#37769) ### Rationale for this change Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests. ### What changes are included in this PR? 1. Add Archery infrastructure for integration testing of the C Data Interface 2. Add implementation of this interface for Arrow C++ ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37537 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…esting (apache#37769) ### Rationale for this change Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests. ### What changes are included in this PR? 1. Add Archery infrastructure for integration testing of the C Data Interface 2. Add implementation of this interface for Arrow C++ ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? No. * Closes: apache#37537 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Currently there are no systematic integration tests between implementations of the C Data Interface, only a couple ad-hoc tests.
What changes are included in this PR?
Are these changes tested?
Yes, by construction.
Are there any user-facing changes?
No.