Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41771: [C++] Iterator releases its resource immediately when it reads all values #41824

Merged
merged 1 commit into from
May 28, 2024

Conversation

kou
Copy link
Member

@kou kou commented May 25, 2024

Rationale for this change

Iterator keeps its resource (ptr_) until it's deleted but we can release its resource immediately when it reads all values. If Iterator keeps its resource until it's deleted, it may block closing a file. See GH-41771 for this case.

What changes are included in this PR?

Releases ptr_ when Next() returns the end.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

…l values

Iterator keeps its resource (ptr_) until it's deleted but we can
release its resource immediately when it reads all values. If Iterator
keeps its resource until it's deleted, it may block closing file. See
apacheGH-41771 for this case.
@kou kou requested a review from bkietz May 25, 2024 03:50
Copy link

⚠️ GitHub issue #41771 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels May 28, 2024
@bkietz bkietz merged commit e6e00e7 into apache:main May 28, 2024
38 of 39 checks passed
@bkietz bkietz removed the awaiting merge Awaiting merge label May 28, 2024
@kou kou deleted the cpp-dataset-scanner-reader branch May 28, 2024 23:38
Result<T> Next() {
if (ptr_) {
auto next_result = next_(ptr_.get());
if (next_result.ok() && IsIterationEnd(next_result.ValueUnsafe())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou @bkietz This extra check and release should be the responsibility of the underlying iterator instead of forcing every abstract iterator to behave this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reason? Performance?

For the #41771 case:

This is the underlying iterator:

Result<TaggedRecordBatchIterator> AsyncScanner::ScanBatches() {
return ::arrow::internal::IterateSynchronously<TaggedRecordBatch>(
[this](::arrow::internal::Executor* executor) {
return ScanBatchesAsync(executor);
},
scan_options_->use_threads);
}

We want to release the IPC reader created at:

auto open_reader = OpenReaderAsync(source);

It's referred by the underlying iterator indirectly via Future/std::function.

It seems that we can't remove a reference from Future/std::function without deleting Future/std::function and we can't delete Future/std::function without deleting the underlying iterator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reason? Performance?

Performance, binary code size, and overall elegance of the iterator tree.

But I see that it's not possible because these iterators are like C++ in that the iterator itself is the value as well instead of being something that produces the value. So just ignore my comment.

@github-actions github-actions bot added the awaiting changes Awaiting changes label Jun 3, 2024
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e6e00e7.

There were 6 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 102 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants