-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41771: [C++] Iterator releases its resource immediately when it reads all values #41824
Conversation
…l values Iterator keeps its resource (ptr_) until it's deleted but we can release its resource immediately when it reads all values. If Iterator keeps its resource until it's deleted, it may block closing file. See apacheGH-41771 for this case.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Result<T> Next() { | ||
if (ptr_) { | ||
auto next_result = next_(ptr_.get()); | ||
if (next_result.ok() && IsIterationEnd(next_result.ValueUnsafe())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the reason? Performance?
For the #41771 case:
This is the underlying iterator:
arrow/cpp/src/arrow/dataset/scanner.cc
Lines 380 to 386 in 7f0c407
Result<TaggedRecordBatchIterator> AsyncScanner::ScanBatches() { | |
return ::arrow::internal::IterateSynchronously<TaggedRecordBatch>( | |
[this](::arrow::internal::Executor* executor) { | |
return ScanBatchesAsync(executor); | |
}, | |
scan_options_->use_threads); | |
} |
We want to release the IPC reader created at:
arrow/cpp/src/arrow/dataset/file_ipc.cc
Line 144 in 7bc2452
auto open_reader = OpenReaderAsync(source); |
It's referred by the underlying iterator indirectly via Future
/std::function
.
It seems that we can't remove a reference from Future
/std::function
without deleting Future
/std::function
and we can't delete Future
/std::function
without deleting the underlying iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the reason? Performance?
Performance, binary code size, and overall elegance of the iterator tree.
But I see that it's not possible because these iterators are like C++ in that the iterator itself is the value as well instead of being something that produces the value. So just ignore my comment.
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e6e00e7. There were 6 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 102 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Iterator
keeps its resource (ptr_
) until it's deleted but we can release its resource immediately when it reads all values. IfIterator
keeps its resource until it's deleted, it may block closing a file. See GH-41771 for this case.What changes are included in this PR?
Releases
ptr_
whenNext()
returns the end.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.