You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
Code to reproduce as a unit test that I added to cpp/src/arrow/dataset/dataset_test.cc, which logs the open files in the dataset directory (only works on Linux). This needs some extra headers:
When I run this (on Fedora 39, using GCC 13) I get output like:
Open files in directory /tmp/dataset-scan-test-268jyz3s/ after read:
/tmp/dataset-scan-test-268jyz3s/data.arrow
Open files in directory /tmp/dataset-scan-test-268jyz3s/ after close:
/tmp/dataset-scan-test-268jyz3s/data.arrow
Open files in directory /tmp/dataset-scan-test-268jyz3s/ after reader destruct:
Open files in directory /tmp/dataset-scan-test-268jyz3s/ after scanner destruct:
This shows that neither consuming the RecordBatchReader by reading it into a table nor calling the Close method results in the IPC file being closed, it's only closed after the reader is destroyed. The Close implementation doesn't do anything other than consume all the data:
For context, this causes errors trying to remove the dataset directory in Windows when using the GLib bindings via Ruby, where there isn't a way to force destruction of the reader and we have to rely on GC (#41750).
Component(s)
C++
The text was updated successfully, but these errors were encountered:
…l values
Iterator keeps its resource (ptr_) until it's deleted but we can
release its resource immediately when it reads all values. If Iterator
keeps its resource until it's deleted, it may block closing file. See
apacheGH-41771 for this case.
kou
added a commit
to kou/arrow
that referenced
this issue
May 25, 2024
…l values
Iterator keeps its resource (ptr_) until it's deleted but we can
release its resource immediately when it reads all values. If Iterator
keeps its resource until it's deleted, it may block closing file. See
apacheGH-41771 for this case.
…ads all values (#41824)
### Rationale for this change
`Iterator` keeps its resource (`ptr_`) until it's deleted but we can release its resource immediately when it reads all values. If `Iterator` keeps its resource until it's deleted, it may block closing a file. See GH-41771 for this case.
### What changes are included in this PR?
Releases `ptr_` when `Next()` returns the end.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Yes.
* GitHub Issue: #41771
Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
Code to reproduce as a unit test that I added to
cpp/src/arrow/dataset/dataset_test.cc
, which logs the open files in the dataset directory (only works on Linux). This needs some extra headers:Test methods:
When I run this (on Fedora 39, using GCC 13) I get output like:
This shows that neither consuming the
RecordBatchReader
by reading it into a table nor calling theClose
method results in the IPC file being closed, it's only closed after the reader is destroyed. TheClose
implementation doesn't do anything other than consume all the data:arrow/cpp/src/arrow/dataset/scanner.cc
Lines 113 to 120 in 37e5240
For context, this causes errors trying to remove the dataset directory in Windows when using the GLib bindings via Ruby, where there isn't a way to force destruction of the reader and we have to rely on GC (#41750).
Component(s)
C++
The text was updated successfully, but these errors were encountered: