-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][C++] Crash when reading from closed RecordBatchReader backed by C Stream #34906
Comments
…from a closed ArrayStreamBatchReader (#35016) ### Rationale for this change Segfaults should be avoided, even on improper calling patterns. ### What changes are included in this PR? Attempting to use a record batch reader, sourced from a C API stream, after it had been closed, will now return an invalid status instead of a segmentation fault. ### Are these changes tested? Yes, new unit tests were added to regress this usage. ### Are there any user-facing changes? No * Closes: #34906 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
It may be the same bug, but a really simple repro of this for me is tab-completion on an empty reader object.
|
That's a different issue I think, since this one was specific to a RecordBatchReader backed by a C Stream. And I can still reproduce this on the latest main. Do you know a way to trigger this from a script? (without the tab-completion action, to be able to run it in a debugger) |
From https://docs.python.org/3.11/tutorial/interactive.html#tab-completion-and-history-editing
So it looks like this is because > python
Python 3.11.1 (main, Jan 27 2023, 14:02:47) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> reader = pa.RecordBatchReader()
>>> reader.schema
[1] 7847 segmentation fault python If I mock out that method, it works: > python
Python 3.11.1 (main, Jan 27 2023, 14:02:47) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> class MyReader(pa.RecordBatchReader):
... @property
... def schema(self):
... return None
...
>>> reader = MyReader()
>>> reader.
reader.close( reader.read_all( reader.read_pandas(
reader.from_batches( reader.read_next_batch( reader.schema
reader.iter_batches_with_custom_metadata( reader.read_next_batch_with_custom_metadata( |
Thanks! Backtrace for this:
Now, the docstring of |
…ading from a closed ArrayStreamBatchReader (apache#35016) ### Rationale for this change Segfaults should be avoided, even on improper calling patterns. ### What changes are included in this PR? Attempting to use a record batch reader, sourced from a C API stream, after it had been closed, will now return an invalid status instead of a segmentation fault. ### Are these changes tested? Yes, new unit tests were added to regress this usage. ### Are there any user-facing changes? No * Closes: apache#34906 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ading from a closed ArrayStreamBatchReader (apache#35016) ### Rationale for this change Segfaults should be avoided, even on improper calling patterns. ### What changes are included in this PR? Attempting to use a record batch reader, sourced from a C API stream, after it had been closed, will now return an invalid status instead of a segmentation fault. ### Are these changes tested? Yes, new unit tests were added to regress this usage. ### Are there any user-facing changes? No * Closes: apache#34906 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ading from a closed ArrayStreamBatchReader (apache#35016) ### Rationale for this change Segfaults should be avoided, even on improper calling patterns. ### What changes are included in this PR? Attempting to use a record batch reader, sourced from a C API stream, after it had been closed, will now return an invalid status instead of a segmentation fault. ### Are these changes tested? Yes, new unit tests were added to regress this usage. ### Are there any user-facing changes? No * Closes: apache#34906 Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
The following snippet reproduces the issue:
Backtrace:
So either on the C++ side, the
ArrayStreamBatchReader::ReadNext
should check if the stream was closed or not, or on the Python side we could not try to read if it was closed (although I assume you would run into this when using it from C++ as well, so it should be fixed at that level).Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: