-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Reading large Parquet files from a seekable connection fails #36819
Comments
paleolimbot
added a commit
that referenced
this issue
Sep 5, 2023
### Rationale for this change When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR. ### What changes are included in this PR? The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR. ### Are these changes tested? The changes are tested in the current suite. ### Are there any user-facing changes? No. * Closes: #36819 Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
loicalleyne
pushed a commit
to loicalleyne/arrow
that referenced
this issue
Nov 13, 2023
…pache#37274) ### Rationale for this change When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR. ### What changes are included in this PR? The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR. ### Are these changes tested? The changes are tested in the current suite. ### Are there any user-facing changes? No. * Closes: apache#36819 Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…pache#37274) ### Rationale for this change When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR. ### What changes are included in this PR? The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR. ### Are these changes tested? The changes are tested in the current suite. ### Are there any user-facing changes? No. * Closes: apache#36819 Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
To support IO that calls into R (e.g., using connection objects as Input/Output streams), we wrap most our calls to the various file readers in
RunWithCapturedR()
. We didn't do this for Parquet because it didn't seem to require it; however, it seems that for Parquet files that are large enough we actually do ( https://stackoverflow.com/questions/76739590/r-arrow-read-parquet-call-to-r-seek-on-r-connection-from-a-non-r-thread-fro ).At least the
ReadTable()
calls below should be wrapped inRunWithCapturedR()
to support this behaviour:arrow/r/src/parquet.cpp
Lines 98 to 148 in b557e85
Component(s)
R
The text was updated successfully, but these errors were encountered: