New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] R hangs when read_csv_arrow after set_io_thread_count(1) #36121
Comments
The arrow doc web page about threading model does not mention anything about a min number of IO threads, https://arrow.apache.org/docs/cpp/threading.html |
Can confirm that this can be reproduced on arrow 12.0.1 on Ubuntu 22.04, and agreed that we should warn & document better. Thanks for reporting this! |
This one is my fault 😬 ...we hijack the IO thread pool to make it possible to call into R (e.g., user-defined functions, R connections as input) while doing certain Arrow tasks ( https://github.com/apache/arrow/blob/main/r/src/safe-call-into-r.h#L315 ). I imagine that there is some Arrow code that makes the usually safe assumption that there is at least one available IO thread. |
Yes 😆 |
…#36304) ### Rationale for this change Setting the number of threads in the IO thread pool to 1 causes a hang or crash when using some functions (notably: any Acero exec plan). ### What changes are included in this PR? `set_io_thread_count()` now warns for `num_threads == 1`: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. # Already errors from C++ set_io_thread_count(0) #> Error: Invalid: ThreadPool capacity must be > 0 # New warning! set_io_thread_count(1) #> Warning: `arrow::set_io_thread_count()` with num_threads < 2 may #> cause certain operations to hang or crash. #> ℹ Use num_threads >= 2 to support all operations # No warning! set_io_thread_count(2) ``` <sup>Created on 2023-06-26 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> ### Are these changes tested? Yes ### Are there any user-facing changes? Yes: some existing code may issue a warning that previously did not. Documentation was added. * Closes: #36121 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Describe the bug, including details regarding any error messages, version, and platform.
I tried setting the number of IO threads to 1, and then I expected to be able to read a CSV file, but instead I observed that the R interpreter hangs, perhaps in an infinite loop, and can not even be interrupted with control-C. I expected that I should be able to cancel this command with control-C.
If 1 IO thread is not supported, I would have at least expected an error message after running
arrow::set_io_thread_count(1)
such as "one IO thread is not allowed, please use at least two IO threads."Also I would have expected some mention of how to control number of threads used for CSV reading on the man page for read_csv_arrow, but there is no mention of threads on that man page. Something like "use arrow::set_cpu_count(N_CPUS) to tell arrow to use N_CPUS for reading the CSV file" on that man page would be useful.
Related issues
Here is a minimal reproducible example R script:
Output when running on Linux laptop:
Output when running on Linux server:
On both computers the last command hangs (infinite loop?) and can not be interrupted, even with Control-C.
Component(s)
R
The text was updated successfully, but these errors were encountered: