Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36121: [R] Warn for set_io_thread_count() with num_threads < 2 #36304

Merged
merged 2 commits into from
Jun 27, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jun 26, 2023

Rationale for this change

Setting the number of threads in the IO thread pool to 1 causes a hang or crash when using some functions (notably: any Acero exec plan).

What changes are included in this PR?

set_io_thread_count() now warns for num_threads == 1:

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

# Already errors from C++
set_io_thread_count(0)
#> Error: Invalid: ThreadPool capacity must be > 0
# New warning!
set_io_thread_count(1)
#> Warning: `arrow::set_io_thread_count()` with num_threads < 2 may
#> cause certain operations to hang or crash.
#> ℹ Use num_threads >= 2 to support all operations
# No warning!
set_io_thread_count(2)

Created on 2023-06-26 with reprex v2.0.2

Are these changes tested?

Yes

Are there any user-facing changes?

Yes: some existing code may issue a warning that previously did not. Documentation was added.

@paleolimbot paleolimbot marked this pull request as draft June 26, 2023 13:40
@github-actions
Copy link

⚠️ GitHub issue #36121 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added Component: R awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jun 26, 2023
@paleolimbot paleolimbot marked this pull request as ready for review June 26, 2023 17:02
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jun 26, 2023
@tdhock
Copy link
Contributor

tdhock commented Jun 27, 2023

hi @paleolimbot thanks writing a PR that partially addresses the issue I created.
To fully address that issue, can you please add documentation to (1) clarify the difference between IO threads and CPU threads, and (2) to explain which function should be called to control the CSV reading operation?
For (1) a link to the C++ doc web page https://arrow.apache.org/docs/cpp/threading.html would be very helpful.
Could a link to that page be added on the R man pages for arrow::cpu_count and arrow::io_thread_count?
For (2) I would have expected some mention of how to control number of threads used for CSV reading on the man page for read_csv_arrow, but there is no mention of threads on that man page. Something like "use arrow::set_cpu_count(N_CPUS) to tell arrow to use N_CPUS for reading the CSV file" on that man page would be useful.

@paleolimbot
Copy link
Member Author

I'd like to keep the scope of this particular PR to the immediate concern, which is that somebody might set the number of IO threads to 1, observe a hang/crash, and not know why! I promise to follow up on this in #36324 because you're right: we don't dedicate much space to documenting these options and they can have a rather large impact on performance.

Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jun 27, 2023
@paleolimbot paleolimbot merged commit 4a93a37 into apache:main Jun 27, 2023
13 checks passed
@paleolimbot paleolimbot removed the awaiting merge Awaiting merge label Jun 27, 2023
@conbench-apache-arrow
Copy link

Conbench analyzed the 5 benchmark runs on commit 4a93a37a.

There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] R hangs when read_csv_arrow after set_io_thread_count(1)
3 participants