New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rust] [DataFusion] Improve threading model #24921
Comments
Adam Lippai / @alippai: This is what I've found on the topic: https://users.rust-lang.org/t/dealing-with-work-priority-and-rayon/30954/2 Rayon doesn't support setting priority for the tasks, but as a workaround we could create two threadpools, eg one with <=10 threads for file reading and CPU_NUM threads for the computation. If you need fine tuning the workload (S3, HDFS, NFS behave differently, local HDD or SSD is a different topic too) you could either configure the threadpool sizes (even down to 1 thread) or setting "nice" for the threadpool threads. Should I add this to the design doc or is this out of scope for a while? |
Wes McKinney / @wesm: |
Andy Grove / @andygrove: |
Antoine Pitrou / @pitrou: |
Adam Lippai / @alippai: |
Andy Grove / @andygrove: |
Andy Grove / @andygrove: |
DataFusion currently spawns one thread per partition and this results in poor performance if there are more partitions than available cores/threads. It would be better to have a thread-pool that defaults to number of available cores.
Here is a Google doc where we can collaborate on a design discussion.
https://docs.google.com/document/d/1_wc6diy3YrRgEIhVIGzrO5AK8yhwfjWlmKtGnvbsrrY/edit?usp=sharing
Reporter: Andy Grove / @andygrove
Assignee: Andy Grove / @andygrove
Note: This issue was originally created as ARROW-8774. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: