-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
If we are reading over HTTP (e.g. S3) we generally want high parallelism in the I/O thread pool.
If we are reading from disk then high parallelism is usually harmless but ineffective. Most of the I/O threads will spend their time in a waiting state and the cores can be used for other work.
However, it appears that when we are reading locally, and the data is cached in memory, then having too much parallelism will be harmful, but some parallelism is beneficial. Once the DRAM <-> CPU bandwidth limit is hit then all reading threads will experience high DRAM latency. Unlike an I/O bottleneck a RAM bottleneck will waste cycles on the physical core.
Reporter: Weston Pace / @westonpace
Related issues:
Note: This issue was originally created as ARROW-14354. Please see the migration documentation for further details.