You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of the physical query plan uses "thread::spawn" which is expensive. We should switch to using Futures, async!/await!, and tokio so that we are launching tasks in a thread pool instead and writing idiomatic Rust code with futures combinators to chain actions together.
Adam Lippai / @alippai:
I've tried to add this feature, but I didn't mange to get all the types right in merge.rs. I still need more experience in Rust :). Notes:
Ben Sully:
My hunch is also that rayon would be better suited to this part ('working' on lots of things in a threadpool, rather than 'waiting' for lots of things) than tokio + async/await, although it does sound like doing IO using gRPC might be well suited to tokio. I'm not familiar with the Datafusion codebase at all unfortunately, how approachable would this issue be for a first time Arrow contributor? :D
Andy Grove / @andygrove: [~sd2k] You might be right, but given that we now have a flight server example using tokio, it seems ideal to use tokio all the way through to query execution to have a complete async implementation? I agree that this is perhaps not an ideal issue to try to take on for a new contributor though!
Andy Grove / @andygrove:
I spent a little time working on a PoC of this [1] but the advice from the Tokio team is that we are better off just using dedicated threads for file io and just use tokio for network io, which we already do in the flight server (thanks to Tonic) so I am closing this.
Adam Lippai / @alippai: @andygrove Skipping Tokio can be reasonable, but we still need a threadpool or limiting the number of opened files somehow. Currently it opens and read all the files in parallel, then it does all the computation in parallel (and AFAIK reading a lot of files concurrently won't produce optimal performance on linux).
Andy Grove / @andygrove:
Thanks. yes. I've been looking at Rayon and threadpool so far this morning.
I think it might be worth me creating a Google doc for some collaboration
on a design here.
The current implementation of the physical query plan uses "thread::spawn" which is expensive. We should switch to using Futures, async!/await!, and tokio so that we are launching tasks in a thread pool instead and writing idiomatic Rust code with futures combinators to chain actions together.
Reporter: Andy Grove / @andygrove
Assignee: Andy Grove / @andygrove
Original Issue Attachments:
Note: This issue was originally created as ARROW-6691. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: