-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should GetIOThreadPool()
be accessible from installed headers?
#15151
Comments
Yes, I think it's entirely appropriate to put RecordBatchReader->source node in the C++ code. On the output side we have collector variants for tables (DeclarationToTable), vector of record batches (DeclarationToBatches) and record batch reader (DeclarationToReader). We already have source node variants for accepting data from a table ( However, I'm also not sure why we wouldn't expose the default I/O pool ( As a short term hack you can do:
|
We do this already in the R package for the place where we need to use the IO thread pool to submit jobs...the problem here is that we need a
That would be my preferred solution...I'd rather not maintain the best way to do that in the R package and it's come up on the mailing list in a context unrelated to the R package as well ( https://lists.apache.org/thread/zo9qq0pntkrt2vnczoxx7hfsl6k233zy ). |
take |
@paleolimbot The reference link here which refers to this code block is outdated AFAIU. |
Yes, sorry! The block was this one: Lines 459 to 468 in 63b91cc
|
Thanks @paleolimbot, I will work on this. |
@westonpace @paleolimbot I created a draft PR to use a |
…R API (#15183) This PR includes the factory `record_batch_reader_source` for the Acero. This is a source node which takes in a `RecordBatchReader` as the data source along an executor which gives the freedom to choose the threadpool required for the execution. Also an example shows how this can be used in Acero. - [x] Self-review * Closes: #15151 Lead-authored-by: vibhatha <vibhatha@gmail.com> Co-authored-by: Vibhatha Lakmal Abeykoon <vibhatha@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
Describe the enhancement requested
In #14582 it was found that using the CPU thread pool in
arrow::compute::MakeReaderGenerator()
caused problems when the number of CPU threads was limited (as it often is on CI machines with few available cores). The solution was to use the IO thread pool for this; however,arrow::io::internal::GetIOThreadPool()
is not available in any installed headers. I don't know what the best way to make this available would be (or whether creating a source node from a record batch reader should be baked into the internals somewhere); however, my hack of:...in the R package should almost certainly not exist.
Component(s)
C++
The text was updated successfully, but these errors were encountered: