You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.
@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?
I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.
@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?
Yes, I think we can definitely enable the record batch reader. Please feel free to submit a PR!
in order to make the record batch path truely lazy, im thinking
The dispatcher can to have an alternate implementation of run where the operations don't happen eagerly but is backed by an iterator.[already available]
this iterator is also exposed on python side which is passed to the RecordBatchReader.
When this RecordBatchReader is consumed, the operations happen at that time calling the next() on the iterator.
This seems quite complicated to me considering my limited understanding of this code base. I'll still try to give this a shot, but if you have any suggestion please let me know.
Activity
fnicastri commentedon Aug 6, 2024
it would be very helpful.
Any chances to see this in the future?
wangxiaoying commentedon Aug 14, 2024
We have initialized the arrow batch iterator for rust and cpp library. Need more work in terms of testing and exposing to python library.
iter_batches
support forread_database_uri
with connectorx pola-rs/polars#21041chitralverma commentedon May 11, 2025
I think it will be an awesome addition to be able to get a RecordBatchReader directly from
read_sql
which only materializes the record batches (sends queries to DB) when the user requestsread_next_batch
.@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?
wangxiaoying commentedon Jun 11, 2025
Yes, I think we can definitely enable the record batch reader. Please feel free to submit a PR!
chitralverma commentedon Jun 19, 2025
@wangxiaoying I checked this out today and here are my findings
arrow_rb
can be easily added on the python side as a return_type and can return a generator ofRecordBatch
es. I did this and it works as expected.in order to make the record batch path truely lazy, im thinking
The dispatcher can to have an alternate implementation of[already available]run
where the operations don't happen eagerly but is backed by an iterator.RecordBatchReader
.RecordBatchReader
is consumed, the operations happen at that time calling thenext()
on the iterator.This seems quite complicated to me considering my limited understanding of this code base. I'll still try to give this a shot, but if you have any suggestion please let me know.chitralverma commentedon Jun 20, 2025
actually, scratch the above, I managed to get this working exactly as expected. will raise a PR today for your review. :D