Skip to content

Provide interface stream out arrow RecordBatch #278

@wangxiaoying

Description

@wangxiaoying
Contributor
No description provided.

Activity

fnicastri

fnicastri commented on Aug 6, 2024

@fnicastri

it would be very helpful.
Any chances to see this in the future?

wangxiaoying

wangxiaoying commented on Aug 14, 2024

@wangxiaoying
ContributorAuthor

We have initialized the arrow batch iterator for rust and cpp library. Need more work in terms of testing and exposing to python library.

chitralverma

chitralverma commented on May 11, 2025

@chitralverma
Contributor

I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.

@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?

wangxiaoying

wangxiaoying commented on Jun 11, 2025

@wangxiaoying
ContributorAuthor

I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.

@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?

Yes, I think we can definitely enable the record batch reader. Please feel free to submit a PR!

chitralverma

chitralverma commented on Jun 19, 2025

@chitralverma
Contributor

@wangxiaoying I checked this out today and here are my findings

  1. arrow_rb can be easily added on the python side as a return_type and can return a generator ofRecordBatches. I did this and it works as expected.
  2. After doing this there was no performance benefit because the dispatcher is eager.

in order to make the record batch path truely lazy, im thinking

  • The dispatcher can to have an alternate implementation of run where the operations don't happen eagerly but is backed by an iterator. [already available]
  • this iterator is also exposed on python side which is passed to the RecordBatchReader.
  • When this RecordBatchReader is consumed, the operations happen at that time calling the next() on the iterator.

This seems quite complicated to me considering my limited understanding of this code base. I'll still try to give this a shot, but if you have any suggestion please let me know.

chitralverma

chitralverma commented on Jun 20, 2025

@chitralverma
Contributor

actually, scratch the above, I managed to get this working exactly as expected. will raise a PR today for your review. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @wangxiaoying@fnicastri@chitralverma

      Issue actions

        Provide interface stream out arrow RecordBatch · Issue #278 · sfu-db/connector-x