Skip to content

Proposal for Implementing Streaming Query Support in chDB​ #322

@wudidapaopao

Description

@wudidapaopao

Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding ​​streaming query capabilities​​ to chDB.

The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.

Proposed Changes​​

  1. ​​chDB Interface Modifications​​
    ​​New send_query Interface​​: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method.
    ​​fetch Method in stream_local_result​​: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
    ​​
  2. LocalServer (ClientBase) Adjustments​​
    Deferred Result Retrieval​​: During the first initialization, only call Connection::sendQuery to set up the execution engine ​​without fetching results immediately​​.
    ​​On-Demand receiveResult Calls​​: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk.
    ​​Handling Blocking​​: If receiveResult is not called for an extended period, the execution engine may block.

The proposal can also address #265

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions