Skip to content

Add poll method to Python bindings #152

@fresh-borzoni

Description

@fresh-borzoni

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

The Python bindings currently only expose to_pandas() and to_arrow() methods, which internally poll until all data is fetched and loaded into memory at once.

Current Behavior

# Only batch API available - loads ALL data into memory
scanner = await table.new_log_scanner()
scanner.subscribe(None, None)
df = scanner.to_pandas()  # Internally polls in a loop until complete

Under the hood, to_pandas() does poll repeatedly (see bindings/python/src/table.rs:378-414), but this is hidden from users and doesn't allow for streaming consumption patterns.

Solution

Add an explicit poll() method to LogScanner that matches the C++ API and Rust core:

Streaming API - poll repeatedly with timeout

scanner = await table.new_log_scanner()
scanner.subscribe(None, None)

while True:
records = await scanner.poll(timeout_ms=5000)
if records.num_rows > 0:
process_in_realtime(records)
else:
break

Anything else?

No response

Willingness to contribute

  • I'm willing to submit a PR!

Tasks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions