-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
The Python bindings currently only expose to_pandas() and to_arrow() methods, which internally poll until all data is fetched and loaded into memory at once.
Current Behavior
# Only batch API available - loads ALL data into memory
scanner = await table.new_log_scanner()
scanner.subscribe(None, None)
df = scanner.to_pandas() # Internally polls in a loop until completeUnder the hood, to_pandas() does poll repeatedly (see bindings/python/src/table.rs:378-414), but this is hidden from users and doesn't allow for streaming consumption patterns.
Solution
Add an explicit poll() method to LogScanner that matches the C++ API and Rust core:
Streaming API - poll repeatedly with timeout
scanner = await table.new_log_scanner()
scanner.subscribe(None, None)
while True:
records = await scanner.poll(timeout_ms=5000)
if records.num_rows > 0:
process_in_realtime(records)
else:
break
Anything else?
No response
Willingness to contribute
- I'm willing to submit a PR!
Tasks
Metadata
Metadata
Assignees
Labels
No labels