[python] Add paimon tail CLI for streaming table reads by tub · Pull Request #7344 · apache/paimon

tub · 2026-03-04T14:35:32Z

Summary

This PR is an optional addition to the python streaming feature - I just found it useful for debugging data issues locally without having to run Flink or Spark.

PR 3 of 3 for pure-Python streaming reads. This PR adds the paimon tail CLI command:

paimon tail: stream data from Paimon tables, similar to kafka-console-consumer
- Multiple output formats: jsonl, json, csv, table
- Filtering and column projection
- Consumer IDs for checkpointing
- Flexible start position: earliest, latest, snapshot:ID, time:-1h
- --to end position for bounded reads
- --follow for continuous streaming
CLI utilities for time parsing and output formatting
console_scripts entry point in setup.py
Documentation: CLI section and supported features list

7 files changed, +1396 lines (incremental)

PR Stack

Streaming infrastructure (scanners, consumers, caching, sharding)
Core streaming (StreamReadBuilder, AsyncStreamingTableScan, table integration)
👉 this PR — CLI (paimon tail command)

Incremental diff (just this PR's changes): python-streaming-2-core...tub:paimon:python-streaming-3-cli

Merge workflow: Merge PR 1, rebase PR 2 onto updated master (PR 1 commits drop out), merge PR 2, repeat for PR 3.

Test plan

python -m pytest pypaimon/tests — 630 passed (9 pre-existing lance failures)
python -c "from pypaimon import CatalogFactory" — no import errors
Unit tests for CLI utilities (time parsing, output formatting)

🤖 Generated with Claude Code

…sharding Add foundational infrastructure for pure-Python streaming reads: - Follow-up scanners (delta, changelog, incremental diff) for continuous snapshot polling - Consumer manager for persisting read progress - LRU caching for snapshots, manifests, and manifest lists - Batch existence checks for efficient file IO - Bucket-based sharding for parallel consumption - Row kind support in table reads - Streaming-related core options - Backtick support for identifier parsing Includes unit tests for all new components.

…ation Add core streaming read API: - StreamReadBuilder for configuring streaming reads with predicates, projections, consumer IDs, sharding, and poll intervals - AsyncStreamingTableScan with async generator and sync wrapper for continuous snapshot polling - Table interface additions: new_stream_read_builder() on Table, FileStoreTable, FormatTable (stub), IcebergTable (stub) - Minor fix in split_read.py - Add oss/lance extras to setup.py - Streaming reads documentation and supported features list Includes streaming table scan and sharding unit tests.

Add command-line interface for streaming Paimon table data: - paimon tail command: stream data from tables similar to kafka-console-consumer, with support for multiple output formats (jsonl, json, csv, table), filtering, column projection, consumer IDs, flexible start position (earliest, latest, snapshot:ID, time:-1h), and --to end position - CLI utilities for time parsing and output formatting - console_scripts entry point in setup.py - CLI documentation and supported features list Includes CLI utility tests.

tub added 3 commits March 4, 2026 15:09

tub force-pushed the python-streaming-3-cli branch from 8988eee to 4e4c2a7 Compare March 4, 2026 15:43

This was referenced Mar 4, 2026

[python] Add streaming infrastructure: scanners, consumers, caching #7342

Open

[python] Add StreamReadBuilder and AsyncStreamingTableScan #7343

Open

[Feature] [python] Streaming Read Support in paimon-python #7152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Add paimon tail CLI for streaming table reads#7344

[python] Add paimon tail CLI for streaming table reads#7344
tub wants to merge 3 commits intoapache:masterfrom
tub:python-streaming-3-cli

tub commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tub commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR Stack

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tub commented Mar 4, 2026 •

edited

Loading