Search before asking
Motivation
Daft's Paimon reader currently decides internally whether each split can use Daft native Parquet reader or must fallback to the pypaimon reader.
This decision is important for performance, but it is not observable through a public diagnostic API. Users cannot easily tell whether a slow scan is caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or unsupported filter pushdown.
Although pypaimon already has ReadBuilder.explain(), it explains the Paimon scan plan only. It does not expose Daft-specific reader routing, such as native Parquet vs pypaimon fallback.
Solution
Add a Daft-side structured scan explain API, for example:
explain_paimon_scan(...)
- or
PaimonTable.explain_scan(...)
- or a structured explain method on
PaimonDataSource exposed through public API
The explain result should include at least:
- Paimon scan explain information from
ReadBuilder.explain()
- native Parquet split count
- pypaimon fallback split count
- fallback reason summary, e.g. PK merge, deletion vectors, BLOB columns, non-Parquet format
- pushed and remaining Daft filters
- projection and limit pushdown status
- optional verbose per-split reader mode and fallback reason
The implementation should reuse the same native/fallback decision logic as PaimonDataSource.get_tasks() to avoid divergence between diagnostics and actual execution.
Anything else?
Relevant code paths:
paimon-python/pypaimon/daft/daft_datasource.py
paimon-python/pypaimon/read/read_builder.py
paimon-python/pypaimon/daft/daft_predicate_visitor.py
Are you willing to submit a PR?
Search before asking
Motivation
Daft's Paimon reader currently decides internally whether each split can use Daft native Parquet reader or must fallback to the pypaimon reader.
This decision is important for performance, but it is not observable through a public diagnostic API. Users cannot easily tell whether a slow scan is caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or unsupported filter pushdown.
Although pypaimon already has
ReadBuilder.explain(), it explains the Paimon scan plan only. It does not expose Daft-specific reader routing, such as native Parquet vs pypaimon fallback.Solution
Add a Daft-side structured scan explain API, for example:
explain_paimon_scan(...)PaimonTable.explain_scan(...)PaimonDataSourceexposed through public APIThe explain result should include at least:
ReadBuilder.explain()The implementation should reuse the same native/fallback decision logic as
PaimonDataSource.get_tasks()to avoid divergence between diagnostics and actual execution.Anything else?
Relevant code paths:
paimon-python/pypaimon/daft/daft_datasource.pypaimon-python/pypaimon/read/read_builder.pypaimon-python/pypaimon/daft/daft_predicate_visitor.pyAre you willing to submit a PR?