Skip to content

[Feature] Add Daft-side scan explain for native Parquet and pypaimon fallback diagnostics #7998

@QuakeWang

Description

@QuakeWang

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Daft's Paimon reader currently decides internally whether each split can use Daft native Parquet reader or must fallback to the pypaimon reader.

This decision is important for performance, but it is not observable through a public diagnostic API. Users cannot easily tell whether a slow scan is caused by PK merge, deletion vectors, BLOB columns, non-Parquet format, or unsupported filter pushdown.

Although pypaimon already has ReadBuilder.explain(), it explains the Paimon scan plan only. It does not expose Daft-specific reader routing, such as native Parquet vs pypaimon fallback.

Solution

Add a Daft-side structured scan explain API, for example:

  • explain_paimon_scan(...)
  • or PaimonTable.explain_scan(...)
  • or a structured explain method on PaimonDataSource exposed through public API

The explain result should include at least:

  • Paimon scan explain information from ReadBuilder.explain()
  • native Parquet split count
  • pypaimon fallback split count
  • fallback reason summary, e.g. PK merge, deletion vectors, BLOB columns, non-Parquet format
  • pushed and remaining Daft filters
  • projection and limit pushdown status
  • optional verbose per-split reader mode and fallback reason

The implementation should reuse the same native/fallback decision logic as PaimonDataSource.get_tasks() to avoid divergence between diagnostics and actual execution.

Anything else?

Relevant code paths:

  • paimon-python/pypaimon/daft/daft_datasource.py
  • paimon-python/pypaimon/read/read_builder.py
  • paimon-python/pypaimon/daft/daft_predicate_visitor.py

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions