Skip to content

Flink: Expose scan planning metrics on ContinuousIcebergEnumerator #16589

@2dmurali

Description

@2dmurali

Feature Request / Improvement

The Flink ContinuousIcebergEnumerator currently provides no observability into scan planning efficiency. Spark exposes scan metrics via its native metrics system, Flink has no equivalent.

Additionally, BaseIncrementalScan.planFiles() does not emit ScanReport via metricsReporter(), unlike SnapshotScan (batch) which already does.

Problem
When running Flink streaming jobs on large Iceberg tables, operators have no visibility into:

  • Whether partition pruning is effective (skipped manifests/files)
  • How long scan planning takes per cycle (latency spikes)
  • How much data is being scanned (file sizes)
  • Whether compaction is needed (growing result file counts)

This makes it difficult to diagnose slow streaming pipelines, validate table maintenance effectiveness, or set meaningful SLOs.

Proposed Solution

  1. Wire metricsReporter() support into BaseIncrementalScan.planFiles(), bringing incremental scans to parity with batch scans (SnapshotScan).
  2. Expose all ScanMetricsResult fields as Flink gauges on the ContinuousIcebergEnumerator's coordinator metric group, reporting per-scan (last-value) snapshots.

Query engine

Flink

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions