Skip to content

Expose ExecutionPlan::metrics() across the FFI boundary #22135

@mailmindlin

Description

@mailmindlin

Is your feature request related to a problem or challenge?

ExecutionPlan::metrics() is not currently reachable through datafusion-ffi: FFI_ExecutionPlan exposes most of the trait (properties, children, with_new_children, name, execute, repartitioned) but has no function pointer for metrics(), so ForeignExecutionPlan::metrics() falls through to the trait default implementation and always returns None.

The user-visible consequences:

  • EXPLAIN ANALYZE produces empty metric blocks for FFI plans.
    AnalyzeExec exhausts the input stream and then formats metrics via
    DisplayableExecutionPlan::with_metrics(...). For any plan node loaded
    through datafusion-ffi, the rendered output shows no output_rows,
    elapsed_compute, etc. — even though those metrics are being recorded
    on the producer side.
  • Anything calling DisplayableExecutionPlan::with_metrics(...) on a
    plan tree containing a foreign node is similarly affected
    — tooling,
    dashboards, tests, etc.

This makes foreign-loaded operators behave noticeably differently from
locally-defined operators in a way that's surprising and not documented.
The expectation, especially given the FFI crate's stated goal of allowing
extensions to "interface to DataFusion" transparently, is that a foreign
plan should be observationally indistinguishable from a local plan with
respect to instrumentation.

Describe the solution you'd like

Pass metrics across the FFI boundary as a snapshot: read all
atomic-backed counters/gauges/timers into plain integer fields at marshal
time. Specifically:

  1. Add a new function pointer to FFI_ExecutionPlan for metrics
  2. Define FFI-stable mirrors of MetricsSet, Metric, MetricValue, etc. plus bidirectional From
    conversions.
  3. For MetricValue::Custom { value: Arc<dyn CustomMetricValue> },
    marshal (name, Display output, as_usize()) and reconstruct on the
    consumer side via a small FfiCustomMetricValue shim. This preserves
    Display (the common case for EXPLAIN ANALYZE) and as_usize()
    (used for sorting), at the cost of losing aggregate and
    as_any downcasting.

Snapshot semantics are sufficient because every in-tree caller of
.metrics() reads after the input stream has been exhausted — see
AnalyzeExec exhausting its input before formatting, and
DisplayableExecutionPlan rendering during post-execution display.

Describe alternatives you've considered

  • Share live atomics across the FFI boundary. Expose the underlying
    Arc<AtomicUsize> such that the consumer can poll updated values without
    re-calling metrics(). Rejected: no in-tree caller polls metrics during
    streaming, and sharing Arcs across the FFI boundaries is complex for little benefit.

  • A separate FFI_ExecutionPlanMetrics struct queried independently
    from FFI_ExecutionPlan.
    Preserves the existing ABI but forces
    consumers to coordinate two structs and complicates lifetime/identity
    reasoning. The function-pointer-on-the-struct pattern matches every
    other method exposed on FFI_ExecutionPlan and reads more naturally.

  • A full FFI trait object for CustomMetricValue. Define an
    FFI_CustomMetricValue with function pointers for display, as_usize,
    aggregate, clone, and release. Rejected for the initial pass:
    significant surface area, while Display + as_usize() covers the only
    in-tree consumer of Custom metrics (display via EXPLAIN ANALYZE).
    Can be added later if extension authors need round-trippable custom
    metrics.

Additional context

I have a PR written for this issue that I will file shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions