Skip to content

feat(metrics): per-query memory accounting and Tokio runtime metrics #82

@LantaoJin

Description

@LantaoJin

Is your feature request related to a problem or challenge?

For any embedder running multi-tenant DataFusion workloads, two operational signals are missing from the Java binding today:

  1. Per-query memory — there is no way to ask DataFusion how many bytes a specific in-flight query is currently holding, or what its peak was. SessionContextBuilder.memoryLimit(...) (PR feat: configure SessionContext and RuntimeEnv via builder #28) lets you cap the global pool, but if a tenant blows past their fair-share allocation you can't tell whose query it is. Without this you cannot do fair-share scheduling, abuse detection, or correctly attribute OOMs.

  2. Tokio runtime stats — the JNI library drives a single shared multi-threaded Tokio runtime in lib.rs. There is no Java-side window into how busy that runtime is right now: how many worker threads exist, how saturated they are, how deep the global queue is. Embedders that report node-level health (e.g. an OpenSearch _nodes/stats endpoint) need these numbers; today they have to hand-roll a parallel native bridge to surface them.

Both problems share the same FFI shape: snapshot a small struct of counters across the boundary on demand. They are bundled here so the design conversation only needs to happen once.

Describe the solution you'd like

Two narrow additions, both opt-in.

1. Per-query memory: MemoryUsage snapshot from a DataFrame

DataFrame df = ctx.sql(...);
ArrowReader r = df.executeStream(allocator);
// ... while another thread drains r ...
MemoryUsage usage = df.memoryUsage();   // peakBytes / currentBytes / consumerCount

The Java surface is a single new accessor on DataFrame. The Rust side wraps the session's MemoryPool with a per-query tracking adapter (modelled on TrackConsumersPool upstream) so the snapshot reports only the consumers tied to this query's MemoryConsumers, not the whole pool. The accessor is non-consuming, can be polled from any thread, and returns immutable values.

For most users the wrapper is harmless and adds no measurable overhead; it can be inserted automatically by SessionContext so callers don't have to opt in per query.

2. Tokio runtime stats: SessionContext.runtimeStats()

RuntimeStats stats = ctx.runtimeStats();
// numWorkers, liveTasksCount, globalQueueDepth, elapsed,
// totalBusyDuration, totalParkCount, totalPollsCount, ...

A small immutable POJO mirroring the subset of tokio_metrics::RuntimeMetrics that operations dashboards typically render. Snapshotted under the JNI guard; cheap (one struct copy across FFI).

Because tokio_metrics requires the --cfg tokio_unstable build flag to compile in, runtime stats are gated behind a default-off runtime-metrics Cargo feature on datafusion-jni. Default cargo build is unaffected (no new build flags, no new dep). To enable:

RUSTFLAGS="--cfg tokio_unstable" cargo build --features runtime-metrics

The Java method always exists; if invoked against a build that compiled the feature off it throws a clear RuntimeException pointing at the flag. Same shape as the just-shipped substrait feature (PR #75).

The two pieces are intentionally bundled. Per-query memory tracking and runtime stats both need the same JNI plumbing pattern (snapshot a struct of counters across FFI), the same lifecycle considerations (polling from another thread while a query runs), and the same opt-in gating story for builds that don't want the extra dependency. Splitting them would mean two near-identical reviews.

Describe alternatives you've considered

  • Always-on Tokio metrics, set --cfg tokio_unstable globally. Would force every contributor and every CI environment to override RUSTFLAGS. Pollutes the build env and breaks anyone who already has their own RUSTFLAGS. Rejected.

Additional context

  • tokio-metrics 0.5.0 is the right version against the Tokio currently used in native/Cargo.toml (tokio = "1"). It exposes RuntimeMonitor whose intervals() iterator yields RuntimeMetrics snapshots; the JNI handler grabs one snapshot per call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions