[api][runtime][python] Propagate metric groups to cross-language reso… by goutamadwant · Pull Request #860 · apache/flink-agents

goutamadwant · 2026-06-26T04:13:24Z

Linked issue: #857

Purpose of change

This propagates metric group binding through cross-language resource wrappers.

Java wrappers around Python resources now forward setMetricGroup(...) to the underlying Python resource through PythonResourceAdapter.

Python wrappers around Java resources now forward set_metric_group(...) to the underlying Java resource, unwrapping the Python FlinkMetricGroup to its Java metric group.

This keeps provider-owned metrics such as latency, request counts, error counts, cache metrics, backend dimensions, and nested resource metrics under the correct action metric scope for cross-language resources.

Tests

mvn --batch-mode --no-transfer-progress -pl api -Dtest=PythonChatModelSetupTest test
mvn --batch-mode --no-transfer-progress -pl runtime -am -Dtest=PythonResourceAdapterImplTest -Dsurefire.failIfNoSpecifiedTests=false test
mvn --batch-mode --no-transfer-progress -pl api,plan,runtime spotless:check
cd python && uv run --python 3.12 --extra test python -m pytest flink_agents/runtime/tests/test_cross_language_metric_group.py
cd python && uv run --python 3.12 --extra lint ruff check flink_agents/runtime/java/java_chat_model.py flink_agents/runtime/java/java_embedding_model.py flink_agents/runtime/java/java_vector_store.py flink_agents/runtime/java/java_resource_wrapper.py flink_agents/runtime/python_java_utils.py flink_agents/runtime/tests/test_cross_language_metric_group.py

API

This touches public API interfaces by adding default metric-group forwarding hooks to PythonResourceAdapter and PythonResourceWrapper.

The change is backward compatible because the new interface methods have default implementations.

Documentation

doc-needed
doc-not-needed
doc-included

…urces

weiqingy

Thanks for taking this on — a few questions inline.

weiqingy · 2026-06-30T22:19:27Z


+    @Override
+    public void setMetricGroup(Object pythonResource, FlinkAgentsMetricGroup metricGroup) {
+        interpreter.invoke(SET_METRIC_GROUP, pythonResource, metricGroup);


This now forwards the action metric group through to the Python provider, which is exactly what issue #857 asked for — and #857 also called out "define metric ownership clearly to avoid double-counting when action-level and provider-level metrics both observe the same response." From what I can see, ownership already looks singular: token metrics are recorded in exactly one place per language — the action handler (ChatModelAction.recordChatTokenMetrics → chatModel.recordTokenMetrics on the Java side, chat_model_action → _record_token_metrics on the Python side), each reading the resource's bound group. The provider integrations under integrations/ record nothing, so there's no second recorder to collide with — this PR widens that single existing recording across the language boundary rather than introducing a second one. So the #857 concern reads as already closed by construction. Would it be worth a sentence in the PR description making that explicit, so the "no double-counting" reasoning is traceable back to the issue?

weiqingy · 2026-06-30T22:19:27Z

    public void setMetricGroup(FlinkAgentsMetricGroup metricGroup) {
        this.metricGroup = metricGroup;
+        if (this instanceof PythonResourceWrapper) {
+            ((PythonResourceWrapper) this).setPythonResourceMetricGroup(metricGroup);


The abstract base Resource (package ...api.resource) now imports and instanceof-checks PythonResourceWrapper from the more specialized ...api.resource.python sub-package. That inverts the usual dependency direction — every Java-only resource (chat models, tools, prompts, vector stores) now carries a compile-time edge to the Python-bridge interface in its base type, and a hypothetical third forwarding flavor would mean editing this base method again rather than overriding it.

One option that keeps the base oblivious to the bridge: have each Python* wrapper override setMetricGroup to call super.setMetricGroup(...) and then setPythonResourceMetricGroup(...), pushing the Python concern down into the wrappers where getPythonResourceAdapter() already lives. The PR already adds a new getPythonResourceAdapter() override to all eight wrappers, so a per-wrapper setMetricGroup override would touch the same set of files — just in the Python layer instead of the base class. Was the centralized instanceof chosen deliberately, or mainly to avoid touching the wrappers?

weiqingy · 2026-06-30T22:19:27Z

+        ),
+    ],
+)
+def test_java_resource_wrappers_forward_metric_group(resource):


This new file covers the Python→Java direction well, but only that direction — set_java_resource_metric_group plus the four Java*Impl wrappers. The Java→Python direction's Python half, python_java_utils.set_metric_group (python_java_utils.py:382), has no direct test: it's the function that wraps the raw Java group in FlinkMetricGroup and owns the only None branch in the feature, and the Java half is mock-verify only (testSetMetricGroup asserts the invoke string, it doesn't execute Python). So that seam isn't exercised end-to-end today.

Would a small unit test in this file's style be worth adding — a fake resource that captures its set_metric_group arg, then set_metric_group(fake, sentinel_j_group) asserting the captured arg is a FlinkMetricGroup whose _j_metric_group is sentinel_j_group, plus a None case asserting None is forwarded? That would pin the wrap-and-None behavior that's currently uncovered.

weiqingy · 2026-06-30T22:19:27Z

+    """Bind the underlying Java metric group to a wrapped Java resource."""
+    if j_resource is None:
+        return
+    j_metric_group = getattr(metric_group, "_j_metric_group", metric_group)


In the real flow this is correct — metric_group is always a FlinkMetricGroup (it comes from action_metric_group), so the unwrap hits _j_metric_group and forwards the genuine Java group. Low confidence that the edge below matters in practice, so genuinely a question rather than a flag.

RunnerContext.get_resource(name, type, metric_group=...) is a documented public API accepting any MetricGroup — an abstract base a user could subclass without a _j_metric_group. If such a custom group reaches a Java resource wrapper, the silent , metric_group fallback would forward a raw Python object into Java setMetricGroup(FlinkAgentsMetricGroup), failing opaquely at the pemja boundary rather than with a clear error. Worth making the fallback explicit (unwrap only a real FlinkMetricGroup, else fail loudly), or is that out of scope today given no in-tree caller hits it? (The None case is benign — get_resource substitutes action_metric_group first, and a direct None just clears the field — so no concern there.)

[api][runtime][python] Propagate metric groups to cross-language reso…

5f17848

…urces

github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.4.0 priority/major Default priority of the PR or issue. labels Jun 26, 2026

weiqingy reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[api][runtime][python] Propagate metric groups to cross-language reso…#860

[api][runtime][python] Propagate metric groups to cross-language reso…#860
goutamadwant wants to merge 1 commit into
apache:mainfrom
goutamadwant:codex/flink-agents-857-cross-language-metrics

goutamadwant commented Jun 26, 2026

Uh oh!

weiqingy left a comment

Uh oh!

weiqingy Jun 30, 2026

Uh oh!

weiqingy Jun 30, 2026

Uh oh!

weiqingy Jun 30, 2026

Uh oh!

weiqingy Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

goutamadwant commented Jun 26, 2026

Purpose of change

Tests

API

Documentation

Uh oh!

weiqingy left a comment

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

weiqingy Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants