Skip to content

[api][runtime][python] Propagate metric groups to cross-language reso…#860

Open
goutamadwant wants to merge 1 commit into
apache:mainfrom
goutamadwant:codex/flink-agents-857-cross-language-metrics
Open

[api][runtime][python] Propagate metric groups to cross-language reso…#860
goutamadwant wants to merge 1 commit into
apache:mainfrom
goutamadwant:codex/flink-agents-857-cross-language-metrics

Conversation

@goutamadwant

Copy link
Copy Markdown

Linked issue: #857

Purpose of change

This propagates metric group binding through cross-language resource wrappers.

Java wrappers around Python resources now forward setMetricGroup(...) to the underlying Python resource through PythonResourceAdapter.

Python wrappers around Java resources now forward set_metric_group(...) to the underlying Java resource, unwrapping the Python FlinkMetricGroup to its Java metric group.

This keeps provider-owned metrics such as latency, request counts, error counts, cache metrics, backend dimensions, and nested resource metrics under the correct action metric scope for cross-language resources.

Tests

  • mvn --batch-mode --no-transfer-progress -pl api -Dtest=PythonChatModelSetupTest test
  • mvn --batch-mode --no-transfer-progress -pl runtime -am -Dtest=PythonResourceAdapterImplTest -Dsurefire.failIfNoSpecifiedTests=false test
  • mvn --batch-mode --no-transfer-progress -pl api,plan,runtime spotless:check
  • cd python && uv run --python 3.12 --extra test python -m pytest flink_agents/runtime/tests/test_cross_language_metric_group.py
  • cd python && uv run --python 3.12 --extra lint ruff check flink_agents/runtime/java/java_chat_model.py flink_agents/runtime/java/java_embedding_model.py flink_agents/runtime/java/java_vector_store.py flink_agents/runtime/java/java_resource_wrapper.py flink_agents/runtime/python_java_utils.py flink_agents/runtime/tests/test_cross_language_metric_group.py

API

This touches public API interfaces by adding default metric-group forwarding hooks to PythonResourceAdapter and PythonResourceWrapper.

The change is backward compatible because the new interface methods have default implementations.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs fixVersion/0.4.0 priority/major Default priority of the PR or issue. labels Jun 26, 2026

@weiqingy weiqingy left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on — a few questions inline.


@Override
public void setMetricGroup(Object pythonResource, FlinkAgentsMetricGroup metricGroup) {
interpreter.invoke(SET_METRIC_GROUP, pythonResource, metricGroup);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now forwards the action metric group through to the Python provider, which is exactly what issue #857 asked for — and #857 also called out "define metric ownership clearly to avoid double-counting when action-level and provider-level metrics both observe the same response." From what I can see, ownership already looks singular: token metrics are recorded in exactly one place per language — the action handler (ChatModelAction.recordChatTokenMetricschatModel.recordTokenMetrics on the Java side, chat_model_action_record_token_metrics on the Python side), each reading the resource's bound group. The provider integrations under integrations/ record nothing, so there's no second recorder to collide with — this PR widens that single existing recording across the language boundary rather than introducing a second one. So the #857 concern reads as already closed by construction. Would it be worth a sentence in the PR description making that explicit, so the "no double-counting" reasoning is traceable back to the issue?

public void setMetricGroup(FlinkAgentsMetricGroup metricGroup) {
this.metricGroup = metricGroup;
if (this instanceof PythonResourceWrapper) {
((PythonResourceWrapper) this).setPythonResourceMetricGroup(metricGroup);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract base Resource (package ...api.resource) now imports and instanceof-checks PythonResourceWrapper from the more specialized ...api.resource.python sub-package. That inverts the usual dependency direction — every Java-only resource (chat models, tools, prompts, vector stores) now carries a compile-time edge to the Python-bridge interface in its base type, and a hypothetical third forwarding flavor would mean editing this base method again rather than overriding it.

One option that keeps the base oblivious to the bridge: have each Python* wrapper override setMetricGroup to call super.setMetricGroup(...) and then setPythonResourceMetricGroup(...), pushing the Python concern down into the wrappers where getPythonResourceAdapter() already lives. The PR already adds a new getPythonResourceAdapter() override to all eight wrappers, so a per-wrapper setMetricGroup override would touch the same set of files — just in the Python layer instead of the base class. Was the centralized instanceof chosen deliberately, or mainly to avoid touching the wrappers?

),
],
)
def test_java_resource_wrappers_forward_metric_group(resource):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new file covers the Python→Java direction well, but only that direction — set_java_resource_metric_group plus the four Java*Impl wrappers. The Java→Python direction's Python half, python_java_utils.set_metric_group (python_java_utils.py:382), has no direct test: it's the function that wraps the raw Java group in FlinkMetricGroup and owns the only None branch in the feature, and the Java half is mock-verify only (testSetMetricGroup asserts the invoke string, it doesn't execute Python). So that seam isn't exercised end-to-end today.

Would a small unit test in this file's style be worth adding — a fake resource that captures its set_metric_group arg, then set_metric_group(fake, sentinel_j_group) asserting the captured arg is a FlinkMetricGroup whose _j_metric_group is sentinel_j_group, plus a None case asserting None is forwarded? That would pin the wrap-and-None behavior that's currently uncovered.

"""Bind the underlying Java metric group to a wrapped Java resource."""
if j_resource is None:
return
j_metric_group = getattr(metric_group, "_j_metric_group", metric_group)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the real flow this is correct — metric_group is always a FlinkMetricGroup (it comes from action_metric_group), so the unwrap hits _j_metric_group and forwards the genuine Java group. Low confidence that the edge below matters in practice, so genuinely a question rather than a flag.

RunnerContext.get_resource(name, type, metric_group=...) is a documented public API accepting any MetricGroup — an abstract base a user could subclass without a _j_metric_group. If such a custom group reaches a Java resource wrapper, the silent , metric_group fallback would forward a raw Python object into Java setMetricGroup(FlinkAgentsMetricGroup), failing opaquely at the pemja boundary rather than with a clear error. Worth making the fallback explicit (unwrap only a real FlinkMetricGroup, else fail loudly), or is that out of scope today given no in-tree caller hits it? (The None case is benign — get_resource substitutes action_metric_group first, and a direct None just clears the field — so no concern there.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs fixVersion/0.4.0 priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants