[api][runtime][python] Propagate metric groups to cross-language reso…#860
[api][runtime][python] Propagate metric groups to cross-language reso…#860goutamadwant wants to merge 1 commit into
Conversation
weiqingy
left a comment
There was a problem hiding this comment.
Thanks for taking this on — a few questions inline.
|
|
||
| @Override | ||
| public void setMetricGroup(Object pythonResource, FlinkAgentsMetricGroup metricGroup) { | ||
| interpreter.invoke(SET_METRIC_GROUP, pythonResource, metricGroup); |
There was a problem hiding this comment.
This now forwards the action metric group through to the Python provider, which is exactly what issue #857 asked for — and #857 also called out "define metric ownership clearly to avoid double-counting when action-level and provider-level metrics both observe the same response." From what I can see, ownership already looks singular: token metrics are recorded in exactly one place per language — the action handler (ChatModelAction.recordChatTokenMetrics → chatModel.recordTokenMetrics on the Java side, chat_model_action → _record_token_metrics on the Python side), each reading the resource's bound group. The provider integrations under integrations/ record nothing, so there's no second recorder to collide with — this PR widens that single existing recording across the language boundary rather than introducing a second one. So the #857 concern reads as already closed by construction. Would it be worth a sentence in the PR description making that explicit, so the "no double-counting" reasoning is traceable back to the issue?
| public void setMetricGroup(FlinkAgentsMetricGroup metricGroup) { | ||
| this.metricGroup = metricGroup; | ||
| if (this instanceof PythonResourceWrapper) { | ||
| ((PythonResourceWrapper) this).setPythonResourceMetricGroup(metricGroup); |
There was a problem hiding this comment.
The abstract base Resource (package ...api.resource) now imports and instanceof-checks PythonResourceWrapper from the more specialized ...api.resource.python sub-package. That inverts the usual dependency direction — every Java-only resource (chat models, tools, prompts, vector stores) now carries a compile-time edge to the Python-bridge interface in its base type, and a hypothetical third forwarding flavor would mean editing this base method again rather than overriding it.
One option that keeps the base oblivious to the bridge: have each Python* wrapper override setMetricGroup to call super.setMetricGroup(...) and then setPythonResourceMetricGroup(...), pushing the Python concern down into the wrappers where getPythonResourceAdapter() already lives. The PR already adds a new getPythonResourceAdapter() override to all eight wrappers, so a per-wrapper setMetricGroup override would touch the same set of files — just in the Python layer instead of the base class. Was the centralized instanceof chosen deliberately, or mainly to avoid touching the wrappers?
| ), | ||
| ], | ||
| ) | ||
| def test_java_resource_wrappers_forward_metric_group(resource): |
There was a problem hiding this comment.
This new file covers the Python→Java direction well, but only that direction — set_java_resource_metric_group plus the four Java*Impl wrappers. The Java→Python direction's Python half, python_java_utils.set_metric_group (python_java_utils.py:382), has no direct test: it's the function that wraps the raw Java group in FlinkMetricGroup and owns the only None branch in the feature, and the Java half is mock-verify only (testSetMetricGroup asserts the invoke string, it doesn't execute Python). So that seam isn't exercised end-to-end today.
Would a small unit test in this file's style be worth adding — a fake resource that captures its set_metric_group arg, then set_metric_group(fake, sentinel_j_group) asserting the captured arg is a FlinkMetricGroup whose _j_metric_group is sentinel_j_group, plus a None case asserting None is forwarded? That would pin the wrap-and-None behavior that's currently uncovered.
| """Bind the underlying Java metric group to a wrapped Java resource.""" | ||
| if j_resource is None: | ||
| return | ||
| j_metric_group = getattr(metric_group, "_j_metric_group", metric_group) |
There was a problem hiding this comment.
In the real flow this is correct — metric_group is always a FlinkMetricGroup (it comes from action_metric_group), so the unwrap hits _j_metric_group and forwards the genuine Java group. Low confidence that the edge below matters in practice, so genuinely a question rather than a flag.
RunnerContext.get_resource(name, type, metric_group=...) is a documented public API accepting any MetricGroup — an abstract base a user could subclass without a _j_metric_group. If such a custom group reaches a Java resource wrapper, the silent , metric_group fallback would forward a raw Python object into Java setMetricGroup(FlinkAgentsMetricGroup), failing opaquely at the pemja boundary rather than with a clear error. Worth making the fallback explicit (unwrap only a real FlinkMetricGroup, else fail loudly), or is that out of scope today given no in-tree caller hits it? (The None case is benign — get_resource substitutes action_metric_group first, and a direct None just clears the field — so no concern there.)
Linked issue: #857
Purpose of change
This propagates metric group binding through cross-language resource wrappers.
Java wrappers around Python resources now forward
setMetricGroup(...)to the underlying Python resource throughPythonResourceAdapter.Python wrappers around Java resources now forward
set_metric_group(...)to the underlying Java resource, unwrapping the PythonFlinkMetricGroupto its Java metric group.This keeps provider-owned metrics such as latency, request counts, error counts, cache metrics, backend dimensions, and nested resource metrics under the correct action metric scope for cross-language resources.
Tests
mvn --batch-mode --no-transfer-progress -pl api -Dtest=PythonChatModelSetupTest testmvn --batch-mode --no-transfer-progress -pl runtime -am -Dtest=PythonResourceAdapterImplTest -Dsurefire.failIfNoSpecifiedTests=false testmvn --batch-mode --no-transfer-progress -pl api,plan,runtime spotless:checkcd python && uv run --python 3.12 --extra test python -m pytest flink_agents/runtime/tests/test_cross_language_metric_group.pycd python && uv run --python 3.12 --extra lint ruff check flink_agents/runtime/java/java_chat_model.py flink_agents/runtime/java/java_embedding_model.py flink_agents/runtime/java/java_vector_store.py flink_agents/runtime/java/java_resource_wrapper.py flink_agents/runtime/python_java_utils.py flink_agents/runtime/tests/test_cross_language_metric_group.pyAPI
This touches public API interfaces by adding default metric-group forwarding hooks to
PythonResourceAdapterandPythonResourceWrapper.The change is backward compatible because the new interface methods have default implementations.
Documentation
doc-neededdoc-not-neededdoc-included