[BEAM-1866] Plumb user metrics through Fn API.#4344
Conversation
The SDK worker is now periodically querried for progress and user metrics.
|
R: @pabloem |
pabloem
left a comment
There was a problem hiding this comment.
Thanks Robert. This is very cool. I left a couple nits.
I don't know the code in fn_api_runner.py well, but all metrics changes look good.
Approving.
| for transform_id, op in self.ops.items()}, | ||
| user=sum( | ||
| [op.metrics_container.to_runner_api() for op in self.ops.values()], | ||
| [])) |
There was a problem hiding this comment.
Nit: Seems like you have an extra empty list here?
There was a problem hiding this comment.
This empty list is the second argument to sum (which defaults to 0, but that doesn't work for lists).
| self.step_name = operation_name | ||
| self.metrics_container = MetricsContainer(self.step_name) | ||
| self.scoped_metrics_container = ScopedMetricsContainer( | ||
| self.metrics_container) |
There was a problem hiding this comment.
Nice. This bothered me for a while : )
Can you then remove the setting of these variables in operations.py:407, and operations.py:615-617?
There was a problem hiding this comment.
Unfortunately I can't remove them from operations.py as the step and stage names can't be unified in that case (with the legacy harness). But this code will all be going away soon.
pabloem
left a comment
There was a problem hiding this comment.
I'm not sure if the comment was recorded:
I left a couple nits, and I can't speak for fn_api_runner.py. All else looks good.
Thanks!
|
Thanks. The fn_api_runner stuff is mostly refactoring. |
| // A key for identifying a metric at the most granular level. | ||
| message MetricKey { | ||
| // The step, if any, this metric is associated with. | ||
| string step = 1; |
| string step = 1; | ||
|
|
||
| // (Required): The namespace of this metric. | ||
| string namespace = 2; |
| string namespace = 2; | ||
|
|
||
| // (Required): The name of this metric. | ||
| string name = 3; |
There was a problem hiding this comment.
I'm assuming this is the user supplied name.
|
|
||
| map<string, PTransform> ptransforms = 1; | ||
| map<string, User> user = 2; | ||
| repeated User user = 2; |
There was a problem hiding this comment.
If we could keep the map here from ptransform id to User and have User have a repeated field allowing to store multiple metrics.
| } | ||
|
|
||
| // Data associated with a counter metric. | ||
| message CounterData { |
There was a problem hiding this comment.
I only just thought of it while trying to find the JIRA associated with Fn API metrics, but I don't think user-defined metrics should necessarily be conflated with progress reporting. For some contexts, a runner may just support attempted metrics via a proxy over dropwizard. It might be nice to be able to stand this up as an independent service that runners can share.
There was a problem hiding this comment.
I'm not sure what your trying to get at here with an independent service?
If you are saying that the Metrics type should be defined outside of beam_fn_api.proto in some place that is shared between job management and job execution then I agree.
There was a problem hiding this comment.
I mean that the message sent from SDK harness to runner should be over an independent channel from progress reporting.
There was a problem hiding this comment.
The context is that there has been an ask to be able to publish metrics to a metrics pusher that is either independent of the runner (can only support attempted metrics) or supported by a runner. In either case both APIs can share this metrics type or the metrics pusher API can define a new API which is relevant for its use case.
|
By the way, it would actually be nice to have the proto changes in a parent commit for history. I expect they will be a bit more stable. And if you merged that earlier it would make it a little smoother for other language SDKs to get started, vs patching this PR. |
The SDK worker is now periodically querried for progress
and user metrics.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue.mvn clean verifyto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.