What would you like to happen?
Hi team,
It would be really nice if the MetricsOptions class and export were available to the dataflow runner .
Currently,
only Flink and Spark runners support metrics export.
This would help export metrics outside of GCP when running dataflow, specifically when running custom flex templates in streaming mode. In this scenario the metrics results aren't queryable at all (from what I can tell).
Happy to add more details, this is my first apache beam issue.
Thanks!
Edit: Might be useful to add an example here.
- We have a family of dataflow flex templates that run beam java (version 2.51.0).
- We want to export metrics from these jobs to some external providers, in our scenario we don't use google cloud monitoring.
- We wrote an implementation of
MetricsSink to collect metrics in 15 second push periods and submit them to the external provider - but then realized that it's not compatible with the DataFlowRunner.
- We tried to collect metrics using
PipeLineResults - but it doesn't seem like this is possible on the dataflow runner either, specifically in streaming pipelines.
The only work around is to bypass beam metrics and collect aggregations of data during processing, them submit via API. This is doable but seems like it shouldn't be needed, the beam metrics package works nicely up until export. It would be great if we could export metrics using the dataflow runner.
I hope this outline helps.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
What would you like to happen?
Hi team,
It would be really nice if the MetricsOptions class and export were available to the dataflow runner .
Currently,
This would help export metrics outside of GCP when running dataflow, specifically when running custom flex templates in streaming mode. In this scenario the metrics results aren't queryable at all (from what I can tell).
Happy to add more details, this is my first apache beam issue.
Thanks!
Edit: Might be useful to add an example here.
MetricsSinkto collect metrics in 15 second push periods and submit them to the external provider - but then realized that it's not compatible with the DataFlowRunner.PipeLineResults- but it doesn't seem like this is possible on the dataflow runner either, specifically in streaming pipelines.The only work around is to bypass beam metrics and collect aggregations of data during processing, them submit via API. This is doable but seems like it shouldn't be needed, the beam metrics package works nicely up until export. It would be great if we could export metrics using the dataflow runner.
I hope this outline helps.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components