New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-4776] Add metrics support to Java PortableRunner #10105
[BEAM-4776] Add metrics support to Java PortableRunner #10105
Conversation
9f683e7
to
5d0dd59
Compare
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mwalenia. Looks good. Do we have any integration tests that we can enable for end-to-end testing this?
@mxm I don't think so, we don't run portable e2e tests in Java yet |
I think we do through the Portable ValidatesRunner tests, we might want to enable these:
|
You're right, thanks for pointing it out. I'll delete the exclusions and run the ValidatesRunner. |
Run Java Flink PortableValidatesRunner Batch |
@mxm I'm going to fix these failures and get back to you. Thanks again for pointing out those tests :) |
Sounds good. Thanks! |
Run Java Flink PortableValidatesRunner Streaming |
1 similar comment
Run Java Flink PortableValidatesRunner Streaming |
Run Java Flink PortableValidatesRunner Batch |
@mxm I excluded tests regarding committed metrics, as they are not supported. |
Run Java PreCommit |
1 similar comment
Run Java PreCommit |
Run Java Flink PortableValidatesRunner Batch |
Run Java Flink PortableValidatesRunner Streaming |
Gauges should be supported. I'm using them on a production system. Beam by default doesn't expose any gauges though, so you might have to add some manually. |
How would I go about that? I'm not sure how exposing the metrics is done. Can you point me in a right direction? |
@echauchot Hi, I've stumbled upon a MetricsPusherTest failure in this PR. the runner reports more than just the user metric defined in the test. TestMetricSink returns just the first metric from the list to the test. Since there's no guarantee that it will be the user metric, the assert is likely to catch a wrong value and fail. Do you think this is a good reason to make the test account for such a situation? I hope you're the person to reach out to in this case - |
@@ -139,12 +139,9 @@ def portableValidatesRunnerTask(String name, Boolean streaming) { | |||
includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner' | |||
excludeCategories 'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders' | |||
excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB' | |||
excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker for this PR but out of curiosity, do enabling these in the Portable Spark Runner pass? It would be a good idea to enable it to if so, or report the errors so they can be fixed if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I can create a PR to check this, that's a topic worth investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10198 it's here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, now that I think of it, you probably wanted to check the impact of my changes on the Spark runner, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iemejia Enabling the tests on Portable Spark runner fails. I'd have to investigate further in order to pinpoint the areas that fail
@mxm How can I go about manually adding gauges? Does that mean changing the FlinkRunner to publish gauge metrics? |
Run Java Spark PortableValidatesRunner Batch |
Gauges are reported here: beam/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java Line 149 in 885ecbf
Also they are added to the accumulator here: beam/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java Line 106 in 885ecbf
I don't know why the tests are not passing, but we can also fix gauges in a follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment about enhancement of MetricsPusher
private Iterable<MetricResult<DistributionResult>> distributions; | ||
private Iterable<MetricResult<GaugeResult>> gauges; | ||
|
||
private PortableMetrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwalenia to answer your question, I'm indeed the correct person for MetricsPusher related questions.
Regarding MetricsPusher, the problem goes beyond the test itself. The whole MetricsPusher feature reports for now only user metrics (that is why the test sink that is tailored for it only reads user metrics). But the aim since the beginning of the architectural design (pull vs push essentially) was to allow in the future to support system metrics. Here is the design I did at the time: https://s.apache.org/runner_independent_metrics_extraction.
Long story short, the good thing to do IMHO is to enhance MetricsPusher to support system metrics as well and, of course, update the test/sink.
@mxm you're right, it will be simpler to figure out gauges in another PR. |
|
||
@Category({ValidatesRunner.class, UsesAttemptedMetrics.class, UsesCounterMetrics.class}) | ||
@Test | ||
public void pushesSystemMetrics() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echauchot I added a test that checks if system metrics are supported by MetricPusher. It seems that they work :)
I also fixed the TestMetricsSink to account for this fact.
Run Java PreCommit |
e5c819f
to
7517573
Compare
@mxm It seems that there is no support for gauges in portability - I didn't find a proper MonitoringInfo type in metrics.proto. |
Run Java PreCommit |
@mxm the tests are green :) I think we need to take the gauge issue elsewhere, as it seems the gauges aren't portable at all. |
Gauges are portable. The type is |
Could you squash the commits? |
For Portable Spark runner the issue tracking passing metrics from SDK harness to Spark is this https://issues.apache.org/jira/browse/BEAM-7219 |
7517573
to
4575e1c
Compare
@mxm the commits are squashed. |
This has broken the Flink runner, it seems: https://issues.apache.org/jira/browse/BEAM-8869 It is also failing in some of Google's internal testing. I am still investigating that but will try to summarize and repro externally. |
pipeline.run(); | ||
// give metrics pusher time to push | ||
Thread.sleep( | ||
(pipeline.getOptions().as(MetricsOptions.class).getMetricsPushPeriod() + 1L) * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably lower this interval and build in a retry logic. Otherwise this is prone to breaking.
This PR adds conversion of portable MonitoringInfos to MetricResults in Java's PortableRunner.
R: @lgajowy @mxm @angoenka @iemejia
Can you take a look, guys? Thanks!
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.