New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backlog metrics do not showing up in FlinkRunner #25554
Conversation
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
assign set of reviewers |
1 similar comment
assign set of reviewers |
No reviewers could be found from any of the labels on the PR or in the fallback reviewers list. Check the config file to make sure reviewers are configured |
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add test to verify that the change brings the expected behavior?
The test might probably just verify that the metrics are available after running a simple pipeline with some source that updates the metrics.
There is an analogous test in |
@@ -391,6 +391,9 @@ public void snapshotState(FunctionSnapshotContext functionSnapshotContext) throw | |||
return; | |||
} | |||
|
|||
ReaderInvocationUtil<OutputT, UnboundedSource.UnboundedReader<OutputT>> readerInvoker = | |||
new ReaderInvocationUtil<>(stepName, serializedOptions.get(), metricContainer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved to a field and reused between calls to run
and snapshotState
.
I wonder, why do we need to write metrics into the context on checkpoint? The current code does so in the call to |
Hey @je-ik
Current code does not emit backlog metrics on advance. I see it is on checkpointing phase https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java#L255 Do I miss anything ? |
If I understand that correctly (and I'm definitely not sure about that :)), then you refer to the time when metrics are compute in the source (KafkaSource in this case). But the wrapper should update the metric container on each call to |
Backlog is reported from getCheckpointMark(), which is done by some other thread. Not sure why it is done there. But this is the main issue. Advance function runs on main thread and it has Metric context so I am able to see element count metrics. However Checkpoint thread does not have context thats why It can not emit metrics. If I move reportBacklog() function in advance function i am able to see backlog metrics too. We could do that in advance(), but that would unnecessary overhead for every single record. :) |
No, I think there is good reason the backlog is computed on checkpoint only. :) |
retest this please |
@je-ik Looks like this change is not working. Based on initial experiment. Whenever I enable beammetrics Pipeline stop processing. Do you have any suggestion ? I believe there is a concurrency issue. But I still count not define the issue. |
What you describe sounds like a deadlock. That was what I was trying to figure out, the metrics are generated in the checkpoint thread, but should be read in the thread that calls |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
We have a Flink Job which does not emit backlog metrics. Actually metrics are emitting in KafkaIO. However I could not see them on Flink Metric system. Looks like Beam -> Flink wiring is broken. I set metricContext in Checkpointing phase which is the place metrics emit on UnBoundedReader.
@mxm @tweise @angoenka Could you please review my MR ? I tested in our env. It is working as expected.