New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-24864][metrics] Release TaskManagerJobMetricGroup with the last slot rather than task #17387
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 3825d66 (Wed Sep 29 13:46:18 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
1736c54
to
d8834de
Compare
fc8a945
to
2db1ab7
Compare
Build failure unrelated (FLINK-24806). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the change to the TMJMG lifecycle exceeds the scope of a hotfix by quite a bit.
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroup.java
Outdated
Show resolved
Hide resolved
...runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
Outdated
Show resolved
Hide resolved
public void removeJobMetricsGroup(JobID jobId, TaskManagerJobMetricGroup group) { | ||
if (jobId == null || group == null || !group.isClosed()) { | ||
return; | ||
if (jobGroup == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you no longer checking whether the jobGroup
is closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is already done in close()
method itself (and it cannot be removed from there because of concurrency).
And IIUC, this method removeJobMetricsGroup
can not be called after close
because they are called from
TaskExecutor.releaseJobResources
and stopTaskExecutorServices
respectively.
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroup.java
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroup.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorTest.java
Show resolved
Hide resolved
Thanks for the feedback @zentol !
Would you prefer this change to be in a separate ticket and PR? |
.../flink-dstl-dfs/src/main/java/org/apache/flink/changelog/fs/ChangelogStorageMetricGroup.java
Outdated
Show resolved
Hide resolved
A separate ticket for sure; whether we need a separate PR is up to you (as you'll need to manage the discussions/commits, so pick whatever works best for you). |
I've created a separate ticket (FLINK-24864) and PR (#17757) for changelog-related changes. I've also rebased it onto latest master and addressed your feedback, PTAL. |
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroup.java
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroup.java
Outdated
Show resolved
Hide resolved
...untime/src/test/java/org/apache/flink/runtime/metrics/groups/TaskManagerMetricGroupTest.java
Outdated
Show resolved
Hide resolved
...runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskManagerJobMetricGroup.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorTest.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…t slot rather than task Motivation: TaskManagerJobMetricGroup might be used by components shared across tasks of a job (e.g. ChangelogStorage). The lifecycle of those is bound to slots rather than tasks (for various reasons including performance). Releasing them differently causes those metrics to be not reported. Besides that, this simplifies release code in TaskManagerJobMetricGroup and makes its lifecycle more consistent (creation and deletion in the same place).
What is the purpose of the change
Release
TaskManagerJobMetricGroup
on last job slot release to avoid issues when using it withStateChangelogStorage
, which is released on last slot release (please see ticket description for more details).Verifying this change
TaskExecutorTest.testReleasingJobResources
TaskManagerMetricGroupTest
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation