-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16867. Exiting Mover due to an exception in MoverMetrics.create #5203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
@Jing9 Hello, could you please help review it |
|
💔 -1 overall
This message was automatically generated. |
Jing9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for reporting the issue and providing the fix! Some comments regarding the detailed implementation for the fix.
| private final Path idPath; | ||
| private OutputStream out; | ||
| private final List<Path> targetPaths; | ||
| private final MoverMetrics moverMetrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NameNodeConnector will also be used by Balancer, while MoverMetrics is only used by Mover. So not sure if placing MoverMetrics directly in NameNodeConnector is a good way from the semantic perspective.
| BlockStoragePolicySuite.ID_BIT_LENGTH]; | ||
| this.excludedPinnedBlocks = excludedPinnedBlocks; | ||
| this.nnc = nnc; | ||
| this.metrics = MoverMetrics.create(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the main issue if the potential naming conflict caused by multiple mover instances, can we track the existing MoverMetrics instances and their NNC mappings at the class level (i.e. through a class static field) to avoid the duplication?
| this.excludedPinnedBlocks = excludedPinnedBlocks; | ||
| this.nnc = nnc; | ||
| this.metrics = MoverMetrics.create(this); | ||
| this.metrics = nnc.getMoverMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add some UTs to reproduce the issue (without your fix) and validate the fix.
|
@Jing9 @Happy-shi hello, anyone follow up on this issue? I had the same problem with balancer. 2023-03-06 17:40:53,264 ERROR org.apache.hadoop.hdfs.server.balancer.Balancer: Exiting balancer due an exception
org.apache.hadoop.metrics2.MetricsException: Metrics source Balancer-BP-332003681-10.196.164.22-1648632173322 already exists!
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:225)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:198)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at org.apache.hadoop.hdfs.server.balancer.BalancerMetrics.create(BalancerMetrics.java:55)
at org.apache.hadoop.hdfs.server.balancer.Balancer.<init>(Balancer.java:344)
at org.apache.hadoop.hdfs.server.balancer.Balancer.doBalance(Balancer.java:809)
at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:847)
at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:952)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1102) |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
JIRA: HDFS-16867
Exiting Mover due to an exception in MoverMetrics.create