New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-7602. Adding metrics to display Pipeline leader Id's. #4054
Conversation
@sadanand48 @symious @dombizita do take a look |
.../server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineLeaderInfoMetrics.java
Outdated
Show resolved
Hide resolved
.../server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineLeaderInfoMetrics.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArafatKhan2198 for updating the PR. LGTM, +1.
...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
Outdated
Show resolved
Hide resolved
UUID pipelineId = pipeline.getId().getId(); | ||
|
||
try { | ||
leaderNode = pipeline.getLeaderNode().getHostName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that for users the datanode hostname would be more informative, but as I looked into it in the DatanodeDetails class only the UUID is unique, based on this code snippet. Maybe we should use that here or add both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ArafatKhan2198 , I think we should show Datanode UUID as well, since we might want to run hundreds of (mock) Ozone datanodes in the future for stress testing. Recon would need to be patched accordingly too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes its probably a good idea then, I have added it to the metrics!
@smengcl
cc @xBis7 can you please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format should be like PipelineInfo
. We shouldn't be adding "{
by hand, there must be a cleaner way to do this and have jmx take care of it.
"PipelineInfo" : [ {
"key" : "CLOSED",
"value" : 0
}, {
"key" : "ALLOCATED",
"value" : 1
}, {
"key" : "OPEN",
"value" : 3
}, {
"key" : "DORMANT",
"value" : 0
} ],
"PipelineLeaders" : [ {
"key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
"value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
}, {
"key" : "17c663fc-e4c2-4244-840f-3c0fdc6ea7a0",
"value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }", "{ HostName : ozone_datanode_1.ozone_default | Role : Follower }", "{ HostName : ozone_datanode_3.ozone_default | Role : Follower }" ]
}, {
"key" : "9d1ec9e6-8d0a-45c0-b4bf-e1d50006ae30",
"value" : [ "{ HostName : ozone_datanode_1.ozone_default | Role : Leader }" ]
}, {
"key" : "09af3c56-2768-46f3-8316-d6fde2262f21",
"value" : [ "{ HostName : ozone_datanode_3.ozone_default | Role : Leader }" ]
} ]
},
{
"key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
"value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
}
Should look like
{
"key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
"value" : [
{
"HostName" : "ozone_datanode_2.ozone_default",
"Role" : "Leader"
}
]
}
...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
Outdated
Show resolved
Hide resolved
...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerMXBean.java
Outdated
Show resolved
Hide resolved
...p-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/pipeline/MockPipelineManager.java
Outdated
Show resolved
Hide resolved
Thanks for the suggestion @xBis7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArafatKhan2198 Thanks for making the changes. I left a few more comments inline.
Although, it looks much better, I find it hard to distinguish between the different values without looking at the code. That will also be the case for an average user.
In many cases hostname is another random sequence of letters and numbers which makes it hard to tell apart from the UUID. I left some suggestions to simplify the output and improve user readability.
This is what it looks like with the suggested changes.
{
"key" : "37b62661-4f40-45b8-9955-03217a139322",
"value" : [ {
"key" : "Role",
"value" : "Leader"
}, {
"key" : "UUID",
"value" : "a8ddb277-5bbc-4e51-967d-ef041b2c196e"
}, {
"key" : "HostName",
"value" : "ozone_datanode_1.ozone_default"
} ]
},
@@ -36,4 +36,7 @@ public interface PipelineManagerMXBean { | |||
*/ | |||
Map<String, Integer> getPipelineInfo() throws NotLeaderException; | |||
|
|||
Map<String, Map<String, String[]>> getPipelineLeaders() | |||
throws NotLeaderException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throws NotLeaderException
is redundant.
@@ -697,6 +699,13 @@ public Map<String, Integer> getPipelineInfo() throws NotLeaderException { | |||
return pipelineInfo; | |||
} | |||
|
|||
@Override | |||
public Map<String, Map<String, String[]>> getPipelineLeaders() | |||
throws NotLeaderException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throws NotLeaderException
is redundant.
* @param pipelines input pipeline | ||
* @return map containing pipeline details | ||
*/ | ||
public static Map<String, Map<String, String[]>> pipelineLeaderFormat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify it and add the hostname
or role
or uuid
as keys to improve user readability.
public static Map<String, Map<String, String[]>> pipelineLeaderFormat( | |
public static Map<String, Map<String, String>> pipelineLeaderFormat( |
*/ | ||
public static Map<String, Map<String, String[]>> pipelineLeaderFormat( | ||
List<Pipeline> pipelines) { | ||
final Map<String, Map<String, String[]>> pipelineInfo = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also change initialization to new ConcurrentHashMap<>()
to improve thread-safety.
final Map<String, Map<String, String[]>> pipelineInfo = new HashMap<>(); | |
final Map<String, Map<String, String>> pipelineInfo = new ConcurrentHashMap<>(); |
} | ||
|
||
int numOfNodes = dataNodes.size(); | ||
Map<String, String[]> nodeInfo = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<String, String[]> nodeInfo = new HashMap<>(); | |
Map<String, String> nodeInfo = new ConcurrentHashMap<>(); |
node.getHostName().equals(leaderNode) ? "Leader" : "Follower"; | ||
String dataNodeUUID = node.getUuidString(); | ||
String hostName = node.getHostName(); | ||
info[0] = role; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info[0] = role; | |
nodeInfo.put("Role", role); |
String dataNodeUUID = node.getUuidString(); | ||
String hostName = node.getHostName(); | ||
info[0] = role; | ||
info[1] = dataNodeUUID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info[1] = dataNodeUUID; | |
nodeInfo.put("UUID", dataNodeUUID); | |
nodeInfo.put("HostName", hostName); |
@@ -36,4 +36,7 @@ public interface PipelineManagerMXBean { | |||
*/ | |||
Map<String, Integer> getPipelineInfo() throws NotLeaderException; | |||
|
|||
Map<String, Map<String, String[]>> getPipelineLeaders() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<String, Map<String, String[]>> getPipelineLeaders() | |
Map<String, Map<String, String>> getPipelineLeaders() |
@@ -697,6 +699,13 @@ public Map<String, Integer> getPipelineInfo() throws NotLeaderException { | |||
return pipelineInfo; | |||
} | |||
|
|||
@Override | |||
public Map<String, Map<String, String[]>> getPipelineLeaders() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public Map<String, Map<String, String[]>> getPipelineLeaders() | |
public Map<String, Map<String, String>> getPipelineLeaders() |
@@ -296,6 +296,11 @@ public Map<String, Integer> getPipelineInfo() { | |||
return null; | |||
} | |||
|
|||
@Override | |||
public Map<String, Map<String, String[]>> getPipelineLeaders() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public Map<String, Map<String, String[]>> getPipelineLeaders() { | |
public Map<String, Map<String, String>> getPipelineLeaders() { |
Do we really need another format of pipeline information? Can same Pipeline object be used and information can be used at caller to display things required? |
None of the metrics across ozone expose the entire objects via the JMX, and also all information given about by a single pipeline object will be cluttered and would definitely require some format refinement, and moreover jmx does not support exposing entire objects. I tried to return a single pipeline object via a getDummyData() the jmx could not register it, and threw this error !
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArafatKhan2198 I proposed these changes to avoid having another nested Map.
Right now you have this structure
key:
value:
key:
value:
key:
value:
IMHO, it's too complicated and difficult for a user to read. Also the Datanode counting Datanode-cnt
is random and misleading.
@smengcl @sadanand48 @dombizita Any thoughts on this?
Hi, @xBis7. Thanks for the review! As far as I know, the only way to display the metrics in the format that you had suggested was to use a nested map. I tried my best to see if we could display the information without using one. But I could not find any. The only other way to display such information with little complexity is to represent it as a string. Like I had done earlier.
The reason I chose it to be represented as a string earlier, is because we have done something similar for OM-HA #3520 and SCM-HA #2260, where we displayed the various roles of OM and SCM as strings, which was pretty simple for the user to understand and get information out of it.
But since we wanted to get the information in key-value pairs, I could not see any other options other than using a nested map. |
@ArafatKhan2198 Sorry, I should have explained it better. I meant removing the third nested Map.
Right now you have
If we scale to more than 3 datanodes, it gets really messy. |
@ArafatKhan2198 If you also add UUID to this, how is it going to look?
|
@ArafatKhan2198 I didn't know about that. I looked around and thought it's best to keep everything as key-value pairs to be more consistent and make it look like |
So at the moment, a pipeline can have either 1 or 3 datanodes hence the reason why we have datanode-0, datanode-1 & datanode-2 therefore they are not generated randomly.
So if we remove the third map, we would probably lose out the information we needed to display in the first place. |
I absolutely agree with you :) |
@ArafatKhan2198 I know that the number is not random and it's referring to the datanode's position on the map but from a user's perspective it's unrelated to the rest of the info.
I'm saying to move the Anyway, I'm fine with either using a String or key-value pairs. |
@sadanand48 @ArafatKhan2198
|
HDDS-7602 Adding metrics to display Pipeline leader Id's will no longer be pursued due to the following reasons mentioned by Sumit.
For example if you have a 50 node cluster than the number of ratis pipelines would be 50C3 = 19,600 pipelines which is too much for the JMX to handle. Hence @nandakumar131 has agreed to close HDDS-7602 and HDDS-899 |
What changes were proposed in this pull request?
Adding metrics for displaying Leader DataNode for all the pipelines.
Was created to help with the following Jira :- https://issues.apache.org/jira/browse/HDDS-899
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7602
How was this patch tested?
Metrics are getting displayed, the Leader node will get updated in the metrics upon new election