Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-7602. Adding metrics to display Pipeline leader Id's. #4054

Closed
wants to merge 13 commits into from

Conversation

ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Dec 7, 2022

What changes were proposed in this pull request?

Adding metrics for displaying Leader DataNode for all the pipelines.
Was created to help with the following Jira :- https://issues.apache.org/jira/browse/HDDS-899

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7602

How was this patch tested?

Metrics are getting displayed, the Leader node will get updated in the metrics upon new election

image

@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review December 7, 2022 14:44
@ArafatKhan2198
Copy link
Contributor Author

@sadanand48 @symious @dombizita do take a look

@ArafatKhan2198 ArafatKhan2198 requested review from sadanand48 and removed request for sadanand48 December 8, 2022 15:15
Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for updating the PR. LGTM, +1.

UUID pipelineId = pipeline.getId().getId();

try {
leaderNode = pipeline.getLeaderNode().getHostName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that for users the datanode hostname would be more informative, but as I looked into it in the DatanodeDetails class only the UUID is unique, based on this code snippet. Maybe we should use that here or add both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed a similar approach to how recon display's the Leader DataNode of the pipeline, by displaying the HostName, so I believe HostName should be alright

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ArafatKhan2198 , I think we should show Datanode UUID as well, since we might want to run hundreds of (mock) Ozone datanodes in the future for stress testing. Recon would need to be patched accordingly too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes its probably a good idea then, I have added it to the metrics!
@smengcl

@kerneltime
Copy link
Contributor

cc @xBis7 can you please take a look.

@ArafatKhan2198 ArafatKhan2198 requested review from sadanand48 and dombizita and removed request for sadanand48 and dombizita December 12, 2022 18:59
@ArafatKhan2198 ArafatKhan2198 requested review from dombizita and sadanand48 and removed request for sadanand48 and dombizita December 12, 2022 20:45
Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format should be like PipelineInfo. We shouldn't be adding "{ by hand, there must be a cleaner way to do this and have jmx take care of it.

    "PipelineInfo" : [ {
      "key" : "CLOSED",
      "value" : 0
    }, {
      "key" : "ALLOCATED",
      "value" : 1
    }, {
      "key" : "OPEN",
      "value" : 3
    }, {
      "key" : "DORMANT",
      "value" : 0
    } ],
    "PipelineLeaders" : [ {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
    }, {
      "key" : "17c663fc-e4c2-4244-840f-3c0fdc6ea7a0",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }", "{ HostName : ozone_datanode_1.ozone_default | Role : Follower }", "{ HostName : ozone_datanode_3.ozone_default | Role : Follower }" ]
    }, {
      "key" : "9d1ec9e6-8d0a-45c0-b4bf-e1d50006ae30",
      "value" : [ "{ HostName : ozone_datanode_1.ozone_default | Role : Leader }" ]
    }, {
      "key" : "09af3c56-2768-46f3-8316-d6fde2262f21",
      "value" : [ "{ HostName : ozone_datanode_3.ozone_default | Role : Leader }" ]
    } ]
  },
  {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
  }

Should look like

   {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [
         {
            "HostName" : "ozone_datanode_2.ozone_default",
            "Role" : "Leader"
         }
      ]
   }

@ArafatKhan2198
Copy link
Contributor Author

The format should be like PipelineInfo. We shouldn't be adding "{ by hand, there must be a cleaner way to do this and have jmx take care of it.

    "PipelineInfo" : [ {
      "key" : "CLOSED",
      "value" : 0
    }, {
      "key" : "ALLOCATED",
      "value" : 1
    }, {
      "key" : "OPEN",
      "value" : 3
    }, {
      "key" : "DORMANT",
      "value" : 0
    } ],
    "PipelineLeaders" : [ {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
    }, {
      "key" : "17c663fc-e4c2-4244-840f-3c0fdc6ea7a0",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }", "{ HostName : ozone_datanode_1.ozone_default | Role : Follower }", "{ HostName : ozone_datanode_3.ozone_default | Role : Follower }" ]
    }, {
      "key" : "9d1ec9e6-8d0a-45c0-b4bf-e1d50006ae30",
      "value" : [ "{ HostName : ozone_datanode_1.ozone_default | Role : Leader }" ]
    }, {
      "key" : "09af3c56-2768-46f3-8316-d6fde2262f21",
      "value" : [ "{ HostName : ozone_datanode_3.ozone_default | Role : Leader }" ]
    } ]
  },
  {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
  }

Should look like

   {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [
         {
            "HostName" : "ozone_datanode_2.ozone_default",
            "Role" : "Leader"
         }
      ]
   }

Thanks for the suggestion @xBis7
I have made the required format changes !!

@ArafatKhan2198 ArafatKhan2198 requested review from xBis7 and smengcl and removed request for sadanand48, xBis7 and smengcl December 20, 2022 11:19
Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 Thanks for making the changes. I left a few more comments inline.

Although, it looks much better, I find it hard to distinguish between the different values without looking at the code. That will also be the case for an average user.

In many cases hostname is another random sequence of letters and numbers which makes it hard to tell apart from the UUID. I left some suggestions to simplify the output and improve user readability.

This is what it looks like with the suggested changes.

{
      "key" : "37b62661-4f40-45b8-9955-03217a139322",
      "value" : [ {
        "key" : "Role",
        "value" : "Leader"
      }, {
        "key" : "UUID",
        "value" : "a8ddb277-5bbc-4e51-967d-ef041b2c196e"
      }, {
        "key" : "HostName",
        "value" : "ozone_datanode_1.ozone_default"
      } ]
    },

@@ -36,4 +36,7 @@ public interface PipelineManagerMXBean {
*/
Map<String, Integer> getPipelineInfo() throws NotLeaderException;

Map<String, Map<String, String[]>> getPipelineLeaders()
throws NotLeaderException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throws NotLeaderException is redundant.

@@ -697,6 +699,13 @@ public Map<String, Integer> getPipelineInfo() throws NotLeaderException {
return pipelineInfo;
}

@Override
public Map<String, Map<String, String[]>> getPipelineLeaders()
throws NotLeaderException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throws NotLeaderException is redundant.

* @param pipelines input pipeline
* @return map containing pipeline details
*/
public static Map<String, Map<String, String[]>> pipelineLeaderFormat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify it and add the hostname or role or uuid as keys to improve user readability.

Suggested change
public static Map<String, Map<String, String[]>> pipelineLeaderFormat(
public static Map<String, Map<String, String>> pipelineLeaderFormat(

*/
public static Map<String, Map<String, String[]>> pipelineLeaderFormat(
List<Pipeline> pipelines) {
final Map<String, Map<String, String[]>> pipelineInfo = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change initialization to new ConcurrentHashMap<>() to improve thread-safety.

Suggested change
final Map<String, Map<String, String[]>> pipelineInfo = new HashMap<>();
final Map<String, Map<String, String>> pipelineInfo = new ConcurrentHashMap<>();

}

int numOfNodes = dataNodes.size();
Map<String, String[]> nodeInfo = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map<String, String[]> nodeInfo = new HashMap<>();
Map<String, String> nodeInfo = new ConcurrentHashMap<>();

node.getHostName().equals(leaderNode) ? "Leader" : "Follower";
String dataNodeUUID = node.getUuidString();
String hostName = node.getHostName();
info[0] = role;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info[0] = role;
nodeInfo.put("Role", role);

String dataNodeUUID = node.getUuidString();
String hostName = node.getHostName();
info[0] = role;
info[1] = dataNodeUUID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info[1] = dataNodeUUID;
nodeInfo.put("UUID", dataNodeUUID);
nodeInfo.put("HostName", hostName);

@@ -36,4 +36,7 @@ public interface PipelineManagerMXBean {
*/
Map<String, Integer> getPipelineInfo() throws NotLeaderException;

Map<String, Map<String, String[]>> getPipelineLeaders()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map<String, Map<String, String[]>> getPipelineLeaders()
Map<String, Map<String, String>> getPipelineLeaders()

@@ -697,6 +699,13 @@ public Map<String, Integer> getPipelineInfo() throws NotLeaderException {
return pipelineInfo;
}

@Override
public Map<String, Map<String, String[]>> getPipelineLeaders()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Map<String, Map<String, String[]>> getPipelineLeaders()
public Map<String, Map<String, String>> getPipelineLeaders()

@@ -296,6 +296,11 @@ public Map<String, Integer> getPipelineInfo() {
return null;
}

@Override
public Map<String, Map<String, String[]>> getPipelineLeaders() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Map<String, Map<String, String[]>> getPipelineLeaders() {
public Map<String, Map<String, String>> getPipelineLeaders() {

@sumitagrawl
Copy link
Contributor

Do we really need another format of pipeline information? Can same Pipeline object be used and information can be used at caller to display things required?
All information is present in Pipeline, but where creating another output view in pipelineFormat.

@ArafatKhan2198
Copy link
Contributor Author

Do we really need another format of pipeline information? Can same Pipeline object be used and information can be used at caller to display things required? All information is present in Pipeline, but where creating another output view in pipelineFormat.

None of the metrics across ozone expose the entire objects via the JMX, and also all information given about by a single pipeline object will be cluttered and would definitely require some format refinement, and moreover jmx does not support exposing entire objects. I tried to return a single pipeline object via a getDummyData() the jmx could not register it, and threw this error !

org.apache.hadoop.hdds.scm.pipeline.PipelineManagerMXBean: Method org.apache.hadoop.hdds.scm.pipeline.PipelineManagerMXBean.getDummyData has parameter or return type that cannot be translated into an open type

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 I proposed these changes to avoid having another nested Map.

Right now you have this structure

    key: 
    value: 
        key:
        value:
            key:
            value:

IMHO, it's too complicated and difficult for a user to read. Also the Datanode counting Datanode-cnt is random and misleading.

@smengcl @sadanand48 @dombizita Any thoughts on this?

@ArafatKhan2198
Copy link
Contributor Author

Hi, @xBis7. Thanks for the review! As far as I know, the only way to display the metrics in the format that you had suggested was to use a nested map. I tried my best to see if we could display the information without using one. But I could not find any. The only other way to display such information with little complexity is to represent it as a string. Like I had done earlier.

  {
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
  }

The reason I chose it to be represented as a string earlier, is because we have done something similar for OM-HA #3520 and SCM-HA #2260, where we displayed the various roles of OM and SCM as strings, which was pretty simple for the user to understand and get information out of it.

  {
      "RatisRoles": " { HostName: om1 | Node-Id: om1 | Ratis-Port : 9872 | Role: FOLLOWER} { HostName: om2 | Node-Id: om2 | Ratis-Port : 9872 | Role: LEADER} { HostName: om3 | Node-Id: om3 | Ratis-Port : 9872 | Role: FOLLOWER} "
  }
  {
      ScmRatisRoles": "{ HostName : scm1, Ratis Port : 9894, Role : LEADER } { HostName : scm3, Ratis Port : 9894, Role : FOLLOWER } { HostName : scm2, Ratis Port : 9894, Role : FOLLOWER } "
  }

But since we wanted to get the information in key-value pairs, I could not see any other options other than using a nested map.

@xBis7
Copy link
Contributor

xBis7 commented Dec 21, 2022

@ArafatKhan2198 Sorry, I should have explained it better. I meant removing the third nested Map.

   key: pipeline id
   value: {
        key: 
        value: 
        }

Right now you have

   key: pipeline id
   value: {
        key: Datanode-0
        value: {
                key:
                value:
            } 
        }

If we scale to more than 3 datanodes, it gets really messy.

@xBis7
Copy link
Contributor

xBis7 commented Dec 21, 2022

@ArafatKhan2198 If you also add UUID to this, how is it going to look?

{
      "key" : "446dbffe-45d9-4c3d-b7f5-d8ad562a7dcb",
      "value" : [ "{ HostName : ozone_datanode_2.ozone_default | Role : Leader }" ]
}

@xBis7
Copy link
Contributor

xBis7 commented Dec 21, 2022

The reason I chose it to be represented as a string earlier, is because we have done something similar for OM-HA #3520 and SCM-HA #2260, where we displayed the various roles of OM and SCM as strings, which was pretty simple for the user to understand and get information out of it.

@ArafatKhan2198 I didn't know about that. I looked around and thought it's best to keep everything as key-value pairs to be more consistent and make it look like PipelineInfo. I don't know what the rest think about it. Based on all that, I'm fine with using a String.

@ArafatKhan2198
Copy link
Contributor Author

ArafatKhan2198 commented Dec 21, 2022

So at the moment, a pipeline can have either 1 or 3 datanodes hence the reason why we have datanode-0, datanode-1 & datanode-2 therefore they are not generated randomly.

First Map -> Pipeline-Id as the Key & a Map of all DataNodes part of the pipeline as Value 
Second Map -> Datanode number as the key & a Map of all the information about the specific datanode as Value
Third Map -> Role, UUID & Hostname as key & their respective metric as Value  

   key: pipeline id
   value: {
        key: Datanode-0
        value: {
                key:
                value:
            } 
        }

So if we remove the third map, we would probably lose out the information we needed to display in the first place.

@ArafatKhan2198
Copy link
Contributor Author

The reason I chose it to be represented as a string earlier, is because we have done something similar for OM-HA #3520 and SCM-HA #2260, where we displayed the various roles of OM and SCM as strings, which was pretty simple for the user to understand and get information out of it.

@ArafatKhan2198 I didn't know about that. I looked around and thought it's best to keep everything as key-value pairs to be more consistent and make it look like PipelineInfo. I don't know what the rest think about it. Based on all that, I'm fine with using a String.

I absolutely agree with you :)
I think we should wait for what others think about it !!
@sadanand48 @smengcl @dombizita

@xBis7
Copy link
Contributor

xBis7 commented Dec 21, 2022

therefore they are not generated randomly.

@ArafatKhan2198 I know that the number is not random and it's referring to the datanode's position on the map but from a user's perspective it's unrelated to the rest of the info.

So if we remove the third map, we would probably lose out the information we needed to display in the first place.

I'm saying to move the Role, UUID & Hostname in the outer Map.

Anyway, I'm fine with either using a String or key-value pairs.

@sumitagrawl
Copy link
Contributor

@sadanand48 @ArafatKhan2198
I think this is not required as,

  1. Recon already have this interface as separate api, this is not required at SCM page also, user can refer recon itself.
  2. Amount of information is very big for big cluster, and jmx do not support pagination, it will be difficult to scale
  3. serviceip:port/jmx itself gives a lot of information, I think should avoid over-populate this

@ArafatKhan2198
Copy link
Contributor Author

HDDS-7602 Adding metrics to display Pipeline leader Id's will no longer be pursued due to the following reasons mentioned by Sumit.

  1. Recon already have this interface as separate api, this is not required at SCM page also, user can refer recon itself.
  2. Amount of information is very big for big cluster, and jmx do not support pagination, it will be difficult to scale
  3. serviceip:port/jmx itself gives a lot of information, I think we should avoid over-populating it

For example if you have a 50 node cluster than the number of ratis pipelines would be 50C3 = 19,600 pipelines which is too much for the JMX to handle. Hence @nandakumar131 has agreed to close HDDS-7602 and HDDS-899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants