HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. #1154

sadanand48 · 2020-07-01T06:26:21Z

What changes were proposed in this pull request?

Currently the chunk-info tool (HDDS-3134) inside ozone debug displays information (chunk/block files that constitute the key) only from the first node of the pipeline. The change here is to extend it for the replicas such that the tool will display all the chunks/blocks in all the replicas

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3867

How was this patch tested?

Tested manually on docker.

Tool o/p:

{
  "KeyLocations": [
    [
      {
        "Datanode-HostName": "ozone_datanode_2.ozone_default",
        "Datanode-IP": "172.25.0.5",
        "Container-ID": 1,
        "Block-ID": 104477715974127616,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974127616.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      },
      {
        "Datanode-HostName": "ozone_datanode_3.ozone_default",
        "Datanode-IP": "172.25.0.8",
        "Container-ID": 1,
        "Block-ID": 104477715974127616,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974127616.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      },
      {
        "Datanode-HostName": "ozone_datanode_1.ozone_default",
        "Datanode-IP": "172.25.0.6",
        "Container-ID": 1,
        "Block-ID": 104477715974127616,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974127616.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      }
    ],
    [
      {
        "Datanode-HostName": "ozone_datanode_2.ozone_default",
        "Datanode-IP": "172.25.0.5",
        "Container-ID": 1,
        "Block-ID": 104477715974586369,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974586369.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      },
      {
        "Datanode-HostName": "ozone_datanode_3.ozone_default",
        "Datanode-IP": "172.25.0.8",
        "Container-ID": 1,
        "Block-ID": 104477715974586369,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974586369.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      },
      {
        "Datanode-HostName": "ozone_datanode_1.ozone_default",
        "Datanode-IP": "172.25.0.6",
        "Container-ID": 1,
        "Block-ID": 104477715974586369,
        "Locations": {
          "files": [
            "/data/hdds/hdds/5d3f3f5d-90a5-49ee-a6ee-8c1c79d045ea/current/containerDir0/1/chunks/104477715974586369.block"
          ],
          "pipelineID": "9e626531-3538-4930-bd5f-aa90e6fe9d74"
        }
      }
    ]
  ]
}

…e pipeline.

…into HDDS-3867

elek

Thanks to create this patch @sadanand48

I have one design concern related to this change. I am not sure what is the good answer, but trying to share my view:

This patch modifies the client interface to provide more functionality for a debug tool. I like this more functionality, but I think we shouldn't add any debug specific feature to the main client API. Client API should be backward compatible between releases and it might have multiple implementations.

I would prefer to separate the client api from the admin api (used by this tool).

As an example: Let's say I would like to implement a new client for Erasure Coding. If the client interface contains debug specific methods I need to implement all of them, which makes introduce any new feature harder.

I am not sure how is it possible to keep this separation (I need your feedback). One possible solution is ot move the logic of looping over the datanodes to the ChunkKeyHandler. But it's not clear how can it work with RATIS pipeline. Or do you use STANDALONE client to read the data wich is persisted with RATIS?

bshashikant · 2020-07-30T09:04:52Z

Thanks to create this patch @sadanand48

I have one design concern related to this change. I am not sure what is the good answer, but trying to share my view:

This patch modifies the client interface to provide more functionality for a debug tool. I like this more functionality, but I think we shouldn't add any debug specific feature to the main client API. Client API should be backward compatible between releases and it might have multiple implementations.

I would prefer to separate the client api from the admin api (used by this tool).

As an example: Let's say I would like to implement a new client for Erasure Coding. If the client interface contains debug specific methods I need to implement all of them, which makes introduce any new feature harder.

I am not sure how is it possible to keep this separation (I need your feedback). One possible solution is ot move the logic of looping over the datanodes to the ChunkKeyHandler. But it's not clear how can it work with RATIS pipeline. Or do you use STANDALONE client to read the data which is persisted with RATIS?

I think we should remove XceiverClient interface altogether. Read and write path client functionality seem to diverge a lot .
Adding a new functionality in XceiverClientGrpc will just be fine then.

There is alredy a jira filed for the same https://issues.apache.org/jira/browse/HDDS-998.

bshashikant · 2020-08-04T16:27:08Z

Thanks to create this patch @sadanand48
I have one design concern related to this change. I am not sure what is the good answer, but trying to share my view:
This patch modifies the client interface to provide more functionality for a debug tool. I like this more functionality, but I think we shouldn't add any debug specific feature to the main client API. Client API should be backward compatible between releases and it might have multiple implementations.
I would prefer to separate the client api from the admin api (used by this tool).
As an example: Let's say I would like to implement a new client for Erasure Coding. If the client interface contains debug specific methods I need to implement all of them, which makes introduce any new feature harder.
I am not sure how is it possible to keep this separation (I need your feedback). One possible solution is ot move the logic of looping over the datanodes to the ChunkKeyHandler. But it's not clear how can it work with RATIS pipeline. Or do you use STANDALONE client to read the data which is persisted with RATIS?

I think we should remove XceiverClient interface altogether. Read and write path client functionality seem to diverge a lot .
Adding a new functionality in XceiverClientGrpc will just be fine then.

There is alredy a jira filed for the same https://issues.apache.org/jira/browse/HDDS-998.

@elek , I feel if we can go ahead with this change and then try to remove XceiverClient interface altogether in a separte jira. Your thoughts??

elek · 2020-08-11T12:27:37Z

@elek , I feel if we can go ahead with this change and then try to remove XceiverClient interface altogether in a separte jira. Your thoughts??

We need some kind of interfaces if we would like to support different type of write path. This interface might not be so low-level as today, but something is required (maybe just a OutputStream factory?).

But it doesn't answer my original concern about mixing client specific and admin specific methods in the same interface. Even with a dedicated interface, if it's part of the client API we need to make it backward compatible forever.

I am not fully against it, but I think we need a strong argument to extend client interface. For example why don't we use any CLI tool on the datanode which can read the db and blocks data directly? In that case we don't need to add complexity for RPC interface at all?

bshashikant · 2020-08-12T09:52:15Z

@elek , I feel if we can go ahead with this change and then try to remove XceiverClient interface altogether in a separte jira. Your thoughts??

We need some kind of interfaces if we would like to support different type of write path. This interface might not be so low-level as today, but something is required (maybe just a OutputStream factory?).

But it doesn't answer my original concern about mixing client specific and admin specific methods in the same interface. Even with a dedicated interface, if it's part of the client API we need to make it backward compatible forever.

I am not fully against it, but I think we need a strong argument to extend client interface. For example why don't we use any CLI tool on the datanode which can read the db and blocks data directly? In that case we don't need to add complexity for RPC interface at all?

Let's define a new admin interface and add the method there.

The idea of the tool is to read the info and do some sort of validation on each replica of the block and it should be possible to run it from any node.

elek · 2020-08-25T12:17:55Z

it should be possible to run it from any node.

Do you have any specific reason for this requirement? Admins can easily login to the datanodes and execute validation. (Not against it, just curious...)

bshashikant · 2020-08-26T07:28:00Z

it should be possible to run it from any node.

Do you have any specific reason for this requirement? Admins can easily login to the datanodes and execute validation. (Not against it, just curious...)

The cmd needs to connection to OM to fetch the block info first and then datanodes to read the chunk info to operate successfully.

Secondly, as similar to hdfs, admin cmds can be run from any node. The idea is to mimic the same here.

elek · 2020-08-27T10:52:54Z

Thanks to explain it @bshashikant, let me ask differently:

Secondly, as similar to hdfs, admin cmds can be run from any node. The idea is to mimic the same here.

Well, if ithere is no strong argument, we can follow different patterns IMHO.

The cmd needs to connection to OM to fetch the block info first and then datanodes to read the chunk info to operate successfully.

Seems to be a strong argument (thanks to share), we can do offline tool, but it requires different model (knowing the address of dn and blockid). If we alread use OM I understand why this is the plan.

elek

Ok, I checked again the code and I can acknowledge that I was wrong.

It's hard to say if it's admin related call which is added but a very generic one and (in the mean-time) I learned that we might need something similar for EC (for non-admin calls)

The client RPC API is not changed, it's just a client side optimization to initialize multiple, existing client calls.

I apologize for the delay what I caused. Or: thanks for the discussion, I like to have interesting discussions about possible design (as we have no other forum for that), but I know that it may require patience ...

In exchange, I am happy to merge this one (if @bshashikant has no other concerns).

bshashikant · 2020-08-27T11:28:46Z

@elek , please go ahead with the merge.

…nodes in the pipeline. (apache#1154)

* master: (26 commits) HDDS-4167. Acceptance test logs missing if fails during cluster startup (apache#1366) HDDS-4121. Implement OmMetadataMangerImpl#getExpiredOpenKeys. (apache#1351) HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. (apache#1154) HDDS-4077. Incomplete OzoneFileSystem statistics (apache#1329) HDDS-3903. OzoneRpcClient support batch rename keys. (apache#1150) HDDS-4151. Skip the inputstream while offset larger than zero in s3g (apache#1354) HDDS-4147. Add OFS to FileSystem META-INF (apache#1352) HDDS-4137. Turn on the verbose mode of safe mode check on testlib (apache#1343) HDDS-4146. Show the ScmId and ClusterId in the scm web ui. (apache#1350) HDDS-4145. Bump version to 1.1.0-SNAPSHOT on master (apache#1349) HDDS-4109. Tests in TestOzoneFileSystem should use the existing MiniOzoneCluster (apache#1316) HDDS-4149. Implement OzoneFileStatus#toString (apache#1356) HDDS-4153. Increase default timeout in kubernetes tests (apache#1357) HDDS-2411. add a datanode chunk validator fo datanode chunk generator (apache#1312) HDDS-4140. Auto-close /pending pull requests after 21 days of inactivity (apache#1344) HDDS-4152. Archive container logs for kubernetes check (apache#1355) HDDS-4056. Convert OzoneAdmin to pluggable model (apache#1285) HDDS-3972. Add option to limit number of items displaying through ldb tool. (apache#1206) HDDS-4068. Client should not retry same OM on network connection failure (apache#1324) HDDS-4062. Non rack aware pipelines should not be created if multiple racks are alive. (apache#1291) ...

sadanand48 and others added 9 commits July 1, 2020 11:42

Extend the chunkinfo tool to display information from all nodes in th…

715f22f

…e pipeline.

Removed wildcard imports

a400445

Code cleanup

1990e56

fixed checkstyle

7e1cda5

trigger new CI check

2ef161f

made the output as a perfect json

f68ee65

Merge branch 'HDDS-3867' of https://github.com/sadanand48/hadoop-ozone …

e08aff2

…into HDDS-3867

trigger new CI check

56bb2ad

trigger new CI check

d466827

elek requested changes Jul 27, 2020

View reviewed changes

Merge branch 'master' into HDDS-3867

c062d7d

elek approved these changes Aug 27, 2020

View reviewed changes

bshashikant approved these changes Aug 27, 2020

View reviewed changes

retrigger build with empty commit

f99d2b9

elek merged commit 0ec1a8a into apache:master Aug 31, 2020

rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020

HDDS-3867. Extend the chunkinfo tool to display information from all …

0be49a2

…nodes in the pipeline. (apache#1154)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. #1154

HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. #1154

sadanand48 commented Jul 1, 2020 •

edited by elek

Loading

elek left a comment

bshashikant commented Jul 30, 2020 •

edited

Loading

bshashikant commented Aug 4, 2020

elek commented Aug 11, 2020

bshashikant commented Aug 12, 2020

elek commented Aug 25, 2020 •

edited

Loading

bshashikant commented Aug 26, 2020 •

edited by elek

Loading

elek commented Aug 27, 2020

elek left a comment

bshashikant commented Aug 27, 2020

HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. #1154

HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. #1154

Conversation

sadanand48 commented Jul 1, 2020 • edited by elek Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

elek left a comment

Choose a reason for hiding this comment

bshashikant commented Jul 30, 2020 • edited Loading

bshashikant commented Aug 4, 2020

elek commented Aug 11, 2020

bshashikant commented Aug 12, 2020

elek commented Aug 25, 2020 • edited Loading

bshashikant commented Aug 26, 2020 • edited by elek Loading

elek commented Aug 27, 2020

elek left a comment

Choose a reason for hiding this comment

bshashikant commented Aug 27, 2020

sadanand48 commented Jul 1, 2020 •

edited by elek

Loading

bshashikant commented Jul 30, 2020 •

edited

Loading

elek commented Aug 25, 2020 •

edited

Loading

bshashikant commented Aug 26, 2020 •

edited by elek

Loading