Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose #1214

Merged
merged 8 commits into from Sep 17, 2020

Conversation

maobaolong
Copy link
Member

What changes were proposed in this pull request?

When I profile the s3g performance, i find in some log of XceiverClientGrpc, I cannot find the related datanode ip, so i use datanode instead of the UUID and networkFullPath.

Also, i add a debug level log to show the request has been executed and output the try counts.

What is the link to the Apache JIRA

HDDS-3981

How was this patch tested?

No need, all changes only related to log.

@@ -339,6 +343,11 @@ private XceiverClientReply sendCommandWithRetry(
// in case these don't exist for the specific datanode.
reply.addDatanode(dn);
responseProto = sendCommandAsync(request, dn).getResponse().get();
if (LOG.isDebugEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sendCommandAsync() below we already have a LOG.debug(cmdType, Dn). Can you move this there to avoid duplicate debug log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoyuyao I hope so, but I think it would be difficult, because, i want to log the retry index and cost time.
when the sendCommandAsync return back, the client still doesn't get the result from datanode. the result return until the get return.

If i have something wrong, please correct me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if we could move the LOG for latency around Line 464 where the container op latency metrics are updated. This way, we can piggyback LOG with the existing metrics update.

metrics.addContainerOpsLatency(request.getCmdType(),
System.nanoTime() - requestTime);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaoyuyao Thanks you for your reply and suggestion. I tried to add the LOG into the L464, like following

              public void onNext(ContainerCommandResponseProto value) {
                replyFuture.complete(value);
                metrics.decrPendingContainerOpsMetrics(request.getCmdType());
                long cost = System.nanoTime() - requestTime;
                metrics.addContainerOpsLatency(request.getCmdType(),
                    cost);
                if (LOG.isDebugEnabled()) {
                  LOG.debug("Executed command {} on datanode {}, cost = {}, "
                          + "retryCount = {}" + ", cmdType = {}", request, dn,
                      cost, index, request.getCmdType());
                }
                semaphore.release();
              }

But, there are some question

  • It cannot print the Index which stand for the retried datanode index of the datanodeList.
  • I should add the same log logic to the onError also? Otherwise, when we met error, the onNext won't be executed.

As index can be calculate from the log context, I drop it.

Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, a few comments inline...

@maobaolong
Copy link
Member Author

@xiaoyuyao Thanks to review my PR, i reply to your comments, please check it again, thanks.

@@ -433,7 +442,7 @@ public XceiverClientReply sendCommandAsync(
UUID dnId = dn.getUuid();
if (LOG.isDebugEnabled()) {
LOG.debug("Send command {} to datanode {}",
request.getCmdType(), dn.getNetworkFullPath());
request.getCmdType(), dn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, I think we have already print the same debug message on Line 338-340 from the caller. Print just the IP address should be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you are right, the redundancy log content is useless, I've change it to dn.getIpAddress().

@maobaolong
Copy link
Member Author

@xiaoyuyao Thank you for the review suggestion, PTAL.

@elek
Copy link
Member

elek commented Sep 7, 2020

@bharatviswa504 What do we need to make it possible to merge this commit?

Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maobaolong for the update. One more comment added inline.

@maobaolong
Copy link
Member Author

@xiaoyuyao Thanks @xiaoyuyao for the suggestion, I pushed a new commit to address your comment, I reply your comment and ask some question. PTAL.

@xiaoyuyao
Copy link
Contributor

Thanks @maobaolong for the update. LGTM, +1. Will merge it shortly.

@xiaoyuyao xiaoyuyao merged commit 68d1ab0 into apache:master Sep 17, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Sep 28, 2020
* master:
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
errose28 added a commit to errose28/ozone that referenced this pull request Sep 28, 2020
* HDDS-4122-remove-code-consolidation: (21 commits)
  Restore files that had deduplicated code from master
  Revert other delete request/response files back to their original states on master
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants