New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose #1214
Conversation
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java
Show resolved
Hide resolved
@@ -339,6 +343,11 @@ private XceiverClientReply sendCommandWithRetry( | |||
// in case these don't exist for the specific datanode. | |||
reply.addDatanode(dn); | |||
responseProto = sendCommandAsync(request, dn).getResponse().get(); | |||
if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In sendCommandAsync() below we already have a LOG.debug(cmdType, Dn). Can you move this there to avoid duplicate debug log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoyuyao I hope so, but I think it would be difficult, because, i want to log the retry index and cost time.
when the sendCommandAsync
return back, the client still doesn't get the result from datanode. the result return until the get
return.
If i have something wrong, please correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if we could move the LOG for latency around Line 464 where the container op latency metrics are updated. This way, we can piggyback LOG with the existing metrics update.
metrics.addContainerOpsLatency(request.getCmdType(),
System.nanoTime() - requestTime);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaoyuyao Thanks you for your reply and suggestion. I tried to add the LOG into the L464, like following
public void onNext(ContainerCommandResponseProto value) {
replyFuture.complete(value);
metrics.decrPendingContainerOpsMetrics(request.getCmdType());
long cost = System.nanoTime() - requestTime;
metrics.addContainerOpsLatency(request.getCmdType(),
cost);
if (LOG.isDebugEnabled()) {
LOG.debug("Executed command {} on datanode {}, cost = {}, "
+ "retryCount = {}" + ", cmdType = {}", request, dn,
cost, index, request.getCmdType());
}
semaphore.release();
}
But, there are some question
- It cannot print the
Index
which stand for the retried datanode index of thedatanodeList
. - I should add the same log logic to the
onError
also? Otherwise, when we met error, theonNext
won't be executed.
As index
can be calculate from the log context, I drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, a few comments inline...
@xiaoyuyao Thanks to review my PR, i reply to your comments, please check it again, thanks. |
@@ -433,7 +442,7 @@ public XceiverClientReply sendCommandAsync( | |||
UUID dnId = dn.getUuid(); | |||
if (LOG.isDebugEnabled()) { | |||
LOG.debug("Send command {} to datanode {}", | |||
request.getCmdType(), dn.getNetworkFullPath()); | |||
request.getCmdType(), dn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, I think we have already print the same debug message on Line 338-340 from the caller. Print just the IP address should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you are right, the redundancy log content is useless, I've change it to dn.getIpAddress()
.
@xiaoyuyao Thank you for the review suggestion, PTAL. |
@bharatviswa504 What do we need to make it possible to merge this commit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maobaolong for the update. One more comment added inline.
@xiaoyuyao Thanks @xiaoyuyao for the suggestion, I pushed a new commit to address your comment, I reply your comment and ask some question. PTAL. |
Thanks @maobaolong for the update. LGTM, +1. Will merge it shortly. |
* master: HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
* HDDS-4122-remove-code-consolidation: (21 commits) Restore files that had deduplicated code from master Revert other delete request/response files back to their original states on master HDDS-4102. Normalize Keypath for lookupKey. (apache#1328) HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438) HDDS-4282. Improve the emptyDir syntax (apache#1450) HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383) HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443) HDDS-2660. Create insight point for datanode container protocol (apache#1272) HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442) HDDS-4324. Add important comment to ListVolumes logic (apache#1417) HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424) HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431) HDDS-2766. security/SecuringDataNodes.md (apache#1175) HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389) HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416) HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385) HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410) HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214) HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433) HDDS-4247. Fixed log4j usage in some places (apache#1426) ...
What changes were proposed in this pull request?
When I profile the s3g performance, i find in some log of XceiverClientGrpc, I cannot find the related datanode ip, so i use datanode instead of the UUID and networkFullPath.
Also, i add a debug level log to show the request has been executed and output the try counts.
What is the link to the Apache JIRA
HDDS-3981
How was this patch tested?
No need, all changes only related to log.