-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() #1454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me, @adoroszlai .
One minor comment below, : ).
@@ -84,7 +86,8 @@ private ContainerProtocolCalls() { | |||
* @throws IOException if there is an I/O error while performing the call | |||
*/ | |||
public static GetBlockResponseProto getBlock(XceiverClientSpi xceiverClient, | |||
DatanodeBlockID datanodeBlockID) throws IOException { | |||
DatanodeBlockID datanodeBlockID, | |||
Collection<Token<? extends TokenIdentifier>> tokens) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the javadoc comment of this method as we new added param tokens here?
The same suggestion for methods we changed below:
- putBlockAsync
- readChunk
- writeChunkAsync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @linyiqun for the suggestion. Would
@param tokens list of tokens the current user has, possibly including a token for this block
be OK, or do you have a better description for this param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai , above description looks good to me. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly encode the pass in token for the specific block id without additional token selection.
@adoroszlai thanks for identify this performance issue. I'm not quite familar with tokens, since it's a user token, suppose it has a much longer liveness than input and output stream object instance, right? |
Credit goes to @elek who reported the problem, see HDDS-4285.
Neither am I, but we have some doc.
It is a block token, specific to the block being written/read. I think it's short-lived. |
A good point that @ChenSammi mentioned, there is a setting to decide the expired time of block token and it's a long time as 1 day. This should be an enough time for client to read/write block data. <property>
<name>hdds.block.token.expiry.time</name>
<value>1d</value>
<tag>OZONE, HDDS, SECURITY, TOKEN</tag>
<description>
Default value for expiry time of block token. This
setting supports multiple time unit suffixes as described in
dfs.heartbeat.interval. If no suffix is specified, then milliseconds is
assumed.
</description>
</property>
From this comment of OzoneBlockTokenIdentifier, current change also looks safe. In additional, I'm +1 for current change. Thanks @adoroszlai for addressing the comment. |
@@ -193,10 +196,11 @@ public synchronized void initialize() throws IOException { | |||
if (token != null) { | |||
UserGroupInformation.getCurrentUser().addToken(token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just pass down the token from the caller directly and bypass the UGI addToken/getTokens completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adoroszlai for working on this. The change LGTM overall. Just have few comments added inline.
@@ -84,7 +86,8 @@ private ContainerProtocolCalls() { | |||
* @throws IOException if there is an I/O error while performing the call | |||
*/ | |||
public static GetBlockResponseProto getBlock(XceiverClientSpi xceiverClient, | |||
DatanodeBlockID datanodeBlockID) throws IOException { | |||
DatanodeBlockID datanodeBlockID, | |||
Collection<Token<? extends TokenIdentifier>> tokens) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly encode the pass in token for the specific block id without additional token selection.
Thanks @xiaoyuyao for the comments. I was also wondering about this, but I didn't have enough background info to decide. If you feel comfortable with bypassing the UGI for the block token, I'm fine with making this simplification. |
Thanks @xiaoyuyao for the suggestion to directly pass down the block token. I have updated the patch accordingly. |
@adoroszlai , thanks for the explanation. +1. |
Thanks @adoroszlai for the update. LGTM, +1. |
Thanks @xiaoyuyao for reviewing and committing it, @ChenSammi and @linyiqun for the review, and @elek for finding the issue. |
* master: (23 commits) HDDS-4122. Implement OM Delete Expired Open Key Request and Response (apache#1435) HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replica reports (apache#1488) Remove extra serialization from getBlockID (apache#1470) HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (apache#1436) HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() (apache#1454) HDDS-4312. findbugs check succeeds despite compile error (apache#1476) HDDS-4311. Type-safe config design doc points to OM HA (apache#1477) HDDS-3814. Drop a column family through debug cli tool (apache#1083) HDDS-3728. Bucket space: check quotaUsageInBytes when write key and allocate block. (apache#1458) HDDS-4316. Upgrade to angular 1.8.0 due to CVE-2020-7676 (apache#1481) HDDS-4325. Incompatible return codes from Ozone getconf -confKey (apache#1485). Contributed by Doroszlai, Attila. HDDS-4309. Fix inconsistency in recon config keys starting with recon and not ozone (apache#1478) HDDS-4310: Ozone getconf broke the compatibility (apache#1475) HDDS-4298. Use an interface in Ozone client instead of XceiverClientManager (apache#1460) HDDS-4280. Document notable configurations for Recon. (apache#1448) HDDS-4156. add hierarchical layout to Chinese doc (apache#1368) HDDS-4242. Copy PrefixInfo proto to new project hadoop-ozone/interface-storage (apache#1444) HDDS-4264. Uniform naming conventions of Ozone Shell Options. (apache#1447) HDDS-4271. Avoid logging chunk content in Ozone Insight (apache#1466) HDDS-4299. Display Ratis version with ozone version (apache#1464) ...
What changes were proposed in this pull request?
Reduce the number of
getCurrentUser()
andgetTokens()
calls performed during someContainerProtocolCalls
operations. This is achieved by getting the tokens once inBlockInputStream
andBlockOutputStream
initialization, and passing them togetBlock
,putBlock
,readChunk
,writeChunk
calls.https://issues.apache.org/jira/browse/HDDS-4285
How was this patch tested?
Verified that time required for the repro unit test is improved.
Without the patch:
With the patch:
Regular CI:
https://github.com/adoroszlai/hadoop-ozone/runs/1181129921