Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() #1454

Merged
merged 6 commits into from
Oct 9, 2020

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Reduce the number of getCurrentUser() and getTokens() calls performed during some ContainerProtocolCalls operations. This is achieved by getting the tokens once in BlockInputStream and BlockOutputStream initialization, and passing them to getBlock, putBlock, readChunk, writeChunk calls.

https://issues.apache.org/jira/browse/HDDS-4285

How was this patch tested?

Verified that time required for the repro unit test is improved.

Without the patch:

[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 112.829 s - in org.apache.hadoop.ozone.client.io.TestKeyOutputStreamUnit

With the patch:

[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.293 s - in org.apache.hadoop.ozone.client.io.TestKeyOutputStreamUnit

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/runs/1181129921

@adoroszlai adoroszlai self-assigned this Sep 29, 2020
@adoroszlai adoroszlai changed the title HDDS-4285. Read is slow due to the frequent usage of UGI.getCurrentUser Call() HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() Sep 29, 2020
Copy link
Contributor

@linyiqun linyiqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me, @adoroszlai .
One minor comment below, : ).

@@ -84,7 +86,8 @@ private ContainerProtocolCalls() {
* @throws IOException if there is an I/O error while performing the call
*/
public static GetBlockResponseProto getBlock(XceiverClientSpi xceiverClient,
DatanodeBlockID datanodeBlockID) throws IOException {
DatanodeBlockID datanodeBlockID,
Collection<Token<? extends TokenIdentifier>> tokens) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the javadoc comment of this method as we new added param tokens here?
The same suggestion for methods we changed below:

  • putBlockAsync
  • readChunk
  • writeChunkAsync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @linyiqun for the suggestion. Would

@param tokens list of tokens the current user has, possibly including a token for this block

be OK, or do you have a better description for this param?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , above description looks good to me. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly encode the pass in token for the specific block id without additional token selection.

@ChenSammi
Copy link
Contributor

@adoroszlai thanks for identify this performance issue. I'm not quite familar with tokens, since it's a user token, suppose it has a much longer liveness than input and output stream object instance, right?

@adoroszlai
Copy link
Contributor Author

thanks for identify this performance issue

Credit goes to @elek who reported the problem, see HDDS-4285.

I'm not quite familar with tokens

Neither am I, but we have some doc.

since it's a user token, suppose it has a much longer liveness than input and output stream object instance, right?

It is a block token, specific to the block being written/read. I think it's short-lived.

https://github.com/apache/hadoop-ozone/blob/d1ac423b3fe35c5d363b348108512440f08ee5f1/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/token/OzoneBlockTokenIdentifier.java#L41-L45

@linyiqun
Copy link
Contributor

linyiqun commented Sep 30, 2020

since it's a user token, suppose it has a much longer liveness than input and output stream object instance, right?

A good point that @ChenSammi mentioned, there is a setting to decide the expired time of block token and it's a long time as 1 day. This should be an enough time for client to read/write block data.

  <property>
    <name>hdds.block.token.expiry.time</name>
    <value>1d</value>
    <tag>OZONE, HDDS, SECURITY, TOKEN</tag>
    <description>
      Default value for expiry time of block token. This
      setting supports multiple time unit suffixes as described in
      dfs.heartbeat.interval. If no suffix is specified, then milliseconds is
      assumed.
    </description>
  </property>

No need to renew or revoke a block access token. when a cached block access token expires, the client simply get a new one. Block access token should be cached only in memory and never write to disk.

From this comment of OzoneBlockTokenIdentifier, current change also looks safe.

In additional, I'm +1 for current change. Thanks @adoroszlai for addressing the comment.

@@ -193,10 +196,11 @@ public synchronized void initialize() throws IOException {
if (token != null) {
UserGroupInformation.getCurrentUser().addToken(token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just pass down the token from the caller directly and bypass the UGI addToken/getTokens completely?

Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for working on this. The change LGTM overall. Just have few comments added inline.

@@ -84,7 +86,8 @@ private ContainerProtocolCalls() {
* @throws IOException if there is an I/O error while performing the call
*/
public static GetBlockResponseProto getBlock(XceiverClientSpi xceiverClient,
DatanodeBlockID datanodeBlockID) throws IOException {
DatanodeBlockID datanodeBlockID,
Collection<Token<? extends TokenIdentifier>> tokens) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly encode the pass in token for the specific block id without additional token selection.

@adoroszlai
Copy link
Contributor Author

Should we just pass down the token from the caller directly and bypass the UGI addToken/getTokens completely?
We can directly encode the pass in token for the specific block id without additional token selection.

Thanks @xiaoyuyao for the comments. I was also wondering about this, but I didn't have enough background info to decide. If you feel comfortable with bypassing the UGI for the block token, I'm fine with making this simplification.

@adoroszlai adoroszlai added enhancement New feature or request performance labels Oct 7, 2020
@adoroszlai
Copy link
Contributor Author

Thanks @xiaoyuyao for the suggestion to directly pass down the block token. I have updated the patch accordingly.

@ChenSammi
Copy link
Contributor

@adoroszlai , thanks for the explanation. +1.

@xiaoyuyao
Copy link
Contributor

Thanks @adoroszlai for the update. LGTM, +1.

@xiaoyuyao xiaoyuyao merged commit 35cc6b0 into apache:master Oct 9, 2020
@adoroszlai adoroszlai deleted the HDDS-4285 branch October 9, 2020 21:24
@adoroszlai
Copy link
Contributor Author

Thanks @xiaoyuyao for reviewing and committing it, @ChenSammi and @linyiqun for the review, and @elek for finding the issue.

errose28 added a commit to errose28/ozone that referenced this pull request Oct 14, 2020
* master: (23 commits)
  HDDS-4122. Implement OM Delete Expired Open Key Request and Response (apache#1435)
  HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replica reports (apache#1488)
  Remove extra serialization from getBlockID (apache#1470)
  HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (apache#1436)
  HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() (apache#1454)
  HDDS-4312. findbugs check succeeds despite compile error (apache#1476)
  HDDS-4311. Type-safe config design doc points to OM HA (apache#1477)
  HDDS-3814. Drop a column family through debug cli tool (apache#1083)
  HDDS-3728. Bucket space: check quotaUsageInBytes when write key and allocate block. (apache#1458)
  HDDS-4316. Upgrade to angular 1.8.0 due to CVE-2020-7676 (apache#1481)
  HDDS-4325. Incompatible return codes from Ozone getconf -confKey (apache#1485). Contributed by Doroszlai, Attila.
  HDDS-4309. Fix inconsistency in recon config keys starting with recon and not ozone (apache#1478)
  HDDS-4310: Ozone getconf broke the compatibility (apache#1475)
  HDDS-4298. Use an interface in Ozone client instead of XceiverClientManager (apache#1460)
  HDDS-4280. Document notable configurations for Recon. (apache#1448)
  HDDS-4156. add hierarchical layout to Chinese doc (apache#1368)
  HDDS-4242. Copy PrefixInfo proto to new project hadoop-ozone/interface-storage (apache#1444)
  HDDS-4264. Uniform naming conventions of Ozone Shell Options. (apache#1447)
  HDDS-4271. Avoid logging chunk content in Ozone Insight (apache#1466)
  HDDS-4299. Display Ratis version with ozone version (apache#1464)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
4 participants