Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3903. OzoneRpcClient support batch rename keys. #1150

Merged
merged 15 commits into from
Aug 28, 2020

Conversation

captainzmc
Copy link
Member

What changes were proposed in this pull request?

Currently rename folder is to get all the keys, and then rename them one by one. This makes for poor performance.

HDDS-2939 can able to optimize this part, but at present the HDDS-2939 is slow and still a long way to go. So we optimized the batch operation based on the current interface. We were able to get better performance with this PR before the HDDS-2939 came in.

This PR is a subtask of Batch Rename and first makes OzoneRpcClient Support Batch Rename Keys.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3903

How was this patch tested?

UT has been added.

@captainzmc
Copy link
Member Author

Hi @xiaoyuyao,This PR implementation is consistent with HDDS-3286 batchDelete. And I split batchRename into two subtasks. This one, mainly OM side implementation. Could you help review it?

* @param keyMap The key is new key name nad value is original key OmKeyArgs.
* @throws IOException
*/
void renameKeys(Map<String, OmKeyArgs> keyMap) throws IOException;
Copy link
Contributor

@xiaoyuyao xiaoyuyao Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this to Map<OmKeyArgs, String> so that it will be consistent with the renameKey() API and the batch of the rename does not to have the same volume/bucket name, e.g. /vol1/bucket1/key1->/vol1/bucket2/key2 and /vol2/bucket1/key1->/vol2/bucket1/key2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this to Map<OmKeyArgs, String> to consistent with the renameKey() API.

In addition, the API was added to OzoneBucket and currently only supports renaming keys under the same bucket.
OzoneBucket bucket = volume.getBucket(bucketName);
Map<String, String> keyMap = new HashMap();
keyMap.put(keyName1, newKeyName1);
bucket.renameKeys(keyMap);

.setBucketName(args.getBucketName())
.setKeyName(args.getKeyName())
.setModificationTime(Time.now())
.setDataSize(args.getDataSize()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the datasize here?

*/
public class OmRenameKeyInfo {

private String toKeyName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newKeyInfo.keyName already cover toKeyName?

Preconditions.checkNotNull(renameKeysRequest);

return getOmRequest().toBuilder()
.setRenameKeysRequest(renameKeysRequest.toBuilder())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to set the modification time here.

String objectKey = omMetadataManager.getOzoneKey(volumeName, bucketName,
fromKeyName);
OmKeyInfo omKeyInfo = omMetadataManager.getKeyTable().get(objectKey);
unRenamedKeys.add(omKeyInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unRenamedKeys is not updated inside the loop. Can we add a unit test for the returned unRenamedKeys upon failure?

Copy link
Member Author

@captainzmc captainzmc Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we have put the list of unDeletedKeys and unRenamedKeys into the response. Currently, renameKeys and DeleteKeys in OzoneBucket.java and ClientProtocol.java are still void.
I had added TODO to the OMClientRequest and created a jira(HDDS-3916), and I'll change those separately.

Copy link
Contributor

@xiaoyuyao xiaoyuyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, a few comments added inline...

@captainzmc
Copy link
Member Author

Thanks @xiaoyuyao for the review. I have fixed review issues. Can you take another look?

Comment on lines 257 to 258
auditLog(auditLogger, buildAuditMessage(OMAction.RENAME_KEY, auditMap,
exception, getOmRequest().getUserInfo()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all renamed keys be logged to audit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for the review. Fixed this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but the new code only logs key names, other info eg. volume/bucket is missing. Also, if result is failure, then all keys are logged as "failure", although only the last key failed (and we don't even know which one is last, since auditMap is a hashmap).

Copy link
Member Author

@captainzmc captainzmc Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks adoroszlai for the tip,I'll add a volume/bucket to the key.
On how to locate the failed key. We can see this in the Exception in client. Typically, the failed keys are displayed in the log with an Exception, such as:

17:45:30.550 [OM StateMachine ApplyTransaction Thread - 0] ERROR OMAudit - user=micahzhao | ip=127.0.0.1 | op=RENAME_KEY {dir/key2=dir/file2, dir/file1=dir/file2} | ret=FAILURE
org.apache.hadoop.ozone.om.exceptions.OMException: Key not found /a88fb570-5b5d-43db-81e0-d6597f4ea81f/4ad1e6c3-41c3-439d-8082-ae24a601ba38/dir/file1
at org.apache.hadoop.ozone.om.request.key.OMKeysRenameRequest.validateAndUpdateCache……

@captainzmc captainzmc force-pushed the batch-rename-1 branch 2 times, most recently from f09420c to fe8c69a Compare July 3, 2020 11:22
@captainzmc captainzmc closed this Jul 3, 2020
@captainzmc captainzmc reopened this Jul 3, 2020
// If toKeyName is null, then we need to only delete the fromKeyName
// from KeyTable. This is the case of replay where toKey exists but
// fromKey has not been deleted.
if (deleteFromKeyOnly()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to the cache should be done as part of ValidateAndUpdateCache.
As in HA, we return the response without DB flush to be completed. So, if we don't update the cache before returning the response, subsequent requests might thing still key exists.

Posted PR for DeleteKeys regarding the same issue #1169

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also replay code is now not required, as it is taken care of by HDDS-3354.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bharat for the suggestion, I have taken a close look at the implementation of #1169 with some very nice changes. In this PR I will synchronize the #1169 changes here to make sure they are implemented the same.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment regarding updateCache which i have seen during OMKeysDeleteRequest, same applies to OMKeysRenameRequest also.

@captainzmc
Copy link
Member Author

I have one comment regarding updateCache which i have seen during OMKeysDeleteRequest, same applies to OMKeysRenameRequest also.

Hi @bharatviswa504. Thanks for fixing OMKeysDeleteRequest. I have updated this PR based on DeleteKeys #1195. Make sure RenameKeys and DeleteKeys can be implemented in the same way. Could you please review this PR again?

@captainzmc
Copy link
Member Author

hi @xiaoyuyao @adoroszlai. I have updated this PR based on DeleteKeys #1195, that Bharat had fixed some issues. Could you please review this PR again?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @captainzmc for updating the PR based on #1195. I would like to suggest waiting until that PR is merged before updating this one again, to avoid having to resolve merge conflicts multiple times.

Comment on lines 200 to 233
// Check if this transaction is a replay of ratis logs.
if (isReplay(ozoneManager, toKeyValue, trxnLogIndex)) {

// Check if fromKey is still in the DB and created before this
// replay.
// For example, lets say we have the following sequence of
// transactions.
// Trxn 1 : Create Key1
// Trnx 2 : Rename Key1 to Key2 -> Deletes Key1 and Creates Key2
// Now if these transactions are replayed:
// Replay Trxn 1 : Creates Key1 again it does not exist in DB
// Replay Trxn 2 : Key2 is not created as it exists in DB and
// the request would be deemed a replay. But
// Key1 is still in the DB and needs to be
// deleted.
fromKeyValue = omMetadataManager.getKeyTable().get(fromKey);
if (fromKeyValue != null) {
// Check if this replay transaction was after the fromKey was
// created. If so, we have to delete the fromKey.
if (ozoneManager.isRatisEnabled() &&
trxnLogIndex > fromKeyValue.getUpdateID()) {
acquiredLock =
omMetadataManager.getLock().acquireWriteLock(BUCKET_LOCK,
volumeName, bucketName);
// Add to cache. Only fromKey should be deleted. ToKey already
// exists in DB as this transaction is a replay.
Table<String, OmKeyInfo> keyTable = omMetadataManager
.getKeyTable();
keyTable.addCacheEntry(new CacheKey<>(fromKey),
new CacheValue<>(Optional.absent(), trxnLogIndex));
renameKeyInfoList.add(new OmRenameKeyInfo(
null, fromKeyValue));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replay logic was removed in 90e8211. Can you please update the PR (also to resolve conflicts)?

@captainzmc
Copy link
Member Author

Thanks @captainzmc for updating the PR based on #1195. I would like to suggest waiting until that PR is merged before updating this one again, to avoid having to resolve merge conflicts multiple times.

Thanks @adoroszlai for your suggestion. I’ll update this PR after #1195 merged and the conflict will be resolved together.

@captainzmc
Copy link
Member Author

Thanks @captainzmc for updating the PR based on #1195. I would like to suggest waiting until that PR is merged before updating this one again, to avoid having to resolve merge conflicts multiple times.

Thanks @adoroszlai for your suggestion. I’ll update this PR after #1195 merged and the conflict will be resolved together.

Hi @adoroszlai @xiaoyuyao @bharatviswa504 Now #1195 had merged. I had updated this PR, removed Replay Logic(as #1082) and Resolved Conflicts. Please take another look?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @captainzmc for updating the patch and resolving merge conflicts after lots of recent changes.

@Override
public void renameKeys(Map<OmKeyArgs, String> omKeyArgsMap)
throws IOException {
throw new NotImplementedException("OzoneManager does not require this to " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider changing to UnsupportedOperationException as

NotImplementedException represents the case where the author has yet to implement the logic at this point in the program

.setVolumeName(args.getVolumeName())
.setBucketName(args.getBucketName())
.setKeyName(args.getKeyName())
.setModificationTime(Time.now()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't modification time be updated on OM server-side?

Comment on lines 1290 to 1309
// rename nonexistent key
Map<String, String> keyMap1 = new HashMap();
keyMap1.put(keyName1, keyName2);
keyMap1.put(newKeyName2, keyName2);
try {
bucket.renameKeys(keyMap1);
} catch (OMException ex) {
Assert.assertEquals(PARTIAL_RENAME, ex.getResult());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this belongs to a separate test case.

Comment on lines 1301 to 1303
// Listing all volumes in the cluster feature has to be fixed after HDDS-357.
// TODO: fix this
@Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was recently removed in HDDS-3062, so it seems to be accidentally being added back (due to merge conflict?).

Comment on lines 137 to 147
volumeName = keyArgs.getVolumeName();
bucketName = keyArgs.getBucketName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1255 to 1259
OzoneOutputStream out1 = bucket.createKey(keyName1,
value.getBytes().length, STAND_ALONE,
ONE, new HashMap<>());
out1.write(value.getBytes());
out1.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract to a helper method (eg. createTestKey) to avoid duplication? (Also include following bucket.getKey and following assertEquals calls.)

Comment on lines 1278 to 1283
// old key should not exist
try {
bucket.getKey(keyName1);
} catch (OMException ex) {
Assert.assertEquals(KEY_NOT_FOUND, ex.getResult());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract to helper method (eg. assertKeyRenamed), and also include preceding assertEquals(newKeyName...) call?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @captainzmc for continuing to update the patch.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @captainzmc for changing the protocol. I think it is much improved now. I only have couple of minor comments left.

captainzmc and others added 2 commits July 30, 2020 22:15
…ne/om/request/key/OMKeysRenameRequest.java


avoid use toString.

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
…ne/om/request/key/OMKeysRenameRequest.java


avoid use toString.

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@captainzmc
Copy link
Member Author

Thanks @captainzmc for changing the protocol. I think it is much improved now. I only have couple of minor comments left.

Thanks for @adoroszlai 's feedback. Updated PR and Accepted suggestions.

fromKeyValue.setModificationTime(Time.now());

acquiredLock =
omMetadataManager.getLock().acquireWriteLock(BUCKET_LOCK,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should acquire lock and release lock at end of the operation.

Lets say this thread checked the fromKey and toKey and other delete thread acquire the lock and deleted the fromKey and this thread waiting for lock, once delete completes we update the cache and rename it to toKey.

So, now we renamed a deleted Key.

And also there can be other scenarios like commitKey committed toKey and this thread assumes there is no such Key.

To avoid these kinds of scenarios check of Key from Table and add to Response should be done holding lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @bharatviswa504 . Agreed and updated this PR to fix the problem.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment, Overall LGTM.
Thank You @captainzmc for updating the patch.

Just a question will this change cause any compatibility issues? (As this is new API, only new client code will use it, just seeing if I missed anything)

cc @avijayanhwx

@captainzmc
Copy link
Member Author

Just a question will this change cause any compatibility issues? (As this is new API, only new client code will use it, just seeing if I missed anything)

Thanks for @bharatviswa504 's review. RenameKeys is a new API and does not cause compatibility issues. Older clients can continue to use renameKey.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing with this approach is, we might soon need HDDS-2939, as in non-HA when batch delete is happening, this might block/affect other write operations.

In your tests, the test is performed with the only rename happening or there any other write requests happening.

As this is the same problem with batch delete also, I am fine with it. (These kinds of operations will become a problem in HA, as it is a single thread executor. I think @xiaoyuyao also has brought up this during the review of the HDDS-3930 review.

Overall changes LGTM. (As HDDS-2939 will solve the rename for the directory, I am okay to get this in.)

@captainzmc
Copy link
Member Author

captainzmc commented Aug 10, 2020

hi @adoroszlai @bharatviswa504 Is there any progress here?

@arp7
Copy link
Contributor

arp7 commented Aug 10, 2020

The downside of the new API is it will hold the lock for a long time. This will have a significant perf impact on concurrent operations and that will come back to bite us. I think the correct long term fix for this is HDDS-2939 which will support atomic rename and delete operations.

Copy link
Contributor

@arp7 arp7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@captainzmc is it a viable option to wait for HDDS-2939 which will add atomic rename capability to OM?

@captainzmc
Copy link
Member Author

@captainzmc is it a viable option to wait for HDDS-2939 which will add atomic rename capability to OM?

Thanks @arp7 for being able to discuss this PR.
Yes, HDDS-2939 may be a good solution to the above problem. But HDDS-2939 still has a long way to merge into master. Currently our cluster use Rename and Delete old directories, each time taking a long time. So we modified the existing interface to get better performance. This can be seen as an interim solution.
We talked about this in the addition of batchDelete.

@xiaoyuyao
Copy link
Contributor

I'm +1 to merge this patch as-is and we can revisit this when HDDS-2939 is merged.

@arp7 we can mitigate the long lock holding issue by adding a limit on the max batch size (e.g., default to 1000). This can be added as follow up JIRA for both batch rename and delete.

@arp7
Copy link
Contributor

arp7 commented Aug 28, 2020

Thanks @xiaoyuyao, let's get this in. Can we file a follow up jira for the periodic lock release? HDDS-2939 will eventually implement atomic and efficient rename however it will take some time.

@bharatviswa504
Copy link
Contributor

bharatviswa504 commented Aug 28, 2020

When ratis is enabled, periodic lock release on the server is of no help, as we have a single thread executor. (It can help readers, but not writers)

Only the batch size from the client should be reduced that will help when ratis is enabled in OM. Right now that batch is defaulted to 1000.

@xiaoyuyao
Copy link
Contributor

bq. When ratis is enabled, periodic lock release on the server is of no help, as we have a single thread executor. (It can help readers, but not writers)

Good point. I agree with that. Blocking readers are not good for throughput as well. Let's open a separate to tune the default batch size to minimize the perf impact on readers and non-HA cases.

I will merge this one shortly. Thanks @captainzmc for the patience/contribution and all for the reviews and discussion.

@xiaoyuyao xiaoyuyao merged commit d34ab29 into apache:master Aug 28, 2020
@bharatviswa504
Copy link
Contributor

Good point. I agree with that. Blocking readers are not good for throughput as well. Let's open a separate to tune the >default batch size to minimize the perf impact on readers and non-HA cases.

I have not understood What do we want to change.

  1. Default Batch size controlled in client or have another parameter on server to release lock?

rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Sep 11, 2020
* master: (26 commits)
  HDDS-4167. Acceptance test logs missing if fails during cluster startup (apache#1366)
  HDDS-4121. Implement OmMetadataMangerImpl#getExpiredOpenKeys. (apache#1351)
  HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. (apache#1154)
  HDDS-4077. Incomplete OzoneFileSystem statistics (apache#1329)
  HDDS-3903. OzoneRpcClient support batch rename keys. (apache#1150)
  HDDS-4151. Skip the inputstream while offset larger than zero in s3g (apache#1354)
  HDDS-4147. Add OFS to FileSystem META-INF (apache#1352)
  HDDS-4137. Turn on the verbose mode of safe mode check on testlib (apache#1343)
  HDDS-4146. Show the ScmId and ClusterId in the scm web ui. (apache#1350)
  HDDS-4145. Bump version to 1.1.0-SNAPSHOT on master (apache#1349)
  HDDS-4109. Tests in TestOzoneFileSystem should use the existing MiniOzoneCluster (apache#1316)
  HDDS-4149. Implement OzoneFileStatus#toString (apache#1356)
  HDDS-4153. Increase default timeout in kubernetes tests (apache#1357)
  HDDS-2411. add a datanode chunk validator fo datanode chunk generator (apache#1312)
  HDDS-4140. Auto-close /pending pull requests after 21 days of inactivity (apache#1344)
  HDDS-4152. Archive container logs for kubernetes check (apache#1355)
  HDDS-4056. Convert OzoneAdmin to pluggable model (apache#1285)
  HDDS-3972. Add option to limit number of items displaying through ldb tool. (apache#1206)
  HDDS-4068. Client should not retry same OM on network connection failure (apache#1324)
  HDDS-4062. Non rack aware pipelines should not be created if multiple racks are alive. (apache#1291)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants