-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-3903. OzoneRpcClient support batch rename keys. #1150
Conversation
1976e7d
to
af9aedb
Compare
Hi @xiaoyuyao,This PR implementation is consistent with HDDS-3286 batchDelete. And I split batchRename into two subtasks. This one, mainly OM side implementation. Could you help review it? |
* @param keyMap The key is new key name nad value is original key OmKeyArgs. | ||
* @throws IOException | ||
*/ | ||
void renameKeys(Map<String, OmKeyArgs> keyMap) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this to Map<OmKeyArgs, String> so that it will be consistent with the renameKey() API and the batch of the rename does not to have the same volume/bucket name, e.g. /vol1/bucket1/key1->/vol1/bucket2/key2 and /vol2/bucket1/key1->/vol2/bucket1/key2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make this to Map<OmKeyArgs, String> to consistent with the renameKey() API.
In addition, the API was added to OzoneBucket and currently only supports renaming keys under the same bucket.
OzoneBucket bucket = volume.getBucket(bucketName);
Map<String, String> keyMap = new HashMap();
keyMap.put(keyName1, newKeyName1);
bucket.renameKeys(keyMap);
.setBucketName(args.getBucketName()) | ||
.setKeyName(args.getKeyName()) | ||
.setModificationTime(Time.now()) | ||
.setDataSize(args.getDataSize()).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update the datasize here?
*/ | ||
public class OmRenameKeyInfo { | ||
|
||
private String toKeyName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newKeyInfo.keyName already cover toKeyName?
Preconditions.checkNotNull(renameKeysRequest); | ||
|
||
return getOmRequest().toBuilder() | ||
.setRenameKeysRequest(renameKeysRequest.toBuilder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to set the modification time here.
String objectKey = omMetadataManager.getOzoneKey(volumeName, bucketName, | ||
fromKeyName); | ||
OmKeyInfo omKeyInfo = omMetadataManager.getKeyTable().get(objectKey); | ||
unRenamedKeys.add(omKeyInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unRenamedKeys is not updated inside the loop. Can we add a unit test for the returned unRenamedKeys upon failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although we have put the list of unDeletedKeys and unRenamedKeys into the response. Currently, renameKeys and DeleteKeys in OzoneBucket.java and ClientProtocol.java are still void.
I had added TODO to the OMClientRequest and created a jira(HDDS-3916), and I'll change those separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, a few comments added inline...
Thanks @xiaoyuyao for the review. I have fixed review issues. Can you take another look? |
auditLog(auditLogger, buildAuditMessage(OMAction.RENAME_KEY, auditMap, | ||
exception, getOmRequest().getUserInfo())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all renamed keys be logged to audit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adoroszlai for the review. Fixed this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but the new code only logs key names, other info eg. volume/bucket is missing. Also, if result is failure, then all keys are logged as "failure", although only the last key failed (and we don't even know which one is last, since auditMap
is a hashmap).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks adoroszlai for the tip,I'll add a volume/bucket to the key.
On how to locate the failed key. We can see this in the Exception in client. Typically, the failed keys are displayed in the log with an Exception, such as:
17:45:30.550 [OM StateMachine ApplyTransaction Thread - 0] ERROR OMAudit - user=micahzhao | ip=127.0.0.1 | op=RENAME_KEY {dir/key2=dir/file2, dir/file1=dir/file2} | ret=FAILURE
org.apache.hadoop.ozone.om.exceptions.OMException: Key not found /a88fb570-5b5d-43db-81e0-d6597f4ea81f/4ad1e6c3-41c3-439d-8082-ae24a601ba38/dir/file1
at org.apache.hadoop.ozone.om.request.key.OMKeysRenameRequest.validateAndUpdateCache……
f09420c
to
fe8c69a
Compare
// If toKeyName is null, then we need to only delete the fromKeyName | ||
// from KeyTable. This is the case of replay where toKey exists but | ||
// fromKey has not been deleted. | ||
if (deleteFromKeyOnly()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to the cache should be done as part of ValidateAndUpdateCache.
As in HA, we return the response without DB flush to be completed. So, if we don't update the cache before returning the response, subsequent requests might thing still key exists.
Posted PR for DeleteKeys regarding the same issue #1169
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also replay code is now not required, as it is taken care of by HDDS-3354.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one comment regarding updateCache which i have seen during OMKeysDeleteRequest, same applies to OMKeysRenameRequest also.
bde1daa
to
56e7b56
Compare
Hi @bharatviswa504. Thanks for fixing OMKeysDeleteRequest. I have updated this PR based on DeleteKeys #1195. Make sure RenameKeys and DeleteKeys can be implemented in the same way. Could you please review this PR again? |
hi @xiaoyuyao @adoroszlai. I have updated this PR based on DeleteKeys #1195, that Bharat had fixed some issues. Could you please review this PR again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @captainzmc for updating the PR based on #1195. I would like to suggest waiting until that PR is merged before updating this one again, to avoid having to resolve merge conflicts multiple times.
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
Outdated
Show resolved
Hide resolved
// Check if this transaction is a replay of ratis logs. | ||
if (isReplay(ozoneManager, toKeyValue, trxnLogIndex)) { | ||
|
||
// Check if fromKey is still in the DB and created before this | ||
// replay. | ||
// For example, lets say we have the following sequence of | ||
// transactions. | ||
// Trxn 1 : Create Key1 | ||
// Trnx 2 : Rename Key1 to Key2 -> Deletes Key1 and Creates Key2 | ||
// Now if these transactions are replayed: | ||
// Replay Trxn 1 : Creates Key1 again it does not exist in DB | ||
// Replay Trxn 2 : Key2 is not created as it exists in DB and | ||
// the request would be deemed a replay. But | ||
// Key1 is still in the DB and needs to be | ||
// deleted. | ||
fromKeyValue = omMetadataManager.getKeyTable().get(fromKey); | ||
if (fromKeyValue != null) { | ||
// Check if this replay transaction was after the fromKey was | ||
// created. If so, we have to delete the fromKey. | ||
if (ozoneManager.isRatisEnabled() && | ||
trxnLogIndex > fromKeyValue.getUpdateID()) { | ||
acquiredLock = | ||
omMetadataManager.getLock().acquireWriteLock(BUCKET_LOCK, | ||
volumeName, bucketName); | ||
// Add to cache. Only fromKey should be deleted. ToKey already | ||
// exists in DB as this transaction is a replay. | ||
Table<String, OmKeyInfo> keyTable = omMetadataManager | ||
.getKeyTable(); | ||
keyTable.addCacheEntry(new CacheKey<>(fromKey), | ||
new CacheValue<>(Optional.absent(), trxnLogIndex)); | ||
renameKeyInfoList.add(new OmRenameKeyInfo( | ||
null, fromKeyValue)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replay logic was removed in 90e8211. Can you please update the PR (also to resolve conflicts)?
Thanks @adoroszlai for your suggestion. I’ll update this PR after #1195 merged and the conflict will be resolved together. |
235fdb0
to
2b0d6a6
Compare
aa42088
to
e698a7c
Compare
Hi @adoroszlai @xiaoyuyao @bharatviswa504 Now #1195 had merged. I had updated this PR, removed Replay Logic(as #1082) and Resolved Conflicts. Please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @captainzmc for updating the patch and resolving merge conflicts after lots of recent changes.
@Override | ||
public void renameKeys(Map<OmKeyArgs, String> omKeyArgsMap) | ||
throws IOException { | ||
throw new NotImplementedException("OzoneManager does not require this to " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider changing to UnsupportedOperationException
as
NotImplementedException
represents the case where the author has yet to implement the logic at this point in the program
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysRenameRequest.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
Outdated
Show resolved
Hide resolved
.setVolumeName(args.getVolumeName()) | ||
.setBucketName(args.getBucketName()) | ||
.setKeyName(args.getKeyName()) | ||
.setModificationTime(Time.now()).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't modification time be updated on OM server-side?
// rename nonexistent key | ||
Map<String, String> keyMap1 = new HashMap(); | ||
keyMap1.put(keyName1, keyName2); | ||
keyMap1.put(newKeyName2, keyName2); | ||
try { | ||
bucket.renameKeys(keyMap1); | ||
} catch (OMException ex) { | ||
Assert.assertEquals(PARTIAL_RENAME, ex.getResult()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this belongs to a separate test case.
// Listing all volumes in the cluster feature has to be fixed after HDDS-357. | ||
// TODO: fix this | ||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was recently removed in HDDS-3062, so it seems to be accidentally being added back (due to merge conflict?).
volumeName = keyArgs.getVolumeName(); | ||
bucketName = keyArgs.getBucketName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add support for bucket links. Example:
auditMap
needs to be created beforehand, eg.:
OzoneOutputStream out1 = bucket.createKey(keyName1, | ||
value.getBytes().length, STAND_ALONE, | ||
ONE, new HashMap<>()); | ||
out1.write(value.getBytes()); | ||
out1.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extract to a helper method (eg. createTestKey
) to avoid duplication? (Also include following bucket.getKey
and following assertEquals
calls.)
// old key should not exist | ||
try { | ||
bucket.getKey(keyName1); | ||
} catch (OMException ex) { | ||
Assert.assertEquals(KEY_NOT_FOUND, ex.getResult()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extract to helper method (eg. assertKeyRenamed
), and also include preceding assertEquals(newKeyName...)
call?
18d5e1f
to
6e6fd89
Compare
de460c7
to
e1070e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @captainzmc for continuing to update the patch.
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysRenameRequest.java
Outdated
Show resolved
Hide resolved
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysRenameRequest.java
Outdated
Show resolved
Hide resolved
ad1a789
to
5e3fe89
Compare
5e3fe89
to
9f5fda0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @captainzmc for changing the protocol. I think it is much improved now. I only have couple of minor comments left.
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysRenameRequest.java
Outdated
Show resolved
Hide resolved
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeysRenameRequest.java
Outdated
Show resolved
Hide resolved
…ne/om/request/key/OMKeysRenameRequest.java avoid use toString. Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
…ne/om/request/key/OMKeysRenameRequest.java avoid use toString. Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
Thanks for @adoroszlai 's feedback. Updated PR and Accepted suggestions. |
fromKeyValue.setModificationTime(Time.now()); | ||
|
||
acquiredLock = | ||
omMetadataManager.getLock().acquireWriteLock(BUCKET_LOCK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should acquire lock and release lock at end of the operation.
Lets say this thread checked the fromKey and toKey and other delete thread acquire the lock and deleted the fromKey and this thread waiting for lock, once delete completes we update the cache and rename it to toKey.
So, now we renamed a deleted Key.
And also there can be other scenarios like commitKey committed toKey and this thread assumes there is no such Key.
To avoid these kinds of scenarios check of Key from Table and add to Response should be done holding lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @bharatviswa504 . Agreed and updated this PR to fix the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one comment, Overall LGTM.
Thank You @captainzmc for updating the patch.
Just a question will this change cause any compatibility issues? (As this is new API, only new client code will use it, just seeing if I missed anything)
cc @avijayanhwx
Thanks for @bharatviswa504 's review. RenameKeys is a new API and does not cause compatibility issues. Older clients can continue to use renameKey. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing with this approach is, we might soon need HDDS-2939, as in non-HA when batch delete is happening, this might block/affect other write operations.
In your tests, the test is performed with the only rename happening or there any other write requests happening.
As this is the same problem with batch delete also, I am fine with it. (These kinds of operations will become a problem in HA, as it is a single thread executor. I think @xiaoyuyao also has brought up this during the review of the HDDS-3930 review.
Overall changes LGTM. (As HDDS-2939 will solve the rename for the directory, I am okay to get this in.)
hi @adoroszlai @bharatviswa504 Is there any progress here? |
The downside of the new API is it will hold the lock for a long time. This will have a significant perf impact on concurrent operations and that will come back to bite us. I think the correct long term fix for this is HDDS-2939 which will support atomic rename and delete operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@captainzmc is it a viable option to wait for HDDS-2939 which will add atomic rename capability to OM?
Thanks @arp7 for being able to discuss this PR. |
Thanks @xiaoyuyao, let's get this in. Can we file a follow up jira for the periodic lock release? HDDS-2939 will eventually implement atomic and efficient rename however it will take some time. |
When ratis is enabled, periodic lock release on the server is of no help, as we have a single thread executor. (It can help readers, but not writers) Only the batch size from the client should be reduced that will help when ratis is enabled in OM. Right now that batch is defaulted to 1000. |
bq. When ratis is enabled, periodic lock release on the server is of no help, as we have a single thread executor. (It can help readers, but not writers) Good point. I agree with that. Blocking readers are not good for throughput as well. Let's open a separate to tune the default batch size to minimize the perf impact on readers and non-HA cases. I will merge this one shortly. Thanks @captainzmc for the patience/contribution and all for the reviews and discussion. |
I have not understood What do we want to change.
|
* master: (26 commits) HDDS-4167. Acceptance test logs missing if fails during cluster startup (apache#1366) HDDS-4121. Implement OmMetadataMangerImpl#getExpiredOpenKeys. (apache#1351) HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. (apache#1154) HDDS-4077. Incomplete OzoneFileSystem statistics (apache#1329) HDDS-3903. OzoneRpcClient support batch rename keys. (apache#1150) HDDS-4151. Skip the inputstream while offset larger than zero in s3g (apache#1354) HDDS-4147. Add OFS to FileSystem META-INF (apache#1352) HDDS-4137. Turn on the verbose mode of safe mode check on testlib (apache#1343) HDDS-4146. Show the ScmId and ClusterId in the scm web ui. (apache#1350) HDDS-4145. Bump version to 1.1.0-SNAPSHOT on master (apache#1349) HDDS-4109. Tests in TestOzoneFileSystem should use the existing MiniOzoneCluster (apache#1316) HDDS-4149. Implement OzoneFileStatus#toString (apache#1356) HDDS-4153. Increase default timeout in kubernetes tests (apache#1357) HDDS-2411. add a datanode chunk validator fo datanode chunk generator (apache#1312) HDDS-4140. Auto-close /pending pull requests after 21 days of inactivity (apache#1344) HDDS-4152. Archive container logs for kubernetes check (apache#1355) HDDS-4056. Convert OzoneAdmin to pluggable model (apache#1285) HDDS-3972. Add option to limit number of items displaying through ldb tool. (apache#1206) HDDS-4068. Client should not retry same OM on network connection failure (apache#1324) HDDS-4062. Non rack aware pipelines should not be created if multiple racks are alive. (apache#1291) ...
What changes were proposed in this pull request?
Currently rename folder is to get all the keys, and then rename them one by one. This makes for poor performance.
HDDS-2939 can able to optimize this part, but at present the HDDS-2939 is slow and still a long way to go. So we optimized the batch operation based on the current interface. We were able to get better performance with this PR before the HDDS-2939 came in.
This PR is a subtask of Batch Rename and first makes OzoneRpcClient Support Batch Rename Keys.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-3903
How was this patch tested?
UT has been added.