Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4122. Implement OM Delete Expired Open Key Request and Response #1435

Merged
merged 74 commits into from
Oct 13, 2020

Conversation

errose28
Copy link
Contributor

@errose28 errose28 commented Sep 17, 2020

What changes were proposed in this pull request?

Implement OM request and response for moving keys from the open key table to the deleted table. These will be used as part of parent jira HDDS-4120 to implement the open key cleanup service. This pull request also moves some duplicate code between the existing OM key(s) delete request/response and the new open keys delete request/response to shared areas, and refactors the OM Key(s) delete request/response to use this shared code.

Volume and bucket level byte usage will eventually need to be updated by the new open keys delete request/response, but this is not done in this pull request. The current method used to do this in HDDS-541 can lead to DB inconsistency. When this is resolved in HDDS-4308, it can be added to the open key request/responses.

What is the link to the Apache JIRA

HDDS-4122

How was this patch tested?

Unit tests were added for the new OMOpenKeysDeleteRequest and OMOpenKeysDeleteResponse classes.

* master: (26 commits)
  HDDS-4167. Acceptance test logs missing if fails during cluster startup (apache#1366)
  HDDS-4121. Implement OmMetadataMangerImpl#getExpiredOpenKeys. (apache#1351)
  HDDS-3867. Extend the chunkinfo tool to display information from all nodes in the pipeline. (apache#1154)
  HDDS-4077. Incomplete OzoneFileSystem statistics (apache#1329)
  HDDS-3903. OzoneRpcClient support batch rename keys. (apache#1150)
  HDDS-4151. Skip the inputstream while offset larger than zero in s3g (apache#1354)
  HDDS-4147. Add OFS to FileSystem META-INF (apache#1352)
  HDDS-4137. Turn on the verbose mode of safe mode check on testlib (apache#1343)
  HDDS-4146. Show the ScmId and ClusterId in the scm web ui. (apache#1350)
  HDDS-4145. Bump version to 1.1.0-SNAPSHOT on master (apache#1349)
  HDDS-4109. Tests in TestOzoneFileSystem should use the existing MiniOzoneCluster (apache#1316)
  HDDS-4149. Implement OzoneFileStatus#toString (apache#1356)
  HDDS-4153. Increase default timeout in kubernetes tests (apache#1357)
  HDDS-2411. add a datanode chunk validator fo datanode chunk generator (apache#1312)
  HDDS-4140. Auto-close /pending pull requests after 21 days of inactivity (apache#1344)
  HDDS-4152. Archive container logs for kubernetes check (apache#1355)
  HDDS-4056. Convert OzoneAdmin to pluggable model (apache#1285)
  HDDS-3972. Add option to limit number of items displaying through ldb tool. (apache#1206)
  HDDS-4068. Client should not retry same OM on network connection failure (apache#1324)
  HDDS-4062. Non rack aware pipelines should not be created if multiple racks are alive. (apache#1291)
  ...
* HDDS-4122-abstract-om-response:
  Move common code from key delete requests into abstract class
  Create first draft of AbstractOMKeyDeleteResponse
…unit-tests

* HDDS-4122-unit-tests-new-openkey-proto:
  All unit tests pass with new proto structure
  Update proto layout and incorporate changes to OMOpenKeyDeleteRequest
  Add protos to better represent open keys to delete
The tests uncover a failure to delete keys from the open key table.
…be fully recreated from OmKeyInfo

ClientID is not stored in OmKeyInfo, but is needed for the open key's full name as stored in the DB.
* HDDS-4122-response-unit-tests:
  Add documnetation to OMOpenKeyDeleteResponse unit tests
  Pass key names to OMResponse for open keys, since their names cannot be fully recreated from OmKeyInfo
  Implement unit tests for OMOpenKeyDeleteResponse
* HDDS-4122-unit-tests:
  Add documnetation to OMOpenKeyDeleteResponse unit tests
  Pass key names to OMResponse for open keys, since their names cannot be fully recreated from OmKeyInfo
  Implement unit tests for OMOpenKeyDeleteResponse
  All unit tests pass with new proto structure
  Update proto layout and incorporate changes to OMOpenKeyDeleteRequest
  Add protos to better represent open keys to delete
  Comment out unfinished audit log and metrics code for testing
  First implementation of unit tests for OMOpenKeyDeleteRequest
  Refactor testing utils to easier manipulate keys
  Add helper methods needed to write unit tests
* HDDS-4122-log-and-metrics:
  Add unit test for OMOpenKeyDeleteRequest metrics, and fix metrics bugs
  Implement metrics for OMOpenKeyDeleteRequest
  Add logging to OMOpenKeyDeleteRequest
* HDDS-4122-volume-usage:
  Update docs and method names in unit tests
  Update documentation and method names in TestOMOpenKeyDeleteRequest
  Add passing test for volume byte usage update
  Move creation of blocks for keyinfo to TestOMRequestUtils
  Remove uneeded exception in method signature
  Add volume update test to unit tests for OMOpenKeyDeleteResponse
  Add todo comments with testing plans
…tes on master

They no longer call the shared helper methods for operations.
* master:
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  HDDS-4241. Support HADOOP_TOKEN_FILE_LOCATION for Ozone token CLI. (apache#1422)
Removes calls to new helper methods.
* HDDS-4122-remove-code-consolidation: (21 commits)
  Restore files that had deduplicated code from master
  Revert other delete request/response files back to their original states on master
  HDDS-4102. Normalize Keypath for lookupKey. (apache#1328)
  HDDS-4263. ReplicatiomManager shouldn't consider origin node Id for CLOSED containers. (apache#1438)
  HDDS-4282. Improve the emptyDir syntax (apache#1450)
  HDDS-4194. Create a script to check AWS S3 compatibility (apache#1383)
  HDDS-4270. Add more reusable byteman scripts to debug ofs/o3fs performance (apache#1443)
  HDDS-2660. Create insight point for datanode container protocol (apache#1272)
  HDDS-3297. Enable TestOzoneClientKeyGenerator. (apache#1442)
  HDDS-4324. Add important comment to ListVolumes logic (apache#1417)
  HDDS-4236. Move "Om*Codec.java" to new project hadoop-ozone/interface-storage (apache#1424)
  HDDS-4254. Bucket space: add usedBytes and update it when create and delete key. (apache#1431)
  HDDS-2766. security/SecuringDataNodes.md (apache#1175)
  HDDS-4206. Attempt pipeline creation more frequently in acceptance tests (apache#1389)
  HDDS-4233. Interrupted exeception printed out from DatanodeStateMachine (apache#1416)
  HDDS-3947: Sort DNs for client when the key is a file for #getFileStatus #listStatus APIs (apache#1385)
  HDDS-3102. ozone getconf command should use the GenericCli parent class (apache#1410)
  HDDS-3981. Add more debug level log to XceiverClientGrpc for debug purpose (apache#1214)
  HDDS-4255. Remove unused Ant and Jdiff dependency versions (apache#1433)
  HDDS-4247. Fixed log4j usage in some places (apache#1426)
  ...
@errose28 errose28 marked this pull request as ready for review September 28, 2020 20:04
Copy link
Contributor

@avijayanhwx avijayanhwx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @errose28. Clean implementation on this. I am +1 on these changes as it is. Let's wait for a review from the OM team.

long bytesUsed = 0;
int keyFactor = omKeyInfo.getFactor().getNumber();
OmKeyLocationInfoGroup keyLocationGroup =
omKeyInfo.getLatestVersionLocations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point to update the used bytes while doing this cleanup. I am wondering what this would mean with multiple key version support in the future. We do not seem to store the "version" of the current open key.

@bharatviswa504
Copy link
Contributor

I will take a look at it today.
Thanks, @avijayanhwx for tagging.

List<OpenKeyBucket> submittedOpenKeyBucket =
deleteOpenKeysRequest.getOpenKeysPerBucketList();

long numSubmittedOpenKeys = submittedOpenKeyBucket.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of streams, use for loop and compute, as request execution is in hot code path.
I see a few recent jira's to not to use streams and helped perf improvement.
Can we use good old for loop here?

// keys deleted.
// The volume args object being updated is a reference from the
// cache, so this serves as a cache update.
if (volumeArgs != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we need a volume lock here, as we are updating the bytesUsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are safe because the bytes used value is stored in a thread safe LongAdder internally. See the original PR where this was introduced. If there is still an issue with this approach, then most of the request classes will need to be modified after HDDS-4053. We should discuss further, as this is really an issue with the design already introduced in master for HDDS-4053 rather than this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume byte usage updates will be removed from the open keys delete request and response classes. See this comment.

private void subtractUsedBytes(OmVolumeArgs volumeArgs,
Collection<OmKeyInfo> keyInfos) {

long quotaReleased = keyInfos.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here avoid stream here.

if (volumeArgs != null) {
// If we already encountered the volume, it was a reference to
// the same object from the cache, so this will update it.
modifiedVolumes.put(volumeName, volumeArgs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also here getting a cached object will cause issues.
If double buffer flush has not flushed this to DB, and other thread uses same volumeArgs reference and update volumeArgs, we will be updating to DB inconsistent state.

So getVolumeInfo should use Table#get API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, is this the execution you are talking about?

  1. Request1 deletes key1 from volume1 in cache.
  2. Request2 deletes key2 from volume1 in cache.
  3. Request1 sets cached VolumeArgs object volArgs.bytesUsed -= key1.bytesUsed.
    • divergence 1: The cache shows key1 and key2 as deleted, but cache byte usage only reflects key1's deletion.
  4. Request2 sets cached VolumeArgs object volArgs.bytesUsed -= key2.bytesUsed.
    • At this point, byte usage in the cache is consistent with the keys it shows as deleted.
  5. Response1 is processed, committing volArgs and the deletion of key1 to the DB.
    • divergence 2: the DB shows only key1 deleted, but volume byte usage has been set as if both key1 and key2 were deleted.
  6. Response2 is processed, committing volArgs to the DB again, and committing the deletion of key2 to the DB.
    • Now the keys deleted and bytes used align in the DB.

IIRC the entire volume table is stored in memory and only persisted to the DB to save state. Reads only happen from the in memory cache for volume metadata. In this case, divergence 2 will never be detected by callers since it only happens at the DB level. divergence 1 my exist briefly and be detected by callers. Again, this is really an issue with all requests modified in HDDS-4053 and not just this PR. We should discuss to determine whether the slight inconsistency warrants a whole volume lock on all requests that modify byte usage.

Copy link
Contributor

@bharatviswa504 bharatviswa504 Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, what I mean here is by the time we process request1, as we add the same object to double buffer, and if other thread processing request2 and updating it, there might be a chance of updating DB state also (Technically this should happen after adding response to double buffer)

Cache is for holding in flight updates if it is not committed to DB, I see no issues with that, this is by design.

Diveregence 1 should not exist, if volume locks are held.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/HDDS-2344 Jira but here just value updating (Might not be bringing ConcurrentModificationException, but it can provide some context)

Copy link
Contributor Author

@errose28 errose28 Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @bharatviswa504. I now see that divergence 2 in the above example poses an issue in the event of an OM crash happening between steps 5 and 6. This will cause the byte usage update to be applied twice in the DB after OM restart. Volume byte usage updates will be removed from the open key requests and responses. Since this is really a larger problem with all requests/responses operating in this way under HDDS-541, we can add the byte usage updates when a solution is developed for all requests/responses as part of HDDS-4308.

result = Result.FAILURE;
exception = ex;
omClientResponse =
new OMKeyDeleteResponse(createErrorOMResponse(omResponse, exception));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMKeyDeleteResponse -> OMOpenKeysDeleteResponse

* @param keyInfo
* @return if empty true, else false.
*/
private boolean isKeyEmpty(@Nullable OmKeyInfo keyInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks some of the logic is common for OMKeyDeleteResponse and OMOpenKeysDeleteResponse like isKeyEmpty and deleteFromTable can be used from OMKeyDeleteResponse.
Can we consolidate them and use this AbstractOMKeyDeleteResponse as base class for both of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea of creating this abstract class was to eventually consolidate the duplicate code between OMOpenKeysDeleteResponse, OMKeyDeleteResponse, and OMKeysDeleteResponse. I had originally refactored the other classes as well to use this code, but since HDDS-451 (quota support) is moving along at a brisk pace, I could not keep up with the merge conflicts as the other response classes kept changing, and decided it was better to do this in a later PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like development on the key(s) delete request and response classes has taken a break. I have refactored them to use these shared methods now.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank You @errose28 for the contribution and nice to see very extensive test coverage.
I have a few comments, overall approach LGTM.

Accdientally deleted the volume and bucket info update code from
OMKeysDeleteResponse when refactoring earlier.
* origin/HDDS-4122: (51 commits)
  Rename method getBytesUsed -> sumBlockLengths
  Restore files that had deduplicated code from master
  Revert other delete request/response files back to their original states on master
  Make plural of keys consistent in file and class names
  Fix checkstyle errors
  Update docs and method names in unit tests
  Update documentation and method names in TestOMOpenKeyDeleteRequest
  Add passing test for volume byte usage update
  Move creation of blocks for keyinfo to TestOMRequestUtils
  Remove uneeded exception in method signature
  Add volume update test to unit tests for OMOpenKeyDeleteResponse
  Rename method for updating volume args, and remove check in response for volume existing
  Fix cache update for volume info in OMOpenKeyDeleteRequest
  Remove unused imports
  Add check for volume existance before updating volume byte usage
  Fix unit test for OMKeysDeleteResponse that expected empty keys to be in the delete table
  Add table cleanup annotation to AbstractOMKeyDeleteResponse
  Add DeleteOpenKeys as a non-readonly operation
  Add todo comments with testing plans
  Only update volume quota if volume still exists.
  ...
Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment.
Once that is addressed good to commit.

* {@code fromTable} to the batch operation {@code batchOperation}. The
* batch operation is not committed, so no changes are persisted to disk.
*/
protected void addDeletionToBatch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Can we merge these two functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually related to a mistake I made in OMKeysDeleteResponse. The original implementation had one trxnLogIndex it used for all the keys. All other calls to this method are using the updateID of the keyInfo provided as the trxnLogIndex. If the way I am doing it currently (OMKeysDeleteResponse uses the updateID of each key as its trxnLogIndex instead of one value for all keys deleted), then I can remove the overload. If not, I can fix OMKeysDeleteResponse to call the overload, giving it identical behavior to its original implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we used to use updateID for detecting replay of transaction. Now we are not using updateID any more.
But from my understanding updateID should be set with transactionIndex even before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. A closer look at the OMKeysDeleteRequest looks like this was happening anyways. Instead of setting the update ID for the key info to be the trxnLogIndex and submitting the key info to the response, it was just passing the trxnLogIndex separately with the key info to the response. I will update OMKeysDeleteRequest/Response to be consistent with the other request/responses in how they do this, and remove the overload of this method.

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@bharatviswa504 bharatviswa504 merged commit 7db0ea8 into apache:master Oct 13, 2020
@bharatviswa504
Copy link
Contributor

Thank You @errose28 for the contribution and @avijayanhwx for the review

errose28 added a commit to errose28/ozone that referenced this pull request Oct 14, 2020
* master: (23 commits)
  HDDS-4122. Implement OM Delete Expired Open Key Request and Response (apache#1435)
  HDDS-4336. ContainerInfo does not persist BCSID (sequenceId) leading to failed replica reports (apache#1488)
  Remove extra serialization from getBlockID (apache#1470)
  HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (apache#1436)
  HDDS-4285. Read is slow due to frequent calls to UGI.getCurrentUser() and getTokens() (apache#1454)
  HDDS-4312. findbugs check succeeds despite compile error (apache#1476)
  HDDS-4311. Type-safe config design doc points to OM HA (apache#1477)
  HDDS-3814. Drop a column family through debug cli tool (apache#1083)
  HDDS-3728. Bucket space: check quotaUsageInBytes when write key and allocate block. (apache#1458)
  HDDS-4316. Upgrade to angular 1.8.0 due to CVE-2020-7676 (apache#1481)
  HDDS-4325. Incompatible return codes from Ozone getconf -confKey (apache#1485). Contributed by Doroszlai, Attila.
  HDDS-4309. Fix inconsistency in recon config keys starting with recon and not ozone (apache#1478)
  HDDS-4310: Ozone getconf broke the compatibility (apache#1475)
  HDDS-4298. Use an interface in Ozone client instead of XceiverClientManager (apache#1460)
  HDDS-4280. Document notable configurations for Recon. (apache#1448)
  HDDS-4156. add hierarchical layout to Chinese doc (apache#1368)
  HDDS-4242. Copy PrefixInfo proto to new project hadoop-ozone/interface-storage (apache#1444)
  HDDS-4264. Uniform naming conventions of Ozone Shell Options. (apache#1447)
  HDDS-4271. Avoid logging chunk content in Ozone Insight (apache#1466)
  HDDS-4299. Display Ratis version with ozone version (apache#1464)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants