HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers#4447
Merged
adoroszlai merged 5 commits intoapache:masterfrom Mar 22, 2023
Merged
Conversation
added 5 commits
March 22, 2023 11:45
adoroszlai
approved these changes
Mar 22, 2023
Contributor
adoroszlai
left a comment
There was a problem hiding this comment.
LGTM. The only issue I found is a stale javadoc comment, but to save CI time it can be fixed later in any upcoming patch.
| @@ -22,9 +22,9 @@ | |||
| /** | |||
| * Exception class used to indicate that all sources are overloaded. | |||
Contributor
There was a problem hiding this comment.
nit: stale javadoc comment
errose28
added a commit
to errose28/ozone
that referenced
this pull request
Mar 23, 2023
* master: (43 commits) HDDS-8148. Improve log for Pipeline creation failure (apache#4385) HDDS-7853. Add support for RemoveSCM in SCMRatisServer. (apache#4358) HDDS-8042. Display certificate issuer in cert list command. (apache#4429) HDDS-8189. [Snapshot] renamedKeyTable should only track keys in buckets that has at least one active snapshot. (apache#4436) HDDS-8154. Perf: Reuse Mac instances in S3 token validation (apache#4433) HDDS-8245. Info log for keyDeletingService when nonzero number of keys are deleted. (apache#4451) HDDS-8233. ReplicationManager: Throttle delete container commands from over-replication handlers (apache#4447) HDDS-8220. [Ozone-Streaming] Trigger volume check on IOException in StreamDataChannelBase (apache#4428) HDDS-8173. Fix to remove enrties from RocksDB after container gets deleted. (apache#4445) HDDS-7975. Rebalance acceptance tests (apache#4437) HDDS-8152. Reduce S3 acceptance test setup time (apache#4393) HDDS-8172. ECUnderReplicationHandler should consider commands already sent when processing the container (apache#4435) HDDS-7883. [Snapshot] Accommodate FSO, key renames and implement OMSnapshotPurgeRequest for SnapshotDeletingService (apache#4407) HDDS-8168. Make deadlines inside MoveManager for move commands configurable (apache#4415) HDDS-7918. EC: ECBlockReconstructedStripeInputStream should check for spare replicas before failing an index (apache#4441) HDDS-8222. EndpointBase#getBucket should handle BUCKET_NOT_FOUND (apache#4431) HDDS-8068. Fix Exception: JMXJsonServlet, getting attribute RatisRoles of Hadoop:service=OzoneManager. (apache#4352) HDDS-8139. Datanodes should not drop block delete transactions based on transaction ID (apache#4384) HDDS-8216. EC: OzoneClientConfig is overwritten in ECKeyOutputStream (apache#4425) HDDS-8054. Fix NPE in metrics for failed volume (apache#4340) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Similar to ReplicateContainerCommands, we should limit the number of delete commands queued on a given datanode at any time. This PR will enforce the limit with a static config variable with a view to making this more dynamic later.
This change does not limit any delete container commands sent from the health check chain in RM. It only affects deletes for the Ratis and EC Over Replication Handlers, which should drive the bulk of the deletes.
Note that delete container replicas from the balancer are not throttled. The balancer issues moves in a controlled way, and its deletes are triggered when a replication completes. Therefore its deletes are naturally throttled by the rate of completion of the replicated commands. It will not flood the cluster with deletes like could happen when a couple of dead nodes are brought back into the cluster still with their containers in place.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8233
How was this patch tested?
New unit tests added.