HDDS-9208. Add queue limit in ReplicationServer.#5216
HDDS-9208. Add queue limit in ReplicationServer.#5216ChenSammi merged 1 commit intoapache:masterfrom
Conversation
|
@z-bb , the patch looks good. Wait for the CI result. |
|
@z-bb , can you do a master rebase? The failed UT looks like be fixed by master branch. |
okey, I will do it |
|
@z-bb , do you think the failed "EC-recovery :: Wait for replication to succeed" is relevant? |
@ChenSammi I'm not sure if it's related. I'm going to look into this, see the log is waiting for the container to replicate for more than 5 minutes |
sumitagrawl
left a comment
There was a problem hiding this comment.
@z-bb Thanks for working over this, have a query
There was a problem hiding this comment.
How this default value 4096 is derived for queue limit?
There was a problem hiding this comment.
@sumitagrawl, Do you think this has something to do with the failure of ci
There was a problem hiding this comment.
@z-bb not related to CI failure, I mean what should be appropriate default value for nettyServer handling upload/download container ... as this is given to NettyServer
There was a problem hiding this comment.
I guess 4096 this value is picked for the moment based on experience, @z-bb , right?
And Guangbao, do you remember the the biggest queue length you have ever seen before without the queue limitation imposed?
There was a problem hiding this comment.
I guess 4096 this value is picked for the moment based on experience, @z-bb , right? And Guangbao, do you remember the the biggest queue length you have ever seen before without the queue limitation imposed?
@ChenSammi Yes, based on experience, we plan to add a monitor for the queue length. I will confirm this value later.
There was a problem hiding this comment.
@z-bb , I see. We can revisit the default value of this "queue.limit" property once you got more info from the monitor history.
* master: (55 commits) HDDS-9236. Fix snapdiff output for key modification (apache#5258) HDDS-8013. Freon S3 bucket creation test should use unique prefix (apache#5282) HDDS-9228. Poor S3G read performance (apache#5274) HDDS-8941. Disable flaky TestContainerBalancerTask#testDelayedStart HDDS-1159. Remove flaky tag from TestContainerStateManagerIntegration (apache#5291) HDDS-6077. Remove flaky tag from TestAddRemoveOzoneManager (apache#5290) HDDS-6610. Remove support for recursive volume list/delete using ozone fs command (apache#5264) HDDS-7752. GetS3SecretRequest API should not return secret if secret of user already exists (apache#4538) HDDS-9173. Invalidate snapshot cache once snapshot gets purged (apache#5248) HDDS-8920. Ozone is supporting unicode volume and bucket names, unintentionally (apache#5276) HDDS-9275. LegacyReplicationManager: Delete excess unhealthy with force=true (apache#5286) HDDS-9264. Execute EC acceptance test in secure environment (apache#5279) HDDS-9161. Recon Pipelines datanode columns search does not work (apache#5213) HDDS-9107. Reduce the granularity of Container locks for BlockDeletingService (apache#5149) HDDS-9270. Create a script to list all acceptance test splits (apache#5281) HDDS-9220. Let ContainerBalancerConfiguration#toString print more info (apache#5228) HDDS-9208. Add queue limit in ReplicationServer. (apache#5216) HDDS-9268. [Snapshot] Update list of snapshot apis to include lsDiff details in docs. (apache#5278) HDDS-9234. OM should shutdown immediately if certificate durations are invalid (apache#5243) HDDS-9136. Throw exception when rename fails during moveToTrash. (apache#5253) ...
What changes were proposed in this pull request?
Add queue limit in ReplicationServer
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9208