Skip to content

HDDS-9208. Add queue limit in ReplicationServer.#5216

Merged
ChenSammi merged 1 commit intoapache:masterfrom
z-bb:HDDS-9208
Sep 13, 2023
Merged

HDDS-9208. Add queue limit in ReplicationServer.#5216
ChenSammi merged 1 commit intoapache:masterfrom
z-bb:HDDS-9208

Conversation

@z-bb
Copy link
Contributor

@z-bb z-bb commented Aug 24, 2023

What changes were proposed in this pull request?

Add queue limit in ReplicationServer

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9208

@ChenSammi
Copy link
Contributor

@z-bb , the patch looks good. Wait for the CI result.

@ChenSammi
Copy link
Contributor

@z-bb , can you do a master rebase? The failed UT looks like be fixed by master branch.

@z-bb
Copy link
Contributor Author

z-bb commented Aug 28, 2023

@z-bb , can you do a master rebase? The failed UT looks like be fixed by master branch.

okey, I will do it

@ChenSammi
Copy link
Contributor

@z-bb , do you think the failed "EC-recovery :: Wait for replication to succeed" is relevant?

@z-bb
Copy link
Contributor Author

z-bb commented Aug 28, 2023

@z-bb , do you think the failed "EC-recovery :: Wait for replication to succeed" is relevant?

@ChenSammi I'm not sure if it's related. I'm going to look into this, see the log is waiting for the container to replicate for more than 5 minutes

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-bb Thanks for working over this, have a query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this default value 4096 is derived for queue limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumitagrawl, Do you think this has something to do with the failure of ci

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-bb not related to CI failure, I mean what should be appropriate default value for nettyServer handling upload/download container ... as this is given to NettyServer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 4096 this value is picked for the moment based on experience, @z-bb , right?
And Guangbao, do you remember the the biggest queue length you have ever seen before without the queue limitation imposed?

Copy link
Contributor Author

@z-bb z-bb Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 4096 this value is picked for the moment based on experience, @z-bb , right? And Guangbao, do you remember the the biggest queue length you have ever seen before without the queue limitation imposed?

@ChenSammi Yes, based on experience, we plan to add a monitor for the queue length. I will confirm this value later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z-bb , I see. We can revisit the default value of this "queue.limit" property once you got more info from the monitor history.

@ChenSammi
Copy link
Contributor

@z-bb , would you do a master rebase again? There is one EC acceptance test #5260 fixed in master.

@z-bb
Copy link
Contributor Author

z-bb commented Sep 12, 2023

@z-bb , would you do a master rebase again? There is one EC acceptance test #5260 fixed in master.

okey, I have already rebased.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @z-bb .

@ChenSammi ChenSammi merged commit 8e46d4e into apache:master Sep 13, 2023
errose28 added a commit to errose28/ozone that referenced this pull request Sep 14, 2023
* master: (55 commits)
  HDDS-9236. Fix snapdiff output for key modification (apache#5258)
  HDDS-8013. Freon S3 bucket creation test should use unique prefix (apache#5282)
  HDDS-9228. Poor S3G read performance (apache#5274)
  HDDS-8941. Disable flaky TestContainerBalancerTask#testDelayedStart
  HDDS-1159. Remove flaky tag from TestContainerStateManagerIntegration (apache#5291)
  HDDS-6077. Remove flaky tag from TestAddRemoveOzoneManager (apache#5290)
  HDDS-6610. Remove support for recursive volume list/delete using ozone fs command (apache#5264)
  HDDS-7752. GetS3SecretRequest API should not return secret if secret of user already exists (apache#4538)
  HDDS-9173. Invalidate snapshot cache once snapshot gets purged (apache#5248)
  HDDS-8920. Ozone is supporting unicode volume and bucket names, unintentionally (apache#5276)
  HDDS-9275. LegacyReplicationManager: Delete excess unhealthy with force=true (apache#5286)
  HDDS-9264. Execute EC acceptance test in secure environment (apache#5279)
  HDDS-9161. Recon Pipelines datanode columns search does not work (apache#5213)
  HDDS-9107. Reduce the granularity of Container locks for BlockDeletingService (apache#5149)
  HDDS-9270. Create a script to list all acceptance test splits (apache#5281)
  HDDS-9220. Let ContainerBalancerConfiguration#toString print more info (apache#5228)
  HDDS-9208. Add queue limit in ReplicationServer. (apache#5216)
  HDDS-9268. [Snapshot] Update list of snapshot apis to include lsDiff details in docs. (apache#5278)
  HDDS-9234. OM should shutdown immediately if certificate durations are invalid (apache#5243)
  HDDS-9136. Throw exception when rename fails during moveToTrash. (apache#5253)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments