Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CN performance stabilization tweaks #4145

Merged
merged 3 commits into from
Oct 17, 2022
Merged

Conversation

SidSethi
Copy link
Contributor

@SidSethi SidSethi commented Oct 17, 2022

Description

  1. Reduce maxWaitingJobs for recurringSyncQueue and updateReplicaSetQueue by 10x
  2. Reduce queue history (num completed/failed jobs) for multiple queues from 100k to 1k
  3. set default num workers to half num cores
  4. Add ability to pause recurringSyncQueue and updateReplicaSetQueue based on envvars

Tests

automated sufficient
also tested manually on staging node

Monitoring - How will this change be monitored? Are there sufficient logs / alerts?

Easy to monitor all above via bull dashboard, grafana bull dashboard, and grafana profiling dashboard

@@ -48,7 +48,7 @@ class ClusterUtils {

// This is called `cpus()` but it actually returns the # of logical cores, which is possibly higher than # of physical cores if there's hyperthreading
const logicalCores = cpus().length
return config.get('expressAppConcurrency') || logicalCores
return (config.get('expressAppConcurrency') || logicalCores) / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking this but w/e

Suggested change
return (config.get('expressAppConcurrency') || logicalCores) / 2
return config.get('expressAppConcurrency') || (logicalCores / 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, good point

@@ -48,7 +48,7 @@ class ClusterUtils {

// This is called `cpus()` but it actually returns the # of logical cores, which is possibly higher than # of physical cores if there's hyperthreading
const logicalCores = cpus().length
return config.get('expressAppConcurrency') || logicalCores
return (config.get('expressAppConcurrency') || logicalCores) / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if this should Math.ceil for machines with 1 core so it's not 0.5 which truncates to 0? approving for either way since none of our local, staging, or prod nodes need to worry about that scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, added

@SidSethi SidSethi changed the title CN SM tweaks CN performance stabilization tweaks Oct 17, 2022
@SidSethi SidSethi merged commit a641a26 into master Oct 17, 2022
@SidSethi SidSethi deleted the ss-slow-down-cn-statemachine branch October 17, 2022 22:26
SidSethi added a commit that referenced this pull request Oct 17, 2022
SidSethi added a commit that referenced this pull request Oct 17, 2022
SidSethi added a commit that referenced this pull request Oct 17, 2022
dmanjunath pushed a commit that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants