Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale celery memory use to system's available memory #2407

Merged
merged 9 commits into from May 11, 2023

Conversation

IsaacMilarky
Copy link
Contributor

@IsaacMilarky IsaacMilarky commented May 11, 2023

Description

  • Reduce/Increase the amount of processes celery is allowed to use depending on the amount of available memory on the system. Data is gathered with psutil
  • Hard cap the maximum amount of processes for each worker process to the previous default values

Notes for Reviewers
These changes should help to preserve some memory on systems with smaller amounts of it without hampering collection on larger memory instances.

Signed commits

  • Yes, I signed my commits.

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what the max percent of available memory celery will reserve. I’m guessing ~20% makes sense. We need to assume database and rabbitmq need mememory tooo. I just am not sure what this does total percentage wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. Does celery reserve only a percentage of available memory. We wouldn’t want it to take it all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It for sure won't take up all of the memory. I don't know the exact proportion at the moment given the custom celery settings we have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some checks to ensure there’s some kind of cap? Is the underlying issue that celery isn’t grabbing enough memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a flat 30% cap of the total memory to keep it simple.

Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what the max percent of available memory celery will reserve. I’m guessing ~20% makes sense. We need to assume database and rabbitmq need mememory tooo. I just am not sure what this does total percentage wise.

Copy link
Contributor

@ABrain7710 ABrain7710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this. I just think we should subtract 2 from the max_process_estimate to make it more accurate

augur/application/cli/backend.py Show resolved Hide resolved
…otal virtual memory

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
#Get a subset of the maximum procesess available using a ratio, not exceeding a maximum value
def determine_worker_processes(ratio,maximum):
return min(round(max_process_estimate * ratio),maximum)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to either add a min of 1 to this, or return an error if that the system does not have enough memory to run Augur because a system with 8 GB of RAM with Augur configured to use 30 percent of virtual memory will currently run 2 scheduling processes, 1 core process, 0 secondary processes, and 0 facade processes. Note: With a 30 percent usage around 8.2 GB is what is needed to for 2 scheduling processes and 1 of all the others. At 20 percent usage 12.3 GB is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bump it up to at least 40% and then add a minimum process of 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgoggins was suggesting to drop the limit to 20% to account for the rabbitmq, redis, and database memory usage. I also don't think 40% is a reasonable number for larger servers where multiple instances are being run. With this, I think it should be configurable as a number between 0 and 1. Then we can just raise an error if not enough memory is allocated to it, and notify them to change the config. This allows for the flexibility of small systems where 1 augur instance is the only thing running, and larger systems where there are many instances running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a hard limit on the maximum level of processes each worker can have for larger instances. I think for now a default cap of 20-30% would be fine. I think it is a good idea to put the percentage in the config though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't notice that before, this sounds good to me

Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
Signed-off-by: Isaac Milarsky <imilarsky@gmail.com>
@IsaacMilarky IsaacMilarky merged commit 49645a2 into dev May 11, 2023
1 check passed
@IsaacMilarky IsaacMilarky deleted the celery-process-scaling branch May 11, 2023 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants