-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Users to Obtain Execution Node CPU and Memory Capacities #762
Conversation
ansible_runner/__main__.py
Outdated
cpu = get_cpu_capacity() | ||
mem = get_mem_capacity() | ||
if cpu or mem > 0: | ||
print("\nExecution Node Info\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you planning to parse this data? It might be easier to just make this print json.
ansible_runner/utils/capacity.py
Outdated
|
||
def get_cpu_capacity(): | ||
# `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html | ||
forkcpu = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return raw cpu count here, and then on the control plane side calculate the capacity based on the user settings for SYSTEM_TASK_FORKS_CPU
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have unit tests for the new functions.
ansible_runner/utils/capacity.py
Outdated
def get_cpu_capacity(): | ||
# `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html | ||
forkcpu = 4 | ||
cpu_capacity = multiprocessing.cpu_count() * forkcpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily the number of CPUs available to the worker. It may be more accurate to use os.sched_getaffinity(0)
especially If the goal is reporting resources available to the worker. Depending on the hypervisor or container platform, it may display the host resources not the guest/container resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that it's more accurate to use os.sched_getaffinity(0)
.
> docker run --rm -it \
--cpuset-cpus 0-1 \
quay.io/samdoran/fedora34-ansible:latest \
python3 -c 'import os, multiprocessing, resource; \
print("Affinity: %s" % len(os.sched_getaffinity(0))); \
print("Multiprocessing: %s" % multiprocessing.cpu_count())'
Affinity: 2
Multiprocessing: 40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, the accuracy depends on how the container was invoked. If the --cpus
flag was used, then it reports the number of CPUs on the container host and not the units available to the container. Trying to get accurate information inside containers is always challenging.
ansible_runner/utils/capacity.py
Outdated
def get_mem_capacity(): | ||
# `resource` info: https://docs.python.org/3/library/resource.html | ||
byte_denom = 1024 | ||
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably an accurate way to get the memory available. I'm doing some testing to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is more tricky. I haven't been able to find a way to detect limited memory from within the container. I imagine the most accurate way would be inspecting the cgroup
from /proc/1/cgroup
, but I didn't find anything definitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some background, this would replace the logic we have in AWX:
for CPU:
and for memory:
I've dug enough into the psutil implementation to say that it just gives the value from /proc/meminfo
(for Linux). We can't use the old logic here, because we just removed psutil as a dependency of ansible-runner, with much enthusiasm for it.
I only mean to say - if our implementation is as bad as the old psutil one, it's not a regression. Doesn't mean we can't improve, but that's where we're coming from, because this is for AWX capacity-based job allocation.
ansible_runner/utils/capacity.py
Outdated
import resource | ||
|
||
|
||
def get_cpu_capacity(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are talking about having this return "raw data", as opposed to the capacity
number as it will appear in AWX /api/v2/instances/
Since that's the case, I would prefer to use a method name other than get_cpu_capacity
. Call it get_cpu_count
or something like that to make it more clear. That will also help to allow us to import this freely in the AWX code.
… outputs to a json file
For now, a I can hardcode a filename instead if we want to only have one file in the |
ansible_runner/__main__.py
Outdated
if cpu or mem > 0: | ||
base = Path('worker_info') | ||
info_file = str(uuid4().hex) | ||
jsonpath = base / info_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no idea you could do this. 👍
Latest commit (cdc3372) makes it so that the data doesn't output to a file anymore, it instead prints to the terminal and is in YAML format:
If the
The changes in this commit also enable the total memory capacity to be discovered, vs the available memory capacity. |
… and surface the error if it comes up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Cherry pick some things from devel into release_2.0 This is basically everything new on devel except for the code in #762 These things applied cleanly and: Seem likely to cause conflicts next time we need to cherry-pick something Are largely aesthetical (not new features or bug fixes) Reviewed-by: None <None>
Connecting AWX Issue ansible/awx#10693
The CPU/memory capacity checks implemented in this PR are based off of the
get_cpu_capacity
andget_mem_capacity
functions found inawx/main/utils/common.py
:https://github.com/ansible/awx/blob/devel/awx/main/utils/common.py#L702-L749
Since
psutil
was removed from Ansible Runner last year,multiprocessing
andresource
were utilized in order to surface the relevant information.Now, the command
ansible-runner worker capacity
can be run, with the resulting output as shown below: