Allow Users to Obtain Execution Node CPU and Memory Capacities #762

beeankha · 2021-07-29T14:13:48Z

The CPU/memory capacity checks implemented in this PR are based off of the get_cpu_capacity and get_mem_capacity functions found in awx/main/utils/common.py:
https://github.com/ansible/awx/blob/devel/awx/main/utils/common.py#L702-L749

Since psutil was removed from Ansible Runner last year, multiprocessing and resource were utilized in order to surface the relevant information.

Now, the command ansible-runner worker capacity can be run, with the resulting output as shown below:

shanemcd · 2021-07-29T16:16:25Z

ansible_runner/__main__.py

+            cpu = get_cpu_capacity()
+            mem = get_mem_capacity()
+            if cpu or mem > 0:
+                print("\nExecution Node Info\n"


How are you planning to parse this data? It might be easier to just make this print json.

fosterseth · 2021-07-29T16:17:38Z

ansible_runner/utils/capacity.py

+
+def get_cpu_capacity():
+    # `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html
+    forkcpu = 4


should we return raw cpu count here, and then on the control plane side calculate the capacity based on the user settings for SYSTEM_TASK_FORKS_CPU?

https://github.com/ansible/awx/blob/d89719c7409528006a65378c3e6ae1a4b2666ec0/awx/main/utils/common.py#L719

samdoran

It would be nice to have unit tests for the new functions.

samdoran · 2021-07-29T18:00:31Z

ansible_runner/utils/capacity.py

+def get_cpu_capacity():
+    # `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html
+    forkcpu = 4
+    cpu_capacity = multiprocessing.cpu_count() * forkcpu


This isn't necessarily the number of CPUs available to the worker. It may be more accurate to use os.sched_getaffinity(0) especially If the goal is reporting resources available to the worker. Depending on the hypervisor or container platform, it may display the host resources not the guest/container resources.

I confirmed that it's more accurate to use os.sched_getaffinity(0).

> docker run --rm -it \ --cpuset-cpus 0-1 \ quay.io/samdoran/fedora34-ansible:latest \ python3 -c 'import os, multiprocessing, resource; \ print("Affinity: %s" % len(os.sched_getaffinity(0))); \ print("Multiprocessing: %s" % multiprocessing.cpu_count())' Affinity: 2 Multiprocessing: 40

Granted, the accuracy depends on how the container was invoked. If the --cpus flag was used, then it reports the number of CPUs on the container host and not the units available to the container. Trying to get accurate information inside containers is always challenging.

samdoran · 2021-07-29T19:22:48Z

ansible_runner/utils/capacity.py

+def get_mem_capacity():
+    # `resource` info: https://docs.python.org/3/library/resource.html
+    byte_denom = 1024
+    mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 


I think this is probably an accurate way to get the memory available. I'm doing some testing to confirm.

This one is more tricky. I haven't been able to find a way to detect limited memory from within the container. I imagine the most accurate way would be inspecting the cgroup from /proc/1/cgroup, but I didn't find anything definitive.

For some background, this would replace the logic we have in AWX:

for CPU:

https://github.com/ansible/awx/blob/747231d3500d86ba07006a6927498d8ec31bf930/awx/main/utils/common.py#L716

and for memory:

https://github.com/ansible/awx/blob/747231d3500d86ba07006a6927498d8ec31bf930/awx/main/utils/common.py#L748

I've dug enough into the psutil implementation to say that it just gives the value from /proc/meminfo (for Linux). We can't use the old logic here, because we just removed psutil as a dependency of ansible-runner, with much enthusiasm for it.

I only mean to say - if our implementation is as bad as the old psutil one, it's not a regression. Doesn't mean we can't improve, but that's where we're coming from, because this is for AWX capacity-based job allocation.

AlanCoding · 2021-08-02T18:12:55Z

ansible_runner/utils/capacity.py

+import resource
+
+
+def get_cpu_capacity():


We are talking about having this return "raw data", as opposed to the capacity number as it will appear in AWX /api/v2/instances/

Since that's the case, I would prefer to use a method name other than get_cpu_capacity. Call it get_cpu_count or something like that to make it more clear. That will also help to allow us to import this freely in the AWX code.

… outputs to a json file

beeankha · 2021-08-03T13:46:10Z

For now, a worker_info directory gets created wherever the ansible-runner worker --worker-info command is run from, and the files that contain the version, cpu and mem capacity info are generated like so:

I can hardcode a filename instead if we want to only have one file in the worker_info directory at a time, to possibly make it easier to pick up the most current worker info text (or not generate a directory/file at all).

shanemcd · 2021-08-04T15:51:55Z

ansible_runner/__main__.py

+            if cpu or mem > 0:
+                base = Path('worker_info')
+                info_file = str(uuid4().hex)
+                jsonpath = base / info_file


I had no idea you could do this. 👍

beeankha · 2021-08-04T16:08:28Z

Latest commit (cdc3372) makes it so that the data doesn't output to a file anymore, it instead prints to the terminal and is in YAML format:

$ ansible-runner worker --worker-info
{CPU Capacity: 12, Errors: [], Memory Capacity: 32615876, Version: 2.0.0.0a4.dev61}

If the meminfo file can't be found, an error will surface and get appended to a list that is under the key error:

$ ansible-runner worker --worker-info
{CPU Capacity: 12, Errors: ['The /proc/meminfo file could not found, memory capacity
      undiscoverable.'], Memory Capacity: null, Version: 2.0.0.0a4.dev61}

The changes in this commit also enable the total memory capacity to be discovered, vs the available memory capacity.

… and surface the error if it comes up

ansible-zuul

LGTM!

Cherry pick some things from devel into release_2.0 This is basically everything new on devel except for the code in #762 These things applied cleanly and: Seem likely to cause conflicts next time we need to cherry-pick something Are largely aesthetical (not new features or bug fixes) Reviewed-by: None <None>

Enable checking of execution node mem and cpu capacities

b1b11bf

beeankha marked this pull request as draft July 29, 2021 14:22

shanemcd reviewed Jul 29, 2021

View reviewed changes

fosterseth reviewed Jul 29, 2021

View reviewed changes

samdoran reviewed Jul 29, 2021

View reviewed changes

AlanCoding reviewed Aug 2, 2021

View reviewed changes

Update worker info command to be a flag that requires no argument and…

4424f07

… outputs to a json file

shanemcd reviewed Aug 4, 2021

View reviewed changes

shanemcd approved these changes Aug 4, 2021

View reviewed changes

Load data in yaml format, wrap memory capacity discovery in exception…

1176c41

… and surface the error if it comes up

shanemcd approved these changes Aug 4, 2021

View reviewed changes

beeankha marked this pull request as ready for review August 4, 2021 18:45

jladdjr approved these changes Aug 4, 2021

View reviewed changes

jladdjr added the gate label Aug 4, 2021

ansible-zuul bot approved these changes Aug 4, 2021

View reviewed changes

shanemcd added gate and removed gate labels Aug 4, 2021

ansible-zuul bot merged commit 66ba47f into ansible:devel Aug 4, 2021

AlanCoding mentioned this pull request Aug 6, 2021

Use the ansible-runner worker --worker-info to perform execution node capacity checks ansible/awx#10825

Merged

shanemcd mentioned this pull request Aug 9, 2021

Cherry pick some things from devel into release_2.0 #779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

beeankha commented Jul 29, 2021 •

edited by shanemcd

shanemcd Jul 29, 2021

fosterseth Jul 29, 2021

samdoran left a comment

samdoran Jul 29, 2021

samdoran Jul 29, 2021

samdoran Jul 29, 2021

samdoran Jul 29, 2021

samdoran Jul 29, 2021

AlanCoding Jul 30, 2021

AlanCoding Aug 2, 2021

beeankha commented Aug 3, 2021 •

edited

shanemcd Aug 4, 2021

beeankha commented Aug 4, 2021

ansible-zuul bot left a comment

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

Conversation

beeankha commented Jul 29, 2021 • edited by shanemcd

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samdoran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beeankha commented Aug 3, 2021 • edited

Choose a reason for hiding this comment

beeankha commented Aug 4, 2021

ansible-zuul bot left a comment

Choose a reason for hiding this comment

beeankha commented Jul 29, 2021 •

edited by shanemcd

beeankha commented Aug 3, 2021 •

edited