Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

Merged
merged 3 commits into from
Aug 4, 2021
Merged

Allow Users to Obtain Execution Node CPU and Memory Capacities #762

merged 3 commits into from
Aug 4, 2021

Conversation

beeankha
Copy link
Contributor

@beeankha beeankha commented Jul 29, 2021

Connecting AWX Issue ansible/awx#10693


The CPU/memory capacity checks implemented in this PR are based off of the get_cpu_capacity and get_mem_capacity functions found in awx/main/utils/common.py:
https://github.com/ansible/awx/blob/devel/awx/main/utils/common.py#L702-L749

Since psutil was removed from Ansible Runner last year, multiprocessing and resource were utilized in order to surface the relevant information.

Now, the command ansible-runner worker capacity can be run, with the resulting output as shown below:

Screenshot from 2021-07-28 21-39-50

@beeankha beeankha marked this pull request as draft July 29, 2021 14:22
cpu = get_cpu_capacity()
mem = get_mem_capacity()
if cpu or mem > 0:
print("\nExecution Node Info\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you planning to parse this data? It might be easier to just make this print json.


def get_cpu_capacity():
# `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html
forkcpu = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return raw cpu count here, and then on the control plane side calculate the capacity based on the user settings for SYSTEM_TASK_FORKS_CPU?

https://github.com/ansible/awx/blob/d89719c7409528006a65378c3e6ae1a4b2666ec0/awx/main/utils/common.py#L719

Copy link
Contributor

@samdoran samdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have unit tests for the new functions.

def get_cpu_capacity():
# `multiprocessing` info: https://docs.python.org/3/library/multiprocessing.html
forkcpu = 4
cpu_capacity = multiprocessing.cpu_count() * forkcpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily the number of CPUs available to the worker. It may be more accurate to use os.sched_getaffinity(0) especially If the goal is reporting resources available to the worker. Depending on the hypervisor or container platform, it may display the host resources not the guest/container resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that it's more accurate to use os.sched_getaffinity(0).

> docker run --rm -it \
    --cpuset-cpus 0-1 \
    quay.io/samdoran/fedora34-ansible:latest \
    python3 -c 'import os, multiprocessing, resource; \
    print("Affinity: %s" % len(os.sched_getaffinity(0))); \
    print("Multiprocessing: %s" % multiprocessing.cpu_count())'

Affinity: 2
Multiprocessing: 40

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, the accuracy depends on how the container was invoked. If the --cpus flag was used, then it reports the number of CPUs on the container host and not the units available to the container. Trying to get accurate information inside containers is always challenging.

def get_mem_capacity():
# `resource` info: https://docs.python.org/3/library/resource.html
byte_denom = 1024
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably an accurate way to get the memory available. I'm doing some testing to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is more tricky. I haven't been able to find a way to detect limited memory from within the container. I imagine the most accurate way would be inspecting the cgroup from /proc/1/cgroup, but I didn't find anything definitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some background, this would replace the logic we have in AWX:

for CPU:

https://github.com/ansible/awx/blob/747231d3500d86ba07006a6927498d8ec31bf930/awx/main/utils/common.py#L716

and for memory:

https://github.com/ansible/awx/blob/747231d3500d86ba07006a6927498d8ec31bf930/awx/main/utils/common.py#L748

I've dug enough into the psutil implementation to say that it just gives the value from /proc/meminfo (for Linux). We can't use the old logic here, because we just removed psutil as a dependency of ansible-runner, with much enthusiasm for it.

I only mean to say - if our implementation is as bad as the old psutil one, it's not a regression. Doesn't mean we can't improve, but that's where we're coming from, because this is for AWX capacity-based job allocation.

import resource


def get_cpu_capacity():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are talking about having this return "raw data", as opposed to the capacity number as it will appear in AWX /api/v2/instances/

Since that's the case, I would prefer to use a method name other than get_cpu_capacity. Call it get_cpu_count or something like that to make it more clear. That will also help to allow us to import this freely in the AWX code.

@beeankha
Copy link
Contributor Author

beeankha commented Aug 3, 2021

For now, a worker_info directory gets created wherever the ansible-runner worker --worker-info command is run from, and the files that contain the version, cpu and mem capacity info are generated like so:

Screenshot from 2021-08-03 09-14-26

I can hardcode a filename instead if we want to only have one file in the worker_info directory at a time, to possibly make it easier to pick up the most current worker info text (or not generate a directory/file at all).

if cpu or mem > 0:
base = Path('worker_info')
info_file = str(uuid4().hex)
jsonpath = base / info_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea you could do this. 👍

@beeankha
Copy link
Contributor Author

beeankha commented Aug 4, 2021

Latest commit (cdc3372) makes it so that the data doesn't output to a file anymore, it instead prints to the terminal and is in YAML format:

$ ansible-runner worker --worker-info
{CPU Capacity: 12, Errors: [], Memory Capacity: 32615876, Version: 2.0.0.0a4.dev61}

If the meminfo file can't be found, an error will surface and get appended to a list that is under the key error:

$ ansible-runner worker --worker-info
{CPU Capacity: 12, Errors: ['The /proc/meminfo file could not found, memory capacity
      undiscoverable.'], Memory Capacity: null, Version: 2.0.0.0a4.dev61}

The changes in this commit also enable the total memory capacity to be discovered, vs the available memory capacity.

@beeankha beeankha marked this pull request as ready for review August 4, 2021 18:45
@jladdjr jladdjr added the gate label Aug 4, 2021
Copy link
Contributor

@ansible-zuul ansible-zuul bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shanemcd shanemcd added gate and removed gate labels Aug 4, 2021
@ansible-zuul ansible-zuul bot merged commit 66ba47f into ansible:devel Aug 4, 2021
ansible-zuul bot added a commit that referenced this pull request Aug 9, 2021
Cherry pick some things from devel into release_2.0

This is basically everything new on devel except for the code in #762
These things applied cleanly and:

Seem likely to cause conflicts next time we need to cherry-pick something
Are largely aesthetical (not new features or bug fixes)

Reviewed-by: None <None>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants