Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Partially improve support for --cpus from Docker CLI #23747

Merged
merged 2 commits into from
Apr 5, 2019

Conversation

luhenry
Copy link

@luhenry luhenry commented Apr 4, 2019

This focuses on better supporting Docker CLI's parameter --cpus, which limits the amount of CPU time available to the container (ex: 1.8 means 180% CPU time, ie on 2 cores 90% for each core, on 4 cores 45% on each core, etc.) in the case of the ThreadPool and in the case of calculating the CPU limit.

@luhenry
Copy link
Author

luhenry commented Apr 4, 2019

This spins off part of #23398 for the parts which are not controversial and do not change the API behavior.

/cc @tmds

In the case where `--cpus` is set to a value very close to the smaller
integer (ex: 1.499999999), it would previously be rounded down. This
would mean that the runtime would only try to take advantage of 1 CPU in
this example, leading to underutilization.

By rounding it up, we augment the pressure on the OS threads scheduler,
but even in the worst case scenario (`--cpus=1.000000001` previously
being rounded to 1, now rounded to 2), we do not observe any
overutilization of the CPU leading to performance degradation.
By making sure we do take the CPU limits into account when computing the
CPU busy time, we ensure we do not have the various heuristic of the
threadpool competing with each other: one trying to allocate more
threads to increase the CPU busy time, and the other one trying to
allocate less threads because there adding more doesn't improve the
throughput.

Let's take the example of a system with 20 cores, and a docker container
with `--cpus=2`. It would mean the total CPU usage of the machine is
2000%, while the CPU limit is 200%. Because the OS scheduler would never
allocate more than 200% of its total CPU budget to the docker container,
the CPU busy time would never get over 200%. From `PAL_GetCpuBusyTime`,
this would indicate that we threadpool threads are mostly doing non-CPU
bound work, meaning we could launch more threads.
cpu_count = quota / period;
if (cpu_count < UINT32_MAX)
cpu_count = (double) quota / period;
if (cpu_count < UINT32_MAX - 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be '<=' otherwise if cpu_count is equal to (UINT32_MAX - 1) then this function will return UINT32_MAX instead of (UINT32_MAX - 1), right?
In general, I agree with @tmds here that rounding up before the 'if' statement might make the code simple. Would something like this work or I missed some details?

// Calculate cpu count based on quota and round it up
cpu_count = (double) quota / period  + 0.999999999;
*val = (cpu_count < UINT32_MAX) ? (uint32_t)cpu_count : UINT32_MAX;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to return UINT32_MAX. I didn't adopt @tmds approach mostly to reduce code change noise.

@luhenry luhenry merged commit aea3b1a into dotnet:master Apr 5, 2019
@bourquep
Copy link

In which dotnet core runtime version is this PR available?

@jkotas
Copy link
Member

jkotas commented Jul 10, 2019

It is available in the .NET Core 3.0 preview6. https://dotnet.microsoft.com/download/dotnet-core/3.0

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…r#23747)

* Round up the value of the CPU limit

In the case where `--cpus` is set to a value very close to the smaller
integer (ex: 1.499999999), it would previously be rounded down. This
would mean that the runtime would only try to take advantage of 1 CPU in
this example, leading to underutilization.

By rounding it up, we augment the pressure on the OS threads scheduler,
but even in the worst case scenario (`--cpus=1.000000001` previously
being rounded to 1, now rounded to 2), we do not observe any
overutilization of the CPU leading to performance degradation.

* Teach the ThreadPool of CPU limits

By making sure we do take the CPU limits into account when computing the
CPU busy time, we ensure we do not have the various heuristic of the
threadpool competing with each other: one trying to allocate more
threads to increase the CPU busy time, and the other one trying to
allocate less threads because there adding more doesn't improve the
throughput.

Let's take the example of a system with 20 cores, and a docker container
with `--cpus=2`. It would mean the total CPU usage of the machine is
2000%, while the CPU limit is 200%. Because the OS scheduler would never
allocate more than 200% of its total CPU budget to the docker container,
the CPU busy time would never get over 200%. From `PAL_GetCpuBusyTime`,
this would indicate that we threadpool threads are mostly doing non-CPU
bound work, meaning we could launch more threads.


Commit migrated from dotnet/coreclr@aea3b1a
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants