-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional cpu limit to spawned action containers #5443
Add optional cpu limit to spawned action containers #5443
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
This generally looks good to me.
import scala.concurrent.duration.DurationInt | ||
|
||
@RunWith(classOf[JUnitRunner]) | ||
class ContainerPoolConfigTests extends FlatSpec with Matchers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerFactory.scala
Show resolved
Hide resolved
It seems the code is wrongly formatted.
|
Co-authored-by: Dominic Kim <style9595@gmail.com>
def cpuLimit(reservedMemory: ByteSize): Option[Double] = { | ||
userCpus.map(c => { | ||
val containerCpus = c / (userMemory.toBytes / reservedMemory.toBytes) | ||
val roundedContainerCpus = round(containerCpus * roundingMultiplier).toDouble / roundingMultiplier // Only use decimal precision of 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quintenp01
Thank you for your contribution.
It looks good to me 👍
@@ -101,6 +102,7 @@ object DockerContainer { | |||
dnsSearch.flatMap(d => Seq("--dns-search", d)) ++ | |||
dnsOptions.flatMap(d => Seq(dnsOptString, d)) ++ | |||
name.map(n => Seq("--name", n)).getOrElse(Seq.empty) ++ | |||
cpuLimit.map(c => Seq("--cpus", c.toString)).getOrElse(Seq.empty) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wasn't aware that you could do both --cpus
and --cpu-shares
on the docker run. how does this look in practice with how it behaves setting both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--cpu-shares
still provides the weight/priority to cpu cycles for the container, --cpus
just provides the cap.
the tradeoff here should be obvious. this will prevent user functions from bursting to take an entire node's cpu if capacity is available, but that is entirely dependent on what else is running on the node resulting in inconsistent performance for the user. this also results in an uncontrollable noisy neighbor issue where a high cpu function could impact the performance of the overall node and other functions executing on it. it's likely preferable to the user to have a reliable and consistent performance to each execution over cpu bursting unless in a highly controlled cluster where you have control over all functions running. I think this should be considered to be defaulted to be on for the 2.0 major version though I'm sure that would be controversial as that will be extremely hard to apply to public clouds where there's no guarantee this change wouldn't break a user's function with lowered cpu. It would require a secondary deployment for 2.0 at that point where a function user consciously upgrades their function to the 2.0 cluster. (which may already have to be the upgrade model for public clouds to 2.0 anyways with the new scheduler) also the way this was implemented is really clean. I've wanted to do this in the past and my thought didn't include having the config be the number of available cpu cores and rationing that in the exact same way the user-memory config is set. makes it very easy for an operator to understand and modify. |
Codecov Report
@@ Coverage Diff @@
## master #5443 +/- ##
==========================================
- Coverage 76.80% 76.53% -0.28%
==========================================
Files 241 241
Lines 14634 14646 +12
Branches 607 617 +10
==========================================
- Hits 11240 11209 -31
- Misses 3394 3437 +43
... and 6 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -71,6 +71,8 @@ whisk { | |||
prewarm-promotion: false # if true, action can take prewarm container which has bigger memory | |||
memory-sync-interval: 1 second # period to sync memory info to etcd | |||
batch-deletion-size: 10 # batch size for removing containers when disable invoker, too big value may cause docker/k8s overload | |||
# optional setting to specify the total allocatable cpus for all action containers, each container will get a fraction of this proportional to its allocated memory to limit the cpu | |||
# user-cpus: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, could you add a configuration like this?
https://github.com/apache/openwhisk/blob/master/ansible/roles/invoker/tasks/deploy.yml#L304
It would be easier to configure it with ansible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@style95 I just pushed some changes to ansible/roles/invoker/tasks/deploy.yml
and ansible/group_vars/all
. I'm not super familiar with ansible and was trying to keep it optional. Can you let me know how that looks?
@@ -258,6 +258,7 @@ | |||
"CONFIG_whisk_containerFactory_containerArgs_network": "{{ invoker_container_network_name | default('bridge') }}" | |||
"INVOKER_CONTAINER_POLICY": "{{ invoker_container_policy_name | default()}}" | |||
"CONFIG_whisk_containerPool_userMemory": "{{ hostvars[groups['invokers'][invoker_index | int]].user_memory | default(invoker.userMemory) }}" | |||
"CONFIG_whisk_containerPool_userCpus": "{{ invoker.userCpus | default() }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could deploy this branch w/ and w/o this configuration using ansible.
I confirmed that NanoCpus is properly configured in containers according to this config.
...
"CpuShares": 25,
"Memory": 268435456,
"NanoCpus": 0,
...
...
"CpuShares": 25,
"Memory": 268435456,
"NanoCpus": 150000000,
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add optional cpu limit to spawned action containers * conf * formatting * Update core/invoker/src/main/resources/application.conf Co-authored-by: Dominic Kim <style9595@gmail.com> * formatting * formatting * ansible --------- Co-authored-by: Dominic Kim <style9595@gmail.com> (cherry picked from commit 0c27a65)
Description
This change provides the option to limit action container cpu usage proportional to the allocated memory. This helps to prevent an invoker from being overloaded by action cpu usage, and also provides more predictable performance for actions.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: