Add optional cpu limit to spawned action containers #5443

quintenp01 · 2023-09-26T20:55:47Z

Description

This change provides the option to limit action container cpu usage proportional to the allocated memory. This helps to prevent an invoker from being overloaded by action cpu usage, and also provides more predictable performance for actions.

Related issue and scope

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

style95

Nice work!
This generally looks good to me.

style95 · 2023-09-26T22:45:25Z

...s/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerPoolConfigTests.scala

+import scala.concurrent.duration.DurationInt
+
+@RunWith(classOf[JUnitRunner])
+class ContainerPoolConfigTests extends FlatSpec with Matchers {


common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerFactory.scala

core/invoker/src/main/resources/application.conf

style95 · 2023-09-26T23:31:21Z

It seems the code is wrongly formatted.

> Task :tests:checkTestScalafmt FAILED
FAILURE: Build failed with an exception.

* What went wrong:

Execution failed for task ':tests:checkTestScalafmt'.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
> Files incorrectly formatted: /home/runner/work/openwhisk/openwhisk/tests/src/test/scala/common/LoggedFunction.scala,/home/runner/work/openwhisk/openwhisk/tests/src/test/scala/org/apache/openwhisk/core/containerpool/v2/test/FunctionPullingContainerProxyTests.scala,/home/runner/work/openwhisk/openwhisk/tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerProxyTests.scala

Co-authored-by: Dominic Kim <style9595@gmail.com>

style95 · 2023-09-26T23:57:47Z

common/scala/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerFactory.scala

+  def cpuLimit(reservedMemory: ByteSize): Option[Double] = {
+    userCpus.map(c => {
+      val containerCpus = c / (userMemory.toBytes / reservedMemory.toBytes)
+      val roundedContainerCpus = round(containerCpus * roundingMultiplier).toDouble / roundingMultiplier // Only use decimal precision of 5


style95

@quintenp01
Thank you for your contribution.
It looks good to me 👍

bdoyle0182 · 2023-09-27T01:56:51Z

.../invoker/src/main/scala/org/apache/openwhisk/core/containerpool/docker/DockerContainer.scala

@@ -101,6 +102,7 @@ object DockerContainer {
      dnsSearch.flatMap(d => Seq("--dns-search", d)) ++
      dnsOptions.flatMap(d => Seq(dnsOptString, d)) ++
      name.map(n => Seq("--name", n)).getOrElse(Seq.empty) ++
+      cpuLimit.map(c => Seq("--cpus", c.toString)).getOrElse(Seq.empty) ++


i wasn't aware that you could do both --cpus and --cpu-shares on the docker run. how does this look in practice with how it behaves setting both?

--cpu-shares still provides the weight/priority to cpu cycles for the container, --cpus just provides the cap.

bdoyle0182 · 2023-09-27T02:12:09Z

the tradeoff here should be obvious. this will prevent user functions from bursting to take an entire node's cpu if capacity is available, but that is entirely dependent on what else is running on the node resulting in inconsistent performance for the user. this also results in an uncontrollable noisy neighbor issue where a high cpu function could impact the performance of the overall node and other functions executing on it. it's likely preferable to the user to have a reliable and consistent performance to each execution over cpu bursting unless in a highly controlled cluster where you have control over all functions running. I think this should be considered to be defaulted to be on for the 2.0 major version though I'm sure that would be controversial as that will be extremely hard to apply to public clouds where there's no guarantee this change wouldn't break a user's function with lowered cpu. It would require a secondary deployment for 2.0 at that point where a function user consciously upgrades their function to the 2.0 cluster. (which may already have to be the upgrade model for public clouds to 2.0 anyways with the new scheduler)

also the way this was implemented is really clean. I've wanted to do this in the past and my thought didn't include having the config be the number of available cpu cores and rationing that in the exact same way the user-memory config is set. makes it very easy for an operator to understand and modify.

codecov-commenter · 2023-09-27T02:17:39Z

Codecov Report

Merging #5443 (3a75379) into master (6375c96) will decrease coverage by 0.28%.
The diff coverage is 92.85%.

❗ Current head 3a75379 differs from pull request most recent head 8e0cf4e. Consider uploading reports for the commit 8e0cf4e to get more accurate results

@@            Coverage Diff             @@
##           master    #5443      +/-   ##
==========================================
- Coverage   76.80%   76.53%   -0.28%     
==========================================
  Files         241      241              
  Lines       14634    14646      +12     
  Branches      607      617      +10     
==========================================
- Hits        11240    11209      -31     
- Misses       3394     3437      +43

Files	Coverage Δ
...penwhisk/core/containerpool/ContainerFactory.scala	`92.30% <100.00%> (+2.83%)`	⬆️
...che/openwhisk/core/yarn/YARNContainerFactory.scala	`87.36% <ø> (ø)`
.../openwhisk/core/containerpool/ContainerProxy.scala	`95.22% <100.00%> (+0.02%)`	⬆️
...sk/core/containerpool/docker/DockerContainer.scala	`98.80% <100.00%> (+0.01%)`	⬆️
.../containerpool/docker/DockerContainerFactory.scala	`69.69% <ø> (ø)`
...erpool/kubernetes/KubernetesContainerFactory.scala	`0.00% <ø> (ø)`
...ntainerpool/v2/FunctionPullingContainerProxy.scala	`78.72% <100.00%> (+0.06%)`	⬆️
...pool/docker/StandaloneDockerContainerFactory.scala	`0.00% <0.00%> (ø)`

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

style95 · 2023-09-27T03:18:35Z

core/invoker/src/main/resources/application.conf

@@ -71,6 +71,8 @@ whisk {
    prewarm-promotion: false # if true, action can take prewarm container which has bigger memory
    memory-sync-interval: 1 second # period to sync memory info to etcd
    batch-deletion-size: 10 # batch size for removing containers when disable invoker, too big value may cause docker/k8s overload
+    # optional setting to specify the total allocatable cpus for all action containers, each container will get a fraction of this proportional to its allocated memory to limit the cpu
+    # user-cpus: 1


BTW, could you add a configuration like this?
https://github.com/apache/openwhisk/blob/master/ansible/roles/invoker/tasks/deploy.yml#L304

It would be easier to configure it with ansible.

@style95 I just pushed some changes to ansible/roles/invoker/tasks/deploy.yml and ansible/group_vars/all. I'm not super familiar with ansible and was trying to keep it optional. Can you let me know how that looks?

style95 · 2023-09-28T23:36:07Z

ansible/roles/invoker/tasks/deploy.yml

@@ -258,6 +258,7 @@
      "CONFIG_whisk_containerFactory_containerArgs_network": "{{ invoker_container_network_name | default('bridge') }}"
      "INVOKER_CONTAINER_POLICY": "{{ invoker_container_policy_name | default()}}"
      "CONFIG_whisk_containerPool_userMemory": "{{ hostvars[groups['invokers'][invoker_index | int]].user_memory | default(invoker.userMemory) }}"
+      "CONFIG_whisk_containerPool_userCpus": "{{ invoker.userCpus | default() }}"


I could deploy this branch w/ and w/o this configuration using ansible.
I confirmed that NanoCpus is properly configured in containers according to this config.

... "CpuShares": 25, "Memory": 268435456, "NanoCpus": 0, ...

... "CpuShares": 25, "Memory": 268435456, "NanoCpus": 150000000, ...

style95

LGTM

* Add optional cpu limit to spawned action containers * conf * formatting * Update core/invoker/src/main/resources/application.conf Co-authored-by: Dominic Kim <style9595@gmail.com> * formatting * formatting * ansible --------- Co-authored-by: Dominic Kim <style9595@gmail.com> (cherry picked from commit 0c27a65)

quintenp01 added 2 commits September 26, 2023 13:34

Add optional cpu limit to spawned action containers

3e142c9

conf

3d26a20

style95 reviewed Sep 26, 2023

View reviewed changes

quintenp01 and others added 2 commits September 26, 2023 16:44

formatting

c0f9496

Update core/invoker/src/main/resources/application.conf

ccc6a61

Co-authored-by: Dominic Kim <style9595@gmail.com>

style95 reviewed Sep 26, 2023

View reviewed changes

style95 approved these changes Sep 26, 2023

View reviewed changes

quintenp01 added 2 commits September 26, 2023 17:23

formatting

8fd0a6e

formatting

6a6da58

bdoyle0182 reviewed Sep 27, 2023

View reviewed changes

bdoyle0182 approved these changes Sep 27, 2023

View reviewed changes

style95 reviewed Sep 27, 2023

View reviewed changes

ansible

8e0cf4e

style95 reviewed Sep 28, 2023

View reviewed changes

style95 approved these changes Sep 28, 2023

View reviewed changes

bdoyle0182 merged commit 0c27a65 into apache:master Sep 29, 2023
6 checks passed

bdoyle0182 deleted the quintenp/container-cpu-limit branch September 29, 2023 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional cpu limit to spawned action containers #5443

Add optional cpu limit to spawned action containers #5443

quintenp01 commented Sep 26, 2023

style95 left a comment

style95 Sep 26, 2023

style95 commented Sep 26, 2023

style95 Sep 26, 2023

style95 left a comment

bdoyle0182 Sep 27, 2023

quintenp01 Sep 27, 2023

bdoyle0182 commented Sep 27, 2023

codecov-commenter commented Sep 27, 2023 •

edited

Loading

style95 Sep 27, 2023

quintenp01 Sep 27, 2023

style95 Sep 28, 2023

style95 left a comment

Add optional cpu limit to spawned action containers #5443

Add optional cpu limit to spawned action containers #5443

Conversation

quintenp01 commented Sep 26, 2023

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

style95 left a comment

Choose a reason for hiding this comment

style95 Sep 26, 2023

Choose a reason for hiding this comment

style95 commented Sep 26, 2023

style95 Sep 26, 2023

Choose a reason for hiding this comment

style95 left a comment

Choose a reason for hiding this comment

bdoyle0182 Sep 27, 2023

Choose a reason for hiding this comment

quintenp01 Sep 27, 2023

Choose a reason for hiding this comment

bdoyle0182 commented Sep 27, 2023

codecov-commenter commented Sep 27, 2023 • edited Loading

Codecov Report

style95 Sep 27, 2023

Choose a reason for hiding this comment

quintenp01 Sep 27, 2023

Choose a reason for hiding this comment

style95 Sep 28, 2023

Choose a reason for hiding this comment

style95 left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 27, 2023 •

edited

Loading