Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional cpu limit to spawned action containers #5443

Merged
merged 7 commits into from
Sep 29, 2023

Conversation

quintenp01
Copy link
Contributor

Description

This change provides the option to limit action container cpu usage proportional to the allocated memory. This helps to prevent an invoker from being overloaded by action cpu usage, and also provides more predictable performance for actions.

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
This generally looks good to me.

import scala.concurrent.duration.DurationInt

@RunWith(classOf[JUnitRunner])
class ContainerPoolConfigTests extends FlatSpec with Matchers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

core/invoker/src/main/resources/application.conf Outdated Show resolved Hide resolved
@style95
Copy link
Member

style95 commented Sep 26, 2023

It seems the code is wrongly formatted.

> Task :tests:checkTestScalafmt FAILED
FAILURE: Build failed with an exception.

* What went wrong:

Execution failed for task ':tests:checkTestScalafmt'.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
> Files incorrectly formatted: /home/runner/work/openwhisk/openwhisk/tests/src/test/scala/common/LoggedFunction.scala,/home/runner/work/openwhisk/openwhisk/tests/src/test/scala/org/apache/openwhisk/core/containerpool/v2/test/FunctionPullingContainerProxyTests.scala,/home/runner/work/openwhisk/openwhisk/tests/src/test/scala/org/apache/openwhisk/core/containerpool/test/ContainerProxyTests.scala

quintenp01 and others added 2 commits September 26, 2023 16:44
Co-authored-by: Dominic Kim <style9595@gmail.com>
def cpuLimit(reservedMemory: ByteSize): Option[Double] = {
userCpus.map(c => {
val containerCpus = c / (userMemory.toBytes / reservedMemory.toBytes)
val roundedContainerCpus = round(containerCpus * roundingMultiplier).toDouble / roundingMultiplier // Only use decimal precision of 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quintenp01
Thank you for your contribution.
It looks good to me 👍

@@ -101,6 +102,7 @@ object DockerContainer {
dnsSearch.flatMap(d => Seq("--dns-search", d)) ++
dnsOptions.flatMap(d => Seq(dnsOptString, d)) ++
name.map(n => Seq("--name", n)).getOrElse(Seq.empty) ++
cpuLimit.map(c => Seq("--cpus", c.toString)).getOrElse(Seq.empty) ++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasn't aware that you could do both --cpus and --cpu-shares on the docker run. how does this look in practice with how it behaves setting both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--cpu-shares still provides the weight/priority to cpu cycles for the container, --cpus just provides the cap.

@bdoyle0182
Copy link
Contributor

the tradeoff here should be obvious. this will prevent user functions from bursting to take an entire node's cpu if capacity is available, but that is entirely dependent on what else is running on the node resulting in inconsistent performance for the user. this also results in an uncontrollable noisy neighbor issue where a high cpu function could impact the performance of the overall node and other functions executing on it. it's likely preferable to the user to have a reliable and consistent performance to each execution over cpu bursting unless in a highly controlled cluster where you have control over all functions running. I think this should be considered to be defaulted to be on for the 2.0 major version though I'm sure that would be controversial as that will be extremely hard to apply to public clouds where there's no guarantee this change wouldn't break a user's function with lowered cpu. It would require a secondary deployment for 2.0 at that point where a function user consciously upgrades their function to the 2.0 cluster. (which may already have to be the upgrade model for public clouds to 2.0 anyways with the new scheduler)

also the way this was implemented is really clean. I've wanted to do this in the past and my thought didn't include having the config be the number of available cpu cores and rationing that in the exact same way the user-memory config is set. makes it very easy for an operator to understand and modify.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Merging #5443 (3a75379) into master (6375c96) will decrease coverage by 0.28%.
The diff coverage is 92.85%.

❗ Current head 3a75379 differs from pull request most recent head 8e0cf4e. Consider uploading reports for the commit 8e0cf4e to get more accurate results

@@            Coverage Diff             @@
##           master    #5443      +/-   ##
==========================================
- Coverage   76.80%   76.53%   -0.28%     
==========================================
  Files         241      241              
  Lines       14634    14646      +12     
  Branches      607      617      +10     
==========================================
- Hits        11240    11209      -31     
- Misses       3394     3437      +43     
Files Coverage Δ
...penwhisk/core/containerpool/ContainerFactory.scala 92.30% <100.00%> (+2.83%) ⬆️
...che/openwhisk/core/yarn/YARNContainerFactory.scala 87.36% <ø> (ø)
.../openwhisk/core/containerpool/ContainerProxy.scala 95.22% <100.00%> (+0.02%) ⬆️
...sk/core/containerpool/docker/DockerContainer.scala 98.80% <100.00%> (+0.01%) ⬆️
.../containerpool/docker/DockerContainerFactory.scala 69.69% <ø> (ø)
...erpool/kubernetes/KubernetesContainerFactory.scala 0.00% <ø> (ø)
...ntainerpool/v2/FunctionPullingContainerProxy.scala 78.72% <100.00%> (+0.06%) ⬆️
...pool/docker/StandaloneDockerContainerFactory.scala 0.00% <0.00%> (ø)

... and 6 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -71,6 +71,8 @@ whisk {
prewarm-promotion: false # if true, action can take prewarm container which has bigger memory
memory-sync-interval: 1 second # period to sync memory info to etcd
batch-deletion-size: 10 # batch size for removing containers when disable invoker, too big value may cause docker/k8s overload
# optional setting to specify the total allocatable cpus for all action containers, each container will get a fraction of this proportional to its allocated memory to limit the cpu
# user-cpus: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, could you add a configuration like this?
https://github.com/apache/openwhisk/blob/master/ansible/roles/invoker/tasks/deploy.yml#L304

It would be easier to configure it with ansible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@style95 I just pushed some changes to ansible/roles/invoker/tasks/deploy.yml and ansible/group_vars/all. I'm not super familiar with ansible and was trying to keep it optional. Can you let me know how that looks?

@@ -258,6 +258,7 @@
"CONFIG_whisk_containerFactory_containerArgs_network": "{{ invoker_container_network_name | default('bridge') }}"
"INVOKER_CONTAINER_POLICY": "{{ invoker_container_policy_name | default()}}"
"CONFIG_whisk_containerPool_userMemory": "{{ hostvars[groups['invokers'][invoker_index | int]].user_memory | default(invoker.userMemory) }}"
"CONFIG_whisk_containerPool_userCpus": "{{ invoker.userCpus | default() }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could deploy this branch w/ and w/o this configuration using ansible.
I confirmed that NanoCpus is properly configured in containers according to this config.

...
            "CpuShares": 25,
            "Memory": 268435456,
            "NanoCpus": 0,
...
...
            "CpuShares": 25,
            "Memory": 268435456,
            "NanoCpus": 150000000,
...

Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bdoyle0182 bdoyle0182 merged commit 0c27a65 into apache:master Sep 29, 2023
6 checks passed
@bdoyle0182 bdoyle0182 deleted the quintenp/container-cpu-limit branch September 29, 2023 13:28
mtt-merz pushed a commit to mtt-merz/openwhisk that referenced this pull request Oct 22, 2023
* Add optional cpu limit to spawned action containers

* conf

* formatting

* Update core/invoker/src/main/resources/application.conf

Co-authored-by: Dominic Kim <style9595@gmail.com>

* formatting

* formatting

* ansible

---------

Co-authored-by: Dominic Kim <style9595@gmail.com>
(cherry picked from commit 0c27a65)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants