Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix memory resource accounting for multiple containers in single task #3782

Conversation

prateekchaudhry
Copy link
Contributor

@prateekchaudhry prateekchaudhry commented Jul 5, 2023

Summary

This PR fixes how task memory is accounted when task level memory limit is not specified in task definition. The existing implementation fails to account correctly in cases where containers with MemoryReservation field present in HostConfig might occur before containers without such field in the task.Containers slice. In such cases, MemoryReservation of previous container 'persists', as the object into which the HostConfig is being unmarshalled is declared outside of the loop which iterates over the containers.

Main cause of this change is that the ACS payload agent receives has HostConfig like {\"NetworkMode\":\"bridge\",\"CapAdd\":[],\"CapDrop\":[]} with missing fields in HostConfig struct. That is, if fields like MemoryReservation are not specified, they are not present in the payload, and json.Unmarshaldoes not make any changes to the object. Using a new object for every container ensures this persistence does not happen.

Implementation details

Fix:

  • Define new &dockercontainer.HostConfig{} object for each container when mapping task to host resources
  • rawHostConfig* in related unit tests have been modified to be directly defined using a string instead of marshaling a HostConfig object. This is necessary, because if we use a HostConfig object, it will use the zero value (int64(0)) of fields like MemoryReservation if they are not specified

Related Containers Roadmap Issue

aws/containers-roadmap#325

Testing

Verified -

  • With manual tests this behavior is consistent
  • With manual tests resources allocated in ECS back end are same as ECS Agent for each task
  • Added new unit tests with multiple containers and a test with awsvpc network mode
  • Verified test with multiple containers - with containers with MemoryReservation followed by containers with memory fail for existing implementation - and pass with the fix -testTask5 here

New tests cover the changes:
Yes

Description for the changelog

Fix memory resource accounting for multiple containers in single task

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@prateekchaudhry prateekchaudhry requested a review from a team as a code owner July 5, 2023 08:20
@@ -4983,6 +4983,35 @@ func TestToHostResources(t *testing.T) {
},
}

// A combination of containers with different configs with memory sourcing from different sources
testTask5 := &Task{
Copy link
Contributor

@yinyic yinyic Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may not be able to catch the gap that we are fixing, since the test uses a single hostConfig for all containers, In that case, the existing implementation would've worked fine. We should create at least two hostConfigs with different memoryReservation, such that the existing implementation would fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack - additionally, existing implementation fails due to payload Agent receives does not have all the HostConfig fields. So the unit test and it's hostConfig need to be defined in such a way as well to actually repro the case with failure for existing impl and success for new.

If we define HostConfig with a struct in unit test, the zero value (=int64(0)) gets used instead which does not create actual scenario, in which the MemoryReservation field itself might not exist. So both implementations succeed in such case.

Pushed changes to correct this.

@prateekchaudhry prateekchaudhry force-pushed the taskAccountingMemoryAccountingFix branch 5 times, most recently from 1c95f25 to b737dcf Compare July 10, 2023 19:26
Realmonia
Realmonia previously approved these changes Jul 10, 2023
Comment on lines -3560 to +3564
// To parse memory reservation / soft limit
hostConfig := &dockercontainer.HostConfig{}

for _, c := range task.Containers {
// To parse memory reservation / soft limit
hostConfig := &dockercontainer.HostConfig{}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update in description about how this is causing the problem? i.e. json.Unmarshal will not override previously assigned value if the field is not specified in the new json source.

@prateekchaudhry prateekchaudhry changed the title fix memory resource accounting for multiple containers fix memory resource accounting for multiple containers in single task Jul 10, 2023
@prateekchaudhry prateekchaudhry force-pushed the feature/task-resource-accounting branch from 9aa7d55 to 058983a Compare July 12, 2023 00:44
@prateekchaudhry prateekchaudhry dismissed Realmonia’s stale review July 12, 2023 00:53

The merge-base changed after approval.

@prateekchaudhry prateekchaudhry force-pushed the feature/task-resource-accounting branch from 058983a to 1e72259 Compare July 12, 2023 01:35
@prateekchaudhry
Copy link
Contributor Author

  • Rebased feature branch against dev
  • Reverted task resource accounting revert changes in feature branch to put it back in
  • Rebased taskAccountingMemoryAccountingFix in fork against feature branch

Copy link
Contributor

@SreeeS SreeeS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prateekchaudhry prateekchaudhry merged commit 2b0ff57 into aws:feature/task-resource-accounting Jul 12, 2023
34 of 35 checks passed
singholt pushed a commit to singholt/amazon-ecs-agent that referenced this pull request Jul 20, 2023
…aws#3782)

* fix memory resource accounting for multiple containers

* change unit tests for multiple containers, add unit test for awsvpc
Realmonia pushed a commit that referenced this pull request Jul 20, 2023
* Revert reverted changes for task resource accounting (#3796)

* Revert "Revert "host resource manager initialization""

This reverts commit dafb967.

* Revert "Revert "Add method to get host resources reserved for a task (#3706)""

This reverts commit 8d824db.

* Revert "Revert "Add host resource manager methods (#3700)""

This reverts commit bec1303.

* Revert "Revert "Remove task serialization and use host resource manager for task resources (#3723)""

This reverts commit cb54139.

* Revert "Revert "add integ tests for task accounting (#3741)""

This reverts commit 61ad010.

* Revert "Revert "Change reconcile/container update order on init and waitForHostResources/emitCurrentStatus order (#3747)""

This reverts commit 60a3f42.

* Revert "Revert "Dont consume host resources for tasks getting STOPPED while waiting in waitingTasksQueue (#3750)""

This reverts commit 8943792.

* fix memory resource accounting for multiple containers in single task (#3782)

* fix memory resource accounting for multiple containers

* change unit tests for multiple containers, add unit test for awsvpc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants