Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

greedy task resource accounting instead of blocking on tasks ahead in the queue #3822

Conversation

singholt
Copy link
Contributor

@singholt singholt commented Jul 20, 2023

Summary

Traditionally, the ECS agent wouldn't start new tasks until all stopping tasks have stopped on a container instance.

With the task resource accounting project, we replaced this serialization with a resource accounting strategy that uses a FIFO queue to serve incoming tasks: #3819. If there are no resources available on the host, we queue tasks up until resources are available.

As soon as resources become available, the first task in the queue gets priority. The other waiting tasks continue to wait, until the first task gets served. There are corner cases when the first task in the queue is large enough to block all other tasks from being served (forever).

This PR changes the above behavior. With this change, although we give preference to tasks in a FIFO fashion, we will try to run as many tasks from the queue as possible, instead of blocking on tasks ahead in the queue that cannot start.

Example:
Consider a container instance with memory 1024 MiB. 3 tasks land on the instance: task T1 with memory 512 MiB, task T2 with memory 1024 MiB, and task T3 with memory 256 MiB.

T1 will run, since the requested memory is available. T2 needs more memory (1024 MiB) than whats available (512 MiB), hence it will be placed in the wait queue. T3 will also run because the requested memory is available. Prior to this change, the T3 would not run until T2 has been served first (and the only way T2 would run is if T1 stops).

Note:

  • The target branch for this PR is feature/greedy-task-resource-accounting.
  • There are corner cases with this approach as well, for example, there's a scenario when smaller tasks will always start and large tasks may be stuck in the queue. After speaking with product, we plan to introduce a mechanism to remove tasks from the queue if it has been in there for too long (with a wait timeout). This will be addressed in a follow-up PR.

Implementation details

  • Updated the dequeueTask() to dequeue at the given index, instead of dequeuing first task always.
  • Updated the topTask() to getTaskByIndex() to return a task at the given index, instead of returning the first task always.
  • Updated the monitorQueuedTasks() to try dequeuing as many tasks as possible, instead of stopping if unable to dequeue the first task in the queue.

Testing

New tests cover the changes: yes

  • Updated the unit test to mimic how tasks will be queued up when resource requirements are identical and different.
  • Integration tests will be updated/added in a follow-up PR.

Description for the changelog

enhancement: switch to greedy task resource accounting instead of blocking on tasks ahead in the queue.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@singholt singholt requested a review from a team as a code owner July 20, 2023 22:49
@singholt singholt changed the title switch to greedy task resource accounting instead of blocking on tasks ahead in the queue greedy task resource accounting instead of blocking on tasks ahead in the queue Jul 20, 2023
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from 5812b5d to ffc7974 Compare July 21, 2023 00:23
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from ffc7974 to 56fac99 Compare July 21, 2023 18:10
@singholt singholt requested a review from mye956 July 21, 2023 18:14
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from 56fac99 to 3997ebd Compare July 21, 2023 18:25
ubhattacharjya
ubhattacharjya previously approved these changes Jul 21, 2023
mye956
mye956 previously approved these changes Jul 21, 2023
@singholt singholt dismissed stale reviews from mye956 and ubhattacharjya via 187e9bb July 21, 2023 18:33
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from 3997ebd to 187e9bb Compare July 21, 2023 18:33
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from 187e9bb to 3ea8de0 Compare July 21, 2023 18:52
@singholt singholt force-pushed the feature/greedy-task-resource-accounting branch from 3ea8de0 to 9c322f7 Compare July 21, 2023 19:29
@singholt singholt merged commit 3b6c4f1 into aws:feature/greedy-task-resource-accounting Jul 21, 2023
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants