Skip to content

Tasks created in parallel might lead to FAILED tasks that are still executed #4619

@johha

Description

@johha

Issue

A race condition exists during task creation when multiple tasks are started in parallel within a space that has a restrictive org or space memory quota which would be exceeded by the tasks. This can cause tasks to be marked as FAILED by Cloud Controller, even though they have already been sent to to Diego and are executing successfully.

Context

The issue stems from two separate quota validations:

  • An initial validation occurs when the task is first created in a PENDING state. For parallel requests, this check can pass for multiple tasks before the quota is consumed.
  • A second validation is triggered when the task state is updated to RUNNING after being submitted to Diego.

When triggered in parallel a task can pass the first check and be sent to Diego (which will also execute it). However, by the time its state is updated to RUNNING, another parallel task may have already consumed the available quota. This causes the second validation to fail, and the Cloud Controller updates the task's state to FAILED. In reality the task was successfully executed by Diego.

Steps to Reproduce

Create a space with quota and assign it

cf create-space task-race-condition-test  -o <some org>

# push a dummy app and stop it

cf create-space-quota repro-quota -m 1G -a 5

cf set-space-quota task-race-condition-test repro-quota

Run Two Tasks in Parallel

COMMAND="for i in {1..12}; do echo \"Task is running at \$(date)\"; sleep 5; done"
APP_NAME="task-app"

cf run-task "$APP_NAME" --command "$COMMAND" -m 600M --name task1 & cf run-task "$APP_NAME"  --command "$COMMAND" -m 600M --name task2 &

Expected Result

The second task should not be forwarded to Diego as it will exceed the memory quota.

Current Result

Run Task

cf run-task "$APP_NAME" --command "$COMMAND" -m 600M --name task1 & cf run-task "$APP_NAME"  --command "$COMMAND" -m 600M --name task2 &

Creating task for app task-app in org <org> / space task-race-condition-test ...
Creating task for app task-app in org <org / space task-race-condition-test ...
Task has been submitted successfully for execution.
OK

task name:   task1
task id:     3
OK

memory_in_mb exceeds space memory quota

Task Status

cf tasks task-app
Getting tasks for app task-app in org <org> / space task-race-condition-test as ...

id   name       state       start time                      command
4    task2      FAILED      Fri, 24 Oct 2025 14:54:41 UTC   for i in {1..12}; do echo "Task is running at $(date)"; sleep 5; done
3    task1      RUNNING     Fri, 24 Oct 2025 14:54:41 UTC   for i in {1..12}; do echo "Task is running at $(date)"; sleep 5; done

Task Status After Completion

cf tasks task-app
Getting tasks for app task-app in org <org> / space task-race-condition-test as ...

id   name       state       start time                      command
4    task2      FAILED      Fri, 24 Oct 2025 14:54:41 UTC   for i in {1..12}; do echo "Task is running at $(date)"; sleep 5; done
3    task1      SUCCEEDED   Fri, 24 Oct 2025 14:54:41 UTC   for i in {1..12}; do echo "Task is running at $(date)"; sleep 5; done

CF Logs Output

   2025-10-24T16:54:45.33+0200 [APP/TASK/task1/0] OUT Invoking pre-start scripts.
   2025-10-24T16:54:45.37+0200 [APP/TASK/task1/0] OUT Invoking start command.
   2025-10-24T16:54:45.37+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:54:45 PM UTC 2025
   2025-10-24T16:54:45.49+0200 [APP/TASK/task2/0] OUT Invoking pre-start scripts.
   2025-10-24T16:54:45.52+0200 [APP/TASK/task2/0] OUT Invoking start command.
   2025-10-24T16:54:45.53+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:54:45 PM UTC 2025
   2025-10-24T16:54:50.38+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:54:50 PM UTC 2025
   2025-10-24T16:54:50.53+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:54:50 PM UTC 2025
   2025-10-24T16:54:55.38+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:54:55 PM UTC 2025
   2025-10-24T16:54:55.53+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:54:55 PM UTC 2025
   2025-10-24T16:55:00.38+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:00 PM UTC 2025
   2025-10-24T16:55:00.54+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:00 PM UTC 2025
   2025-10-24T16:55:05.39+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:05 PM UTC 2025
   2025-10-24T16:55:05.54+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:05 PM UTC 2025
   2025-10-24T16:55:10.39+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:10 PM UTC 2025
   2025-10-24T16:55:10.55+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:10 PM UTC 2025
   2025-10-24T16:55:15.39+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:15 PM UTC 2025
   2025-10-24T16:55:15.55+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:15 PM UTC 2025
   2025-10-24T16:55:20.40+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:20 PM UTC 2025
   2025-10-24T16:55:20.55+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:20 PM UTC 2025
   2025-10-24T16:55:25.40+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:25 PM UTC 2025
   2025-10-24T16:55:25.56+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:25 PM UTC 2025
   2025-10-24T16:55:30.41+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:30 PM UTC 2025
   2025-10-24T16:55:30.56+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:30 PM UTC 2025
   2025-10-24T16:55:35.41+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:35 PM UTC 2025
   2025-10-24T16:55:35.56+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:35 PM UTC 2025
   2025-10-24T16:55:40.41+0200 [APP/TASK/task1/0] OUT Task is running at Fri Oct 24 02:55:40 PM UTC 2025
   2025-10-24T16:55:40.57+0200 [APP/TASK/task2/0] OUT Task is running at Fri Oct 24 02:55:40 PM UTC 2025
   2025-10-24T16:55:45.42+0200 [APP/TASK/task1/0] OUT Exit status 0
   2025-10-24T16:55:45.57+0200 [APP/TASK/task2/0] OUT Exit status 0

Possible Fix

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions