You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My Azure Batch-based application uses a constraint-based scheduling convention wherein task slots represent vCPUs. If I have a pool using a 4-vCPU VM size then I set "Task slots per node" to 4, and when I submit a task that needs access to 2 vCPUs I set the task's "Required slots" to 2.
I have been unable to get this scheme to work reliably with autoscaling, because the autoscale formula language is unaware of tasks' task slot requirements. Information on task slots per node is available in the $TaskSlotsPerNode variable, but there does not appear to be any information on the slot requirements of existing tasks. This means that any autoscaling formula implicitly bakes in the assumption that every task requires exactly one slot.
Describe Preferred Solution
I think the ideal solution is to introduce a task-slot-wise version of each task metric variable. This might look like:
Existing task-wise metric
New task-slot-wise metric
$ActiveTasks
$ActiveTaskSlots
$RunningTasks
$RunningTaskSlots
$PendingTasks
$PendingTaskSlots
$SucceededTasks
$SucceededTaskSlots
$FailedTasks
$FailedTaskSlots
Describe Alternatives Considered
The current task-wise metric variables could be changed to reflect task slots instead of whole tasks. This would be a breaking change.
Additional Context
The proposed addition of task-slot-wise metrics would be analogous to the addition of the TaskSlotCounts object to the response of the Job_GetTaskSlots operation made in API version 2020-09-01.12.0.
The text was updated successfully, but these errors were encountered:
Hi @alfpark
Does "known issue" mean this will not get fixed? Sounds more like a feature request to me and a very valueable one!
I have another use case where I run into the exact same problem.
I have a Job which has tasks which depend on each other, there are one task per job which is more compute intense then others, I usually set that one to max slots per node so that it runs on one VM with all the resources available.
The dependent tasks later are 10x more tasks but they run faster and only need one CPU, so, in that case I set slots to 1 for those tasks.
Now, its impossible to calculate how many nodes are actually needed with auto scaling...
So, my question would be, will this feature be added soon, to use task slot variables within the formula instead of tasks?
Or is there another solution or workaround for those use cases I should consider instead of auto scaling?
Feature Request Description
My Azure Batch-based application uses a constraint-based scheduling convention wherein task slots represent vCPUs. If I have a pool using a 4-vCPU VM size then I set "Task slots per node" to 4, and when I submit a task that needs access to 2 vCPUs I set the task's "Required slots" to 2.
I have been unable to get this scheme to work reliably with autoscaling, because the autoscale formula language is unaware of tasks' task slot requirements. Information on task slots per node is available in the
$TaskSlotsPerNode
variable, but there does not appear to be any information on the slot requirements of existing tasks. This means that any autoscaling formula implicitly bakes in the assumption that every task requires exactly one slot.Describe Preferred Solution
I think the ideal solution is to introduce a task-slot-wise version of each task metric variable. This might look like:
$ActiveTasks
$ActiveTaskSlots
$RunningTasks
$RunningTaskSlots
$PendingTasks
$PendingTaskSlots
$SucceededTasks
$SucceededTaskSlots
$FailedTasks
$FailedTaskSlots
Describe Alternatives Considered
The current task-wise metric variables could be changed to reflect task slots instead of whole tasks. This would be a breaking change.
Additional Context
The proposed addition of task-slot-wise metrics would be analogous to the addition of the
TaskSlotCounts
object to the response of theJob_GetTaskSlots
operation made in API version 2020-09-01.12.0.The text was updated successfully, but these errors were encountered: