Skip to content

Commit 1bfd8a0

Browse files
committed
Addressed comments
1 parent 61ec86f commit 1bfd8a0

File tree

1 file changed

+19
-13
lines changed
  • keps/sig-scheduling/5194-reserved-for-workloads

1 file changed

+19
-13
lines changed

keps/sig-scheduling/5194-reserved-for-workloads/README.md

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ KEP, the hard limit on the number of pods will be removed.
177177
Training workloads that uses TPUs can be very large, requiring over 9,000
178178
TPUs for a single training job. The number of TPUs for each node is usually 4, meaning
179179
that the job will run across more than 2,000 nodes. Due to topology constraints, TPU slices
180-
are usually modelled in DRA as multi-host devices, meaning that a single DRA device
180+
are usually modeled in DRA as multi-host devices, meaning that a single DRA device
181181
can represent thousands of TPUs. As a result, all pods running the workload will
182182
therefore share a single ResourceClaim. The current limit of 256 pods sharing a
183183
ResourceClaim is therefore too low.
@@ -237,7 +237,7 @@ provide guidance as to what is a safe number.
237237
The `ReservedFor` field on the `ResourceClaimStatus` is currently used for two purposes:
238238

239239
#### Deallocation
240-
Devices are allocated to `ResourceClaim`s when the first pod referencing the claim is
240+
Devices are allocated to a `ResourceClaim` when the first pod referencing the claim is
241241
scheduled. Other pods can also share the `ResourceClaim` in which case they share the
242242
devices. Once no pods are consuming the claim, the devices should be deallocated to they
243243
can be allocted to other claims. The `ReservedFor` list is used to keep track of pods
@@ -256,7 +256,7 @@ controller to find pods that are using the ResourceClaim:
256256
1. The DRA scheduler plugin uses the list to find claims that have zero or only
257257
a single pod using it, and is therefore a candidate for deallocation in the `PostFilter` function.
258258

259-
1. The device_taint_eviction controller uses the `ReservedFor` list to find the pods that needs to be evicted
259+
1. The device_taint_eviction controller uses the `ReservedFor` list to find the pods that need to be evicted
260260
when one or more of the devices allocated to a ResourceClaim is tainted (and the ResourceClaim
261261
does not have a toleration).
262262

@@ -284,6 +284,9 @@ type ResourceClaimSpec struct {
284284
// ResourceClaim to remove the reference from the status.ReservedFor list when there
285285
// are no longer any pods consuming the claim.
286286
//
287+
// Most user-created ResourceClaims should not set this field. It is more typically
288+
// used by ResourceClaims created and managed by controllers.
289+
//
287290
// +featureGate=DRAReservedForWorkloads
288291
// +optional
289292
ReservedFor *ResourceClaimConsumerReference
@@ -303,7 +306,7 @@ type AllocationResult struct {
303306
}
304307
```
305308

306-
The `ResourceClaimConsumerReference type already exists:
309+
The `ResourceClaimConsumerReference` type already exists:
307310

308311
```go
309312
// ResourceClaimConsumerReference contains enough information to let you
@@ -345,21 +348,24 @@ list, it will not add a reference to the pod.
345348

346349
##### Deallocation
347350
The resourceclaim controller will remove Pod references from the `ReservedFor` list just
348-
like it does now, but it will never remove references to non-Pod resources. Instead, it
349-
will be the responsibility of the controller/user that created the `ResourceClaim` to
350-
remove the reference to the non-Pod resource from the `ReservedFor` list when no pods
351+
like it does now using the same logic. For non-Pod references, the controller will recognize
352+
a small number of built-in types, starting with Deployment, StatefulSet and Job, and will
353+
remove the reference from the list when those resources are removed. For other types,
354+
it will be the responsibility of the workload controller/user that created the `ResourceClaim`
355+
to remove the reference to the non-Pod resource from the `ReservedFor` list when no pods
351356
are consuming the `ResourceClaim` and no new pods will be created that references
352357
the `ResourceClaim`.
353358

354359
The resourceclaim controller will then discover that the `ReservedFor` list is empty
355360
and therefore know that it is safe to deallocate the `ResourceClaim`.
356361

357-
This requires that the controller/user has permissions to update the status
358-
subresource of the `ResourceClaim`. The resourceclaim controller will also try to detect if
359-
the resource referenced in the `ReservedFor` list has been deleted from the cluster, but
360-
that requires that the controller has permissions to get or list resources of the type. If the
361-
resourceclaim controller is not able to check, it will just wait until the reference in
362-
the `ReservedFor` list is removed.
362+
This requires that the resourceclaim controller watches the workload types that will
363+
be supported. For other types of workloads, there will be a requirement that the workload
364+
controller has permissions to update the status subresource of the `ResourceClaim`. The
365+
resourceclaim controller will also try to detect if an unknown resource referenced in the
366+
`ReservedFor` list has been deleted from the cluster, but that requires that the controller
367+
has permissions to get or list resources of the type. If the resourceclaim controller is
368+
not able to check, it will just wait until the reference in the `ReservedFor` list is removed.
363369

364370
##### Finding pods using a ResourceClaim
365371
If the reference in the `ReservedFor` list is to a non-Pod resource, controllers can no longer

0 commit comments

Comments
 (0)