@@ -177,7 +177,7 @@ KEP, the hard limit on the number of pods will be removed.
177
177
Training workloads that uses TPUs can be very large, requiring over 9,000
178
178
TPUs for a single training job. The number of TPUs for each node is usually 4, meaning
179
179
that the job will run across more than 2,000 nodes. Due to topology constraints, TPU slices
180
- are usually modelled in DRA as multi-host devices, meaning that a single DRA device
180
+ are usually modeled in DRA as multi-host devices, meaning that a single DRA device
181
181
can represent thousands of TPUs. As a result, all pods running the workload will
182
182
therefore share a single ResourceClaim. The current limit of 256 pods sharing a
183
183
ResourceClaim is therefore too low.
@@ -237,7 +237,7 @@ provide guidance as to what is a safe number.
237
237
The ` ReservedFor ` field on the ` ResourceClaimStatus ` is currently used for two purposes:
238
238
239
239
#### Deallocation
240
- Devices are allocated to ` ResourceClaim ` s when the first pod referencing the claim is
240
+ Devices are allocated to a ` ResourceClaim ` when the first pod referencing the claim is
241
241
scheduled. Other pods can also share the ` ResourceClaim ` in which case they share the
242
242
devices. Once no pods are consuming the claim, the devices should be deallocated to they
243
243
can be allocted to other claims. The ` ReservedFor ` list is used to keep track of pods
@@ -256,7 +256,7 @@ controller to find pods that are using the ResourceClaim:
256
256
1 . The DRA scheduler plugin uses the list to find claims that have zero or only
257
257
a single pod using it, and is therefore a candidate for deallocation in the ` PostFilter ` function.
258
258
259
- 1 . The device_taint_eviction controller uses the ` ReservedFor ` list to find the pods that needs to be evicted
259
+ 1 . The device_taint_eviction controller uses the ` ReservedFor ` list to find the pods that need to be evicted
260
260
when one or more of the devices allocated to a ResourceClaim is tainted (and the ResourceClaim
261
261
does not have a toleration).
262
262
@@ -284,6 +284,9 @@ type ResourceClaimSpec struct {
284
284
// ResourceClaim to remove the reference from the status.ReservedFor list when there
285
285
// are no longer any pods consuming the claim.
286
286
//
287
+ // Most user-created ResourceClaims should not set this field. It is more typically
288
+ // used by ResourceClaims created and managed by controllers.
289
+ //
287
290
// +featureGate=DRAReservedForWorkloads
288
291
// +optional
289
292
ReservedFor *ResourceClaimConsumerReference
@@ -303,7 +306,7 @@ type AllocationResult struct {
303
306
}
304
307
```
305
308
306
- The `ResourceClaimConsumerReference type already exists:
309
+ The ` ResourceClaimConsumerReference ` type already exists:
307
310
308
311
``` go
309
312
// ResourceClaimConsumerReference contains enough information to let you
@@ -345,21 +348,24 @@ list, it will not add a reference to the pod.
345
348
346
349
##### Deallocation
347
350
The resourceclaim controller will remove Pod references from the ` ReservedFor ` list just
348
- like it does now, but it will never remove references to non-Pod resources. Instead, it
349
- will be the responsibility of the controller/user that created the ` ResourceClaim ` to
350
- remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351
+ like it does now using the same logic. For non-Pod references, the controller will recognize
352
+ a small number of built-in types, starting with Deployment, StatefulSet and Job, and will
353
+ remove the reference from the list when those resources are removed. For other types,
354
+ it will be the responsibility of the workload controller/user that created the ` ResourceClaim `
355
+ to remove the reference to the non-Pod resource from the ` ReservedFor ` list when no pods
351
356
are consuming the ` ResourceClaim ` and no new pods will be created that references
352
357
the ` ResourceClaim ` .
353
358
354
359
The resourceclaim controller will then discover that the ` ReservedFor ` list is empty
355
360
and therefore know that it is safe to deallocate the ` ResourceClaim ` .
356
361
357
- This requires that the controller/user has permissions to update the status
358
- subresource of the ` ResourceClaim ` . The resourceclaim controller will also try to detect if
359
- the resource referenced in the ` ReservedFor ` list has been deleted from the cluster, but
360
- that requires that the controller has permissions to get or list resources of the type. If the
361
- resourceclaim controller is not able to check, it will just wait until the reference in
362
- the ` ReservedFor ` list is removed.
362
+ This requires that the resourceclaim controller watches the workload types that will
363
+ be supported. For other types of workloads, there will be a requirement that the workload
364
+ controller has permissions to update the status subresource of the ` ResourceClaim ` . The
365
+ resourceclaim controller will also try to detect if an unknown resource referenced in the
366
+ ` ReservedFor ` list has been deleted from the cluster, but that requires that the controller
367
+ has permissions to get or list resources of the type. If the resourceclaim controller is
368
+ not able to check, it will just wait until the reference in the ` ReservedFor ` list is removed.
363
369
364
370
##### Finding pods using a ResourceClaim
365
371
If the reference in the ` ReservedFor ` list is to a non-Pod resource, controllers can no longer
0 commit comments