-
Notifications
You must be signed in to change notification settings - Fork 437
Description
Hi,
apologies if this isn't the right forum for a feature request/question
I've been poking around, trying to find ways to automatically assign a number of Nvidia accelerators ( unique devices connected to the host ) to a specific, Nvidia Container Runtime enhanced bundle. So far i'm coming up with very little and i'm not too certain if this is in or out of scope for NVCR itself.
Before i jump in to a novel oci shim that could work together with NVCR envvars, such as VISIBLE_DEVICES, and other features of the container toolkit ( discovery and filtering on other envvars ) to maintain a bundle to device/s "lease", i'd like to ask
- if this already is easily attainable and i just haven't searched thoroughly enough,
- if this should live in NVCR or elsewhere in the toolkit, or
- it should be indeed essentially pre-processing of the bundle json before so by the time the oci create reaches NVCR, it already has a specific VISIBLE_DEVICES value set, even though the overlap with NVCR ( parsing envvar expressed requirements ) may be large
my use case is coordinating containerized workloads on nodes with many accelerators, that are a poor fit for kubernetes scheduling at this moment ( partially because some of them are containerized kubelets themselves, but i'll also accept an "all of this should really be done over kubernetes" solution as well )
thanks!
Andras