-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Steps to reproduce:
Run the following command from a dev environment or a task:
pip install vllmActual behavior:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
Notes:
This happens because dstack's base CUDA image does not include all CUDA drivers. For example, it does not include nvcc.
Below are some of the possible solutions:
Solution 1: Allow to easily install cuda via conda
If we pre-configure nvidia/label/cuda-11.4.3 channel, the user will be able to install nvcc via conda install cuda.
This solution won't solve other related important problems:
- [UX] The user still will have to find the instruction in the documentation on why and how to install
cuda. - [UX] Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.
- [Performance, UX] Installing
cudaon each startup will take significant time pushing the user to pre-build it for each configuration. This will inherit other problems related tobuild(a. the same build cannot be reused across configurations; b. Docker images are easier to reuse as they have unique repository and tag names; c. build must be invoked manually each time before the run) - [Performance, UX] If the user will install
cudaviacondainbuild, the build image will be of a minimum of 6.7GB even if no other libraries are not installed. Combined with the performance problems related tobuild, it can be easier to installcudaon each run.
Solution 2: Use of the "devel" version of the CUDA image
The "devel" version of the CUDA image has all CUDA drivers including nvcc.
This solution won't solve other related important problems:
- [UX] The user will have to somehow configure via YAML if the configuration requires
nvcc(or other CUDA drivers). - [Implementation] We'll have to maintain more base Docker images.
- [UX]: Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.
Solution 3: Allow to easily use existing Docker images
This solution won't solve other related important problems:
- [Implementation] Installing
openssh-serverinto an existing Docker image may be challenging.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status