Skip to content

No such file or directory: '/usr/local/cuda/bin/nvcc' #539

@peterschmidt85

Description

@peterschmidt85

Steps to reproduce:

Run the following command from a dev environment or a task:

pip install vllm

Actual behavior:

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'

Notes:

This happens because dstack's base CUDA image does not include all CUDA drivers. For example, it does not include nvcc.

Below are some of the possible solutions:

Solution 1: Allow to easily install cuda via conda

If we pre-configure nvidia/label/cuda-11.4.3 channel, the user will be able to install nvcc via conda install cuda.

This solution won't solve other related important problems:

  1. [UX] The user still will have to find the instruction in the documentation on why and how to install cuda.
  2. [UX] Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.
  3. [Performance, UX] Installing cuda on each startup will take significant time pushing the user to pre-build it for each configuration. This will inherit other problems related to build (a. the same build cannot be reused across configurations; b. Docker images are easier to reuse as they have unique repository and tag names; c. build must be invoked manually each time before the run)
  4. [Performance, UX] If the user will install cuda via conda in build, the build image will be of a minimum of 6.7GB even if no other libraries are not installed. Combined with the performance problems related to build, it can be easier to install cuda on each run.

Solution 2: Use of the "devel" version of the CUDA image

The "devel" version of the CUDA image has all CUDA drivers including nvcc.

This solution won't solve other related important problems:

  1. [UX] The user will have to somehow configure via YAML if the configuration requires nvcc (or other CUDA drivers).
  2. [Implementation] We'll have to maintain more base Docker images.
  3. [UX]: Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.

Solution 3: Allow to easily use existing Docker images

This solution won't solve other related important problems:

  1. [Implementation] Installing openssh-server into an existing Docker image may be challenging.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Released

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions