No such file or directory: '/usr/local/cuda/bin/nvcc'

**Steps to reproduce:**

Run the following command from a dev environment or a task:

```shell
pip install vllm
```

**Actual behavior:**

```
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
```

**Notes:**

This happens because `dstack`'s base CUDA image does not include all CUDA drivers. For example, it does not include `nvcc`.

Below are some of the possible solutions:

**Solution 1: Allow to easily install `cuda` via `conda`**

If we pre-configure `nvidia/label/cuda-11.4.3` channel, the user will be able to install `nvcc` via `conda install cuda`.

This solution won't solve other related important  problems:

1. [UX] The user still will have to find the instruction in the documentation on why and how to install `cuda`.
2. [UX] Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.
3. [Performance, UX] Installing `cuda` on each startup will take significant time pushing the user to pre-build it for each configuration. This will inherit other problems related to `build` (a. the same build cannot be reused across configurations; b. Docker images are easier to reuse as they have unique repository and tag names; c. build must be invoked manually each time before the run) 
4. [Performance, UX] If the user will install `cuda` via `conda` in `build`, the build image will be of a minimum of 6.7GB even if no other libraries are not installed. Combined with the performance problems related to `build`, it can be easier to install `cuda` on each run.

**Solution 2: Use of the "devel" version of the CUDA image**

The "devel" version of the CUDA image has all CUDA drivers including `nvcc`.

This solution won't solve other related important problems:

1. [UX] The user will have to somehow configure via YAML if the configuration requires `nvcc` (or other CUDA drivers).
2. [Implementation] We'll have to maintain more base Docker images.
3. [UX]: Intuitively, the user may want to use an existing Docker image that already has all the dependencies pre-installed.

**Solution 3: Allow to easily use existing Docker images**

This solution won't solve other related important problems:

1. [Implementation] Installing `openssh-server` into an existing Docker image may be challenging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No such file or directory: '/usr/local/cuda/bin/nvcc' #539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No such file or directory: '/usr/local/cuda/bin/nvcc' #539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions