TGI-2.0.2 encounter "CUDA is not available" #1861

Cucunnber · 2024-05-06T08:51:35Z

System Info

torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch']
torch version .................... 2.1.2+cu121
deepspeed install path ........... ['/home/chatgpt/.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.12.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 1007.76 GB

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:21:00.0 Off |                    0 |
| N/A   30C    P0              66W / 400W |  77901MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          Off | 00000000:27:00.0 Off |                    0 |
| N/A   32C    P0              65W / 400W |  77901MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          Off | 00000000:51:00.0 Off |                    0 |
| N/A   32C    P0              63W / 400W |  74443MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          Off | 00000000:56:00.0 Off |                    0 |
| N/A   30C    P0              64W / 400W |  74443MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          Off | 00000000:8E:00.0 Off |                    0 |
| N/A   29C    P0              58W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          Off | 00000000:93:00.0 Off |                    0 |
| N/A   31C    P0              56W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          Off | 00000000:CA:00.0 Off |                    0 |
| N/A   32C    P0              60W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          Off | 00000000:D0:00.0 Off |                    0 |
| N/A   30C    P0              56W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

tgi version

ghcr.io/huggingface/text-generation-inference:sha-bb2b295-rocm

docker command

external_port=${1:-9198}
num_shard=2
# convert to safetensors
model_path=/var/mntpkg/llama3-70b-instruct/Meta-Llama-3-70B-Instruct
docker run -d \
--gpus '"device=4,5"' \
--shm-size 1g \
--name llama3-chat \
-p ${external_port}:80 -v $model_path:/data/CmwCoder \
-e WEIGHTS_CACHE_OVERRIDE="/data/CmwCoder" \
text-generation-inference:2.0.2 \
--weights-cache-override="/data/CmwCoder" \
--model-id "/data/CmwCoder" --num-shard $num_shard \
--max-input-length 6000 \
--max-total-tokens 8000 \
--max-batch-prefill-tokens 8000

echo "gRPC 127.0.0.1:${external_port} is running..."

Error Message

2024-05-06T08:14:37.322566Z ERROR text_generation_launcher: Shard 1 failed to start
2024-05-06T08:14:37.322594Z  INFO text_generation_launcher: Shutting down shards
2024-05-06T08:14:37.324258Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 71, in serve
    from text_generation_server import server

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 17, in <module>
    from text_generation_server.models.vlm_causal_lm import VlmCausalLMBatch

Error: ShardCannotStart
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 14, in <module>
    from text_generation_server.models.flash_mistral import (

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 18, in <module>
    from text_generation_server.models.custom_modeling.flash_mistral_modeling import (

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 29, in <module>
    from text_generation_server.utils import paged_attention, flash_attn

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/flash_attn.py", line 24, in <module>
    raise ImportError("CUDA is not available")

ImportError: CUDA is not available
 rank=0

Expected behavior

When deploying with the tgi-2.0.1 image, there are no issues, but this version does not support llama3-instruct very well. I'm not sure why the "CUDA is not available" issue occurs when deploying tgi-2.0.2. Is it because the local CUDA version (12.1) is too low?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TGI-2.0.2 encounter "CUDA is not available" #1861

TGI-2.0.2 encounter "CUDA is not available" #1861

Cucunnber commented May 6, 2024 •

edited

TGI-2.0.2 encounter "CUDA is not available" #1861

TGI-2.0.2 encounter "CUDA is not available" #1861

Comments

Cucunnber commented May 6, 2024 • edited

System Info

Information

Tasks

Reproduction

tgi version

docker command

Error Message

Expected behavior

Cucunnber commented May 6, 2024 •

edited