Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI-2.0.2 encounter "CUDA is not available" #1861

Open
2 of 4 tasks
Cucunnber opened this issue May 6, 2024 · 0 comments
Open
2 of 4 tasks

TGI-2.0.2 encounter "CUDA is not available" #1861

Cucunnber opened this issue May 6, 2024 · 0 comments

Comments

@Cucunnber
Copy link

Cucunnber commented May 6, 2024

System Info

torch install path ............... ['/home/chatgpt/.local/lib/python3.10/site-packages/torch']
torch version .................... 2.1.2+cu121
deepspeed install path ........... ['/home/chatgpt/.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.12.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 1007.76 GB

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:21:00.0 Off |                    0 |
| N/A   30C    P0              66W / 400W |  77901MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          Off | 00000000:27:00.0 Off |                    0 |
| N/A   32C    P0              65W / 400W |  77901MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          Off | 00000000:51:00.0 Off |                    0 |
| N/A   32C    P0              63W / 400W |  74443MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          Off | 00000000:56:00.0 Off |                    0 |
| N/A   30C    P0              64W / 400W |  74443MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          Off | 00000000:8E:00.0 Off |                    0 |
| N/A   29C    P0              58W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          Off | 00000000:93:00.0 Off |                    0 |
| N/A   31C    P0              56W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          Off | 00000000:CA:00.0 Off |                    0 |
| N/A   32C    P0              60W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          Off | 00000000:D0:00.0 Off |                    0 |
| N/A   30C    P0              56W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

tgi version

ghcr.io/huggingface/text-generation-inference:sha-bb2b295-rocm

docker command

external_port=${1:-9198}
num_shard=2
# convert to safetensors
model_path=/var/mntpkg/llama3-70b-instruct/Meta-Llama-3-70B-Instruct
docker run -d \
--gpus '"device=4,5"' \
--shm-size 1g \
--name llama3-chat \
-p ${external_port}:80 -v $model_path:/data/CmwCoder \
-e WEIGHTS_CACHE_OVERRIDE="/data/CmwCoder" \
text-generation-inference:2.0.2 \
--weights-cache-override="/data/CmwCoder" \
--model-id "/data/CmwCoder" --num-shard $num_shard \
--max-input-length 6000 \
--max-total-tokens 8000 \
--max-batch-prefill-tokens 8000

echo "gRPC 127.0.0.1:${external_port} is running..."

Error Message

2024-05-06T08:14:37.322566Z ERROR text_generation_launcher: Shard 1 failed to start
2024-05-06T08:14:37.322594Z  INFO text_generation_launcher: Shutting down shards
2024-05-06T08:14:37.324258Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 71, in serve
    from text_generation_server import server

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 17, in <module>
    from text_generation_server.models.vlm_causal_lm import VlmCausalLMBatch

Error: ShardCannotStart
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 14, in <module>
    from text_generation_server.models.flash_mistral import (

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 18, in <module>
    from text_generation_server.models.custom_modeling.flash_mistral_modeling import (

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py", line 29, in <module>
    from text_generation_server.utils import paged_attention, flash_attn

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/flash_attn.py", line 24, in <module>
    raise ImportError("CUDA is not available")

ImportError: CUDA is not available
 rank=0

Expected behavior

When deploying with the tgi-2.0.1 image, there are no issues, but this version does not support llama3-instruct very well. I'm not sure why the "CUDA is not available" issue occurs when deploying tgi-2.0.2. Is it because the local CUDA version (12.1) is too low?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant