You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to deploy a Llama 2 model with Triton server TensorRT-LLM backend, using the nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 container image.
In Docker everything works fine. However when we tried the same container image in Kubernetes, we are getting bus error:
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I1215 18:15:20.401963 1034 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I1215 18:15:20.402378 1034 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
[tensor-rt-llm:1034 :0:1040] Caught signal 7 (Bus error: nonexistent physical address)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4b95 vs 0x436758)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4b95 vs 0x436758)
We tried the workaround mentioned in NVIDIA/nccl-tests#143, by increasing /dev/shm to 1GB, but it didn't help.
We are trying to deploy a Llama 2 model with Triton server TensorRT-LLM backend, using the nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 container image.
In Docker everything works fine. However when we tried the same container image in Kubernetes, we are getting bus error:
We tried the workaround mentioned in NVIDIA/nccl-tests#143, by increasing /dev/shm to 1GB, but it didn't help.
Here is our deployment yaml
Full error log
trt_llm_bus_error_k8s.txt
Good log when running with docker
trt_llm_good_docker.txt
The text was updated successfully, but these errors were encountered: