Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem running Triton Server with TensorRT-LLM backend and Llama 2 in Kubernetes #674

Closed
onlygo opened this issue Dec 16, 2023 · 2 comments

Comments

@onlygo
Copy link

onlygo commented Dec 16, 2023

We are trying to deploy a Llama 2 model with Triton server TensorRT-LLM backend, using the nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 container image.

In Docker everything works fine. However when we tried the same container image in Kubernetes, we are getting bus error:

[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I1215 18:15:20.401963 1034 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
I1215 18:15:20.402378 1034 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
[tensor-rt-llm:1034 :0:1040] Caught signal 7 (Bus error: nonexistent physical address)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4b95 vs 0x436758)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4b95 vs 0x436758)

We tried the workaround mentioned in NVIDIA/nccl-tests#143, by increasing /dev/shm to 1GB, but it didn't help.

Here is our deployment yaml

apiVersion: v1
kind: Pod
metadata:
  name: tensor-rt-llm
spec:
  hostIPC: true
  nodeSelector:
    kubernetes.io/hostname: c300-11 
  volumes:
    - name: model-store
      persistentVolumeClaim:
        claimName: pv-claim
    - name: dshm
      emptyDir:
          medium: Memory
          sizeLimit: 1Gi
  
  containers:
    - name: tensor-rt-llm-dev
      image: nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3
      #securityContext:
      #  privileged: true #doesn't help
      command: [ "sleep" ]
      args: [ "infinity" ]
      volumeMounts:
        - mountPath: "/mnt/pvc"
          name: model-store
        - mountPath: /dev/shm
          name: dshm
      resources:
        limits:
          memory: "64Gi"
          cpu: "8"
          nvidia.com/gpu: "4"

Full error log
trt_llm_bus_error_k8s.txt

Good log when running with docker
trt_llm_good_docker.txt

@onlygo
Copy link
Author

onlygo commented Dec 17, 2023

We figured out the issue to be due to huge pages not enabled in our k8s cluster. Closing the issue.

@onlygo onlygo closed this as completed Dec 17, 2023
@Wenhan-Tan
Copy link

Hi @onlygo Could you please explain more on how you fixed the issue? I'm having the same problem in Kubernetes as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants