I compiled tenosrrt-llm on a host machine, and executed the command make -C docker release_build without any errors, successfully generating the image tensorrt-llm. However, when I copied the image to another host machine and attempted to start the image, I received the following error:
sudo docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all --volume /app/xueht/llm/TensorRT-LLM-release-0.5.0:/code/tensorrt_llm --workdir /code/tensorrt_llm --hostname sftech-27-release --name tensorrt_llm-release-appdeploy --tmpfs /tmp:exec tensorrt_llm/release:latest
=============
== PyTorch ==
NVIDIA Release 23.08 (build 66128610)
PyTorch Version 2.1.0a0+29c30b1
Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ System has unsupported display driver / cuda driver combination (error 803) ]]
The driver information is as follows:
nvidia-smi
Mon Oct 30 02:09:16 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
I compiled tenosrrt-llm on a host machine, and executed the command make -C docker release_build without any errors, successfully generating the image tensorrt-llm. However, when I copied the image to another host machine and attempted to start the image, I received the following error:
sudo docker run --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all --volume /app/xueht/llm/TensorRT-LLM-release-0.5.0:/code/tensorrt_llm --workdir /code/tensorrt_llm --hostname sftech-27-release --name tensorrt_llm-release-appdeploy --tmpfs /tmp:exec tensorrt_llm/release:latest
=============
== PyTorch ==
NVIDIA Release 23.08 (build 66128610)
PyTorch Version 2.1.0a0+29c30b1
Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2023 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015 Google Inc.
Copyright (c) 2015 Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ System has unsupported display driver / cuda driver combination (error 803) ]]
The driver information is as follows:
nvidia-smi
Mon Oct 30 02:09:16 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|