Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

镜像编译失败: #6

Closed
zhanghui-china opened this issue Oct 24, 2023 · 10 comments
Closed

镜像编译失败: #6

zhanghui-china opened this issue Oct 24, 2023 · 10 comments

Comments

@zhanghui-china
Copy link

e344f9e3599978bb34cc3a18d29ccde
5862ec565c94e33052d1b50117a413f

怎么会有个.a文件格式错呢?

@zhanghui-china
Copy link
Author

zhanghui-china commented Oct 24, 2023

`root@zhanghui-OMEN-by-HP-Laptop-17-ck0xxx:/home/zhanghui/Qwen-7B-Chat-TensorRT-LLM/TensorRT-LLM/docker# make release_build
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull
--progress auto
--build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch
--build-arg BASE_TAG=23.08-py3
--build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt"
--build-arg TORCH_INSTALL_TYPE="skip"
--target release
--file Dockerfile.multi
--tag tensorrt_llm/release:latest
..
[+] Building 4192.7s (25/30)
=> [internal] load build definition from Dockerfile.multi 0.0s
=> => transferring dockerfile: 1.78kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 233B 0.0s
=> [internal] load metadata for nvcr.io/nvidia/pytorch:23.08-py3 3.3s
=> [internal] load build context 0.1s
=> => transferring context: 647.00kB 0.1s
=> [base 1/1] FROM nvcr.io/nvidia/pytorch:23.08-py3@sha256:12a39f22d6e3a3cfcb285a238b6219475181672ff41a557a75bdeeef6d630740 0.0s
=> CACHED [devel 1/10] COPY docker/common/install_base.sh install_base.sh 0.0s
=> CACHED [devel 2/10] RUN bash ./install_base.sh && rm install_base.sh 0.0s
=> CACHED [devel 3/10] COPY docker/common/install_cmake.sh install_cmake.sh 0.0s
=> CACHED [devel 4/10] RUN bash ./install_cmake.sh && rm install_cmake.sh 0.0s
=> CACHED [devel 5/10] COPY docker/common/install_tensorrt.sh install_tensorrt.sh 0.0s
=> CACHED [devel 6/10] RUN bash ./install_tensorrt.sh && rm install_tensorrt.sh 0.0s
=> CACHED [devel 7/10] COPY docker/common/install_polygraphy.sh install_polygraphy.sh 0.0s
=> CACHED [devel 8/10] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh 0.0s
=> CACHED [devel 9/10] COPY docker/common/install_pytorch.sh install_pytorch.sh 0.0s
=> CACHED [devel 10/10] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh 0.0s
=> CACHED [release 1/6] WORKDIR /app/tensorrt_llm 0.0s
=> CACHED [wheel 1/9] WORKDIR /src/tensorrt_llm 0.0s
=> CACHED [wheel 2/9] COPY benchmarks benchmarks 0.0s
=> CACHED [wheel 3/9] COPY cpp cpp 0.0s
=> CACHED [wheel 4/9] COPY benchmarks benchmarks 0.0s
=> CACHED [wheel 5/9] COPY scripts scripts 0.0s
=> CACHED [wheel 6/9] COPY tensorrt_llm tensorrt_llm 0.0s
=> CACHED [wheel 7/9] COPY 3rdparty 3rdparty 0.0s
=> CACHED [wheel 8/9] COPY setup.py requirements.txt ./ 0.0s
=> ERROR [wheel 9/9] RUN python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt 4189.2s

[wheel 9/9] RUN python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt:
#0 0.486 Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://pypi.ngc.nvidia.com
#0 4.536 Collecting build (from -r requirements.txt (line 1))
#0 4.536 Obtaining dependency information for build from https://files.pythonhosted.org/packages/93/dd/b464b728b866aaa62785a609e0dd8c72201d62c5f7c53e7c20f4dceb085f/build-1.0.3-py3-none-any.whl.metadata
#0 6.029 Downloading build-1.0.3-py3-none-any.whl.metadata (4.2 kB)
#0 6.036 Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (2.1.0a0+29c30b1)
#0 7.706 Collecting transformers==4.31.0 (from -r requirements.txt (line 3))
#0 7.706 Obtaining dependency information for transformers==4.31.0 from https://files.pythonhosted.org/packages/21/02/ae8e595f45b6c8edee07913892b3b41f5f5f273962ad98851dc6a564bbb9/transformers-4.31.0-py3-none-any.whl.metadata
#0 8.032 Downloading transformers-4.31.0-py3-none-any.whl.metadata (116 kB)
#0 8.535 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.9/116.9 kB 245.3 kB/s eta 0:00:00
#0 10.05 Collecting diffusers==0.15.0 (from -r requirements.txt (line 4))
#0 10.37 Downloading diffusers-0.15.0-py3-none-any.whl (851 kB)
#0 11.20 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 851.8/851.8 kB 1.0 MB/s eta 0:00:00
#0 12.69 Collecting accelerate==0.20.3 (from -r requirements.txt (line 5))
#0 12.69 Obtaining dependency information for accelerate==0.20.3 from https://files.pythonhosted.org/packages/10/d3/5382aa337d3e67214003a17b06bfc07cf0334356b4e8aaf3b12b0d38c83f/accelerate-0.20.3-py3-none-any.whl.metadata
#0 13.02 Downloading accelerate-0.20.3-py3-none-any.whl.metadata (17 kB)
#0 14.62 Collecting colored (from -r requirements.txt (line 6))
#0 14.62 Obtaining dependency information for colored from https://files.pythonhosted.org/packages/6f/0d/a10351ef1a98e0b03d66887ec2d87c261f9a0fbff8f2bdb75614cc0a2850/colored-2.2.3-py3-none-any.whl.metadata
#0 14.94 Downloading colored-2.2.3-py3-none-any.whl.metadata (3.6 kB)
#0 14.94 Requirement already satisfied: polygraphy in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (0.48.1)
#0 14.95 Requirement already satisfied: onnx>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (1.14.0)
#0 14.95 Requirement already satisfied: mpi4py in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (3.1.5)
#0 14.95 Requirement already satisfied: tensorrt>=8.6.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (9.1.0.post12.dev4)
#0 14.95 Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.22.2)
#0 16.56 Collecting cuda-python==12.2.0 (from -r requirements.txt (line 12))
#0 16.56 Obtaining dependency information for cuda-python==12.2.0 from https://files.pythonhosted.org/packages/2b/5c/ad3b6bee78f95134834dc2ce8805fc12ac70d7a9ba065e2d7c05263549dc/cuda_python-12.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
#0 16.88 Downloading cuda_python-12.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (749 bytes)
#0 18.76 Collecting sentencepiece>=0.1.99 (from -r requirements.txt (line 13))
#0 19.09 Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
#0 19.64 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 2.4 MB/s eta 0:00:00
#0 19.64 Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (0.41.1)
#0 21.12 Collecting lark (from -r requirements.txt (line 15))
#0 21.12 Obtaining dependency information for lark from https://files.pythonhosted.org/packages/99/ca/f3532a61dce7dd52fbd38737a12e16cdc7699697e23287eb7addfdd93e3f/lark-1.1.8-py3-none-any.whl.metadata
#0 21.44 Downloading lark-1.1.8-py3-none-any.whl.metadata (1.9 kB)
#0 21.75 Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (3.12.2)
#0 23.35 Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0->-r requirements.txt (line 3))
#0 23.35 Obtaining dependency information for huggingface-hub<1.0,>=0.14.1 from https://files.pythonhosted.org/packages/ef/b5/b6107bd65fa4c96fdf00e4733e2fe5729bb9e5e09997f63074bb43d3ab28/huggingface_hub-0.18.0-py3-none-any.whl.metadata
#0 23.67 Downloading huggingface_hub-0.18.0-py3-none-any.whl.metadata (13 kB)
#0 23.68 Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (23.1)
#0 23.68 Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (6.0.1)
#0 23.68 Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (2023.6.3)
#0 23.68 Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (2.31.0)
#0 26.45 Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0->-r requirements.txt (line 3))
#0 26.78 Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
#0 29.52 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 2.8 MB/s eta 0:00:00
#0 31.66 Collecting safetensors>=0.3.1 (from transformers==4.31.0->-r requirements.txt (line 3))
#0 31.66 Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/20/4e/878b080dbda92666233ec6f316a53969edcb58eab1aa399a64d0521cf953/safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
#0 31.99 Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
#0 31.99 Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0->-r requirements.txt (line 3)) (4.65.0)
#0 32.05 Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from diffusers==0.15.0->-r requirements.txt (line 4)) (9.2.0)
#0 32.05 Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from diffusers==0.15.0->-r requirements.txt (line 4)) (6.8.0)
#0 32.08 Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.20.3->-r requirements.txt (line 5)) (5.9.4)
#0 32.08 Requirement already satisfied: cython in /usr/local/lib/python3.10/dist-packages (from cuda-python==12.2.0->-r requirements.txt (line 12)) (3.0.0)
#0 33.23 Collecting pyproject_hooks (from build->-r requirements.txt (line 1))
#0 33.56 Downloading pyproject_hooks-1.0.0-py3-none-any.whl (9.3 kB)
#0 33.57 Requirement already satisfied: tomli>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from build->-r requirements.txt (line 1)) (2.0.1)
#0 33.58 Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->-r requirements.txt (line 2)) (4.7.1)
#0 33.58 Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->-r requirements.txt (line 2)) (1.12)
#0 33.58 Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->-r requirements.txt (line 2)) (2.6.3)
#0 33.58 Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->-r requirements.txt (line 2)) (3.1.2)
#0 33.58 Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->-r requirements.txt (line 2)) (2023.6.0)
#0 33.59 Requirement already satisfied: protobuf>=3.20.2 in /usr/local/lib/python3.10/dist-packages (from onnx>=1.12.0->-r requirements.txt (line 8)) (4.21.12)
#0 33.76 Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->diffusers==0.15.0->-r requirements.txt (line 4)) (3.16.2)
#0 33.76 Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->-r requirements.txt (line 2)) (2.1.3)
#0 33.80 Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0->-r requirements.txt (line 3)) (3.2.0)
#0 33.80 Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0->-r requirements.txt (line 3)) (3.4)
#0 33.80 Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0->-r requirements.txt (line 3)) (1.26.16)
#0 33.80 Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0->-r requirements.txt (line 3)) (2023.7.22)
#0 33.80 Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->-r requirements.txt (line 2)) (1.3.0)
#0 34.17 Downloading transformers-4.31.0-py3-none-any.whl (7.4 MB)
#0 36.62 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.4/7.4 MB 3.0 MB/s eta 0:00:00
#0 36.95 Downloading accelerate-0.20.3-py3-none-any.whl (227 kB)
#0 37.03 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.6/227.6 kB 2.8 MB/s eta 0:00:00
#0 37.36 Downloading cuda_python-12.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.1 MB)
#0 44.00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.1/20.1 MB 3.1 MB/s eta 0:00:00
#0 44.33 Downloading build-1.0.3-py3-none-any.whl (18 kB)
#0 44.66 Downloading colored-2.2.3-py3-none-any.whl (16 kB)
#0 44.99 Downloading lark-1.1.8-py3-none-any.whl (111 kB)
#0 45.02 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 111.6/111.6 kB 3.6 MB/s eta 0:00:00
#0 45.35 Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
#0 45.46 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.0/302.0 kB 2.7 MB/s eta 0:00:00
#0 45.79 Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
#0 46.24 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 3.0 MB/s eta 0:00:00
#0 47.65 Installing collected packages: tokenizers, sentencepiece, safetensors, pyproject_hooks, lark, cuda-python, colored, huggingface-hub, build, transformers, diffusers, accelerate
#0 47.84 Attempting uninstall: cuda-python
#0 47.84 Found existing installation: cuda-python 12.1.0rc5+1.g994d8d0
#0 47.84 Uninstalling cuda-python-12.1.0rc5+1.g994d8d0:
#0 47.88 Successfully uninstalled cuda-python-12.1.0rc5+1.g994d8d0
#0 50.91 Successfully installed accelerate-0.20.3 build-1.0.3 colored-2.2.3 cuda-python-12.2.0 diffusers-0.15.0 huggingface-hub-0.18.0 lark-1.1.8 pyproject_hooks-1.0.0 safetensors-0.4.0 sentencepiece-0.1.99 tokenizers-0.13.3 transformers-4.31.0
#0 50.91 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#0 54.23
#0 54.23 [notice] A new release of pip is available: 23.2.1 -> 23.3.1
#0 54.23 [notice] To update, run: python3 -m pip install --upgrade pip
#0 54.69 -- The CXX compiler identification is GNU 11.4.0
#0 54.69 -- Detecting CXX compiler ABI info
#0 54.76 -- Detecting CXX compiler ABI info - done
#0 54.76 -- Check for working CXX compiler: /usr/bin/c++ - skipped
#0 54.76 -- Detecting CXX compile features
#0 54.76 -- Detecting CXX compile features - done
#0 54.76 -- NVTX is disabled
#0 54.76 -- Importing batch manager
#0 54.76 -- Building PyTorch
#0 54.76 -- Building Google tests
#0 54.76 -- Building benchmarks
#0 54.76 -- Looking for a CUDA compiler
#0 55.96 -- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
#0 55.96 -- CUDA compiler: /usr/local/cuda/bin/nvcc
#0 55.97 -- GPU architectures: 70-real;80-real;86-real;89-real;90-real
#0 56.55 -- The CUDA compiler identification is NVIDIA 12.2.128
#0 56.55 -- Detecting CUDA compiler ABI info
#0 57.89 -- Detecting CUDA compiler ABI info - done
#0 57.93 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
#0 57.93 -- Detecting CUDA compile features
#0 57.93 -- Detecting CUDA compile features - done
#0 57.93 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.2.128")
#0 57.94 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
#0 57.99 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
#0 57.99 -- Found Threads: TRUE
#0 58.02 -- ========================= Importing and creating target nvinfer ==========================
#0 58.02 -- Looking for library nvinfer
#0 58.02 -- Library that was found /usr/local/tensorrt/targets/x86_64-linux-gnu/lib/libnvinfer.so
#0 58.02 -- ==========================================================================================
#0 58.02 -- ========================= Importing and creating target nvuffparser ==========================
#0 58.02 -- Looking for library nvparsers
#0 58.02 -- Library that was found nvparsers_LIB_PATH-NOTFOUND
#0 58.02 -- ==========================================================================================
#0 58.02 -- CUDAToolkit_VERSION 12.2 is greater or equal than 11.0, enable -DENABLE_BF16 flag
#0 58.02 -- CUDAToolkit_VERSION 12.2 is greater or equal than 11.8, enable -DENABLE_FP8 flag
#0 58.21 -- Found MPI_CXX: /opt/hpcx/ompi/lib/libmpi.so (found version "3.1")
#0 58.21 -- Found MPI: TRUE (found version "3.1")
#0 58.21 -- COMMON_HEADER_DIRS: /src/tensorrt_llm/cpp;/usr/local/cuda/include
#0 58.21 -- TORCH_CUDA_ARCH_LIST: 7.0;8.0;8.6;8.9;9.0
#0 58.21 CMake Warning at CMakeLists.txt:248 (message):
#0 58.21 Ignoring environment variable TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.5 8.0
#0 58.21 8.6 9.0+PTX
#0 58.21
#0 58.21
#0 58.37 -- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter Development Development.Module Development.Embed
#0 58.37 -- Found Python executable at /usr/bin/python3.10
#0 58.37 -- Found Python libraries at /usr/lib/x86_64-linux-gnu
#0 60.51 -- Found CUDA: /usr/local/cuda (found version "12.2")
#0 60.51 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.2.128")
#0 60.54 -- Caffe2: CUDA detected: 12.2
#0 60.54 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
#0 60.54 -- Caffe2: CUDA toolkit directory: /usr/local/cuda
#0 60.59 -- Caffe2: Header version is: 12.2
#0 60.68 -- /usr/local/cuda/lib64/libnvrtc.so shorthash is eaa826f0
#0 60.68 -- USE_CUDNN is set to 0. Compiling without cuDNN support
#0 60.68 -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90
#0 60.69 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
#0 60.69 static library kineto_LIBRARY-NOTFOUND not found.
#0 60.69 Call Stack (most recent call first):
#0 60.69 /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
#0 60.69 CMakeLists.txt:281 (find_package)
#0 60.69
#0 60.69
`

@zhanghui-china
Copy link
Author

#0 60.69 -- Found Torch: /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so
#0 60.69 -- TORCH_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=1
#0 60.69 -- Building for TensorRT version: 9.1.0, library version: 9
#0 60.69 -- Using MPI_CXX_INCLUDE_DIRS: /opt/hpcx/ompi/include;/opt/hpcx/ompi/include/openmpi;/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include;/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent;/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include
#0 60.69 -- Using MPI_CXX_LIBRARIES: /opt/hpcx/ompi/lib/libmpi.so
#0 60.70 -- CMAKE_SYSTEM_PROCESSOR: x86_64
#0 61.77 -- USE_CXX11_ABI: True
#0 71.40 -- The C compiler identification is GNU 11.4.0
#0 71.40 -- Detecting C compiler ABI info
#0 71.46 -- Detecting C compiler ABI info - done
#0 71.47 -- Check for working C compiler: /usr/bin/cc - skipped
#0 71.47 -- Detecting C compile features
#0 71.47 -- Detecting C compile features - done
#0 71.60 -- Found Python: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
#0 71.60 -- ========================= Importing and creating target nvonnxparser ==========================
#0 71.60 -- Looking for library nvonnxparser
#0 71.60 -- Library that was found /usr/local/tensorrt/targets/x86_64-linux-gnu/lib/libnvonnxparser.so
#0 71.60 -- ==========================================================================================
#0 71.61 -- Configuring done
#0 71.73 -- Generating done

@zhanghui-china
Copy link
Author

#0 71.74 -- Build files have been written to: /src/tensorrt_llm/cpp/build #0 71.77 [ 0%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/cublasMMWrapper.cpp.o #0 71.77 [ 0%] Building CXX object tensorrt_llm/layers/CMakeFiles/layers_src.dir/baseSamplingLayer.cpp.o #0 71.77 [ 0%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/cudaAllocator.cpp.o #0 71.77 [ 0%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/cudaDriverWrapper.cpp.o #0 71.77 [ 0%] Building CUDA object tensorrt_llm/layers/CMakeFiles/layers_src.dir/baseBeamSearchLayer.cu.o #0 71.77 [ 0%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/bufferManager.cpp.o #0 71.77 [ 0%] Building CXX object tensorrt_llm/layers/CMakeFiles/layers_src.dir/dynamicDecodeLayer.cpp.o #0 71.77 [ 1%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/utils/sessionUtils.cpp.o #0 71.77 [ 1%] Building CUDA object tensorrt_llm/layers/CMakeFiles/layers_src.dir/onlineBeamSearchLayer.cu.o #0 71.77 [ 2%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/logger.cpp.o #0 71.77 [ 2%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/mpiUtils.cpp.o #0 71.77 [ 3%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/decodingOutput.cpp.o #0 71.77 [ 3%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/stringUtils.cpp.o #0 71.77 [ 3%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/gptDecoder.cpp.o #0 71.77 [ 3%] Building CUDA object tensorrt_llm/layers/CMakeFiles/layers_src.dir/topKSamplingLayer.cu.o #0 71.79 [ 3%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_128_32_ldgsts_sm90.cubin.cpp.o #0 72.15 [ 3%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_128_64_ldgsts_sm90.cubin.cpp.o #0 72.34 [ 3%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_256_32_ldgsts_sm90.cubin.cpp.o #0 72.48 [ 3%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/tensor.cpp.o #0 72.49 [ 5%] Building CXX object tensorrt_llm/common/CMakeFiles/common_src.dir/tllmException.cpp.o #0 72.49 [ 6%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_256_64_ldgsts_sm90.cubin.cpp.o #0 72.90 [ 6%] Building CUDA object tensorrt_llm/common/CMakeFiles/common_src.dir/cudaFp8Utils.cu.o #0 72.90 [ 6%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_384_32_ldgsts_sm90.cubin.cpp.o #0 73.09 [ 6%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_384_64_ldgsts_sm90.cubin.cpp.o #0 73.24 [ 6%] Building CUDA object tensorrt_llm/common/CMakeFiles/common_src.dir/memoryUtils.cu.o #0 73.24 [ 6%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_512_32_ldgsts_sm90.cubin.cpp.o #0 73.37 [ 7%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_512_64_ldgsts_sm90.cubin.cpp.o #0 73.40 [ 7%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/gptDecoderBatch.cpp.o #0 73.42 [ 7%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/gptJsonConfig.cpp.o #0 73.60 [ 7%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_64_32_ldgsts_sm90.cubin.cpp.o #0 73.81 [ 7%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_bf16_64_64_ldgsts_sm90.cubin.cpp.o #0 73.83 [ 7%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_16_sm80.cubin.cpp.o #0 73.99 [ 8%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_16_sm86.cubin.cpp.o #0 74.04 [ 8%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_16_sm89.cubin.cpp.o #0 74.12 [ 8%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_32_sm80.cubin.cpp.o #0 74.29 [ 8%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_32_sm86.cubin.cpp.o #0 74.34 [ 10%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_32_sm89.cubin.cpp.o #0 74.36 [ 11%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/gptSession.cpp.o #0 74.40 [ 11%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/iBuffer.cpp.o #0 74.45 [ 11%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_40_sm80.cubin.cpp.o #0 74.62 [ 11%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_40_sm86.cubin.cpp.o #0 74.66 [ 11%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_40_sm89.cubin.cpp.o #0 74.66 [ 11%] Building CUDA object tensorrt_llm/layers/CMakeFiles/layers_src.dir/topPSamplingLayer.cu.o #0 74.79 [ 12%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_64_sm80.cubin.cpp.o #0 74.98 [ 12%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_64_sm86.cubin.cpp.o #0 75.01 [ 12%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_128_128_S_64_sm89.cubin.cpp.o #0 75.14 [ 14%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_128_sm80.cubin.cpp.o #0 75.33 [ 14%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_128_sm86.cubin.cpp.o #0 75.34 [ 14%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_128_sm89.cubin.cpp.o #0 75.38 [ 14%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_128_tma_ws_sm90.cubin.cpp.o #0 75.42 [ 15%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_160_sm80.cubin.cpp.o #0 75.56 In file included from /src/tensorrt_llm/cpp/tensorrt_llm/runtime/gptSession.cpp:23: #0 75.56 /src/tensorrt_llm/cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h: In member function ‘bool tensorrt_llm::batch_manager::kv_cache_manager::BlockManager::schedulingHasFreeBlocks(std::size_t) const’: #0 75.56 /src/tensorrt_llm/cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:183:41: warning: comparison of integer expressions of different signedness: ‘const SizeType’ {aka ‘const int’} and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare] #0 75.56 183 | return mSchedulingNumFreeBlocks >= numRequired; #0 75.56 | ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ #0 75.60 [ 15%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_160_sm86.cubin.cpp.o #0 75.63 [ 15%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_160_sm89.cubin.cpp.o #0 75.63 [ 15%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_256_sm80.cubin.cpp.o #0 75.75 [ 16%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_256_sm86.cubin.cpp.o #0 75.82 [ 16%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/iTensor.cpp.o #0 75.94 [ 16%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_256_sm89.cubin.cpp.o #0 75.98 [ 16%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_80_sm80.cubin.cpp.o #0 76.02 [ 16%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_80_sm86.cubin.cpp.o #0 76.12 [ 17%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_128_S_80_sm89.cubin.cpp.o #0 76.25 [ 17%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_160_sm80.cubin.cpp.o #0 76.26 [ 17%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_160_sm86.cubin.cpp.o #0 76.26 [ 17%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/ipcUtils.cpp.o #0 76.27 [ 17%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_160_sm89.cubin.cpp.o #0 76.34 [ 19%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_256_sm80.cubin.cpp.o #0 76.37 [ 19%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_256_sm86.cubin.cpp.o #0 76.44 [ 19%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_16_S_256_sm89.cubin.cpp.o #0 76.44 [ 19%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_256_S_32_tma_ws_sm90.cubin.cpp.o #0 76.44 [ 20%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_256_S_64_tma_ws_sm90.cubin.cpp.o #0 76.54 [ 20%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_128_sm80.cubin.cpp.o #0 76.57 [ 20%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_128_sm86.cubin.cpp.o #0 76.64 [ 20%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_128_sm89.cubin.cpp.o #0 76.71 [ 21%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_40_sm80.cubin.cpp.o #0 76.74 [ 21%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_40_sm86.cubin.cpp.o #0 76.79 [ 21%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_40_sm89.cubin.cpp.o #0 76.83 [ 21%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_64_sm80.cubin.cpp.o #0 76.86 [ 23%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_64_sm86.cubin.cpp.o #0 76.91 [ 23%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_64_sm89.cubin.cpp.o #0 76.95 [ 23%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_80_sm80.cubin.cpp.o #0 76.98 [ 23%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_80_sm86.cubin.cpp.o #0 77.03 [ 24%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_32_S_80_sm89.cubin.cpp.o #0 77.09 [ 24%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_16_sm80.cubin.cpp.o #0 77.11 [ 24%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_16_sm86.cubin.cpp.o #0 77.18 [ 24%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_16_sm89.cubin.cpp.o #0 77.22 [ 25%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_256_tma_ws_sm90.cubin.cpp.o #0 77.22 [ 25%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_32_sm80.cubin.cpp.o #0 77.29 [ 26%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/memoryCounters.cpp.o #0 77.32 [ 26%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_32_sm86.cubin.cpp.o #0 77.36 [ 26%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_bf16_64_64_S_32_sm89.cubin.cpp.o #0 77.46 [ 28%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_16_sm80.cubin.cpp.o #0 77.50 [ 28%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_16_sm86.cubin.cpp.o #0 77.61 [ 28%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_16_sm89.cubin.cpp.o #0 77.67 [ 28%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_32_sm80.cubin.cpp.o #0 77.78 [ 28%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/ncclCommunicator.cpp.o #0 77.79 [ 29%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_32_sm86.cubin.cpp.o #0 77.81 [ 29%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_32_sm89.cubin.cpp.o #0 77.92 [ 29%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_40_sm80.cubin.cpp.o #0 77.96 [ 29%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_40_sm86.cubin.cpp.o #0 78.02 [ 30%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_40_sm89.cubin.cpp.o #0 78.14 [ 30%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_64_sm80.cubin.cpp.o #0 78.14 [ 30%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_64_sm86.cubin.cpp.o #0 78.28 [ 30%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/runtimeBuffers.cpp.o #0 78.28 [ 30%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_128_128_S_64_sm89.cubin.cpp.o #0 78.31 [ 32%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_128_sm80.cubin.cpp.o #0 78.38 [ 32%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_128_sm86.cubin.cpp.o #0 78.48 [ 32%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_128_sm89.cubin.cpp.o #0 78.52 [ 33%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_128_tma_ws_sm90.cubin.cpp.o #0 78.60 [ 33%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_160_sm80.cubin.cpp.o #0 78.64 [ 33%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_160_sm86.cubin.cpp.o #0 78.68 [ 33%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_160_sm89.cubin.cpp.o #0 78.75 [ 34%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_256_sm80.cubin.cpp.o #0 78.94 [ 34%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_256_sm86.cubin.cpp.o #0 78.98 [ 34%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_256_sm89.cubin.cpp.o #0 79.00 [ 34%] Building CUDA object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/runtimeKernels.cu.o #0 79.01 [ 34%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_80_sm80.cubin.cpp.o #0 79.13 [ 35%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_80_sm86.cubin.cpp.o #0 79.14 [ 37%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/statefulGptDecoder.cpp.o #0 79.28 [ 37%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_128_S_80_sm89.cubin.cpp.o #0 79.32 [ 37%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o #0 79.32 [ 37%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_160_sm80.cubin.cpp.o #0 79.33 [ 37%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_160_sm86.cubin.cpp.o #0 79.38 [ 38%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_160_sm89.cubin.cpp.o #0 79.39 [ 38%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_256_sm80.cubin.cpp.o #0 79.42 In file included from /src/tensorrt_llm/cpp/tensorrt_llm/runtime/runtimeBuffers.cpp:26: #0 79.42 /src/tensorrt_llm/cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h: In member function ‘bool tensorrt_llm::batch_manager::kv_cache_manager::BlockManager::schedulingHasFreeBlocks(std::size_t) const’: #0 79.42 /src/tensorrt_llm/cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h:183:41: warning: comparison of integer expressions of different signedness: ‘const SizeType’ {aka ‘const int’} and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare] #0 79.42 183 | return mSchedulingNumFreeBlocks >= numRequired; #0 79.42 | ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ #0 79.47 [ 38%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_256_sm86.cubin.cpp.o #0 79.48 [ 38%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_16_S_256_sm89.cubin.cpp.o #0 79.54 [ 39%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmLogger.cpp.o #0 79.54 [ 39%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_256_S_32_tma_ws_sm90.cubin.cpp.o #0 79.55 [ 39%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_256_S_64_tma_ws_sm90.cubin.cpp.o #0 79.64 [ 39%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_128_sm80.cubin.cpp.o #0 79.65 [ 39%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_128_sm86.cubin.cpp.o #0 79.79 [ 41%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_128_sm89.cubin.cpp.o #0 79.81 [ 41%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_40_sm80.cubin.cpp.o #0 79.93 [ 41%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_40_sm86.cubin.cpp.o #0 79.94 [ 41%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_40_sm89.cubin.cpp.o #0 79.99 [ 41%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/worldConfig.cpp.o #0 80.05 [ 42%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_64_sm80.cubin.cpp.o #0 80.06 [ 42%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_64_sm86.cubin.cpp.o #0 80.18 [ 42%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_64_sm89.cubin.cpp.o #0 80.18 [ 42%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_80_sm80.cubin.cpp.o #0 80.30 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_80_sm86.cubin.cpp.o #0 80.32 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_32_S_80_sm89.cubin.cpp.o #0 80.44 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_16_sm80.cubin.cpp.o #0 80.46 [ 43%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_16_sm86.cubin.cpp.o #0 80.54 [ 44%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_16_sm89.cubin.cpp.o #0 80.57 [ 44%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_256_tma_ws_sm90.cubin.cpp.o #0 80.59 [ 44%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_32_sm80.cubin.cpp.o #0 80.66 [ 44%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_32_sm86.cubin.cpp.o #0 80.74 [ 46%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_64_64_S_32_sm89.cubin.cpp.o #0 80.79 [ 46%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_16_sm80.cubin.cpp.o #0 80.88 [ 46%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_16_sm86.cubin.cpp.o #0 80.89 [ 46%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_16_sm89.cubin.cpp.o #0 80.90 [ 47%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_32_sm80.cubin.cpp.o #0 80.93 [ 47%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_32_sm86.cubin.cpp.o #0 81.10 [ 47%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_32_sm89.cubin.cpp.o #0 81.19 [ 47%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_40_sm80.cubin.cpp.o #0 81.20 [ 48%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_40_sm86.cubin.cpp.o #0 81.24 [ 48%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_40_sm89.cubin.cpp.o #0 81.25 [ 48%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_64_sm80.cubin.cpp.o #0 81.25 [ 48%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_64_sm86.cubin.cpp.o #0 81.44 [ 50%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_128_128_S_64_sm89.cubin.cpp.o #0 81.55 [ 50%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_128_sm80.cubin.cpp.o #0 81.56 [ 50%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_128_sm86.cubin.cpp.o #0 81.60 [ 50%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_128_sm89.cubin.cpp.o

@zhanghui-china
Copy link
Author

`#0 81.61 [ 51%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_128_tma_ws_sm90.cubin.cpp.o
#0 81.66 [ 51%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_160_sm80.cubin.cpp.o
#0 81.76 [ 51%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_160_sm86.cubin.cpp.o
#0 81.83 [ 52%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_160_sm89.cubin.cpp.o
#0 81.84 [ 52%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_256_sm80.cubin.cpp.o
#0 81.85 [ 52%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_256_sm86.cubin.cpp.o
#0 81.86 [ 52%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_256_sm89.cubin.cpp.o
#0 81.90 [ 53%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_80_sm80.cubin.cpp.o
#0 82.01 [ 53%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_80_sm86.cubin.cpp.o
#0 82.08 [ 53%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_128_S_80_sm89.cubin.cpp.o
#0 82.11 [ 53%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_160_sm80.cubin.cpp.o
#0 82.19 [ 55%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_160_sm86.cubin.cpp.o
#0 82.20 [ 55%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_160_sm89.cubin.cpp.o
#0 82.27 [ 55%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_256_sm80.cubin.cpp.o
#0 82.29 [ 55%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_256_sm86.cubin.cpp.o
#0 82.30 [ 56%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_16_S_256_sm89.cubin.cpp.o
#0 82.30 [ 56%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_32_tma_ws_sm90.cubin.cpp.o
#0 82.30 [ 56%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_256_S_64_tma_ws_sm90.cubin.cpp.o
#0 82.36 [ 56%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_sm80.cubin.cpp.o
#0 82.37 [ 57%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_sm86.cubin.cpp.o
#0 82.38 [ 57%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_128_sm89.cubin.cpp.o
#0 82.48 [ 57%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_40_sm80.cubin.cpp.o
#0 82.49 [ 57%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_40_sm86.cubin.cpp.o
#0 82.50 [ 58%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_40_sm89.cubin.cpp.o
#0 82.51 [ 58%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_64_sm80.cubin.cpp.o
#0 82.54 [ 58%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_64_sm89.cubin.cpp.o
#0 82.54 [ 58%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_64_sm86.cubin.cpp.o
#0 82.57 [ 60%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_80_sm80.cubin.cpp.o
#0 82.61 [ 60%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_80_sm86.cubin.cpp.o
#0 82.62 [ 60%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_32_S_80_sm89.cubin.cpp.o
#0 82.63 [ 60%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_16_sm80.cubin.cpp.o
#0 82.65 [ 61%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_16_sm86.cubin.cpp.o
#0 82.67 [ 61%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_16_sm89.cubin.cpp.o
#0 82.67 [ 61%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_256_tma_ws_sm90.cubin.cpp.o
#0 82.70 [ 61%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_32_sm80.cubin.cpp.o
#0 82.75 [ 62%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_32_sm86.cubin.cpp.o
#0 82.76 [ 62%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_flash_attention_fp16_fp32_64_64_S_32_sm89.cubin.cpp.o
#0 82.77 [ 62%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_32_ldgsts_sm90.cubin.cpp.o
#0 82.78 [ 62%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_128_64_ldgsts_sm90.cubin.cpp.o
#0 82.79 [ 64%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_32_ldgsts_sm90.cubin.cpp.o
#0 82.83 [ 64%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_256_64_ldgsts_sm90.cubin.cpp.o
#0 82.83 [ 64%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_32_ldgsts_sm90.cubin.cpp.o
#0 82.90 [ 64%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_384_64_ldgsts_sm90.cubin.cpp.o
#0 82.90 [ 65%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_32_ldgsts_sm90.cubin.cpp.o
#0 83.12 [ 65%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_512_64_ldgsts_sm90.cubin.cpp.o
#0 83.12 [ 65%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_32_ldgsts_sm90.cubin.cpp.o
#0 83.13 [ 65%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_64_64_ldgsts_sm90.cubin.cpp.o
#0 83.33 [ 66%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_32_ldgsts_sm90.cubin.cpp.o
#0 83.34 [ 66%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_128_64_ldgsts_sm90.cubin.cpp.o
#0 83.35 [ 66%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_32_ldgsts_sm90.cubin.cpp.o
#0 83.37 [ 66%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_256_64_ldgsts_sm90.cubin.cpp.o
#0 83.37 [ 67%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_32_ldgsts_sm90.cubin.cpp.o
#0 83.43 [ 67%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_384_64_ldgsts_sm90.cubin.cpp.o
#0 83.44 [ 67%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_32_ldgsts_sm90.cubin.cpp.o
#0 83.44 [ 67%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_512_64_ldgsts_sm90.cubin.cpp.o
#0 83.50 [ 69%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_32_ldgsts_sm90.cubin.cpp.o
#0 83.54 [ 69%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/cubin/fmha_v2_fp16_fp32_64_64_ldgsts_sm90.cubin.cpp.o
#0 83.64 [ 69%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/contextFusedMultiHeadAttention/fmhaRunner.cpp.o
#0 83.69 [ 69%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/cutlass_heuristic.cpp.o
#0 83.73 [ 70%] Building CXX object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/cutlass_preprocessors.cpp.o
#0 83.73 [ 70%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banBadWords.cu.o
#0 83.79 [ 70%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/banRepeatNgram.cu.o
#0 83.85 [ 71%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchPenaltyKernels.cu.o
#0 83.89 [ 71%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/beamSearchTopkKernels.cu.o
#0 83.95 [ 71%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/customAllReduceKernels.cu.o
#0 83.98 [ 71%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scalebias.cu.o
#0 84.05 [ 73%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_fg_scaleonly.cu.o
#0 84.07 [ 73%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int4_gemm_per_col.cu.o
#0 85.02 [ 73%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scalebias.cu.o
#0 86.41 [ 73%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_fg_scaleonly.cu.o
#0 86.57 [ 74%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/bf16_int8_gemm_per_col.cu.o
#0 88.02 [ 74%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scalebias.cu.o
#0 93.29 [ 74%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_fg_scaleonly.cu.o
#0 93.67 [ 74%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int4_gemm_per_col.cu.o
#0 97.45 [ 74%] Built target layers_src
#0 97.46 [ 75%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scalebias.cu.o
#0 100.4 [ 75%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_fg_scaleonly.cu.o
#0 102.2 [ 75%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/fpA_intB_gemm/fp16_int8_gemm_per_col.cu.o
#0 105.1 [ 75%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_bf16.cu.o
#0 105.6 [ 75%] Built target common_src
#0 105.6 [ 76%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_fp16.cu.o
#0 112.4 [ 76%] Built target runtime_src
#0 112.4 [ 76%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_fp32.cu.o
#0 167.9 [ 76%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_int32.cu.o
#0 168.1 [ 76%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention.cu.o
#0 172.8 [ 78%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention112_bf16.cu.o
#0 173.7 [ 78%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention112_float.cu.o
#0 175.5 [ 78%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention112_half.cu.o
#0 177.1 [ 78%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention128_bf16.cu.o
#0 177.3 [ 79%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention128_float.cu.o
#0 188.4 [ 79%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention128_half.cu.o
#0 190.2 [ 79%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention144_bf16.cu.o
#0 191.9 [ 79%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention144_float.cu.o
#0 197.0 [ 80%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention144_half.cu.o
#0 201.1 [ 80%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention160_bf16.cu.o
#0 212.2 [ 80%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention160_float.cu.o
#0 547.0 [ 80%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention160_half.cu.o
#0 662.7 [ 82%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention192_bf16.cu.o
#0 679.9 [ 82%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention192_float.cu.o
#0 680.6 [ 82%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention192_half.cu.o
#0 738.1 [ 82%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention224_bf16.cu.o
#0 1184.7 [ 83%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention224_float.cu.o
#0 1196.1 [ 83%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention224_half.cu.o
#0 1337.2 [ 83%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention256_bf16.cu.o
#0 1396.2 [ 83%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention256_float.cu.o
#0 1599.4 [ 84%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention256_half.cu.o
#0 1601.6 [ 84%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention32_bf16.cu.o
#0 1938.7 [ 84%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention32_float.cu.o
#0 1969.2 [ 84%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention32_half.cu.o
#0 2039.6 [ 85%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention48_bf16.cu.o
#0 2045.3 [ 85%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention48_float.cu.o
#0 2406.6 [ 85%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention48_half.cu.o
#0 2439.4 [ 85%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention64_bf16.cu.o
#0 2506.9 [ 87%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention64_float.cu.o
#0 2519.7 [ 87%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention64_half.cu.o
#0 2530.0 [ 87%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention80_bf16.cu.o
#0 2615.0 [ 87%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention80_float.cu.o
#0 2639.4 [ 88%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention80_half.cu.o
#0 2962.0 [ 88%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention96_bf16.cu.o
#0 3008.6 [ 88%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention96_float.cu.o
#0 3038.5 [ 89%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention96_half.cu.o
#0 3079.1 [ 89%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decodingKernels.cu.o
#0 3103.2 [ 89%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/gptKernels.cu.o
#0 3128.1 [ 89%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/layernormKernels.cu.o
#0 3155.7 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/lookupKernels.cu.o
#0 3167.2 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels.cu.o
#0 3177.8 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels/onlineSoftmaxBeamsearchKernels16.cu.o
#0 3224.4 [ 91%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels/onlineSoftmaxBeamsearchKernels32.cu.o
#0 3311.5 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels/onlineSoftmaxBeamsearchKernels4.cu.o
#0 3311.9 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels/onlineSoftmaxBeamsearchKernels64.cu.o
#0 3422.3 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/onlineSoftmaxBeamsearchKernels/onlineSoftmaxBeamsearchKernels8.cu.o
#0 3424.5 [ 92%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/preQuantScaleKernel.cu.o
#0 3430.7 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/quantization.cu.o
#0 3434.1 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/rmsnormKernels.cu.o
#0 3454.0 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingPenaltyKernels.cu.o
#0 3458.6 [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopKKernels.cu.o
#0 3466.4 [ 94%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/samplingTopPKernels.cu.o
#0 3489.2 [ 94%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/stopCriteriaKernels.cu.o
#0 3511.2 [ 94%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels.cu.o
#0 3573.6 [ 94%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/unfusedAttentionKernels_2.cu.o
#0 3630.0 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelLauncher.cu.o
#0 3638.9 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs1Int4b.cu.o
#0 3642.8 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs1Int8b.cu.o
#0 3669.7 [ 96%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs2Int4b.cu.o
#0 3671.1 [ 97%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs2Int8b.cu.o
#0 3680.1 [ 97%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs3Int4b.cu.o
#0 3681.7 [ 97%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs3Int8b.cu.o
#0 3702.0 [ 97%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs4Int4b.cu.o
#0 3702.3 [ 98%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/weightOnlyBatchedGemvBs4Int8b.cu.o
#0 4185.4 [ 98%] Built target kernels_src
#0 4185.4 [ 98%] Linking CXX static library libtensorrt_llm_static.a
#0 4189.0 [ 98%] Built target tensorrt_llm_static
#0 4189.0 [100%] Linking CXX shared library libtensorrt_llm.so
#0 4189.1 /usr/bin/ld:/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.a: file format not recognized; treating as linker script
#0 4189.1 /usr/bin/ld:/src/tensorrt_llm/cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.a:1: syntax error
#0 4189.1 collect2: error: ld returned 1 exit status
#0 4189.1 gmake[3]: *** [tensorrt_llm/CMakeFiles/tensorrt_llm.dir/build.make:714: tensorrt_llm/libtensorrt_llm.so] Error 1
#0 4189.1 gmake[2]: *** [CMakeFiles/Makefile2:677: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/all] Error 2
#0 4189.1 gmake[1]: *** [CMakeFiles/Makefile2:684: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
#0 4189.1 gmake: *** [Makefile:179: tensorrt_llm] Error 2
#0 4189.1 Traceback (most recent call last):
#0 4189.1 File "/src/tensorrt_llm/scripts/build_wheel.py", line 248, in
#0 4189.1 main(**vars(args))
#0 4189.1 File "/src/tensorrt_llm/scripts/build_wheel.py", line 152, in main
#0 4189.1 build_run(
#0 4189.1 File "/usr/lib/python3.10/subprocess.py", line 526, in run
#0 4189.1 raise CalledProcessError(retcode, process.args,
#0 4189.1 subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 16 --target tensorrt_llm tensorrt_llm_static nvinfer_plugin_tensorrt_llm th_common ' returned non-zero exit status 2.

Dockerfile.multi:48

46 |
47 | ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt"
48 | >>> RUN python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
49 |
50 | FROM devel as release

ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
Makefile:46: recipe for target 'release_build' failed
make: *** [release_build] Error 1
`

@Tlntin
Copy link
Owner

Tlntin commented Oct 25, 2023

看样子是没有 pull lfs文件,导致缺了.a文件。你可以再pull一下。
git lfs install
git lfs pull

@Tlntin
Copy link
Owner

Tlntin commented Oct 25, 2023

排查方法

  1. 进入当前项目的TensorRT-LLM目录
cd TensorRT-LLM
  1. 确保下面4个.a文件存在,可以用ls命令观察一下。
ls -lh  cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.a
ls -lh  cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a
ls -lh  cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.a
ls -lh  cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a
  1. 如果不存在,或者文件低于1MB,还是用上面说的,用git-lfs拉一下代码,然后再验证一下
git lfs install
git lfs pull

@zhanghui-china
Copy link
Author

好的,我试下

@zhanghui-china
Copy link
Author

确实是下载的文件不全。现在修复了后编译成功了。谢谢。

@Boom199801
Copy link

确实是下载的文件不全。现在修复了后编译成功了。谢谢。

微信图片_20231120173318 hal哈喽大佬,和你遇到同样的情况,最后重新编译遇到这个错误,怎么解决的

@KKenny0
Copy link

KKenny0 commented Dec 26, 2023

确实是下载的文件不全。现在修复了后编译成功了。谢谢。

微信图片_20231120173318 hal哈喽大佬,和你遇到同样的情况,最后重新编译遇到这个错误,怎么解决的

你好,我也遇到了这个问题,请问你解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants