Skip to content

The TF installation fails with No such file #include <cudnn_frontend.h> #261

@kaixih

Description

@kaixih

With the latest code, it seems we can no longer build the code in the TF containers. After executing pip install ., I got:

FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
      /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/home/workspace/repo_zoo/TransformerEngine/transformer_engine -I/home/workspace/repo_zoo/TransformerEngine/transformer_engine/common/include
-I/usr/local/cuda/targets/x86_64-linux/include -I/home/workspace/repo_zoo/TransformerEngine/transformer_engine/../3rdparty/cudnn-frontend/include -I/tmp/tmpy_w9ikze/common/string_headers -isystem=/usr/local/cuda/include --threads 4 --exp
t-relaxed-constexpr -O3 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,
code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o.d -x cu -c /home/workspace/re
po_zoo/TransformerEngine/transformer_engine/common/fused_attn/fused_attn_fp8.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
      In file included from /home/workspace/repo_zoo/TransformerEngine/transformer_engine/common/fused_attn/fused_attn_fp8.cu:10:
      /home/workspace/repo_zoo/TransformerEngine/transformer_engine/common/fused_attn/utils.h:14:10: fatal error: cudnn_frontend.h: No such file or directory
         14 | #include <cudnn_frontend.h>
            |          ^~~~~~~~~~~~~~~~~~

It seems we need the similar setup_tensorflow_extension function as for the pytorch or paddle in the setup.py, where the /opt/tensorflow/cudnn-frontend/include/ needs to be added.

@trevor-m Can you take a look when you have bandwidth?

Also curious: is such installation break captured by our CI/CD?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions