Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing cicc? #9

Open
1 task done
hmaarrfk opened this issue Dec 30, 2023 · 18 comments
Open
1 task done

missing cicc? #9

hmaarrfk opened this issue Dec 30, 2023 · 18 comments
Labels
bug Something isn't working

Comments

@hmaarrfk
Copy link

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

cicc seems to be in ${PREFIX}/nvvm/bin instead of ${PREFIX}/bin

so does libdevice10..bc

xref: conda-forge/tensorflow-feedstock#296

Installed packages

| linux-64/cuda-nvcc-tools-12.0.76-h59595ed_1.conda 
 | linux-64/cuda-nvcc-tools-12.1.105-hd3aeb46_0.conda 
 | linux-64/cuda-nvcc-tools-12.0.76-h59595ed_0.conda

Environment info

.
@hmaarrfk hmaarrfk added the bug Something isn't working label Dec 30, 2023
@jakirkham
Copy link
Member

Thanks for raising Mark! 🙏

nvvm is actually the expected location for these files

In CUDA 12, the nvvm contents match the CUDA Toolkit layout. In CUDA 11, the cudatoolkit package is not matching this layout ( conda-forge/cudatoolkit-feedstock#96 )

Hence libdevice and other bits wind up in the wrong place in the cudatoolkit package. Think we discussed this before in issue ( conda-forge/tensorflow-feedstock#296 ) where cudatoolkit package layout issues had cropped up

With cicc itself, it is typically used by nvcc (not usually external programs)

Have seen one other case where cicc was not found, but after further investigation it was due to some build configuration issues ( scopetools/cudadecon#29 )

So am wondering if there is a similar issue here. Do you have more context on the issue that came up?

@hmaarrfk
Copy link
Author

hmaarrfk commented Jan 4, 2024

Tensorflow 2.15 and cuda builds is where it came up

@jakirkham
Copy link
Member

Ok is there a log or something we could look at?

@hmaarrfk
Copy link
Author

hmaarrfk commented Jan 4, 2024

Not really since we disable building tf on the cis . you can see how I modified the build script though.

but I’ll upload something tomorrow

conda-forge/tensorflow-feedstock#366

edit: this comment in particular shows a small portion of the log
conda-forge/tensorflow-feedstock#366 (comment)

@jakirkham
Copy link
Member

jakirkham commented Jan 4, 2024

Completely understandable

An uploaded log would work. Happy to look at snippets too

We might consider setting up TensorFlow on the Quansight CI as well to make that a bit easier to manage

@jakirkham
Copy link
Member

Found this (admittedly old) thread, which mentions cicc may need to be in the search path

@jakirkham
Copy link
Member

This logic should add NVVM's bin directory to the $PATH

+PATH += $(TOP)/bin:$(TOP)/$(_NVVM_BRANCH_)/bin:$(TOP)/../../bin:$(TOP)/../../$(_NVVM_BRANCH_)/bin:

@leofang
Copy link
Member

leofang commented May 7, 2024

This logic should add NVVM's bin directory to the $PATH

Is there any action we need in this feedstock?

@hmaarrfk
Copy link
Author

hmaarrfk commented May 7, 2024

i'm not sure. happy to revisit in the future.

I haven't had time to go through the tensorflow builds in a long time.

@hmaarrfk hmaarrfk closed this as completed May 7, 2024
@LourensVeen
Copy link

I'm seeing the same issue as in scopetools/cudadecon#29, in a similar situation with old CUDA code using CMake. And I can reproduce it without calling cicc directly:

conda create -n test
conda activate test
conda install cuda-toolkit

touch source.cu
${CONDA_PREFIX}/bin/nvcc -c source.cu        # works

${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc -c source.cu
<command-line>: fatal error: cuda_runtime.h: No such file or directory

${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc -c -I${CONDA_PREFIX}/targets/x86_64-linux/include source.cu
sh: 1: cicc: not found

For the failing call, strace tells me:

openat(AT_FDCWD, "${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc.profile", O_RDONLY) = -1 ENOENT (No such file or directory)

while for the successful one it says:

openat(AT_FDCWD, "${CONDA_PREFIX}/bin/nvcc.profile", O_RDONLY) = 3

This latter file contains the line

CICC_PATH        = $(TOP)/nvvm/bin

which explains why cicc isn't found, I think.

So if nvcc tries to find its configuration in a location relative to itself, perhaps the symlink for nvcc should be accompanied by one for nvcc.profile?

@LourensVeen
Copy link

LourensVeen commented Jul 3, 2024

And a little more digging: CMake runs nvcc -v __cmake_determine_cuda, which prints the configuration as created from nvcc.profile and then errors out. This has the line

#$ TOP=${CONDA_PREFIX}/bin/../targets/x86_64-linux

which CMake then uses to locate nvcc at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, from where it can't find its configuration.

So it's the nvcc.profile itself that points CMake to a version of nvcc that cannot read nvcc.profile 😄.

Adding a symlink at ${CONDA_PREFIX)/targets/x86_64-linux/bin/nvcc.profile to ${CONDA_PREFIX}/bin/nvcc.profile fixes the problem.

@leofang leofang reopened this Jul 3, 2024
@leofang
Copy link
Member

leofang commented Jul 3, 2024

@robertmaynard @adibbley do you have insights for what Lorens brought up above?

@LourensVeen
Copy link

I've now also added a symlink for the bin/crt directory, to avoid errors linking code that uses the driver API with the stubs. I have more issues still, but the code I'm working on is also messy so they may be unrelated.

@robertmaynard
Copy link

What Cmake version are you using? This sounds like an older version of CMake that didn't properly handle symlinks inside TOP and has been fixed

@LourensVeen
Copy link

This is a new CMake, but with an old configuration that uses the now-obsolete FindCUDA macro.

But my first example reproduces the problem without CMake being involved in any way. Are you saying that users are expected to first resolve the symlink at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, rather than trying to run it directly as if it were the linked-to executable?

@robertmaynard
Copy link

But my first example reproduces the problem without CMake being involved in any way. Are you saying that users are expected to first resolve the symlink at ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc, rather than trying to run it directly as if it were the linked-to executable?

After looking at this more the issue is entirely due to a bad setup by conda. You are correct that a nvcc.profile needs to be beside the nvcc symlink in ${CONDA_PREFIX}/targets/x86_64-linux/bin/.

In the current form the nvcc at ${CONDA_PREFIX}/targets/x86_64-linux/bin/ is broken and the verbose output from the compiler looks like:

#$ NVCC_PREPEND_FLAGS=" -ccbin=/home/rmaynard/miniconda3/envs/cuda_stub_env/bin/x86_64-conda-linux-gnu-c++"
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_=
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_stub_env/targets/x86_64-linux/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_stub_env/targets/x86_64-linux/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ "/home/rmaynard/miniconda3/envs/cuda_stub_env/bin"/x86_64-conda-linux-gnu-c++ ....

When I symlink the nvcc.profile as well into targets/x86_64-linux/bin I see proper paths for the crt headers being included and a simple test case properly finds them.

@leofang @adibbley We need to create a nvcc.profile symlink like we do for targets/x86_64-linux/bin/nvcc

@robertmaynard
Copy link

robertmaynard commented Jul 8, 2024

@LourensVeen The only reason that ${CONDA_PREFIX}/targets/x86_64-linux/bin/nvcc exists is to support legacy CMake versions where the FindCUDA or FindCUDAToolkit would validate the CUDA Toolkit layout by searching for a nvcc executable under bin. Therefore we have that symlink so that targets/x86_64-linux/ matches the checked layout.

But I also believe that if we are going to offer a symlink to the compiler it should work so we don't give footguns to users

Edit: So at some point expect targets/x86_64-linux/bin/nvcc to go away and the only nvcc compiler to be in <prefix>/bin

@LourensVeen
Copy link

Okay, that makes sense to me. I'll be updating that CMake config.

You need to symlink bin/crt as well to make nvcc work if you want a temporary solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants