compile tensornet_example.cu #2

balewski · 2022-02-22T23:20:19Z

Hi,
I'm trying to follow the instruction on how to compile tensornet_example.cu .
https://github.com/NVIDIA/cuQuantum/tree/main/samples/cutensornet
I think it is inconsistent because it says:
export CUTENSORNET_ROOT=<path_to_custatevec_root>

but currently, custatevec is not part of CUTENSORNET anymore. It is a part of CUQUANTUM.
I'm referring to this pair:

https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz

https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
Can you please clarify how to configure the Makefile to work with these 2 libs and compile tensornet_example.cu ?
Thanks
Jan Balewski, NERSC

leofang · 2022-02-23T02:51:55Z

Hi @balewski Thanks for asking. I think you caught another documentation issue 🙂 This doc is maybe less confusing?
https://docs.nvidia.com/cuda/cuquantum/custatevec/getting_started.html#installation-and-compilation

leofang · 2022-02-23T02:52:21Z

Oops sorry, wrong link: https://docs.nvidia.com/cuda/cuquantum/cutensornet/getting_started.html#installation-and-compilation

balewski · 2022-02-23T04:36:41Z

Hi Leo, I started with that other instruction. It has 2 issues: a) there is no ${CUQUANTUM_ROOT}/lib64 but only ${CUQUANTUM_ROOT}/lib I have fixed that. b) But when I execute the binary I get the error: $ ./tensornet_example ===== device info ====== GPU-name:A100-PCIE-40GB GPU-clock:1410000 GPU-memoryClock:1215000 GPU-nSM:108 GPU-major:8 GPU-minor:0 ======================== Include headers and define data types Define network, modes, and extents Total memory: 0.28 GiB Allocate memory for data and workspace, and initialize data. Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199 Would it be possible, you try this instruction and confirm it works. Also, please tell me what exactly where the names of both tar.xz files and which CUTENSOR_ROOT/lib version have you used? At this moment I'm testing the code and any version is fine as long as it works Thanks Jan

…

On Feb 22, 2022, at 6:52 PM, Leo Fang ***@***.***> wrote: Oops sorry, wrong link: https://docs.nvidia.com/cuda/cuquantum/cutensornet/getting_started.html#installation-and-compilation — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.

leofang · 2022-02-23T04:47:24Z

Right, we're aware of the lib64 vs lib issue, currently the clarification is pending in our pipeline, sorry for the confusion. A simple rename or symlink should fix it quickly.

As for CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH, for cuTensorNet 0.1.0 (the version that you downloaded) it requires cuTENSOR 1.4.0. Could you check which cuTENSOR version that you have?

balewski · 2022-02-23T04:57:12Z

Hi Leo, I think I got the right one:1.4.0.6, or it is not 1.4.0 ? Your page: https://developer.nvidia.com/cutensor/downloads gave me this tar-ball : wget https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz And for completeness, I got this cuQuantum wget https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz I'm not sure how to resolve this mismatch. Please advice Jan

…

On Feb 22, 2022, at 8:47 PM, Leo Fang ***@***.***> wrote: As for CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH, for cuTensorNet 0.1.0 (the version that you downloaded) it requires cuTENSOR 1.4.0. Could you check which cuTENSOR version that you have?

leofang · 2022-02-23T05:02:21Z

Hi Jan, inspecting your log I found something odd:

Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199

line 199 does not contain any code. Could you check?

Also, what's your CUDA Toolkit version (which can be checked by nvcc --version) and the driver version (by nvidia-smi)?

balewski · 2022-02-23T16:44:40Z

Hi Leo, we are home. I was using CUDA Toolkit 1.3 instead of the required 1.4. The crash was in this line: HANDLE_ERROR(cutensornetCreate(&handle)); Now all works for me. Thank you for your help and patience. You can close this ticket. Thanks Jan P.S.: For the record, I list below all ingredients for the successful execution of tensornet_example.cu Required CUDA Version: 11.4 (nvidia-smi) Binaries: wget https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz test code: https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example.cu Local setup: export CUQUANTUM_ROOT=/xxx/cuquantum-linux-x86_64-0.1.0.30-archive export CUTENSOR_ROOT=/xxx/libcutensor-linux-x86_64-1.4.0.6-archive export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/11:${LD_LIBRARY_PATH} Compilation: nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/11 -lcutensor -lcutensornet -o tensornet_example Execution: ./tensornet_example cuTensorNet-vers:1 ===== device info ====== GPU-name:A100-PCIE-40GB GPU-clock:1410000 GPU-memoryClock:1215000 GPU-nSM:108 GPU-major:8 GPU-minor:0 ======================== Include headers and define data types Define network, modes, and extents Total memory: 0.28 GiB Allocate memory for data and workspace, and initialize data. Initialize the cuTensorNet library and create a network descriptor. Find an optimized contraction path with cuTensorNet optimizer. Create a contraction plan for cuTENSOR and optionally auto-tune it. Contract the network, each slice uses the same contraction plan. numSlices: 1 1.96 ms / slice 7405.34 GFLOPS/s Free resource and exit.

leofang · 2022-02-23T17:39:29Z

Nice, thanks for reporting @balewski. So all you did to make it work was to use CUDA Toolkit 11.4 instead of 11.3 to compile? And you mentioned nvidia-smi shows your driver version is 11.4?

If it's the case, this is not what we would expect. I will check internally. Let me keep this ticket open for the time being.

balewski · 2022-02-23T18:19:51Z

Hi Leo, yes - the too old cuda version was the root cause of my issue. I'm working at NERSC and the default deployment on our facility was: CUDA Version: 11.3 . So I switch to a Docker image derived from your Docker nvcr.io/nvidia/pytorch:21.08 <http://nvcr.io/nvidia/pytorch:21.08>, which contains CUDA Version: 11.4 Then all went smooothly. Thanks again for helping Jan

…

On Feb 23, 2022, at 9:39 AM, Leo Fang ***@***.***> wrote: Nice, thanks for reporting @balewski <https://github.com/balewski>. So all you did to make it work was to use CUDA Toolkit 11.4 instead of 11.3 to compile? And you mentioned nvidia-smi shows your driver version is 11.4? If it's the case, this is not what we would expect. I will check internally. Let me keep this ticket open for the time being.

balewski · 2022-02-23T18:25:49Z

To clarify, I'm not in control of the software stack at my facility. I noticed cuda version provided to me was to old and decided switch to this Docker image which had a newer cuda. But most likely there were other updates in this Docker image too. So I can only say this Docker image works with my instruction. It is now a more fuzzy answer. Jan

…

On Feb 23, 2022, at 9:39 AM, Leo Fang ***@***.***> wrote: this is not what we would expect.

leofang · 2022-02-23T18:38:25Z

Haha sure it helps, thanks Jan. I was also a NERSC user so I understand the constraints. I'll get back once we know better the situation.

balewski · 2022-02-23T18:46:23Z

to be more precise, this is how the local Docker image (deployed under Shifter at Nersc ) was constructed: NVC_TAG=21.08 nvc_tag=$NVC_TAG-py3 FROM nvcr.io/nvidia/pytorch:$nvc_tag The full Dockerfile is here: https://github.com/NERSC/nersc-ml-images/blob/master/pytorch/Dockerfile Jan

leofang · 2022-02-24T18:41:51Z

Hi @balewski, just to confirm: Have you checked the output of ldd ./tensornet_example with the failing executable? If not, could you check and share the output? Thanks.

balewski · 2022-02-24T19:43:12Z

Hi Leo, sorry - I can't reproduce my previous mismatch error: Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199 despite I went back to the tensornet_example.cu code I got from a user. (I had tried 10 different setups that day) I started over today and w/o using Shifter image and using cuda 11.3 I'm getting only the seg-fault error. ======================== Include headers and define data types Define network, modes, and extents Total memory: 0.28 GiB Allocate memory for data and workspace, and initialize data. Initialize the cuTensorNet library and create a network descriptor. Find an optimized contraction path with cuTensorNet optimizer. Segmentation fault Below is more information for this crashing config - if it is of any use for you Sorry Jan = = = = = = = = ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> module list Currently Loaded Modules: 1) craype-x86-rome 4) perftools-base/21.12.0 7) cray-dsmml/0.2.2 10) xalt/2.10.2 13) nccl/2.11.4 2) libfabric/1.11.0.4.75 5) xpmem/2.2.40-2.1_3.9__g3cf3325.shasta 8) cray-libsci/21.08.1.2 11) darshan/3.3.1 (io) 14) pytorch/1.10.0 3) craype-network-ofi 6) craype/2.7.13 9) PrgEnv-nvidia/8.2.0 12) gcc/9.3.0 (c) 15) cray-mpich/8.1.12 (mpi) Where: mpi: MPI Providers io: Input/output software c: Compiler ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> nvidia-smi Thu Feb 24 11:34:43 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.162 Driver Version: 450.162 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 A100-PCIE-40GB On | 00000000:C3:00.0 Off | 0 | | N/A 36C P0 35W / 250W | 35599MiB / 40537MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/11 -lcutensor -lcutensornet -o tensornet_example ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> ./tensornet_example cuTensorNet-vers:1 ===== device info ====== GPU-name:A100-PCIE-40GB GPU-clock:1410000 GPU-memoryClock:1215000 GPU-nSM:108 GPU-major:8 GPU-minor:0 ======================== Include headers and define data types Define network, modes, and extents Total memory: 0.28 GiB Allocate memory for data and workspace, and initialize data. Initialize the cuTensorNet library and create a network descriptor. Find an optimized contraction path with cuTensorNet optimizer. Segmentation fault ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> ldd ./tensornet_example linux-vdso.so.1 (0x00007ffe36afb000) libcutensor.so.1 => /pscratch/sd/b/balewski/tmp_cuQuant/libcutensor-linux-x86_64-1.4.0.6-archive/lib/11/libcutensor.so.1 (0x00007f26ca9b6000) libcutensornet.so.0 => /pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/lib/libcutensornet.so.0 (0x00007f26ca553000) librt.so.1 => /lib64/librt.so.1 (0x00007f26ca34b000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f26ca12c000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f26c9f28000) libstdc++.so.6 => /opt/cray/pe/gcc/9.3.0/snos/lib64/libstdc++.so.6 (0x00007f26c9b49000) libm.so.6 => /lib64/libm.so.6 (0x00007f26c9811000) libgcc_s.so.1 => /opt/cray/pe/gcc/9.3.0/snos/lib64/libgcc_s.so.1 (0x00007f26c95f9000) libc.so.6 => /lib64/libc.so.6 (0x00007f26c923e000) /lib64/ld-linux-x86-64.so.2 (0x00007f26d2214000) libcublasLt.so.11 => /global/common/software/nersc/cos1.3/cuda/11.3.0/lib64/libcublasLt.so.11 (0x00007f26bd660000) ***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet>

balewski · 2022-02-24T21:01:22Z

Hi Leo, success! We have our internal ticket system at Nersc and there is the ldd output you asked for, when the mismatch error has occurred. But I can't reproduce it any more. Jan = = = = = = ***@***.***:login38:/pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/question_sjeffrey> ./compile.bash nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jul_14_19:41:19_PDT_2021 Cuda compilation tools, release 11.4, V11.4.100 Build cuda_11.4.r11.4/compiler.30188945_0 linux-vdso.so.1 (0x00007ffda539b000) libcutensornet.so.0 => /pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/lib/libcutensornet.so.0 (0x00007f2188f59000) librt.so.1 => /lib64/librt.so.1 (0x00007f2188d51000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2188b32000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f218892e000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f2188553000) libm.so.6 => /lib64/libm.so.6 (0x00007f218821b000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2188002000) libc.so.6 => /lib64/libc.so.6 (0x00007f2187c47000) /lib64/ld-linux-x86-64.so.2 (0x00007f21893bc000) libcublasLt.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64/libcublasLt.so.11 (0x00007f2175b38000) libcutensor.so.1 => /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64/libcutensor.so.1 (0x00007f216f087000) cuTensorNet-vers:1 ===== device info ====== GPU-name:A100-PCIE-40GB GPU-clock:1410000 GPU-memoryClock:1215000 GPU-nSM:108 GPU-major:8 GPU-minor:0 ======================== Include headers and define data types Define network, modes, and extents Total memory: 0.28 GiB Allocate memory for data and workspace, and initialize data. Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199

mtjrider · 2022-02-26T03:47:21Z

Hi @balewski

So, you can't reproduce the issue? Or is the problem resolved?
Some additional notes below.

As long as you've loaded the NVIDIA HPC SDK 21.9 and correctly override cuTENSOR, the example should compile without issue (NVHPC21.9 comes with cuTENSOR 1.3.1, see release notes here).

If you need a different toolchain, you may have to add -gencodes to nvcc in the Makefile.

balewski · 2022-02-26T17:25:44Z

The problem is resolved. Thank you Jan

mtjrider · 2022-02-26T19:18:34Z

Thanks. Closing this issue. Please feel free to reopen if you need to.

mtjrider closed this as completed Feb 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile tensornet_example.cu #2

compile tensornet_example.cu #2

balewski commented Feb 22, 2022

leofang commented Feb 23, 2022

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 24, 2022

balewski commented Feb 24, 2022 via email

balewski commented Feb 24, 2022 via email

mtjrider commented Feb 26, 2022

balewski commented Feb 26, 2022 via email

mtjrider commented Feb 26, 2022

compile tensornet_example.cu #2

compile tensornet_example.cu #2

Comments

balewski commented Feb 22, 2022

leofang commented Feb 23, 2022

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

balewski commented Feb 23, 2022 via email

leofang commented Feb 23, 2022

balewski commented Feb 23, 2022 via email

leofang commented Feb 24, 2022

balewski commented Feb 24, 2022 via email

balewski commented Feb 24, 2022 via email

mtjrider commented Feb 26, 2022

balewski commented Feb 26, 2022 via email

mtjrider commented Feb 26, 2022