New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compile tensornet_example.cu #2
Comments
Hi @balewski Thanks for asking. I think you caught another documentation issue 🙂 This doc is maybe less confusing? |
Hi Leo,
I started with that other instruction. It has 2 issues:
a) there is no ${CUQUANTUM_ROOT}/lib64
but only ${CUQUANTUM_ROOT}/lib
I have fixed that.
b) But when I execute the binary I get the error:
$ ./tensornet_example
===== device info ======
GPU-name:A100-PCIE-40GB
GPU-clock:1410000
GPU-memoryClock:1215000
GPU-nSM:108
GPU-major:8
GPU-minor:0
========================
Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199
Would it be possible, you try this instruction and confirm it works. Also, please tell me what exactly where the names of both tar.xz files and
which CUTENSOR_ROOT/lib version have you used? At this moment I'm testing the code and any version is fine as long as it works
Thanks
Jan
… On Feb 22, 2022, at 6:52 PM, Leo Fang ***@***.***> wrote:
Oops sorry, wrong link: https://docs.nvidia.com/cuda/cuquantum/cutensornet/getting_started.html#installation-and-compilation
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
Right, we're aware of the As for |
Hi Leo,
I think I got the right one:1.4.0.6, or it is not 1.4.0 ? Your page:
https://developer.nvidia.com/cutensor/downloads
gave me this tar-ball :
wget https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
And for completeness, I got this cuQuantum
wget https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz
I'm not sure how to resolve this mismatch.
Please advice
Jan
… On Feb 22, 2022, at 8:47 PM, Leo Fang ***@***.***> wrote:
As for CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH, for cuTensorNet 0.1.0 (the version that you downloaded) it requires cuTENSOR 1.4.0. Could you check which cuTENSOR version that you have?
|
Hi Jan, inspecting your log I found something odd:
line 199 does not contain any code. Could you check? Also, what's your CUDA Toolkit version (which can be checked by |
Hi Leo,
we are home. I was using CUDA Toolkit 1.3 instead of the required 1.4.
The crash was in this line: HANDLE_ERROR(cutensornetCreate(&handle));
Now all works for me. Thank you for your help and patience.
You can close this ticket.
Thanks
Jan
P.S.: For the record, I list below all ingredients for the successful execution of tensornet_example.cu
Required CUDA Version: 11.4 (nvidia-smi)
Binaries:
wget https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
test code:
https://github.com/NVIDIA/cuQuantum/blob/main/samples/cutensornet/tensornet_example.cu
Local setup:
export CUQUANTUM_ROOT=/xxx/cuquantum-linux-x86_64-0.1.0.30-archive
export CUTENSOR_ROOT=/xxx/libcutensor-linux-x86_64-1.4.0.6-archive
export LD_LIBRARY_PATH=${CUQUANTUM_ROOT}/lib:${CUTENSOR_ROOT}/lib/11:${LD_LIBRARY_PATH}
Compilation:
nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/11 -lcutensor -lcutensornet -o tensornet_example
Execution: ./tensornet_example
cuTensorNet-vers:1
===== device info ======
GPU-name:A100-PCIE-40GB
GPU-clock:1410000
GPU-memoryClock:1215000
GPU-nSM:108
GPU-major:8
GPU-minor:0
========================
Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Initialize the cuTensorNet library and create a network descriptor.
Find an optimized contraction path with cuTensorNet optimizer.
Create a contraction plan for cuTENSOR and optionally auto-tune it.
Contract the network, each slice uses the same contraction plan.
numSlices: 1
1.96 ms / slice
7405.34 GFLOPS/s
Free resource and exit.
|
Nice, thanks for reporting @balewski. So all you did to make it work was to use CUDA Toolkit 11.4 instead of 11.3 to compile? And you mentioned If it's the case, this is not what we would expect. I will check internally. Let me keep this ticket open for the time being. |
Hi Leo,
yes - the too old cuda version was the root cause of my issue.
I'm working at NERSC and the default deployment on our facility was: CUDA Version: 11.3 .
So I switch to a Docker image derived from your Docker nvcr.io/nvidia/pytorch:21.08 <http://nvcr.io/nvidia/pytorch:21.08>, which contains CUDA Version: 11.4
Then all went smooothly.
Thanks again for helping
Jan
… On Feb 23, 2022, at 9:39 AM, Leo Fang ***@***.***> wrote:
Nice, thanks for reporting @balewski <https://github.com/balewski>. So all you did to make it work was to use CUDA Toolkit 11.4 instead of 11.3 to compile? And you mentioned nvidia-smi shows your driver version is 11.4?
If it's the case, this is not what we would expect. I will check internally. Let me keep this ticket open for the time being.
|
To clarify, I'm not in control of the software stack at my facility. I noticed cuda version provided to me was to old and decided switch to this Docker image which had a newer cuda.
But most likely there were other updates in this Docker image too.
So I can only say this Docker image works with my instruction.
It is now a more fuzzy answer.
Jan
… On Feb 23, 2022, at 9:39 AM, Leo Fang ***@***.***> wrote:
this is not what we would expect.
|
Haha sure it helps, thanks Jan. I was also a NERSC user so I understand the constraints. I'll get back once we know better the situation. |
to be more precise, this is how the local Docker image (deployed under Shifter at Nersc ) was constructed:
NVC_TAG=21.08
nvc_tag=$NVC_TAG-py3
FROM nvcr.io/nvidia/pytorch:$nvc_tag
The full Dockerfile is here:
https://github.com/NERSC/nersc-ml-images/blob/master/pytorch/Dockerfile
Jan
|
Hi @balewski, just to confirm: Have you checked the output of |
Hi Leo,
sorry - I can't reproduce my previous mismatch error:
Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199
despite I went back to the tensornet_example.cu code I got from a user.
(I had tried 10 different setups that day)
I started over today and w/o using Shifter image and using cuda 11.3 I'm getting only the seg-fault error.
========================
Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Initialize the cuTensorNet library and create a network descriptor.
Find an optimized contraction path with cuTensorNet optimizer.
Segmentation fault
Below is more information for this crashing config - if it is of any use for you
Sorry
Jan
= = = = = = = =
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> module list
Currently Loaded Modules:
1) craype-x86-rome 4) perftools-base/21.12.0 7) cray-dsmml/0.2.2 10) xalt/2.10.2 13) nccl/2.11.4
2) libfabric/1.11.0.4.75 5) xpmem/2.2.40-2.1_3.9__g3cf3325.shasta 8) cray-libsci/21.08.1.2 11) darshan/3.3.1 (io) 14) pytorch/1.10.0
3) craype-network-ofi 6) craype/2.7.13 9) PrgEnv-nvidia/8.2.0 12) gcc/9.3.0 (c) 15) cray-mpich/8.1.12 (mpi)
Where:
mpi: MPI Providers
io: Input/output software
c: Compiler
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> nvidia-smi
Thu Feb 24 11:34:43 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.162 Driver Version: 450.162 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-PCIE-40GB On | 00000000:C3:00.0 Off | 0 |
| N/A 36C P0 35W / 250W | 35599MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> nvcc tensornet_example.cu -I${CUQUANTUM_ROOT}/include -I${CUTENSOR_ROOT}/include -L${CUQUANTUM_ROOT}/lib -L${CUTENSOR_ROOT}/lib/11 -lcutensor -lcutensornet -o tensornet_example
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> ./tensornet_example
cuTensorNet-vers:1
===== device info ======
GPU-name:A100-PCIE-40GB
GPU-clock:1410000
GPU-memoryClock:1215000
GPU-nSM:108
GPU-major:8
GPU-minor:0
========================
Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Initialize the cuTensorNet library and create a network descriptor.
Find an optimized contraction path with cuTensorNet optimizer.
Segmentation fault
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet> ldd ./tensornet_example
linux-vdso.so.1 (0x00007ffe36afb000)
libcutensor.so.1 => /pscratch/sd/b/balewski/tmp_cuQuant/libcutensor-linux-x86_64-1.4.0.6-archive/lib/11/libcutensor.so.1 (0x00007f26ca9b6000)
libcutensornet.so.0 => /pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/lib/libcutensornet.so.0 (0x00007f26ca553000)
librt.so.1 => /lib64/librt.so.1 (0x00007f26ca34b000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f26ca12c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f26c9f28000)
libstdc++.so.6 => /opt/cray/pe/gcc/9.3.0/snos/lib64/libstdc++.so.6 (0x00007f26c9b49000)
libm.so.6 => /lib64/libm.so.6 (0x00007f26c9811000)
libgcc_s.so.1 => /opt/cray/pe/gcc/9.3.0/snos/lib64/libgcc_s.so.1 (0x00007f26c95f9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f26c923e000)
/lib64/ld-linux-x86-64.so.2 (0x00007f26d2214000)
libcublasLt.so.11 => /global/common/software/nersc/cos1.3/cuda/11.3.0/lib64/libcublasLt.so.11 (0x00007f26bd660000)
***@***.***:login15:/pscratch/sd/b/balewski/tmp_cuQuant/cuQuantum/samples/cutensornet>
|
Hi Leo,
success! We have our internal ticket system at Nersc and there is the ldd output you asked for, when the mismatch error has occurred.
But I can't reproduce it any more.
Jan
= = = = = =
***@***.***:login38:/pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/question_sjeffrey> ./compile.bash
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0
linux-vdso.so.1 (0x00007ffda539b000)
libcutensornet.so.0 => /pscratch/sd/b/balewski/tmp_cuQuant/cuquantum-linux-x86_64-0.1.0.30-archive/lib/libcutensornet.so.0 (0x00007f2188f59000)
librt.so.1 => /lib64/librt.so.1 (0x00007f2188d51000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2188b32000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f218892e000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f2188553000)
libm.so.6 => /lib64/libm.so.6 (0x00007f218821b000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2188002000)
libc.so.6 => /lib64/libc.so.6 (0x00007f2187c47000)
/lib64/ld-linux-x86-64.so.2 (0x00007f21893bc000)
libcublasLt.so.11 => /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64/libcublasLt.so.11 (0x00007f2175b38000)
libcutensor.so.1 => /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/math_libs/11.4/lib64/libcutensor.so.1 (0x00007f216f087000)
cuTensorNet-vers:1
===== device info ======
GPU-name:A100-PCIE-40GB
GPU-clock:1410000
GPU-memoryClock:1215000
GPU-nSM:108
GPU-major:8
GPU-minor:0
========================
Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199
|
Hi @balewski So, you can't reproduce the issue? Or is the problem resolved? As long as you've loaded the NVIDIA HPC SDK 21.9 and correctly override cuTENSOR, the example should compile without issue (NVHPC21.9 comes with cuTENSOR 1.3.1, see release notes here). If you need a different toolchain, you may have to add |
The problem is resolved.
Thank you
Jan
|
Thanks. Closing this issue. Please feel free to reopen if you need to. |
Hi,
I'm trying to follow the instruction on how to compile tensornet_example.cu .
https://github.com/NVIDIA/cuQuantum/tree/main/samples/cutensornet
I think it is inconsistent because it says:
export CUTENSORNET_ROOT=<path_to_custatevec_root>
but currently, custatevec is not part of CUTENSORNET anymore. It is a part of CUQUANTUM.
I'm referring to this pair:
https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz
https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
Can you please clarify how to configure the Makefile to work with these 2 libs and compile tensornet_example.cu ?
Thanks
Jan Balewski, NERSC
The text was updated successfully, but these errors were encountered: