Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile tensornet_example.cu #2

Closed
balewski opened this issue Feb 22, 2022 · 18 comments
Closed

compile tensornet_example.cu #2

balewski opened this issue Feb 22, 2022 · 18 comments

Comments

@balewski
Copy link

Hi,
I'm trying to follow the instruction on how to compile tensornet_example.cu .
https://github.com/NVIDIA/cuQuantum/tree/main/samples/cutensornet
I think it is inconsistent because it says:
export CUTENSORNET_ROOT=<path_to_custatevec_root>

but currently, custatevec is not part of CUTENSORNET anymore. It is a part of CUQUANTUM.
I'm referring to this pair:

https://developer.download.nvidia.com/compute/cuquantum/redist/cuquantum/linux-x86_64/cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz

https://developer.download.nvidia.com/compute/cutensor/redist/libcutensor/linux-x86_64/libcutensor-linux-x86_64-1.4.0.6-archive.tar.xz
Can you please clarify how to configure the Makefile to work with these 2 libs and compile tensornet_example.cu ?
Thanks
Jan Balewski, NERSC

@leofang
Copy link
Member

leofang commented Feb 23, 2022

Hi @balewski Thanks for asking. I think you caught another documentation issue 🙂 This doc is maybe less confusing?
https://docs.nvidia.com/cuda/cuquantum/custatevec/getting_started.html#installation-and-compilation

@leofang
Copy link
Member

leofang commented Feb 23, 2022

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@leofang
Copy link
Member

leofang commented Feb 23, 2022

Right, we're aware of the lib64 vs lib issue, currently the clarification is pending in our pipeline, sorry for the confusion. A simple rename or symlink should fix it quickly.

As for CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH, for cuTensorNet 0.1.0 (the version that you downloaded) it requires cuTENSOR 1.4.0. Could you check which cuTENSOR version that you have?

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@leofang
Copy link
Member

leofang commented Feb 23, 2022

Hi Jan, inspecting your log I found something odd:

Include headers and define data types
Define network, modes, and extents
Total memory: 0.28 GiB
Allocate memory for data and workspace, and initialize data.
Error: CUTENSORNET_STATUS_CUTENSOR_VERSION_MISMATCH in line 199

line 199 does not contain any code. Could you check?

Also, what's your CUDA Toolkit version (which can be checked by nvcc --version) and the driver version (by nvidia-smi)?

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@leofang
Copy link
Member

leofang commented Feb 23, 2022

Nice, thanks for reporting @balewski. So all you did to make it work was to use CUDA Toolkit 11.4 instead of 11.3 to compile? And you mentioned nvidia-smi shows your driver version is 11.4?

If it's the case, this is not what we would expect. I will check internally. Let me keep this ticket open for the time being.

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@leofang
Copy link
Member

leofang commented Feb 23, 2022

Haha sure it helps, thanks Jan. I was also a NERSC user so I understand the constraints. I'll get back once we know better the situation.

@balewski
Copy link
Author

balewski commented Feb 23, 2022 via email

@leofang
Copy link
Member

leofang commented Feb 24, 2022

Hi @balewski, just to confirm: Have you checked the output of ldd ./tensornet_example with the failing executable? If not, could you check and share the output? Thanks.

@balewski
Copy link
Author

balewski commented Feb 24, 2022 via email

@balewski
Copy link
Author

balewski commented Feb 24, 2022 via email

@mtjrider
Copy link
Collaborator

Hi @balewski

So, you can't reproduce the issue? Or is the problem resolved?
Some additional notes below.

As long as you've loaded the NVIDIA HPC SDK 21.9 and correctly override cuTENSOR, the example should compile without issue (NVHPC21.9 comes with cuTENSOR 1.3.1, see release notes here).

If you need a different toolchain, you may have to add -gencodes to nvcc in the Makefile.

@balewski
Copy link
Author

balewski commented Feb 26, 2022 via email

@mtjrider
Copy link
Collaborator

Thanks. Closing this issue. Please feel free to reopen if you need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants