Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvqpp_SampleAsync fails on machines with multiple GPUs #1374

Closed
3 of 4 tasks
bmhowe23 opened this issue Mar 12, 2024 · 0 comments · Fixed by #1379
Closed
3 of 4 tasks

nvqpp_SampleAsync fails on machines with multiple GPUs #1374

bmhowe23 opened this issue Mar 12, 2024 · 0 comments · Fixed by #1379
Labels
bug Something isn't working

Comments

@bmhowe23
Copy link
Collaborator

Required prerequisites

  • Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

The nvqpp_SampleAsync test (included in the cuda-quantum repo) fails with the following error: double free or corruption (out)

Steps to reproduce the bug

On a machine with multiple GPUs, build cuda-quantum and then run this command (or something similar):

$ CUDA_VISIBLE_DEVICES=2,3 ctest --test-dir build -R "^nvqpp_SampleAsync$" -V

It will return something like this:

Internal ctest changing into directory: /workspaces/tmp/cuda-quantum/build
UpdateCTestConfiguration  from :/workspaces/tmp/cuda-quantum/build/DartConfiguration.tcl
Parse Config file:/workspaces/tmp/cuda-quantum/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/workspaces/tmp/cuda-quantum/build/DartConfiguration.tcl
Parse Config file:/workspaces/tmp/cuda-quantum/build/DartConfiguration.tcl
Test project /workspaces/tmp/cuda-quantum/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 491
    Start 491: nvqpp_SampleAsync

491: Test command: /usr/bin/bash "-c" "rm -f SampleAsync; /workspaces/tmp/cuda-quantum/build/bin/nvq++  --target nvidia-mqpu /workspaces/tmp/cuda-quantum/docs/sphinx/snippets/cpp/using/cudaq/platform/sample_async.cpp -o SampleAsync &&                /workspaces/tmp/cuda-quantum/build/docs/SampleAsync"
491: Working Directory: /workspaces/tmp/cuda-quantum/build/docs
491: Test timeout computed to be: 1500
491: double free or corruption (out)
491: /usr/bin/bash: line 1:  6582 Aborted                 (core dumped) /workspaces/tmp/cuda-quantum/build/docs/SampleAsync
1/1 Test #491: nvqpp_SampleAsync ................***Failed    5.77 sec

0% tests passed, 1 tests failed out of 1

Label Time Summary:
gpu_required    =   5.77 sec*proc (1 test)

Total Test time (real) =   5.78 sec

The following tests FAILED:
        491 - nvqpp_SampleAsync (Failed)
Errors while running CTest
Output from these tests are in: /workspaces/tmp/cuda-quantum/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Expected behavior

The test should not fail.

Is this a regression? If it is, put the last known working version (or commit) here.

Yes, this is a regression. git bisect says a8a8ba5 is the problem

Environment

  • CUDA Quantum version: a8a8ba5
  • Python version:
  • C++ compiler:
  • Operating system: Ubuntu

Suggestions

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant