Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing issue with sample_async failing on machines with multiple GPUs #1379

Merged
merged 3 commits into from
Mar 12, 2024

Conversation

1tnguyen
Copy link
Collaborator

@1tnguyen 1tnguyen commented Mar 12, 2024

Description

In library mode, we need to run a tracing pass before actual execution.
With multi-qpu, we set_exec_ctx on specific QPU id but didn't reset the context on that particular qpu id (fall back to reset QPU 0 context all the time).

Resolves #1374

In library mode, we need to run a tracing pass before actual execution.
With multi-qpu, we set_exec_ctx on specific qpu id but didn't reset the
context on that particular qpu id (fall back to reset QPU 0 context all
the time).
Copy link
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@1tnguyen 1tnguyen enabled auto-merge (squash) March 12, 2024 20:15
@1tnguyen 1tnguyen merged commit 0c8e28b into NVIDIA:main Mar 12, 2024
133 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2024
@bettinaheim bettinaheim changed the title Fix a bug in sample.h affecting sample_async in library mode Fixing issue with sample_async failing on machines with multiple GPUs Apr 17, 2024
@bettinaheim bettinaheim added the bug fix To be listed under Bug Fixes in the release notes label Apr 17, 2024
@bettinaheim bettinaheim added this to the release 0.7.1 milestone Apr 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug fix To be listed under Bug Fixes in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nvqpp_SampleAsync fails on machines with multiple GPUs
3 participants