Skip to content

Conversation

leofang
Copy link
Member

@leofang leofang commented Oct 2, 2025

Description

Close #1065
Close #1063

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@leofang leofang self-assigned this Oct 2, 2025
Copy link
Contributor

copy-pr-bot bot commented Oct 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang added enhancement Any code-related improvements triage Needs the team's attention cuda.core Everything related to the cuda.core module blocked This task is currently blocked by other tasks labels Oct 2, 2025
@leofang

This comment was marked as resolved.

@leofang leofang added P0 High priority - Must do! and removed blocked This task is currently blocked by other tasks labels Oct 3, 2025
@leofang
Copy link
Member Author

leofang commented Oct 3, 2025

/ok to test b713c5c

This comment has been minimized.

Copy link
Collaborator

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment asking about pointer data type changes and pointer value conversions.

cdef int _get_device_and_context(self) except?-1:
cdef cydriver.CUcontext curr_ctx
if self._device_id == cydriver.CU_DEVICE_INVALID:
# TODO: It is likely faster/safer to call cuCtxGetCurrent?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is universally true, where a context can be non-current without being destroyed. Retrieving the device and context from that stream object would be valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not bother doing a profiling here, just thought that it might be good to reduce a few CUDA calls... 😛

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this TODO before merge? It's not logically correct to call cuCtxGetCurrent here as the Stream object could be from a different context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this TODO is confusing. We actually need the current context in order to do a safe cuCtxGetDevice dance (set the stream ctx to current -> get device -> revert to the current ctx). Let me clarify in the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use cuCtxGetDevice_v2 which allows passing a ctx explicitly and doesn't require messing with the current ctx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See commit 22b0e2e

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use cuCtxGetDevice_v2 which allows passing a ctx explicitly and doesn't require messing with the current ctx?

Unfortunately it seems to be CUDA 13 only... 😢

@leofang
Copy link
Member Author

leofang commented Oct 6, 2025

/ok to test a5e6bcf

@leofang
Copy link
Member Author

leofang commented Oct 7, 2025

/ok to test 3809a33

kkraus14
kkraus14 previously approved these changes Oct 7, 2025
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the last change I'd like is trivial

@leofang
Copy link
Member Author

leofang commented Oct 7, 2025

/ok to test 22b0e2e

@leofang leofang merged commit 5669118 into NVIDIA:main Oct 7, 2025
71 checks passed
@leofang leofang deleted the cythonize_more branch October 7, 2025 18:02
Copy link

github-actions bot commented Oct 7, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cuda-core: Release GIL when calling cimport'd CUDA APIs [FEA]: Update the memory module to use __dealloc__.
8 participants