Skip to content

Conversation

@leofang
Copy link
Member

@leofang leofang commented Nov 19, 2025

Description

There are a few pure Python constructions that are getting in the way, and those also forced us to use a single value type (was int and being changed to uint64_t in #1272). We have LLM now so we really should just spell it out once and tweak them quickly manually, like what we do for device attributes and program options.

Before (main):

In [6]: %timeit mr.attributes.reuse_allow_internal_dependencies
265 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

After this PR:

In [6]: %timeit mr.attributes.reuse_allow_internal_dependencies
116 ns ± 0.63 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

By construction this also fixes https://github.com/NVIDIA/cuda-python-private/issues/197.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang
Copy link
Member Author

leofang commented Nov 19, 2025

/ok to test 74c6611

@github-actions

This comment has been minimized.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is awesome! Please disregard my comment on #1272. I didn't see this before; and I didn't look around enough to realize it can be done this elegantly.

Copy link
Contributor

@Andy-Jost Andy-Jost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Leo. This looks like the right approach to me.

@leofang leofang added this to the cuda.core beta 10 milestone Nov 20, 2025
@leofang leofang added bug Something isn't working enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Nov 20, 2025
@leofang
Copy link
Member Author

leofang commented Nov 20, 2025

Thanks guys! Merging.

@leofang leofang merged commit 1f67ea2 into NVIDIA:main Nov 20, 2025
65 checks passed
@leofang leofang deleted the unfold_mr_attr branch November 20, 2025 01:30
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants