Skip to content

Backport checkpoint restore arg layout handling to 12.9.x#2145

Open
kkraus14 wants to merge 1 commit into
NVIDIA:12.9.xfrom
kkraus14:codex/backport-checkpoint-restoreargs-12.9.x
Open

Backport checkpoint restore arg layout handling to 12.9.x#2145
kkraus14 wants to merge 1 commit into
NVIDIA:12.9.xfrom
kkraus14:codex/backport-checkpoint-restoreargs-12.9.x

Conversation

@kkraus14
Copy link
Copy Markdown
Collaborator

Backport of the CUDA checkpoint restore argument layout handling from #2144 to the 12.9.x branch.

This keeps CUcheckpointRestoreArgs rendering version-flexible across the checkpoint restore layouts:

  • CUDA 12.9: reserved remains cuuint64_t[8]
  • CUDA 13.1/13.2: gpuPairs, gpuPairsCount, reserved as char[44], and reserved1
  • CUDA 13.3: gpuPairs, gpuPairsCount, and reserved as char[52]

The 12.9.x branch did not already have the CUcheckpointGpuPair generated blocks that main had before #2144, so this backport includes those conditional template blocks as well.

Validation:

  • pre-commit hooks passed during commit
  • rendered checkpoint template sections for 12.9, 13.2, and 13.3 shapes

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.bindings Everything related to the cuda.bindings module label May 27, 2026
@kkraus14 kkraus14 added bug Something isn't working P0 High priority - Must do! labels May 27, 2026
@kkraus14 kkraus14 self-assigned this May 27, 2026
@kkraus14 kkraus14 force-pushed the codex/backport-checkpoint-restoreargs-12.9.x branch from 5df5e59 to b34a70f Compare May 27, 2026 21:40
@kkraus14 kkraus14 force-pushed the codex/backport-checkpoint-restoreargs-12.9.x branch 2 times, most recently from 2dd69d8 to 440be94 Compare May 28, 2026 02:13
@kkraus14
Copy link
Copy Markdown
Collaborator Author

/ok to test

@kkraus14 kkraus14 force-pushed the codex/backport-checkpoint-restoreargs-12.9.x branch from 440be94 to c924a85 Compare May 28, 2026 05:13
@kkraus14 kkraus14 force-pushed the codex/backport-checkpoint-restoreargs-12.9.x branch from c924a85 to 648c67f Compare May 28, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.bindings Everything related to the cuda.bindings module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants