Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Oct 17, 2025

This is an alternative to #1154. Experimenting here and we should pick only one.

@mdboom mdboom requested a review from cpcloud October 17, 2025 16:45
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 17, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@github-actions

This comment has been minimized.

@leofang
Copy link
Member

leofang commented Oct 17, 2025

Looking at the post action cleanup log, we don't seem to be building the cache at all. I suspect on Linux it has to do with cibuildwheel launching a manylinux container, and if so we need to mount sccache into the container. But this does not explain the Windows situation, which does not use container and instead we build wheels on the bare VM.

@leofang
Copy link
Member

leofang commented Oct 17, 2025

Do we need to convince setuptools to prepend sccache to CC and CXX?

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

Do we need to convince setuptools to prepend sccache to CC and CXX?

Yeah, it looks like this action (unlike the ccache-based one) doesn't install the symlinks. The docs have examples setting CC and CXX. I'll try that and see if setuptools picks it up (I think it normally does).

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Oct 17, 2025

/ok to test

cpcloud
cpcloud previously approved these changes Oct 17, 2025
Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only question is about a possibly-unused variable, but that's not blocking.

@leofang
Copy link
Member

leofang commented Oct 23, 2025

/ok to test 7cac30d

@leofang
Copy link
Member

leofang commented Oct 23, 2025

/ok to test 6be8df7

@leofang
Copy link
Member

leofang commented Oct 23, 2025

It seems neither renaming (6be8df7) nor wrapping (8f6f008) works on Windows...

@leofang
Copy link
Member

leofang commented Oct 23, 2025

/ok to test 1eb6a1b

@leofang
Copy link
Member

leofang commented Oct 23, 2025

I am thinking we should purge all unsuccessful Windows attempts and get this merged.

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

I am thinking we should purge all unsuccessful Windows attempts and get this merged.

I agree.

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

/ok to test

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 23, 2025

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

/ok to test 6cfe70d

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

This is working with the Windows stuff removed.

However, I'm not sure how to confirm it's actually working. The stats at the end of the log (I guess) are from the host so are basically irrelevant.

But I'm also not seeing any speed improvement. For example:

  • This run from this PR, presumably with a hot cache, took 227s to build cuda_bindings
  • This run from another PR, took 211s to build cuda_bindings

I think the difference is within the noise, but we aren't seeing a notable speedup? If it is really caching, maybe the remote caching overhead of large files is erasing any benefits?

@leofang leofang added this to the cuda-python 13-next, 12-next milestone Oct 23, 2025
@leofang
Copy link
Member

leofang commented Oct 23, 2025

I saw Linux builds decreased from ~9 mins to 5 mins as of commit c10dc60 (link). Let me check what happened since then.

The cache stats need to be accessed in every cibuildwheel's test step (example), not at the end of the build workflows. As already noted, the stats on host is irrelevant.

@leofang
Copy link
Member

leofang commented Oct 23, 2025

@mdboom Something caused cache miss in the most recent CI run:

  Compile requests                      41
  Compile requests executed             21
  Cache hits                             2
  Cache hits (C/C++)                     2
  Cache misses                          19
  Cache misses (C/C++)                  19
  Cache hits rate                     9.52 %
  Cache hits rate (C/C++)             9.52 %
  Cache timeouts                         0
  Cache read errors                      0
  Forced recaches                        0
  Cache write errors                     0
  Cache errors                           0
  Compilations                          19
  Compilation failures                   0
  Non-cacheable compilations             0
  Non-cacheable calls                    0
  Non-compilation calls                 20
  Unsupported compiler calls             0
  Average cache write                0.360 s
  Average compiler                  20.770 s
  Average cache read hit             0.173 s
  Failed distributed compilations        0
  Cache location                  ghac, name: c53e4962b46713a17b429097c76c3a9dfe6c94ba4556b3b544ce89beb6872fc9, prefix: /sccache/
  Version (client)                0.12.0

https://github.com/NVIDIA/cuda-python/actions/runs/18753688569/job/53499960916?pr=1156#step:17:1336

I dunno what happened but I see many new caches generated in the last hour or so:
https://github.com/NVIDIA/cuda-python/actions/caches?query=sort%3Acreated-desc

Let me kick off another run to ensure we get cache hits.

@leofang leofang closed this Oct 23, 2025
@leofang leofang reopened this Oct 23, 2025
@leofang
Copy link
Member

leofang commented Oct 23, 2025

/ok to test 6cfe70d

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

Let me kick off another run to ensure we get cache hits.

Cool. Maybe the cache is somehow keyed off the contents of the build-wheel.yml file. It wouldn't be a crazy thing to do.

@leofang
Copy link
Member

leofang commented Oct 23, 2025

It wouldn't be a crazy thing to do.

Sounds crazy to me lol... I thought it's only relevant to compiler versions, flags, args, and .cpp file contents, if the cache can be invalidated easily due to workflow changes, it is a bit annoying. GitHub Cache space is only 10 GB and we've been managing it carefully so far. It's a scarce resource.

Anyway, cache hits are back: https://github.com/NVIDIA/cuda-python/actions/runs/18755592039/job/53506633614?pr=1156#step:17:1334

@mdboom
Copy link
Contributor Author

mdboom commented Oct 23, 2025

Anyway, cache hits are back: https://github.com/NVIDIA/cuda-python/actions/runs/18755592039/job/53506633614?pr=1156#step:17:1334

Great. And it looks like a cuda_bindings build is now 55s vs. 220s (approx). So, around 3 minutes saved (+ the time saved on the 2 cuda_core builds). Not bad.

@cpcloud
Copy link
Contributor

cpcloud commented Oct 23, 2025

I tinkered with sccache locally, and the remaining 55s is mostly spent in the linker, which is a whole other can of worms :)

@leofang leofang merged commit 848ecd2 into NVIDIA:main Oct 23, 2025
119 of 128 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants