Skip to content

Conversation

@leofang
Copy link
Member

@leofang leofang commented Oct 24, 2025

Description

Follow-up of #1156.

In order to use sccache without patching setuptools (which IMHO is a long-term solution), we need to

  1. convince setuptools to pick up a custom wrapper instead of using the bare cl.exe directly
    • The regular path for setuptools to find cl.exe is via vswhere/vcvarsall, this is important for the below discussion.
  2. call sccache cl ... in the wrapper
    • setuptools invokes cl.exe via a subprocess

It turns out that I did the above two steps wrong in #1156.

  1. It is critical to bypass the vswhere/vcvarsall mechanism, and setuptools does provide a shortcut through DISTUTILS_USE_SDK=1. By setting it, setuptools will only rely on whatever is found in %PATH%.
  2. On Windows, we cannot rename a .bat or .cmd file to "cl.exe" and expect that it is invokable. Unlike on Linux it is not. So we need a real shim executable that invokes sccache for us. This is done by compiling a tiny cl_shim.c added in this PR and generating an executable named "cl.exe". When launching cibuildwheels, we prepend its path to %PATH% so that it is found the first.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Added steps to find cl.exe and generate sccache wrapper script for Windows builds.
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@github-actions
Copy link

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang leofang self-assigned this Oct 24, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do CI/CD CI/CD infrastructure labels Oct 24, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Oct 24, 2025
@leofang leofang changed the title [WIP] Revisit sccache for Windows Revisit sccache for Windows Oct 24, 2025
@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

@leofang leofang marked this pull request as ready for review October 24, 2025 13:54
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Oct 24, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

PR description is updated

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds Windows sccache support to speed up CI builds by implementing a C-based compiler shim that intercepts MSVC compiler invocations. The change adds ci/tools/cl_shim.c, which is compiled into a custom cl.exe executable that wraps calls through sccache cl.exe <args>, and modifies the build workflow to compile the shim, prepend it to PATH, and set DISTUTILS_USE_SDK=1 to force setuptools to bypass its normal vswhere/vcvarsall mechanism. This integrates with the existing CI infrastructure by reusing sccache setup steps already present for Linux, extending them to Windows with the necessary GitHub Actions cache environment variables. The PR also fixes a minor Linux path typo (/host${{ ... }}/host/${{ ... }}).

Important Files Changed

Filename Score Overview
ci/tools/cl_shim.c 2/5 New C shim that wraps MSVC compiler calls through sccache; contains critical memory safety issues including unchecked realloc, buffer overflow risk in argument quoting, and incomplete Windows argument escaping
.github/workflows/build-wheel.yml 3/5 Adds Windows sccache support by finding real cl.exe, compiling shim, configuring PATH/environment variables for cibuildwheel; includes path handling concerns between Git Bash and Windows

Confidence score: 2/5

  • This PR has multiple critical memory safety issues in the C shim code that could cause crashes or undefined behavior in edge cases, and path handling between Unix-style shells and Windows may cause failures.
  • Score reflects serious bugs in cl_shim.c: unchecked realloc return (lines 16-17) that will corrupt memory on allocation failure, buffer overflow vulnerability in argument quoting logic (lines 37-50) when arguments contain multiple quotes, and incomplete Windows backslash-before-quote escaping that violates MSDN CommandLineToArgvW rules.
  • ci/tools/cl_shim.c requires immediate attention for memory safety fixes; the workflow file needs verification that which cl.exe in Git Bash produces paths compatible with the PowerShell compilation step and that cygpath -w conversions work correctly in all cibuildwheel contexts.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant MSVC as MSVC Setup
    participant Compiler as Compile cl_shim.c
    participant CIBW as cibuildwheel
    participant Setuptools as setuptools
    participant Shim as cl.exe (shim)
    participant Sccache as sccache
    participant RealCL as cl.exe (real MSVC)

    GHA->>MSVC: Set up MSVC (ilammy/msvc-dev-cmd)
    MSVC->>GHA: MSVC environment ready, cl.exe in PATH
    
    GHA->>GHA: Find cl.exe location
    GHA->>GHA: Store path in CL_EXE env var
    GHA->>GHA: Set SCCACHE_WRAPPER_DIR
    
    GHA->>Compiler: Compile ci/tools/cl_shim.c
    Compiler->>GHA: Generate cl.exe shim executable
    
    GHA->>CIBW: Launch cibuildwheel with env vars
    Note over GHA,CIBW: DISTUTILS_USE_SDK=1<br/>PATH with SCCACHE_WRAPPER_DIR prepended<br/>CL_EXE pointing to real compiler
    
    CIBW->>Setuptools: Build wheel (setup.py)
    Note over Setuptools: DISTUTILS_USE_SDK=1 bypasses vswhere/vcvarsall
    
    Setuptools->>Shim: Invoke "cl.exe" from PATH
    Note over Shim: Found in SCCACHE_WRAPPER_DIR first
    
    Shim->>Shim: Read CL_EXE environment variable
    Shim->>Sccache: Execute "sccache <CL_EXE> <args>"
    
    Sccache->>Sccache: Check cache for compilation
    
    alt Cache hit
        Sccache->>Shim: Return cached result
    else Cache miss
        Sccache->>RealCL: Forward compilation to real cl.exe
        RealCL->>Sccache: Compilation result
        Sccache->>Sccache: Store in cache
        Sccache->>Shim: Return result
    end
    
    Shim->>Setuptools: Return exit code
    Setuptools->>CIBW: Build complete
    
    CIBW->>Sccache: Show cache statistics (CIBW_BEFORE_TEST)
    Sccache->>CIBW: Display stats
    
    CIBW->>GHA: Wheel artifacts ready
Loading

2 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

static void ensure_capacity(char **buf, size_t *cap, size_t need) {
if (*cap >= need) return;
while (*cap < need) *cap *= 2;
*buf = (char*)realloc(*buf, *cap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: realloc can fail and return NULL, which would overwrite the original pointer and leak memory. Check the return value before assignment.

static void append_arg(char **buf, size_t *len, size_t *cap, const char *arg) {
int quote = needs_quotes(arg);
if (quote) {
size_t need = *len + 3 + strlen(arg); /* space + quotes + content */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Size calculation underestimates space needed: doesn't account for existing characters that might need escaping (each " becomes \"). This can cause buffer overflow if arg contains multiple quotes.

Comment on lines +42 to +44
if (*p == '"') {
(*buf)[(*len)++] = '\\';
(*buf)[(*len)++] = '"';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Windows argument escaping is incomplete. Backslashes before quotes need special handling (\" requires \\\"). Consider referencing MSDN's CommandLineToArgvW rules or using a tested escaping library.

(*buf)[(*len)++] = '"';
(*buf)[*len] = '\0';
} else {
size_t need = *len + 1 + strlen(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: When not quoting, size calculation doesn't account for initial space character (should be *len + 1 + strlen(arg) + 1 for null terminator safety).

Comment on lines +69 to +72
char *cmd = (char*)malloc(cap);
if (!cmd) {
fprintf(stderr, "out of memory\n");
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Check malloc return value before dereferencing cmd on line74.

Comment on lines +77 to +78
strcpy(cmd, "sccache");
len = strlen(cmd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Hardcoding "sccache" prevents users from overriding the sccache binary path. Consider adding a SCCACHE_EXE environment variable for flexibility.

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

@copilot please address review comments

@leofang leofang removed their assignment Oct 24, 2025
@leofang leofang requested a review from Copilot October 24, 2025 13:57
@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

@copilot please address review comments by pushing to this PR directly

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements sccache support for Windows builds by introducing a compiled C shim that wraps the MSVC compiler. The approach bypasses setuptools' default compiler discovery mechanism and ensures sccache is invoked for all compilation commands.

Key changes:

  • Added a C-based shim executable that intercepts cl.exe calls and routes them through sccache
  • Modified GitHub Actions workflow to compile the shim and configure environment variables for sccache integration
  • Set DISTUTILS_USE_SDK=1 to make setuptools use the shim from PATH instead of discovering the compiler via vswhere/vcvarsall

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
ci/tools/cl_shim.c New C shim that wraps cl.exe invocations with sccache, handling argument quoting and process spawning
.github/workflows/build-wheel.yml Added steps to locate cl.exe, compile the shim, and configure environment variables for sccache on Windows

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

static void append_arg(char **buf, size_t *len, size_t *cap, const char *arg) {
int quote = needs_quotes(arg);
if (quote) {
size_t need = *len + 3 + strlen(arg); /* space + quotes + content */
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation for need doesn't account for escaped quotes within the argument. If the argument contains N double-quotes, you'll need N additional bytes for the backslashes. This could lead to buffer overflow. Calculate need by iterating through arg to count quotes, or use a more conservative estimate like *len + 3 + 2*strlen(arg).

Suggested change
size_t need = *len + 3 + strlen(arg); /* space + quotes + content */
size_t need = *len + 3 + 2*strlen(arg); /* space + quotes + content (worst case: every char is a quote) */

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +48
for (const char *p = arg; *p; ++p) {
if (*p == '"') {
(*buf)[(*len)++] = '\\';
(*buf)[(*len)++] = '"';
} else {
(*buf)[(*len)++] = *p;
}
}
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This escaping strategy is incomplete. According to Windows command-line parsing rules, backslashes before quotes need special handling - a sequence of N backslashes followed by a quote requires 2N+1 backslashes in the escaped form. The current implementation will fail for arguments like path\to\"file\". Consider implementing proper backslash escaping or documenting this limitation.

Suggested change
for (const char *p = arg; *p; ++p) {
if (*p == '"') {
(*buf)[(*len)++] = '\\';
(*buf)[(*len)++] = '"';
} else {
(*buf)[(*len)++] = *p;
}
}
const char *p = arg;
while (*p) {
size_t num_backslashes = 0;
while (*p == '\\') {
++num_backslashes;
++p;
}
if (*p == '"') {
/* Output 2N+1 backslashes, then the quote */
for (size_t i = 0; i < num_backslashes * 2 + 1; ++i)
(*buf)[(*len)++] = '\\';
(*buf)[(*len)++] = '"';
++p;
} else {
/* Output N backslashes, then the non-quote character (if any) */
for (size_t i = 0; i < num_backslashes; ++i)
(*buf)[(*len)++] = '\\';
if (*p) {
(*buf)[(*len)++] = *p;
++p;
}
}
}
/* At the end, double all trailing backslashes */
size_t num_trailing_backslashes = 0;
for (const char *q = arg + strlen(arg); q > arg && *(q-1) == '\\'; --q)
++num_trailing_backslashes;
for (size_t i = 0; i < num_trailing_backslashes; ++i)
(*buf)[(*len)++] = '\\';

Copilot uses AI. Check for mistakes.
static void ensure_capacity(char **buf, size_t *cap, size_t need) {
if (*cap >= need) return;
while (*cap < need) *cap *= 2;
*buf = (char*)realloc(*buf, *cap);
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check after realloc. If realloc fails and returns NULL, the original pointer in *buf is leaked and subsequent operations will crash. Store the result in a temporary variable and check before assigning to *buf.

Suggested change
*buf = (char*)realloc(*buf, *cap);
char *tmp = (char*)realloc(*buf, *cap);
if (!tmp) {
fprintf(stderr, "Out of memory in ensure_capacity\n");
exit(1);
}
*buf = tmp;

Copilot uses AI. Check for mistakes.
@leofang leofang self-assigned this Oct 24, 2025
@leofang leofang closed this Oct 24, 2025
@leofang leofang reopened this Oct 24, 2025
@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

/ok to test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR enables sccache-based build caching for Windows CI by creating a native C shim that intercepts MSVC compiler invocations. The shim (ci/tools/cl_shim.c) is compiled to a binary named cl.exe and prepended to PATH, allowing it to wrap the real cl.exe with sccache calls. The GitHub Actions workflow (.github/workflows/build-wheel.yml) is updated to locate the real compiler, compile the shim, set DISTUTILS_USE_SDK=1 to force setuptools to use PATH-based compiler discovery, and propagate all necessary sccache environment variables (GitHub Actions cache tokens, SCCACHE_DIR, SCCACHE_GHA_ENABLED) to the cibuildwheel containers for all three Windows builds (cuda.bindings and two cuda.core variants). This approach avoids patching setuptools directly by leveraging its extension points, integrating with the existing CI infrastructure that already uses sccache on Linux. The shim reads the real compiler path from the CL_EXE environment variable and builds a properly escaped command line to invoke sccache with all forwarded arguments. The workflow also adds CIBW_BEFORE_TEST_WINDOWS to display cache statistics, providing visibility into caching performance.

Important Files Changed

Filename Score Overview
ci/tools/cl_shim.c 2/5 Adds a C shim executable that wraps cl.exe invocations with sccache by building an escaped command line and spawning the process via CreateProcessA
.github/workflows/build-wheel.yml 4/5 Updates Windows workflow to locate real cl.exe, compile the shim, set DISTUTILS_USE_SDK=1, prepend shim to PATH, and propagate sccache environment variables to all three cibuildwheel invocations

Confidence score: 2/5

  • This PR introduces significant correctness risks in the shim implementation that could cause build failures or silent misbehavior in production
  • Score reflects critical unresolved issues in cl_shim.c memory safety and argument escaping (unchecked realloc, insufficient buffer size calculations accounting for quote escaping, incomplete Windows backslash-before-quote handling per CommandLineToArgvW rules, missing malloc null check), any of which could cause crashes or incorrect compiler invocations; the workflow changes are well-structured but depend entirely on the shim's correctness
  • Pay very close attention to ci/tools/cl_shim.c lines 13-17, 34-58, and 67-78 where memory allocation, size calculations, and argument escaping occur—these sections contain multiple unfixed issues flagged in previous reviews

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Workflow as build-wheel.yml
    participant MSVC as MSVC Setup
    participant Shim as cl_shim.c
    participant Python as Python/cibuildwheel
    participant Compiler as sccache + cl.exe

    GHA->>Workflow: Trigger workflow (win-64 platform)
    Workflow->>Workflow: Enable sccache action
    Workflow->>Workflow: Export GHA cache env vars
    Workflow->>Python: Set up Python 3.12
    Workflow->>MSVC: Set up MSVC dev environment
    MSVC->>Workflow: cl.exe now in PATH
    Workflow->>Workflow: Find cl.exe location
    Workflow->>Workflow: Set CL_EXE and SCCACHE_WRAPPER_DIR env vars
    Workflow->>Shim: Compile cl_shim.c to cl.exe
    Note over Shim: Creates shim cl.exe that wraps sccache
    Workflow->>Workflow: Set DISTUTILS_USE_SDK=1
    Workflow->>Workflow: Prepend SCCACHE_WRAPPER_DIR to PATH
    Workflow->>Python: Run cibuildwheel
    Python->>Python: Build wheel (setuptools invokes compiler)
    Python->>Shim: Execute "cl.exe" (finds shim first in PATH)
    Shim->>Shim: Read CL_EXE environment variable
    Shim->>Compiler: Execute "sccache <CL_EXE> <args>"
    Compiler->>Compiler: Check cache / compile
    Compiler->>Shim: Return exit code
    Shim->>Python: Return exit code
    Python->>Workflow: Wheel build complete
    Workflow->>Workflow: Show sccache stats
    Workflow->>GHA: Upload artifacts
Loading

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

Observations

  • Good news: the "Post Enable sccache" step can report accurate (aggregated) stats on Windows (because there is no container)
  • Bad news: PY39-311 have poor cache hits (~8%)
    • Good news: other Python versions reach 100%
  • Bad news: sccache helps very little on Windows in terms of reducing the end-to-end time spent on the build stage. Unlike on Linux where we had a 2x speedup (10 mins -> 5 mins), on Windows it's 10% speedup only (10 mins -> 9 mins). I suspect what @cpcloud said that the linker also takes a good chunk of time is worse with MSVC.

It makes me question if this is worth pursuing.

cc @mdboom @cpcloud

@cpcloud
Copy link
Contributor

cpcloud commented Oct 24, 2025

Savings of only 1 minute while introducing 200 lines of C seems like a costly tradeoff. I'll wait an extra minute to avoid having to maintain a compiler wrapper written in C, every day of the week.

@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

The C code should be re-written/fixed if we want to merge this, no doubt, and eventually removed in favor of pushing a patch to upstream setuptools for wrapping cl.exe. I can live with this crap for now. I was trying to say even that would not help. Getting 1 min speedup by stressing further our scarce cache space is a poor tradeoff to me, not to mention there are still mysterious cache misses.

@leofang leofang marked this pull request as draft October 24, 2025 14:42
@leofang
Copy link
Member Author

leofang commented Oct 24, 2025

(btw the C code was vibe-coded in commit 4f29e8f 😆 it looks ugly but it's nice that it was done in one shot to deliver a working prototype and unblock the experiment...)

@cpcloud
Copy link
Contributor

cpcloud commented Oct 24, 2025

I'm -1 on merging this for the above stated reasons.

Comment on lines +67 to +86
/* Build command line: sccache "<cl_path>" arg1 arg2 ... */
size_t cap = 1024;
char *cmd = (char*)malloc(cap);
if (!cmd) {
fprintf(stderr, "out of memory\n");
return 1;
}
cmd[0] = '\0';
size_t len = 0;
/* start with 'sccache' */
strcpy(cmd, "sccache");
len = strlen(cmd);

/* append the compiler path (quoted if necessary) */
append_arg(&cmd, &len, &cap, cl_path);

/* append the rest of the args (skip argv[0]) */
for (int i = 1; i < argc; ++i) {
append_arg(&cmd, &len, &cap, argv[i]);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot keeps insisting that this is the right thing to do. But I read the Windows API reference and I am going to remove all these stupid parsing in favor of a simple GetCommandLineW call. This really shouldn't be that difficult, and I suspect the cache misses might have to do with incorrect forwarding of the cmdline arguments (passed by setuptools).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants