Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 25, 2025

Fixes #810 by updating both Linux and Windows test workflows to always fetch the cuda-pathfinder wheel from the latest successful run on the main branch, rather than from the current branch's build artifacts. Additionally removes redundant testing of components that are not actively maintained in this branch.

Problem

The current CI design has issues with outdated artifacts and redundant testing:

  1. Outdated pathfinder artifacts: The 12.9.x branch test workflows download cuda-pathfinder from the same branch as other components. However, cuda-pathfinder is only actively maintained on the main branch, so using artifacts from the 12.9.x branch could lead to outdated or inconsistent pathfinder functionality.

  2. Redundant component testing: The workflows run tests for cuda-pathfinder and cuda.core on backport branches, but since these components are only developed on main, these tests are redundant and don't add value.

Solution

This PR modifies both test-wheel-linux.yml and test-wheel-windows.yml to:

  1. Replace the existing artifact download step for cuda-pathfinder with new steps that fetch from the main branch
  2. Follow the existing pattern used by the "Download cuda-python & cuda.bindings build artifacts from the prior branch" steps
  3. Ensure cross-platform compatibility with proper bash/PowerShell implementations
  4. Remove all pathfinder and cuda.core test steps since the focus should be on ensuring cuda.bindings 12.x works correctly with the latest versions of these components, not testing them independently

The new pathfinder download steps:

  • Install GitHub CLI if not available
  • Query the latest successful CI run on the main branch
  • Download the cuda-pathfinder-wheel artifact from that run
  • Place it in the expected ./cuda_pathfinder directory

Key Features

  • Consistent behavior: Both workflows now guarantee they use the same, up-to-date pathfinder from main
  • Focused testing: Removed redundant component tests; workflows focus on cuda.bindings compatibility with latest versions
  • Error handling: Proper validation that the latest run ID is found before attempting download
  • Backward compatibility: Maintains the same directory structure expected by downstream workflow steps
  • Follows existing patterns: Uses the same approach as the existing cross-branch artifact fetching logic

This change ensures that 12.9.x branch testing always uses the maintained implementations from main for compatibility testing with cuda.bindings, while eliminating redundant testing and the risk of using stale code from components not maintained in this branch.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Aug 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Please fix NVIDIA/cuda-python#810 targeting the 12.9.x branch. There should be a job step in both Linux/Windows tests that looks similar to this step "Download cuda-python & cuda.bindings build artifacts from the prior branch", but instead always fetch... CI: Fix pathfinder CI infrastructure to fetch from main branch Aug 25, 2025
Copilot finished work on behalf of leofang August 25, 2025 22:09
@Copilot Copilot AI requested a review from leofang August 25, 2025 22:09
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leofang
Copy link
Member

leofang commented Aug 25, 2025

@copilot address the comments above

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Aug 25, 2025

@copilot address the comments above

Addressed both comments: consolidated GitHub CLI installation in Linux workflow (removed duplication) and fixed step ordering in Windows workflow to ensure gh is available before pathfinder download. Changes made in commit c110dcf.

Copilot finished work on behalf of leofang August 25, 2025 22:30
@Copilot Copilot AI requested a review from leofang August 25, 2025 22:30
@leofang
Copy link
Member

leofang commented Aug 25, 2025

/ok to test c110dcf

@leofang leofang added bug Something isn't working P0 High priority - Must do! CI/CD CI/CD infrastructure cuda.pathfinder Everything related to the cuda.pathfinder module labels Aug 25, 2025
@leofang leofang added this to the cuda-python 13-next, 12-next milestone Aug 25, 2025
@leofang
Copy link
Member

leofang commented Aug 25, 2025

@copilot please remove all pathfinder test steps from the test workflows. As mentioned we only develop (and thus) pathfinder on the main branch. So for the backport branch, the focus is to ensure cuda.bindings 12.x can run correctly with the latest pathfinder.

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Aug 25, 2025

@copilot please remove all pathfinder test steps from the test workflows. As mentioned we only develop (and thus) pathfinder on the main branch. So for the backport branch, the focus is to ensure cuda...

Removed all pathfinder test steps from both Linux and Windows test workflows as requested. The workflows now focus on ensuring cuda.bindings 12.x works correctly with the latest pathfinder from main, without redundantly testing pathfinder itself. Fixed in commit 7f46e6c.

@Copilot Copilot AI changed the title CI: Fix pathfinder CI infrastructure to fetch from main branch CI: Fix pathfinder CI infrastructure and remove redundant pathfinder testing Aug 25, 2025
Copilot finished work on behalf of leofang August 25, 2025 23:04
@leofang
Copy link
Member

leofang commented Aug 25, 2025

/ok to test 7f46e6c

@leofang
Copy link
Member

leofang commented Aug 25, 2025

@copilot for the same reasons, we should remove the cuda.core tests from both test workflows. Only cuda-bindings and cuda-python 12.x are maintained in this branch.

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Aug 25, 2025

@copilot for the same reasons, we should remove the cuda.core tests from both test workflows. Only cuda-bindings and cuda-python 12.x are maintained in this branch.

Removed all cuda.core test steps from both Linux and Windows test workflows as requested. The workflows now focus only on cuda-bindings and cuda-python 12.x testing, which are the components maintained in this branch. Fixed in commit 402759e.

@Copilot Copilot AI changed the title CI: Fix pathfinder CI infrastructure and remove redundant pathfinder testing CI: Fix pathfinder CI infrastructure and remove redundant component testing Aug 25, 2025
Copilot finished work on behalf of leofang August 25, 2025 23:53
@leofang leofang requested review from cryos and rwgk August 26, 2025 00:06
@leofang leofang marked this pull request as ready for review August 26, 2025 00:06
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about a potential simplification. If it looks interesting, we could do that in a separate PR. I see the code is just moved around in this PR.

Comment on lines +152 to +162
- name: Install GitHub CLI
# gh is needed for artifact fetching in subsequent steps
run: |
# See https://github.com/cli/cli/blob/trunk/docs/install_linux.md#debian-ubuntu-linux-raspberry-pi-os-apt.
mkdir -p -m 755 /etc/apt/keyrings \
&& out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
&& cat $out | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt update \
&& apt install gh -y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look into using an existing action?

This is from ChatGPT, I didn't try it out:

      - name: Setup GitHub CLI
        uses: actions4gh/setup-gh@v1
        with:
          gh-version: latest         # or specify "2.65.0", etc.
          token: ${{ secrets.GITHUB_TOKEN }}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems setup-gh is no longer maintained... 😢 https://github.com/actions4gh/setup-gh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we already have a install_unix_deps action. Perhaps we could just move this bits there to allow using it to install gh?

@leofang
Copy link
Member

leofang commented Aug 26, 2025

/ok to test 402759e

@leofang leofang enabled auto-merge (squash) August 26, 2025 17:57
@leofang leofang merged commit 6a83341 into 12.9.x Aug 26, 2025
42 checks passed
@leofang leofang deleted the copilot/fix-13ebb0da-7044-4524-bd83-05314e7e7e53 branch August 26, 2025 18:19
@leofang leofang linked an issue Aug 26, 2025 that may be closed by this pull request
@cpcloud cpcloud mentioned this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD CI/CD infrastructure cuda.pathfinder Everything related to the cuda.pathfinder module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Fix pathfinder CI infrastructure

3 participants