Skip to content

Conversation

@jameslamb
Copy link
Member

Description

This project runs nightly builds and tests on a cron schedule:

name: Trigger Nightly cuOpt Pipeline
on:
workflow_dispatch:
schedule:
- cron: "0 5 * * *" # 5am UTC / 1am EST

Tests need to wait for builds to finish, and that's currently done with some shell scripts that hit the GitHub API, using a mix of sleep and polling.

This has sometimes resulted in nightly failures (network errors, timeouts, etc.). This PR proposes reducing the risk of such failures by moving that logic into GitHub Actions configuration directly, specifically:

  • making build.yaml and test.yaml re-usable workflows and having nightly.yaml call them with uses:
  • blocking test.yaml until build.yaml passes using needs:

Issue

Closes #122

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@jameslamb jameslamb added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Aug 28, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rgsl888prabhu
Copy link
Collaborator

@jameslamb Please let me know if it is in a good state to be moved to review

@jameslamb
Copy link
Member Author

@jameslamb Please let me know if it is in a good state to be moved to review

Not yet, will @ when it's ready.

@jameslamb jameslamb mentioned this pull request Sep 22, 2025
8 tasks
@jameslamb
Copy link
Member Author

/ok to test

rapids-bot bot pushed a commit that referenced this pull request Sep 22, 2025
I'd missed `cuopt` in rapidsai/shared-workflows#376, and also want to pull these changes off of #359 to shrink the diff there.

## Issue

Helps with #122 (via making #359 a bit easier to review)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

URL: #407
@jameslamb
Copy link
Member Author

Closing this. I think the combination of #409 and #408 will be a simpler, more resilient solution.

@jameslamb jameslamb closed this Sep 23, 2025
@jameslamb jameslamb deleted the fix/nightly-tests branch September 23, 2025 04:08
rapids-bot bot pushed a commit that referenced this pull request Sep 24, 2025
Replaces #359 (my more-complicated earlier attempt at this)

This project runs nightly builds and tests on a cron schedule:

https://github.com/NVIDIA/cuopt/blob/36a6a1c0edf42cec2cf07c6be3f16531f33515de/.github/workflows/nightly.yaml#L1-L6

Tests need to wait for builds to finish, and that's currently done with some shell scripts that hit the GitHub API, using a mix of `sleep` and polling.

This has sometimes resulted in nightly failures (network errors, timeouts, etc.). This PR proposes reducing the risk of such failures by moving that logic into GitHub Actions configuration directly, specifically:

* making `build.yaml` trigger `test.yaml` with the GitHub CLI **only after all package builds and publishing have finished**

## Issue

Contributes to #122

## Notes for Reviewers

### How I tested this

I manually triggered this run of the "Trigger Nightly cuOpt Pipeline": https://github.com/NVIDIA/cuopt/actions/runs/17935159871

Which triggered this `build` run: https://github.com/NVIDIA/cuopt/actions/runs/17935161536

Which triggered this `test` run:  https://github.com/NVIDIA/cuopt/actions/runs/17936474025

Things look ok to me!

The `test` run was triggered until after all the relevant package builds and uploads were done, and BEFORE the docker image builds were done (as intended, to not be delayed waiting on them).

There are some test failures from artifact-downloading, like this:

```text
[rapids-github-run-id] Querying the GitHub API to determine relevant run of 'build.yaml'.
Downloading and decompressing cuopt_wheel_python_cuopt_server_cu12_py312_x86_64 from Run ID 17936253863 into /tmp/tmp.pqrBXIhMlP
```

But I think they'll be fixed by merging #409 

And the naming changes for the image builds look good 😁 

<img width="317" height="203" alt="image" src="https://github.com/user-attachments/assets/31bac7bd-1c4d-4c31-9ce9-9863778c2e89" />

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

URL: #408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Nightly build and testing is failing

3 participants