CI : testcase prioritization in macos workflow#7335
Conversation
| @@ -0,0 +1 @@ | |||
| This PR intends to implement testcase prioritization in macos workflow. No newline at end of file | |||
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| This PR intends to implement testcase prioritization in macos workflow. | |
| This PR intends to implement testcase prioritization in macos workflow. | |
|
What about using the « debug logging » flag that you can use for manual reruns in GitHub actions? I think it might be useful for that kind of fallback, providing an escape hatch. We might want to have clean runs for the main branch and the Releasebranches that we backport too, and especially when doing releases. We would want to avoid cache poisoning. |
|
Your idea also makes me think of codecov ATS, automated test selection, I don’t know where it’s at now, or if it still exists at all. |
|
I think you forgot to commit some files to your PR |
| name: testreport-macOS | ||
| path: testreport | ||
| retention-days: 3 | ||
| retention-days: 3 No newline at end of file |
There was a problem hiding this comment.
[pre-commit] reported by reviewdog 🐶
| retention-days: 3 | |
| retention-days: 3 | |
@echoix @wenzeslaus I have pushed the code and tested against my fork. I have enabled flag structures so in any case if something goes wrong we can just enable flags and make sure there is no downtime. Also i took help of AI for scripting so if anything you dont like please let me know i will change. Also i am open to suggestion, thoughts, critical points. |
For specific branches i havent implemented yet but can do that. For that as well we can use flags |
Hello everyone,
The intention of this PR is to reduce CI time by implementing test case prioritization in the macOS workflow.
Recently, my paper ["PrioTestCI: Efficient Test Case Prioritization in GitHub Workflows for CI Optimization"](https://ieeexplore.ieee.org/document/11334426) was accepted at the Automated Software Engineering Conference (IEEE/ACM) 2025. In this paper, we applied test case prioritization on Pytest's fork repo and saw a significant reduction in CI runtime (81.55% on average). Detailed results can be found in the paper.
I believe the same approach could be applied to the GRASS repo, so over the past few months I have been working on implementing it here.
How it works
First run on a PR — All test cases run as usual. The results (passed and failed) are stored as GitHub Artifacts.
On subsequent commits to the same PR:
Check the artifacts from the previous run for any failed test cases.
If failed tests exist, re-run those first.
If they still fail → stop early and provide immediate feedback to the developer.
If they now pass → continue running the remaining test cases.
Why run failed tests first? When a contributor pushes follow-up commits, they are usually trying to fix an issue. There is a high chance the same test case will fail again. By running those first, we can stop early and free up the runner so other PRs get a chance to run.
No mixing of results across PRs
Each PR gets its own isolated artifact storage scoped by PR ID. For example, if there are two open PRs (PR-1135 and PR-1127), the artifacts are stored in separate folders (pr_1135/ and pr_1127/). When fetching previous results, we only retrieve the artifacts for that specific PR.
Deployment strategy — looking for your opinions
The main concern is stability: what if something goes wrong with the prioritization logic and CI itself breaks? I want to minimize downtime and would appreciate your feedback on the following strategies:
Strategy A (Simple rollback): If anything goes wrong, revert to the previous macOS workflow and disable the new one. This is the simplest approach.
Strategy B (Automatic fallback flag): Use a fallback flag that starts as false at the beginning of each run. The workflow performs a series of checks (Is the GitHub API responding? Did the previous artifact download successfully? Did the JSON parse correctly?). If any check fails, the flag flips to true and the workflow automatically falls back to running tests using the original logic — no manual intervention needed. This works independently for each test section (pytest and gunittest), so a failure in one section doesn't affect the other.
These are the deployment strategies I have so far. I am also actively looking into other approaches. Please let me know your thoughts and suggestions.
References
📄 Paper: PrioTestCI — IEEE Xplore
📊 Workflow Diagram: View Diagram