chore: run all tests even on failure by olivermeyer · Pull Request #481 · aignostics/python-sdk

olivermeyer · 2026-03-17T14:57:51Z

Currently if one test suite fails, the following suites never run (e.g. if integration fails we never run E2E regular + long-running). This leads to under-reported code coverage.

Instead, we now run all tests regardless of whether the previous suite failed and then catch all failures in a single step.

Copilot

Pull request overview

Adjusts CI execution flow so coverage from long-running E2E tests is still produced on main/release tags even when regular E2E tests fail (e.g., due to transient 500/503s), helping keep Codecov coverage reporting stable for default-branch builds.

Changes:

Adds an explicit if: condition so the “E2E / long running” step runs on main and refs/tags/v* even after earlier step failures.
Preserves existing PR behavior by only running long-running E2E tests when earlier steps succeeded.

claude · 2026-03-17T15:02:29Z

Claude finished @olivermeyer's task in 2m 11s —— View job

PR Review: chore: run all tests even on failure

I'll analyze this PR and provide a comprehensive review.

Checklist:

Summary

✅ Excellent change! This PR solves a real problem with CodeCov coverage reporting by ensuring all test suites run to completion even when earlier suites fail. The implementation is well-designed and follows best practices.

Review Findings

✅ Strengths

Problem Clearly Identified: Under-reported coverage due to incomplete test execution is a legitimate issue that affects metrics accuracy.
Well-Structured Solution: The continue-on-error: true + final assertion gate pattern is industry standard and appropriate for this use case.
Comprehensive Coverage: All 5 test stages properly configured (unit, integration, e2e, long_running, very_long_running).
Clear Documentation: Inline comments explain the rationale effectively.
Proper Conditional Logic: Uses !cancelled() to allow manual cancellation while continuing on test failures.

🔍 Observations

1. Test Failures (Unrelated to PR Changes)

Two failing tests appear to be flaky/transient and not caused by this PR:

test_gui_cli_submit_to_run_result_delete - Assertion timing issue ("Application run cancelled!" found but assertion failed)
test_cli_run_submit_and_describe_and_cancel_and_download_and_delete - Timeout after 600s reading from SSL socket

These are environmental/timing issues, not workflow logic issues.

2. Conditional Logic is Correct

The PR description mentions ensuring E2E long-running tests run on main/tags even after regular E2E failures. Let me verify this is achieved:

Current logic (lines 207, 219):

# Long running step
if: ${{ !cancelled() }}

# Very long running step  
if: |
  !cancelled() && (
    contains(inputs.commit_message, 'enable:test:very_long_running') ||
    contains(github.event.pull_request.labels.*.name, 'enable:test:very_long_running')
  )

Analysis: ✅ The !cancelled() condition combined with continue-on-error: true on all preceding steps means:

Long-running tests will run on main/tags even if regular E2E fails
Very long-running tests will run (if enabled) even if all prior tests fail
This achieves the stated goal in the PR description

3. Documentation Update Needed

The .github/CLAUDE.md currently states (lines 215-222):

CI Behavior:

Draft PRs: Always skipped

Non-draft PRs: Run by default UNLESS: [skip conditions]

Main branch: Always run

Releases: Always run

This is still accurate but could be enhanced to clarify the new behavior that tests run even after failures on main/tags.

📝 Suggestions

1. Documentation Enhancement (Non-blocking)

Recommendation: Add a subsection to .github/CLAUDE.md around line 328 (after "Test Execution Flow in CI") to document this execution strategy:

### Test Execution Resilience

**All test steps use `continue-on-error: true`** to ensure every suite runs regardless of failures in earlier suites. This guarantees:

1. **Complete Coverage Reports**: CodeCov always receives data from all test suites, preventing under-reported coverage
2. **Full Visibility**: All test failures are visible in CI logs, not just the first failure
3. **Efficient Debugging**: Developers see all issues at once rather than fixing one at a time

**Final Gate**: A single "Assert no test failures" step at the end checks all test outcomes and fails the job if any suite failed. This preserves the expected CI behavior (job fails on test failure) while allowing reporting steps (Codecov, SonarQube, artifact upload) to complete.

**Execution Flow**:
```text
Unit Tests → Integration Tests → E2E Tests → Long Running → Very Long Running
     ↓              ↓                ↓             ↓              ↓
  (failures)   (failures)        (failures)    (failures)    (failures)
     ↓              ↓                ↓             ↓              ↓
     └──────────────┴────────────────┴─────────────┴──────────────┘
                                    ↓
                       Coverage Upload (always runs)
                       SonarQube Upload (always runs)
                       Artifact Upload (always runs)
                                    ↓
                       Assert no test failures
                         (fails job if any suite failed)


**Why this helps**: Future maintainers will understand the execution strategy's purpose and won't accidentally remove `continue-on-error` thinking it's a mistake.

#### 2. Consider Adding Job Summary Context (Optional Enhancement)

The final assertion step could optionally add context to the job summary showing which suites passed/failed:

```yaml
- name: Assert no test failures
  if: ${{ !cancelled() }}
  shell: bash
  run: |
    failed=()
    [[ "${{ steps.unit.outcome }}"              == "failure" ]] && failed+=("unit")
    [[ "${{ steps.integration.outcome }}"       == "failure" ]] && failed+=("integration")
    [[ "${{ steps.e2e.outcome }}"               == "failure" ]] && failed+=("e2e")
    [[ "${{ steps.long_running.outcome }}"      == "failure" ]] && failed+=("long-running")
    [[ "${{ steps.very_long_running.outcome }}" == "failure" ]] && failed+=("very-long-running")
    
    # Add summary to job output
    echo "## Test Suite Results" >> $GITHUB_STEP_SUMMARY
    echo "| Suite | Status |" >> $GITHUB_STEP_SUMMARY
    echo "|-------|--------|" >> $GITHUB_STEP_SUMMARY
    echo "| Unit | ${{ steps.unit.outcome }} |" >> $GITHUB_STEP_SUMMARY
    echo "| Integration | ${{ steps.integration.outcome }} |" >> $GITHUB_STEP_SUMMARY
    echo "| E2E | ${{ steps.e2e.outcome }} |" >> $GITHUB_STEP_SUMMARY
    echo "| Long Running | ${{ steps.long_running.outcome }} |" >> $GITHUB_STEP_SUMMARY
    echo "| Very Long Running | ${{ steps.very_long_running.outcome }} |" >> $GITHUB_STEP_SUMMARY
    
    if [[ ${#failed[@]} -gt 0 ]]; then
      echo "The following test suites failed: ${failed[*]}"
      exit 1
    fi
    echo "All test suites passed."

Benefit: Provides at-a-glance visibility of all suite results in the GitHub Actions summary.

Note: This is optional and not required for the PR to be approved.

Edge Cases Considered

✅ Manual Cancellation: !cancelled() allows users to cancel workflow without triggering unwanted test runs

✅ Skipped Steps: If a step is skipped (e.g., due to skip markers), outcome will be skipped not failure, so the assertion won't falsely fail

✅ PR vs Main Behavior: The logic works identically on PRs and main/tags, which aligns with the goal (complete coverage on both)

✅ Very Long Running Opt-in: Correctly preserves the enable-only behavior while applying the continue-on-error pattern

Workflow Validation

Commit Message: ✅ Follows conventional commits (chore: ...)

Changed Files: ✅ Only workflow file (.github/workflows/_test.yml)

No Breaking Changes: ✅ PR and main branch CI behavior preserved, only execution strategy changed

Verdict

✅ APPROVED (with documentation suggestion)

Blocking Issues: None

Suggestions:

📚 Documentation enhancement (non-blocking) - Add execution resilience section to .github/CLAUDE.md
🎯 Optional improvement - Add test suite summary table to final assertion step

Ready to Merge: Yes, pending CI completion (the two failing tests appear to be flaky environmental issues unrelated to this change)

Next Steps

For this PR: Monitor CI completion. If the two flaky tests continue failing, consider:
- Adding @pytest.mark.flaky(retries=3) to both tests
- Investigating root cause of timing/network issues
Follow-up PR (optional): Add documentation enhancement to .github/CLAUDE.md as suggested above

Excellent work on improving CI reliability and coverage reporting accuracy! 🎉

codecov · 2026-03-17T15:47:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.
see 8 files with indirect coverage changes

Copilot

Pull request overview

Updates the reusable CI test workflow to ensure all test suites run even if an earlier suite fails, improving the completeness of coverage/reporting uploads (e.g., to Codecov) while still failing the job at the end when any suite fails.

Changes:

Mark Unit/Integration/E2E/long-running/very-long-running test steps as continue-on-error: true and assign step IDs.
Add a final “Assert no test failures” gate step that fails the job if any suite’s step outcome is failure.
Refine the conditional for the “very long running” suite to include !cancelled() alongside the enable markers.

sonarqubecloud · 2026-03-18T07:58:35Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

olivermeyer requested a review from helmut-hoffer-von-ankershoffen as a code owner March 17, 2026 14:57

Copilot AI review requested due to automatic review settings March 17, 2026 14:57

olivermeyer added the claude Trigger Claude Code automation label Mar 17, 2026

Copilot started reviewing on behalf of olivermeyer March 17, 2026 14:58 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml Outdated

claude Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml Outdated

olivermeyer changed the title ~~chore: run long running tests even if regular tests fail on main~~ chore: run all tests even on failure Mar 18, 2026

olivermeyer force-pushed the chore/fix-coverage branch from 1fbf8c4 to bda9f00 Compare March 18, 2026 07:05

chore: run all tests even on failure in previous steps

3ff7f16

olivermeyer force-pushed the chore/fix-coverage branch from bda9f00 to 3ff7f16 Compare March 18, 2026 07:17

Copilot AI review requested due to automatic review settings March 18, 2026 07:17

Copilot started reviewing on behalf of olivermeyer March 18, 2026 07:18 View session

sentry Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml

Comment thread .github/workflows/_test.yml

olivermeyer merged commit 485f86f into main Mar 18, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: run all tests even on failure#481

chore: run all tests even on failure#481
olivermeyer merged 1 commit intomainfrom
chore/fix-coverage

olivermeyer commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

claude Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

olivermeyer commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

claude Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: chore: run all tests even on failure

Summary

Review Findings

✅ Strengths

🔍 Observations

1. Test Failures (Unrelated to PR Changes)

2. Conditional Logic is Correct

3. Documentation Update Needed

📝 Suggestions

1. Documentation Enhancement (Non-blocking)

Edge Cases Considered

Workflow Validation

Verdict

✅ APPROVED (with documentation suggestion)

Next Steps

Uh oh!

Uh oh!

codecov Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Mar 18, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

olivermeyer commented Mar 17, 2026 •

edited

Loading

claude Bot commented Mar 17, 2026 •

edited

Loading

codecov Bot commented Mar 17, 2026 •

edited

Loading