Skip to content

CI hygiene: fail loudly, gate the merge queue, log on failure #876

@kovtcharov

Description

@kovtcharov

Part of #875 (Tier 1).

Goal

Restore CI signal integrity. Today, multiple workflows swallow failures, lack timeouts, delete logs, and only two workflows actually gate the merge queue.

Why

Per CLAUDE.md's "no silent fallbacks — fail loudly" rule, hiding failures at the workflow layer is the same anti-pattern as catching exceptions in code. Contributors learn to ignore CI red.

Scope (single PR)

A. Stop swallowing failures

  • .github/workflows/test_code_agent.yml:98,107,120,128,143,153,179 — remove || true and [WARNING] non-blocking for now echoes; mark genuinely-broken tests xfail with reason
  • .github/workflows/test_gaia_cli_linux.yml:169,183,198 — replace || TEST_EXIT=$? + echo with proper exit $TEST_EXIT
  • .github/workflows/test_mcp.yml:201,226 — drop || true from gaia mcp test and gaia mcp stop
  • .github/workflows/build_cpp.yml:374-379 — promote integration/benchmark ::warning:: to job failure (or move to a separate non-required job with explicit naming)
  • .github/workflows/test_sd.yml:46,80,116,148 — remove three continue-on-error: true blocks; if SD requires Lemonade ≥ 9.2.0, fail loudly with an actionable error instead of silently SKIP_SD_TESTS=true

B. Tighten the merge-queue gate

  • .github/workflows/merge-queue-notify.yml:18-21 — expand the workflow_run.workflows watch list to include the integration suites that should block merges, OR wire test_gaia_cli.yml's existing test-summary (line 84) as the single required check by adding the right pull_request trigger and removing the workflow_call-only constraint
  • Document the merge-gate policy in docs/reference/dev.mdx

C. Add timeouts and log uploads

  • Add timeout-minutes to: lint.yml, test_unit.yml:42, test_chat_agent.yml:46, test_code_agent.yml:40, test_security.yml:42,133, test_rag.yml:39,73, pypi.yml:16. Default ≤ 60.
  • Add if: failure() artifact upload of lemonade.log, pytest XML, and any agent stdout to: test_gaia_cli_linux.yml, test_chat_agent.yml, test_code_agent.yml, test_security.yml, test_unit.yml
  • Remove the rm -f lemonade.log in .github/workflows/test_gaia_cli_linux.yml:213

D. Re-enable Dependabot

  • .github/dependabot.yml lines 21, 37, 57, 74, 89, 106, 121 — change all seven open-pull-requests-limit: 0 to a real number (suggest 5)
  • Add assignees: [kovtcharov-amd] to security ecosystems

E. Post-publish smoke

  • Append a job to .github/workflows/publish.yml that runs on a fresh hosted Ubuntu after the PyPI upload completes: pip install gaia==<published-version> + gaia --version + gaia --help. Fail the workflow (and ideally roll back the GitHub release tag) if this fails.

Acceptance criteria

  • grep -RE '\|\| true|TEST_EXIT' .github/workflows/test_*.yml returns 0 hits
  • A deliberately failing test in any of the integration sub-workflows blocks merge queue
  • A failing CI run leaves logs in artifacts (no need to re-run locally to debug)
  • A Dependabot PR opens within one week of merge
  • Deliberately publishing a broken wheel fails the post-publish smoke before the release is announced

Out of scope

  • New tests (separate sub-issues)
  • Coverage aggregation (separate sub-issue)
  • macOS / Python matrix (separate sub-issue)

Metadata

Metadata

Assignees

No one assigned

    Labels

    consumerBlocks consumer adoption — must ship for the v0.20.0 consumer launch windowdevopsDevOps/infrastructure changesdomain:qualityTests, CI/CD, security, performance, evalsp0high prioritytech debttier-0Tier 0 — must ship before parallel agent work begins (test/CI prerequisites)track:platformFoundation that both consumer-app and oem-pc tracks consume

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions