Skip to content

feat(ci): add post-publish smoke test to validate PyPI install#1239

Merged
kovtcharov-amd merged 3 commits into
mainfrom
feat/1156-post-publish-smoke
May 29, 2026
Merged

feat(ci): add post-publish smoke test to validate PyPI install#1239
kovtcharov-amd merged 3 commits into
mainfrom
feat/1156-post-publish-smoke

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

After publish.yml publishes to PyPI, there's no verification that the package actually installs and works — a broken wheel can reach users undetected. This adds a post-publish-smoke job that installs amd-gaia from PyPI on a fresh runner and verifies all CLI entry points and the Python import work with the correct version.

Test plan

  • Review the new post-publish-smoke job in .github/workflows/publish.yml
  • Confirm needs: [validate, publish-pypi] and if: needs.publish-pypi.result == 'success' are correct
  • Confirm retry loop handles PyPI propagation delay (5 attempts × 30s)
  • Confirm no continue-on-error: true — failure is loud
  • Verify the job runs in parallel with github-release (not blocking it)

Closes #1156

@github-actions github-actions Bot added the devops DevOps/infrastructure changes label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The "Verify Python import" step will throw an AttributeError on every publish run — gaia.__init__.py doesn't export __version__, so the smoke test that's supposed to catch broken releases will itself be broken from day one. One blocking fix needed; everything else looks solid.


Summary

The new post-publish-smoke job correctly chains off publish-pypi, retries for PyPI propagation delay, and fails loudly (no continue-on-error). The dependency graph is clean — github-release runs in parallel since it doesn't need post-publish-smoke. Only one step has a functional bug, but it's in the critical verification path.


Issues Found

🔴 Critical — gaia.__version__ doesn't exist (publish.yml:449–454)

The "Verify Python import" step runs:

python -c "import gaia; print(gaia.__version__)"

gaia/__init__.py exports Agent, DatabaseAgent, tool, etc. — but not __version__. The version constant lives in gaia.version.__version__ (src/gaia/version.py:9). This step will always raise AttributeError: module 'gaia' has no attribute '__version__', causing the smoke test to fail on every release.

Two equally valid fixes — pick one:

Option A — import from the correct submodule (tests the internal constant):

          IMPORT_VERSION=$(python -c "from gaia.version import __version__; print(__version__)")

Option B — use importlib.metadata (tests what PyPI registered, arguably more appropriate for a post-publish check):

          IMPORT_VERSION=$(python -c "from importlib.metadata import version; print(version('amd-gaia'))")

🟢 Minor — gaia-emr not exercised (publish.yml:440–444)

setup.py registers four binary entry points: gaia, gaia-cli, gaia-mcp, gaia-code, and gaia-emr. The step tests three of them:

gaia --help
gaia-code --help
gaia-mcp --help

gaia-emr --help is missing. A broken gaia-emr install (e.g. a missing optional dependency imported at module load) would go undetected.

          gaia --help
          gaia-code --help
          gaia-mcp --help
          gaia-emr --help

Strengths

  • Dependency wiring is correct: needs: [validate, publish-pypi] with if: needs.publish-pypi.result == 'success' is the right pattern. github-release doesn't depend on this job, so they run in parallel — the release isn't blocked while the smoke test runs.
  • Retry loop is well-constructed: for attempt in 1 2 3 4 5 with a 30 s sleep handles typical PyPI CDN propagation delays (~2.5 min window) without over-engineering.
  • Fails loudly: No continue-on-error, no silent degradation — consistent with CLAUDE.md's "fail loudly" convention.

Verdict

Request changes — the blocking gaia.__version__ bug will cause the smoke test to fail on the first release that uses this workflow, which defeats its purpose. Fix the import path (either option works), then this is ready to merge.

@github-actions
Copy link
Copy Markdown
Contributor

Solid, focused addition that closes a real gap in the release pipeline. Two minor nits before this is pristine.

Summary

The post-publish-smoke job correctly catches broken PyPI installs that would otherwise reach users silently. The retry loop, dual-verification approach (CLI + import), and loud-failure posture all follow GAIA conventions. Two small issues in the install retry and version-grep logic are worth fixing.

Issues Found

🟢 Minor — grep -q uses regex matching on the version string (.github/workflows/publish.yml:52)

grep -q "$TAG_VERSION" treats the version as a regex, so dots (.) match any character — 0.18.1 would match 0X181. More practically, it's a substring match, so version 0.18.1 would also match against a gaia --version output of 0.18.10, giving a false positive if an older patch is somehow installed.

The import-version step already does a precise != equality check, so the grep step's false-positive window is narrow — but it's easy to close:

          if ! echo "$VERSION_OUTPUT" | grep -qF "$TAG_VERSION"; then

-F treats the pattern as a fixed string. If you want an exact line match (ruling out the 0.18.100.18.1 substring case), use grep -qxF instead and ensure the version occupies its own line in the output.


🟢 Minor — sleep 30 fires on the 5th failed attempt, adding 30 s of dead time (.github/workflows/publish.yml:38–42)

On the final loop iteration, pip fails → the script prints "PyPI package not available yet, waiting 30s…" and sleeps before the loop exits. There is no 6th attempt; the sleep is wasted and the message is misleading.

      - name: Install from PyPI (with retry for propagation delay)
        env:
          TAG_VERSION: ${{ needs.validate.outputs.tag_version }}
        run: |
          for attempt in 1 2 3 4 5; do
            echo "Attempt $attempt: pip install amd-gaia==$TAG_VERSION"
            if pip install "amd-gaia==$TAG_VERSION"; then
              echo "Installed amd-gaia==$TAG_VERSION successfully"
              exit 0
            fi
            if [ "$attempt" -lt 5 ]; then
              echo "PyPI package not available yet, waiting 30s..."
              sleep 30
            fi
          done
          echo "ERROR: Failed to install amd-gaia==$TAG_VERSION after 5 attempts"
          exit 1

Strengths

  • Fail-loud by design. No continue-on-error: true, and the loop's explicit exit 1 after all retries are exhausted is exactly what GAIA expects per the "No Silent Fallbacks" rule.
  • Two independent verification paths. CLI version via gaia --version and Python import via importlib.metadata catch different classes of failure (entry-point misconfiguration vs. metadata packaging bugs). The importlib.metadata approach avoids the brittle pkg_resources deprecation.
  • Correct parallelism. The job runs after publish-pypi but does not block github-release, which is the right call — once the wheel is on PyPI it can't be rolled back, so blocking the release page serves no purpose. The design is intentional and sound.

Verdict

Approve with suggestions — both nits are safe to merge as-is, but the grep -qF fix in particular takes 5 seconds and eliminates a latent false-positive class. Recommend applying before merge.

…bypass, gate release

grep -qF prevents dots in version strings from being treated as regex
wildcards. --no-cache-dir ensures retries actually re-query PyPI instead
of serving a cached 404. The final-attempt sleep guard avoids a wasted
30s wait when all retries are exhausted. Most critically, github-release
now depends on post-publish-smoke instead of publish-pypi so a broken
wheel actually blocks the release.
@github-actions
Copy link
Copy Markdown
Contributor

Solid addition that closes a real gap: before this, a broken wheel could reach users with no automated catch between pypa/gh-action-pypi-publish and the GitHub Release. The implementation is clean and all the previously-reviewed edge cases (final-sleep skip, cache bypass, grep -qF, gaia-emr entry point, release gate) are already resolved in the current HEAD.

Issues Found

🟢 Minor — Test plan item is backwards (description, not code)

The test plan says "Verify the job runs in parallel with github-release (not blocking it)" — but the diff shows the opposite: github-release now needs post-publish-smoke, so the release is explicitly gated on the smoke test passing.

That's actually the correct design (catching a bad wheel before publishing the Release page is the whole point), but the test plan checkbox describes it wrong. Worth correcting in the PR description so reviewers tick the right mental box.


🟢 Minor — Only Python 3.11 tested

The smoke test runs on Python 3.11. If GAIA supports 3.10 (the version in docs/reference/dev.mdx setup instructions), a wheel that breaks on 3.10 would pass the smoke test.

For a smoke test this is acceptable, but noting it so the team can decide whether to add a 3.10 leg in a follow-up.


🟢 Nit — Pre-existing silent fallback in version.py (not introduced here)

src/gaia/version.py:24 returns "" on any importlib.metadata exception — a silent fallback that CLAUDE.md flags as tech debt. The smoke test would still catch it (empty version fails grep -qF "$TAG_VERSION"), but the root cause would be obscure. Out of scope for this PR; flagging so it gets tracked.


Strengths

  • Retry loop is correct: skips the final sleep 30 before the failure message — no pointless 30-second pause on attempt 5 — and the loop exits 0 immediately on first success.
  • --no-cache-dir + grep -qF: both are the right choices here. Cache bypass prevents a stale pip cache returning an old wheel; fixed-string grep avoids version numbers being mis-interpreted as regex metacharacters.
  • Dual verification (CLI output + importlib.metadata) catches both "wrong wheel installed" and "entry point broken" independently.
  • Gate wired correctly: replacing publish-pypi with post-publish-smoke in github-release's needs list means a bad PyPI upload now aborts the Release rather than just printing a warning. This is exactly the fail-loud pattern CLAUDE.md asks for.
  • No continue-on-error, timeout-minutes: 15, and no checkout step (correct — smoke test should prove PyPI install, not the repo checkout).

Verdict

Approve. One inaccurate test-plan bullet is the only thing worth fixing, and that's a description edit. The CI logic is correct and the smoke-test coverage is a genuine improvement over the prior state.

@kovtcharov-amd kovtcharov-amd added the p1 medium priority label May 29, 2026
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier blocking bot finding (gaia.version AttributeError) is stale — head verifies via importlib.metadata and exercises gaia-emr. Verified the version logic lines up: validate.outputs.tag_version is ${TAG_NAME#v} (bare), gaia --version prints the bare importlib.metadata version, and the import step does an exact != check, so both the grep -qF and the equality check behave correctly. One description fix noted inline (the test-plan "runs in parallel" bullet is the opposite of the diff, which correctly gates the Release on the smoke test). Approving.

Comment thread .github/workflows/publish.yml
@kovtcharov-amd kovtcharov-amd enabled auto-merge May 29, 2026 16:21
@kovtcharov-amd kovtcharov-amd added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit f8ff144 May 29, 2026
27 checks passed
@kovtcharov-amd kovtcharov-amd deleted the feat/1156-post-publish-smoke branch May 29, 2026 16:23
@kovtcharov-amd kovtcharov-amd mentioned this pull request Jun 1, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops DevOps/infrastructure changes p1 medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: post-publish smoke test (validate PyPI install + CLI entry points)

2 participants