Skip to content

ci: add Python version matrix, macOS smoke lane, and path-filter fixes#1247

Open
kovtcharov-amd wants to merge 3 commits into
mainfrom
feat/ci-matrix-881
Open

ci: add Python version matrix, macOS smoke lane, and path-filter fixes#1247
kovtcharov-amd wants to merge 3 commits into
mainfrom
feat/ci-matrix-881

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

setup.py claims python_requires>=3.10 but CI only tested 3.12 — compatibility with 3.10/3.11 was never validated. Unit tests now run against all three supported versions via a strategy matrix, and a continue-on-error macOS smoke job catches Darwin-specific failures before users hit them. MCP tests missed cli.py changes that affect MCP subcommands, and the Lemonade smoke test fired on every PR regardless of what changed.

Test plan

  • Verify Unit Tests (py3.10), Unit Tests (py3.11), Unit Tests (py3.12) appear as separate CI jobs on this PR
  • Verify Unit Tests (macOS smoke) appears and runs (green or yellow, not red — continue-on-error: true)
  • Confirm MCP tests trigger when src/gaia/cli.py is in the changeset
  • Confirm Lemonade smoke test does NOT trigger on this PR (no src/gaia/llm/ changes)

Closes #881

@github-actions github-actions Bot added the devops DevOps/infrastructure changes label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Clean, targeted CI fix with no blocking issues — approving. Three small nits worth addressing, one of which is a genuine coverage gap.

Summary

This PR patches three real CI problems in one shot: the Lemonade smoke test was burning expensive STX hardware time on every PR regardless of what changed; MCP tests didn't trigger when cli.py (which registers all gaia mcp subcommands) was edited; and setup.py claims python_requires>=3.10 but CI only validated 3.12. All three fixes are correctly scoped. The macOS smoke lane is a useful addition and the continue-on-error: true flag is appropriate for an explicitly experimental job.

Issues Found

🟢 macOS job skips packaging validation (test_unit.yml:164)

The Linux unit-tests job runs pytest tests/unit/test_packaging.py as a dedicated gate before the main suite. The macOS smoke job jumps straight to python -m pytest tests/unit/ -x and skips it entirely. Since packaging validation checks entry-point wiring and __init__.py completeness — both of which are platform-independent — there's no reason to omit it. Any macOS-specific import issue that causes the packaging test to fail (e.g., a platform-conditional import inside a __init__.py) would be silently missed.

      - name: Validate packaging integrity
        run: python -m pytest tests/unit/test_packaging.py -v --tb=short

      - name: Run unit tests (macOS smoke)

🟢 python -m pytest vs bare pytest style inconsistency (test_unit.yml:177)

Every other step in this repo's CI invokes pytest directly after installing into the system environment. The macOS job uses python -m pytest. Neither is wrong, but the inconsistency could mask a misconfigured PATH on macOS runners where the installed pytest shim isn't on $PATH. Either standardise on pytest (matching the rest of the file) or use python -m pytest everywhere — the latter is the safer choice since it guarantees the correct interpreter's pytest is invoked.

🟢 pytest-timeout installed on macOS but not Linux (test_unit.yml:166)

The macOS job installs pytest-timeout (needed for --timeout=60) while the Linux matrix doesn't, so a test that hangs without a timeout on Linux won't be caught until it surfaces on macOS. This isn't a blocker, but it means the two suites are diverging in a way that will make failures harder to diagnose. Consider adding --timeout=60 to the Linux unit-test run as well, or documenting the intentional divergence in a comment.

Strengths

  • Path-filter coverage on test_lemonade_server.yml is complete. The filter covers src/gaia/llm/**, src/gaia/installer/**, both custom actions, and the workflow file itself — nothing relevant is missing, and the expensive STX runner stays idle on doc/agent/UI-only PRs.
  • fail-fast: false on the Python matrix is the right call. Lets all three versions run to completion so a 3.10-only failure isn't hidden by an earlier 3.11 failure.
  • continue-on-error: true on the macOS job is correctly justified in both the inline comment and PR description — it follows the established GAIA pattern for experimental CI lanes that shouldn't block until they're fully green.

Verdict

Approve with suggestions. The three findings above are all minor. The packaging-validation gap (test_unit.yml:164) is the most meaningful one and is trivially fixed with the suggestion block above. The core improvements in this PR — path filtering on Lemonade, cli.py in MCP triggers, Python version matrix — are correct and ship real value.

@github-actions
Copy link
Copy Markdown
Contributor

Solid CI hygiene across the board — the path filters, version matrix, and macOS smoke lane all address real gaps that setup.py's python_requires>=3.10 declaration had left unvalidated. Two minor observations worth fixing, one informational gap worth noting.


Issues

🟢 Minor — Lemonade path filters don't cover the Lemonade test files (test_lemonade_server.yml)

The new path list covers source and infra, but not the three root-level test files that exercise Lemonade:

  • tests/test_lemonade_client.py
  • tests/test_lemonade_embeddings.py
  • tests/test_lemonade_health.py

If any of those are modified in isolation, the Lemonade smoke test won't trigger — the same gap this PR fixes for MCP (tests/mcp/**). Add a parallel entry to both push and pull_request paths:

      - 'src/gaia/llm/**'
      - 'src/gaia/installer/**'
      - 'setup.py'
      - '.github/workflows/test_lemonade_server.yml'
      - '.github/actions/install-lemonade/**'
      - '.github/actions/setup-venv/**'
      - 'installer/**'
      - 'tests/test_lemonade*.py'

(Same addition needed in the pull_request block.)


🟢 Minor — pytest-timeout missing from Linux matrix job (test_unit.yml:69)

The macOS smoke job correctly installs pytest-timeout and passes --timeout=60, but the Linux matrix job skips the package. Any test decorated with @pytest.mark.timeout(...) (or the --timeout default in a pytest.ini) silently has no enforcement on Linux. Adding it keeps the two job definitions consistent:

          uv pip install --system pytest pytest-cov pytest-asyncio pytest-mock pytest-timeout pyfakefs \
                                  keyring httpx respx

Informational — continue-on-error: true is at the job level, not the step level (test_unit.yml:152)

CLAUDE.md documents step-level continue-on-error for permission-warning tolerance; this PR uses it at the job level for a different purpose (experimental Darwin coverage). The intent is clearly documented in the inline comment ("continue-on-error until the full suite is macOS-clean"), so this is not a blocker — but the author should be aware that a broken macOS job produces no PR status signal until the flag is removed. Tracking that graduation in issue #881 (or a follow-up) would help avoid the flag living indefinitely.


Strengths

  • fail-fast: false on the Python matrix is the right call — a 3.10 failure shouldn't mask whether 3.12 also breaks.
  • Adding src/gaia/cli.py to the MCP trigger is a precise, correct fix: cli.py owns the MCP subcommand dispatch, so a change there that breaks gaia mcp previously flew under the radar.
  • merge_group and workflow_dispatch correctly carry no path restriction — always-run semantics for the merge queue are preserved.

Verdict

Approve with suggestions. The two minor issues are independent and both have one-line fixes; the path-filter gap for Lemonade test files is the one worth landing before this merges.

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

The Unit Tests (py3.10) failure is a real bug this PR intentionally exposed: test_symlink_to_blocked_directory_is_blocked fails on Python 3.10 because Path.resolve() handles symlinks differently. PR #1256 fixes the underlying is_write_blocked function to work correctly on Python 3.10+ — once it merges and this PR rebases, py3.10 will pass.

@kovtcharov-amd kovtcharov-amd added the p2 low priority label May 29, 2026
@kovtcharov-amd kovtcharov-amd self-assigned this May 29, 2026
itomek
itomek previously approved these changes May 29, 2026
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path-filtering, version matrix, and macOS lane all address real gaps. One merge-ordering point worth calling out (not a code change here): this PR's premise is validating 3.10/3.11, but the py3.10 leg is currently red and you've noted it only goes green after #1256 lands and this rebases. Since that matrix leg has no continue-on-error (unlike the macOS lane), merging this before #1256 turns py3.10 into a permanently-failing required check — recommend sequencing #1256 first, then rebase and confirm green. The Lemonade path-filter gap is noted inline. pytest-timeout is enforced on macOS but not the Linux matrix — minor consistency nit. Approving on the understanding that the merge order is handled.


Generated by Claude Code

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new path filter covers src/gaia/llm, installer, and actions, but not the root-level Lemonade test files. A PR editing only tests/test_lemonade_client.py (or _embeddings/_health) won't trigger this smoke test — the same gap you just fixed for MCP by adding src/gaia/cli.py. Add tests/test_lemonade*.py to both the push and pull_request paths blocks.


Generated by Claude Code

@itomek itomek enabled auto-merge May 29, 2026 13:47
@itomek itomek disabled auto-merge May 29, 2026 13:47
@github-actions github-actions Bot added tests Test changes security Security-sensitive changes labels May 29, 2026
itomek-amd
itomek-amd previously approved these changes May 29, 2026
Ovtcharov added 2 commits May 29, 2026 09:33
setup.py claims python_requires>=3.10 but CI only tested 3.12. Unit tests
now run against 3.10/3.11/3.12 via a strategy matrix. A macOS smoke job
(continue-on-error) catches platform-specific failures early. test_mcp.yml
gains cli.py in its path filter so MCP subcommand changes trigger the suite,
and test_lemonade_server.yml gets a path filter scoped to src/gaia/llm/
instead of firing on every PR.

Closes #881
Add packaging validation step to the macOS smoke job matching the Linux
job. Switch from `python -m pytest` to bare `pytest` for consistency
with the rest of the workflow.
Lemonade path filters missed tests/test_lemonade*.py, so edits to those
files wouldn't trigger the smoke test. Linux matrix was also missing
pytest-timeout, creating a divergence with the macOS job.
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request May 29, 2026
…#1256)

Symlinks pointing into blocked directories (`.ssh`, `C:\Windows`,
`/etc`, …) were not detected by `is_write_blocked()` on Python <3.12 —
the write guardrail returned `False` when it should have returned
`True`. The Python 3.10/3.11 version matrix (amd#1247) surfaced this via
`test_symlink_to_blocked_directory_is_blocked`. On 3.12+ the test passed
by accident because `Path.resolve()` internally delegates to
`os.path.realpath`; on older versions it uses a separate resolver that
can disagree after symlink traversal.

The fix drops the redundant `.resolve()` call so `os.path.realpath` is
the single source of truth for symlink resolution across all supported
Python versions.

## Test plan
- [ ] `python -m pytest tests/unit/test_security_edge_cases.py
tests/unit/test_file_write_guardrails.py -xvs` — all 132 pass, 6 skipped
(platform-specific)
- [ ] CI matrix passes on Python 3.10, 3.11, 3.12, 3.13

Co-authored-by: Ovtcharov <kovtchar@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops DevOps/infrastructure changes p2 low priority security Security-sensitive changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: macOS lane + Python version matrix + path-filter audit

3 participants