From 8ad92ae53f9ae475f5f6ded2de27930565ad6ada Mon Sep 17 00:00:00 2001 From: Abhijeet Prasad Date: Fri, 10 Apr 2026 18:30:00 -0400 Subject: [PATCH] ci(checks): move static checks out of sharded nox Run pylint and test_types together in a dedicated Ubuntu Python matrix job instead of repeating them inside every sharded nox job. This should reduce duplicated work across OSes and shards, improving CI wall time and total runner usage while preserving multi-version static-check coverage. Pin the checks workflow runners to ubuntu-24.04 and windows-2025, add nox-matrix session exclusions so shard reproduction matches CI, and update the SDK agent docs and CI triage skill to reflect the new static_checks workflow path. --- .agents/skills/sdk-ci-triage/SKILL.md | 49 ++++++++++++++++++------ .github/workflows/adk-py-test.yaml | 2 +- .github/workflows/checks.yaml | 44 +++++++++++++++------ .github/workflows/langchain-py-test.yaml | 2 +- AGENTS.md | 7 +++- py/scripts/nox-matrix.py | 10 ++++- 6 files changed, 87 insertions(+), 27 deletions(-) diff --git a/.agents/skills/sdk-ci-triage/SKILL.md b/.agents/skills/sdk-ci-triage/SKILL.md index ddb2f270..b1b57953 100644 --- a/.agents/skills/sdk-ci-triage/SKILL.md +++ b/.agents/skills/sdk-ci-triage/SKILL.md @@ -34,13 +34,14 @@ Read when relevant: - `lint`: pre-commit diff-based checks - `ensure-pinned-actions`: workflow hygiene +- `static_checks`: Ubuntu-only Python matrix for `pylint` and `test_types` - `smoke`: install/import matrix across Python and OS - `nox`: provider and core test matrix, sharded through `py/scripts/nox-matrix.py` - `adk-py`: reusable workflow for ADK coverage - `langchain-py`: reusable workflow for LangChain coverage - `upload-wheel`: build wheel sanity check -The most common failure source is the `nox` matrix job. +The most common failure source is still the `nox` matrix job, but `pylint` and `test_types` failures now surface through `static_checks`, not through `nox`. ## Standard Workflow @@ -48,15 +49,17 @@ The most common failure source is the `nox` matrix job. 2. Inspect the failing job logs with `gh`. 3. Determine which workflow branch failed: - `lint` + - `static_checks` - `smoke` - `nox` - reusable workflow (`adk-py`, `langchain-py`) - `upload-wheel` 4. For `nox` failures, map the matrix job to the exact nox session and pinned provider version from the logs. -5. Reproduce the narrowest failing command locally. -6. Fix the bug. -7. Re-run the narrowest failing command first. -8. Expand only if shared code changed. +5. For `static_checks` failures, identify whether `pylint` or `test_types` failed under the reported Python version. +6. Reproduce the narrowest failing command locally. +7. Fix the bug. +8. Re-run the narrowest failing command first. +9. Expand only if shared code changed. Do not start by running the whole suite locally unless the failure genuinely spans many sessions. @@ -94,25 +97,29 @@ gh api repos/braintrustdata/braintrust-sdk-python/actions/jobs//logs Job names look like this: ```text -nox (3.10, ubuntu-latest, 0) +nox (3.10, ubuntu-24.04, 0) ``` That means: - Python `3.10` -- OS `ubuntu-latest` +- OS `ubuntu-24.04` - shard `0` out of 4 The workflow runs: ```bash -mise exec python@ -- python ./py/scripts/nox-matrix.py 4 +mise exec python@ -- python ./py/scripts/nox-matrix.py 4 \ + --exclude-session pylint \ + --exclude-session test_types ``` Use a dry run first to see which sessions belong to the shard: ```bash -mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run +mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run \ + --exclude-session pylint \ + --exclude-session test_types ``` Then inspect the failing logs to find the exact session name, for example: @@ -161,6 +168,23 @@ make lint make pylint ``` +### `static_checks` + +The `static_checks` job is an Ubuntu-only Python matrix that runs `pylint` and `test_types` together for each configured Python version. + +Local equivalents: + +```bash +mise exec python@3.10 -- nox -f ./py/noxfile.py -s pylint test_types +``` + +If only one of the two sessions failed in CI, narrow locally to that specific session: + +```bash +mise exec python@3.10 -- nox -f ./py/noxfile.py -s pylint +mise exec python@3.10 -- nox -f ./py/noxfile.py -s test_types +``` + ### `smoke` The smoke job validates install + import across OS and Python versions. @@ -276,7 +300,9 @@ Preferred progression: ```bash # 1. Inspect the failing shard -mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run +mise exec python@3.10 -- python ./py/scripts/nox-matrix.py 0 4 --dry-run \ + --exclude-session pylint \ + --exclude-session test_types # 2. Reproduce the exact session cd py @@ -299,7 +325,7 @@ When answering a CI-triage question, report: Good example structure: ```text -The failing job is `nox (3.10, ubuntu-latest, 0)`. +The failing job is `nox (3.10, ubuntu-24.04, 0)`. Within that shard, the failing session is `test_google_genai(1.30.0)`. The root cause is that the tests import a symbol that does not exist in google-genai 1.30.0, even though it exists in newer versions. You can reproduce it locally with `cd py && nox -s "test_google_genai(1.30.0)"`. @@ -311,6 +337,7 @@ The fix is to gate the behavior for older versions or stop assuming the newer AP Avoid these common mistakes: - guessing the session from the provider name without checking `py/noxfile.py` +- forgetting that CI excludes `pylint` and `test_types` from the sharded `nox` job - reproducing with `latest` when CI failed on an older pinned version - running from repo root when the real SDK command belongs in `py/` - fixing the symptom in tests without understanding the provider-version contract diff --git a/.github/workflows/adk-py-test.yaml b/.github/workflows/adk-py-test.yaml index adbfa294..85a4d832 100644 --- a/.github/workflows/adk-py-test.yaml +++ b/.github/workflows/adk-py-test.yaml @@ -9,7 +9,7 @@ on: jobs: test: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 15 steps: diff --git a/.github/workflows/checks.yaml b/.github/workflows/checks.yaml index 4d4b1812..0f373e77 100644 --- a/.github/workflows/checks.yaml +++ b/.github/workflows/checks.yaml @@ -10,7 +10,7 @@ permissions: jobs: lint: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 10 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -26,13 +26,31 @@ jobs: mise exec -- pre-commit run --from-ref origin/${{ github.base_ref || 'main' }} --to-ref HEAD ensure-pinned-actions: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 5 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 - name: Ensure SHA pinned actions uses: zgosalvez/github-actions-ensure-sha-pinned-actions@70c4af2ed5282c51ba40566d026d6647852ffa3e # v5.0.1 + static_checks: + runs-on: ubuntu-24.04 + timeout-minutes: 20 + strategy: + fail-fast: false + matrix: + python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] + steps: + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 + - name: Setup Python environment + uses: ./.github/actions/setup-python-env + with: + python-version: ${{ matrix.python-version }} + - name: Run pylint and type tests + shell: bash + run: | + mise exec python@${{ matrix.python-version }} -- nox -f ./py/noxfile.py -s pylint test_types + smoke: runs-on: ${{ matrix.os }} timeout-minutes: 20 @@ -41,7 +59,7 @@ jobs: fail-fast: false matrix: python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] - os: [ubuntu-latest, windows-latest] + os: [ubuntu-24.04, windows-2025] steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -66,7 +84,7 @@ jobs: fail-fast: false matrix: python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] - os: [ubuntu-latest, windows-latest] + os: [ubuntu-24.04, windows-2025] shard: [0, 1, 2, 3] steps: @@ -78,7 +96,9 @@ jobs: - name: Run nox tests (shard ${{ matrix.shard }}/4) shell: bash run: | - mise exec python@${{ matrix.python-version }} -- python ./py/scripts/nox-matrix.py ${{ matrix.shard }} 4 + mise exec python@${{ matrix.python-version }} -- python ./py/scripts/nox-matrix.py ${{ matrix.shard }} 4 \ + --exclude-session pylint \ + --exclude-session test_types adk-py: uses: ./.github/workflows/adk-py-test.yaml @@ -90,7 +110,7 @@ jobs: needs: - smoke - nox - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 10 steps: - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 @@ -114,12 +134,13 @@ jobs: needs: - lint - ensure-pinned-actions + - static_checks - smoke - nox - adk-py - langchain-py - upload-wheel - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 5 if: always() steps: @@ -138,12 +159,13 @@ jobs: } check_result "lint" "${{ needs.lint.result }}" - check_result "ensure-pinned-actions" "${{ needs.ensure-pinned-actions.result }}" + check_result "ensure-pinned-actions" "${{ needs['ensure-pinned-actions'].result }}" + check_result "static_checks" "${{ needs.static_checks.result }}" check_result "smoke" "${{ needs.smoke.result }}" check_result "nox" "${{ needs.nox.result }}" - check_result "adk-py" "${{ needs.adk-py.result }}" - check_result "langchain-py" "${{ needs.langchain-py.result }}" - check_result "upload-wheel" "${{ needs.upload-wheel.result }}" + check_result "adk-py" "${{ needs['adk-py'].result }}" + check_result "langchain-py" "${{ needs['langchain-py'].result }}" + check_result "upload-wheel" "${{ needs['upload-wheel'].result }}" if [ "$FAILED" -ne 0 ]; then echo "One or more required checks failed" diff --git a/.github/workflows/langchain-py-test.yaml b/.github/workflows/langchain-py-test.yaml index c49495f6..e53f1342 100644 --- a/.github/workflows/langchain-py-test.yaml +++ b/.github/workflows/langchain-py-test.yaml @@ -5,7 +5,7 @@ on: jobs: test: - runs-on: ubuntu-latest + runs-on: ubuntu-24.04 timeout-minutes: 15 steps: diff --git a/AGENTS.md b/AGENTS.md index 6da9e3a1..ca3b2a72 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,7 +15,8 @@ Use this file as the default playbook for work in this repository. 2. **Use `mise` as the source of truth for tools and environment.** 3. **Do not guess test commands or version coverage.** - - `py/noxfile.py` is the source of truth for nox session names, provider/version matrices, and CI coverage. + - `py/noxfile.py` is the source of truth for nox session names, provider/version matrices, and local reproduction commands. + - `.github/workflows/checks.yaml` is the source of truth for which sessions run in CI, on which Python versions, and outside vs. inside the nox shard matrix. - For provider and integration work, also check `py/src/braintrust/integrations/versioning.py`. 4. **Keep changes narrow and validate with the smallest relevant test first.** @@ -116,7 +117,7 @@ Do not guess: - supported provider versions - which tests a provider session runs -Check `py/noxfile.py` and reproduce with the exact local session CI uses. +Check `py/noxfile.py` and `.github/workflows/checks.yaml`, then reproduce with the exact local session CI uses. ### Run the smallest relevant test first @@ -143,6 +144,8 @@ Before changing provider/integration behavior: - `test_core` runs without optional vendor packages. - `test_types` runs pyright, mypy, and pytest on `py/src/braintrust/type_tests/`. +- CI runs `pylint` and `test_types` via the dedicated `static_checks` workflow job on Ubuntu across the configured Python matrix, not inside the sharded `nox` job. +- The sharded `nox` workflow excludes `pylint` and `test_types`; use `py/scripts/nox-matrix.py --exclude-session ...` when reproducing shard membership locally. - wrapper coverage is split across dedicated nox sessions by provider/version. - `test-wheel` is a wheel sanity check and requires a built wheel first. diff --git a/py/scripts/nox-matrix.py b/py/scripts/nox-matrix.py index 4c9ff0c3..11460abf 100644 --- a/py/scripts/nox-matrix.py +++ b/py/scripts/nox-matrix.py @@ -6,7 +6,7 @@ by weight descending and greedily assigns each to the lightest shard. Usage: - python nox-matrix.py [--dry-run] + python nox-matrix.py [--dry-run] [--exclude-session ...] """ import argparse @@ -80,6 +80,12 @@ def main() -> None: parser.add_argument("shard_index", type=int, help="Zero-based shard index") parser.add_argument("num_shards", type=int, help="Total number of shards") parser.add_argument("--dry-run", action="store_true", help="Print assignment without running nox") + parser.add_argument( + "--exclude-session", + action="append", + default=[], + help="Exclude a nox session from shard assignment. May be passed multiple times.", + ) parser.add_argument( "--output-durations", type=Path, @@ -108,6 +114,8 @@ def main() -> None: weights_file = root_dir / "py" / "scripts" / "session-weights.json" all_sessions = get_nox_sessions(noxfile) + excluded_sessions = set(args.exclude_session) + all_sessions = [session for session in all_sessions if session not in excluded_sessions] weights, default_weight = load_weights(weights_file) shard_assignments = assign_shards(all_sessions, args.num_shards, weights, default_weight)