Skip to content

fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs#1785

Merged
Pouyanpi merged 1 commit into
developfrom
fix/ci-venv-cache-sigill
Apr 14, 2026
Merged

fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs#1785
Pouyanpi merged 1 commit into
developfrom
fix/ci-venv-cache-sigill

Conversation

@Pouyanpi
Copy link
Copy Markdown
Collaborator

@Pouyanpi Pouyanpi commented Apr 13, 2026

Description

CI tests intermittently crash with Fatal Python error: Illegal instruction (SIGILL, exit code 132) on Ubuntu runners.

GitHub Actions ubuntu-latest runners have heterogeneous CPU microarchitectures some support AVX-512, others only AVX2. When native C extensions (e.g. annoy, numpy) are compiled from source on a runner with newer SIMD instructions and the resulting .venv is cached, restoring that cache on a runner with an older CPU causes native code to hit unsupported instructions.

The cache key includes OS, arch (x86_64), and Python version — but not the CPU model. Since uname -m returns x86_64 regardless of instruction set support, different CPUs produce identical cache keys.

Fix

Set CFLAGS and CXXFLAGS to -march=x86-64 on Linux runners before poetry install. This forces all native extensions to compile for the baseline x86-64 instruction set, making cached .so files safe across any x86-64 runner.

Scoped to Linux only

closes #1659

Summary by CodeRabbit

  • Chores
    • Updated GitHub Actions workflow cache verification logic to improve environment validation.

@Pouyanpi Pouyanpi added this to the v0.22.0 milestone Apr 13, 2026
@Pouyanpi Pouyanpi self-assigned this Apr 13, 2026
@Pouyanpi Pouyanpi added the CI label Apr 13, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR fixes intermittent SIGILL (exit code 132) crashes on GitHub Actions by setting CFLAGS/CXXFLAGS=-march=x86-64 before poetry install, forcing native extensions to compile against the baseline x86-64 ISA so cached .so files are safe across heterogeneous runners. It also includes runner architecture in cache keys to differentiate x86_64 from arm64 runners.

  • P1 — full-tests.yml trigger: The original scoped triggers (branch-filtered push, PR review requests, workflow_dispatch) were replaced with a bare push: that runs the full Windows + macOS matrix (8 jobs) on every push to every branch. Combined with pr-tests.yml also using push:, this can significantly inflate runner costs if merged to develop.

Confidence Score: 4/5

Core SIGILL fix is correct; the bare push: trigger in full-tests.yml should be restored to branch-scoped filters before merging to develop.

The compiler flag approach in _test.yml and lint.yml is sound and directly addresses the root cause. However, full-tests.yml drops all branch/tag/PR filters and workflow_dispatch, leaving a bare push: that would run an expensive 8-job Windows+macOS matrix on every branch push if merged — a real CI cost and reliability concern.

full-tests.yml — the trigger block needs branch filters, PR review triggers, and workflow_dispatch restored before this lands in develop.

Important Files Changed

Filename Overview
.github/workflows/_test.yml Adds CFLAGS/CXXFLAGS=-march=x86-64 step before poetry install (Linux-only, correctly gated) and includes runner arch in cache key — the core fix for cross-CPU SIGILL crashes.
.github/workflows/lint.yml Sets CFLAGS/CXXFLAGS at workflow-level env (safe since lint always runs on ubuntu-latest) and adds arch to cache key — consistent with the fix in _test.yml.
.github/workflows/full-tests.yml Replaced scoped triggers (branch-filtered push, PR review requests, workflow_dispatch) with a bare push: that fires the full Windows+macOS matrix on every push to every branch — excessive CI spend if merged.
.github/workflows/pr-tests.yml Changed from pull_request to push trigger (developer noted this is a temporary testing commit); removes fork-PR CI coverage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Push / PR Event] --> B{Workflow trigger}
    B --> C[pr-tests.yml\npush only]
    B --> D[full-tests.yml\npush only]
    B --> E[lint.yml\npull_request + push to main/develop]
    C --> F[Ubuntu matrix\n4 Python versions]
    D --> G[Windows + macOS matrix\n4 Python versions each]
    F --> H{Cache hit?}
    G --> H
    H -->|Yes| I[Ensure cache healthy\npoetry run pip --version]
    H -->|No| J[Set CFLAGS/CXXFLAGS\n-march=x86-64\nLinux only]
    I -->|Healthy| K[poetry install]
    I -->|Unhealthy| L[rm -rf .venv]
    L --> J
    J --> K
    K --> M[Native extensions compiled\nwith baseline x86-64 ISA]
    M --> N[Cache venv\nkey: OS + arch + py-version + lock-hash]
    N --> O[Run tests]
    style J fill:#90EE90
    style M fill:#90EE90
    style D fill:#FFB6C1
Loading

Comments Outside Diff (1)

  1. .github/workflows/full-tests.yml, line 4-5 (link)

    P1 Bare push: trigger runs full matrix on every branch push

    The original trigger had branch filters (main, develop, v* tags), PR review-request triggers, and workflow_dispatch. The new bare push: fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (workflow_dispatch was removed). Combined with pr-tests.yml also using a bare push:, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in develop as-is.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: .github/workflows/full-tests.yml
    Line: 4-5
    
    Comment:
    **Bare `push:` trigger runs full matrix on every branch push**
    
    The original trigger had branch filters (`main`, `develop`, `v*` tags), PR review-request triggers, and `workflow_dispatch`. The new bare `push:` fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (`workflow_dispatch` was removed). Combined with `pr-tests.yml` also using a bare `push:`, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in `develop` as-is.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/full-tests.yml
Line: 4-5

Comment:
**Bare `push:` trigger runs full matrix on every branch push**

The original trigger had branch filters (`main`, `develop`, `v*` tags), PR review-request triggers, and `workflow_dispatch`. The new bare `push:` fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (`workflow_dispatch` was removed). Combined with `pr-tests.yml` also using a bare `push:`, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in `develop` as-is.

```suggestion
on:
  pull_request:
    types: [review_requested, ready_for_review]
    paths-ignore:
      - "**/*.md"
      - ".github/**"
  push:
    branches:
      - main
      - develop
    tags:
      - "v*"
  workflow_dispatch:
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix(ci): pin baseline x86-64 compiler ta..." | Re-trigger Greptile

Comment thread .github/workflows/_test.yml Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The GitHub Actions workflow cache health check was modified to verify the cached virtual environment by attempting to import the annoy module instead of checking the pip version. This more accurately validates whether required dependencies are available.

Changes

Cohort / File(s) Summary
Workflow Cache Verification
.github/workflows/_test.yml
Changed cache health verification command from poetry run pip --version to poetry run python -c "import annoy" to better detect missing or corrupted dependency installations.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title mentions pinning a baseline x86-64 compiler target, but the actual change validates native extensions in the cache health check by importing annoy instead of checking pip version. Update the title to reflect the actual change: something like 'fix(ci): validate native extensions in venv cache health check' would better describe the fallback verification command change from pip --version to importing annoy.
✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR partially addresses the linked issue by improving cache health detection, but does not implement all proposed mitigations (NPY_DISABLE_CPU_FEATURES, faulthandler) from issue #1659.
Out of Scope Changes check ✅ Passed The change is narrowly scoped to the cache health check validation and directly addresses the root cause of SIGILL crashes by detecting incompatible native extensions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed This PR contains a minor, single-line change to a GitHub Actions workflow file that improves cache health validation by checking for native extension compatibility.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ci-venv-cache-sigill

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/_test.yml (1)

83-83: Guard timeout by runner OS in this reusable workflow.

At Line 83, timeout is Linux/GNU-specific; if callers use macOS/Windows images, this can invalidate .venv on every cache hit even when healthy.

Proposed fix
-        run: timeout 10s poetry run python -c "import annoy" || rm -rf .venv
+        run: |
+          CHECK='from annoy import AnnoyIndex; i=AnnoyIndex(8,"angular"); i.add_item(0,[0.0]*8); i.build(1)'
+          if [[ "$RUNNER_OS" == "Linux" ]]; then
+            timeout 10s poetry run python -c "$CHECK" || rm -rf .venv
+          else
+            poetry run python -c "$CHECK" || rm -rf .venv
+          fi
#!/bin/bash
# Inspect callers of this reusable workflow and what images they pass.
# Expected: if non-Linux images are used, timeout guard is required.
rg -n -C3 --iglob '*.yml' --iglob '*.yaml' '_test\.yml|uses:.*_test\.yml|image:' .github/workflows
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/_test.yml at line 83, The run step that executes `timeout
10s poetry run python -c "import annoy" || rm -rf .venv` uses the Linux-specific
`timeout` binary; change the step so it checks the runner OS (runner.os) and
only uses `timeout` on Linux, falling back to running `poetry run python -c
"import annoy"` without `timeout` on non-Linux runners, and preserve the
existing `|| rm -rf .venv` behavior if the import fails; update the workflow run
step (the `run:` that contains `timeout`) to use a conditional shell branch
based on `runner.os` so macOS/Windows callers do not invoke the Linux `timeout`
command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/_test.yml:
- Line 83: The run step that executes `timeout 10s poetry run python -c "import
annoy" || rm -rf .venv` uses the Linux-specific `timeout` binary; change the
step so it checks the runner OS (runner.os) and only uses `timeout` on Linux,
falling back to running `poetry run python -c "import annoy"` without `timeout`
on non-Linux runners, and preserve the existing `|| rm -rf .venv` behavior if
the import fails; update the workflow run step (the `run:` that contains
`timeout`) to use a conditional shell branch based on `runner.os` so
macOS/Windows callers do not invoke the Linux `timeout` command.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c48424a5-bd84-4df6-a6e7-8ce6cb0c28f2

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9cbad and 28a27a0.

📒 Files selected for processing (1)
  • .github/workflows/_test.yml

Comment thread .github/workflows/_test.yml Outdated
@Pouyanpi Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from 28a27a0 to fb3339d Compare April 14, 2026 19:24
@Pouyanpi Pouyanpi changed the title fix(ci): validate native extensions in venv cache health check fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs Apr 14, 2026
@Pouyanpi Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from b29ae7a to 1cf7365 Compare April 14, 2026 19:27
Comment thread .github/workflows/pr-tests.yml Outdated
@Pouyanpi Pouyanpi requested a review from Copilot April 14, 2026 19:33
@Pouyanpi Pouyanpi marked this pull request as draft April 14, 2026 19:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Pins Linux compiler target to baseline x86-64 in CI to prevent cached virtualenvs from containing native extensions compiled with newer SIMD instructions that can SIGILL on older GitHub-hosted runners.

Changes:

  • Set CFLAGS/CXXFLAGS to -march=x86-64 on Linux runners before installing dependencies.
  • Modify PR test workflow trigger configuration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/_test.yml Exports baseline x86-64 compiler flags for Linux builds before poetry install.
.github/workflows/pr-tests.yml Changes workflow trigger from pull_request to push and adjusts ignored paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/pr-tests.yml Outdated
Comment thread .github/workflows/_test.yml
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Pouyanpi Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch 2 times, most recently from a62b2dd to 1a5cc57 Compare April 14, 2026 19:50
@Pouyanpi Pouyanpi marked this pull request as ready for review April 14, 2026 19:58
@Pouyanpi Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from 69c563f to fe12b3e Compare April 14, 2026 19:58
@Pouyanpi
Copy link
Copy Markdown
Collaborator Author

caches were deleted and recreated on this PR. So we should be OK. We must watch for similar issue in case it reappear

@Pouyanpi Pouyanpi merged commit 6a7be49 into develop Apr 14, 2026
2 checks passed
@Pouyanpi Pouyanpi deleted the fix/ci-venv-cache-sigill branch April 14, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: CI failing on Python 3.10 PR tests

2 participants