fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs by Pouyanpi · Pull Request #1785 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-04-13T14:39:11Z

Description

CI tests intermittently crash with Fatal Python error: Illegal instruction (SIGILL, exit code 132) on Ubuntu runners.

GitHub Actions ubuntu-latest runners have heterogeneous CPU microarchitectures some support AVX-512, others only AVX2. When native C extensions (e.g. annoy, numpy) are compiled from source on a runner with newer SIMD instructions and the resulting .venv is cached, restoring that cache on a runner with an older CPU causes native code to hit unsupported instructions.

The cache key includes OS, arch (x86_64), and Python version — but not the CPU model. Since uname -m returns x86_64 regardless of instruction set support, different CPUs produce identical cache keys.

Fix

Set CFLAGS and CXXFLAGS to -march=x86-64 on Linux runners before poetry install. This forces all native extensions to compile for the baseline x86-64 instruction set, making cached .so files safe across any x86-64 runner.

Scoped to Linux only

closes #1659

Summary by CodeRabbit

Chores
- Updated GitHub Actions workflow cache verification logic to improve environment validation.

greptile-apps · 2026-04-13T14:42:04Z

Greptile Summary

This PR fixes intermittent SIGILL (exit code 132) crashes on GitHub Actions by setting CFLAGS/CXXFLAGS=-march=x86-64 before poetry install, forcing native extensions to compile against the baseline x86-64 ISA so cached .so files are safe across heterogeneous runners. It also includes runner architecture in cache keys to differentiate x86_64 from arm64 runners.

P1 — full-tests.yml trigger: The original scoped triggers (branch-filtered push, PR review requests, workflow_dispatch) were replaced with a bare push: that runs the full Windows + macOS matrix (8 jobs) on every push to every branch. Combined with pr-tests.yml also using push:, this can significantly inflate runner costs if merged to develop.

Confidence Score: 4/5

Core SIGILL fix is correct; the bare push: trigger in full-tests.yml should be restored to branch-scoped filters before merging to develop.

The compiler flag approach in _test.yml and lint.yml is sound and directly addresses the root cause. However, full-tests.yml drops all branch/tag/PR filters and workflow_dispatch, leaving a bare push: that would run an expensive 8-job Windows+macOS matrix on every branch push if merged — a real CI cost and reliability concern.

full-tests.yml — the trigger block needs branch filters, PR review triggers, and workflow_dispatch restored before this lands in develop.

Important Files Changed

Filename	Overview
.github/workflows/_test.yml	Adds CFLAGS/CXXFLAGS=-march=x86-64 step before poetry install (Linux-only, correctly gated) and includes runner arch in cache key — the core fix for cross-CPU SIGILL crashes.
.github/workflows/lint.yml	Sets CFLAGS/CXXFLAGS at workflow-level env (safe since lint always runs on ubuntu-latest) and adds arch to cache key — consistent with the fix in _test.yml.
.github/workflows/full-tests.yml	Replaced scoped triggers (branch-filtered push, PR review requests, workflow_dispatch) with a bare push: that fires the full Windows+macOS matrix on every push to every branch — excessive CI spend if merged.
.github/workflows/pr-tests.yml	Changed from pull_request to push trigger (developer noted this is a temporary testing commit); removes fork-PR CI coverage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Push / PR Event] --> B{Workflow trigger}
    B --> C[pr-tests.yml\npush only]
    B --> D[full-tests.yml\npush only]
    B --> E[lint.yml\npull_request + push to main/develop]
    C --> F[Ubuntu matrix\n4 Python versions]
    D --> G[Windows + macOS matrix\n4 Python versions each]
    F --> H{Cache hit?}
    G --> H
    H -->|Yes| I[Ensure cache healthy\npoetry run pip --version]
    H -->|No| J[Set CFLAGS/CXXFLAGS\n-march=x86-64\nLinux only]
    I -->|Healthy| K[poetry install]
    I -->|Unhealthy| L[rm -rf .venv]
    L --> J
    J --> K
    K --> M[Native extensions compiled\nwith baseline x86-64 ISA]
    M --> N[Cache venv\nkey: OS + arch + py-version + lock-hash]
    N --> O[Run tests]
    style J fill:#90EE90
    style M fill:#90EE90
    style D fill:#FFB6C1

Comments Outside Diff (1)

.github/workflows/full-tests.yml, line 4-5 (link)

Bare push: trigger runs full matrix on every branch push

The original trigger had branch filters (main, develop, v* tags), PR review-request triggers, and workflow_dispatch. The new bare push: fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (workflow_dispatch was removed). Combined with pr-tests.yml also using a bare push:, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in develop as-is.

Prompt To Fix With AI

This is a comment left during a code review.
Path: .github/workflows/full-tests.yml
Line: 4-5

Comment:
**Bare `push:` trigger runs full matrix on every branch push**

The original trigger had branch filters (`main`, `develop`, `v*` tags), PR review-request triggers, and `workflow_dispatch`. The new bare `push:` fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (`workflow_dispatch` was removed). Combined with `pr-tests.yml` also using a bare `push:`, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in `develop` as-is.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: .github/workflows/full-tests.yml
Line: 4-5

Comment:
**Bare `push:` trigger runs full matrix on every branch push**

The original trigger had branch filters (`main`, `develop`, `v*` tags), PR review-request triggers, and `workflow_dispatch`. The new bare `push:` fires the full Windows + macOS matrix (8 jobs: 2 OS × 4 Python versions) on every push to every feature branch, hotfix branch, or draft branch — with no way to manually re-trigger via the Actions UI either (`workflow_dispatch` was removed). Combined with `pr-tests.yml` also using a bare `push:`, both workflows now fire simultaneously on every branch push. If this is a temporary testing commit, it should not land in `develop` as-is.

```suggestion
on:
  pull_request:
    types: [review_requested, ready_for_review]
    paths-ignore:
      - "**/*.md"
      - ".github/**"
  push:
    branches:
      - main
      - develop
    tags:
      - "v*"
  workflow_dispatch:
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (4): Last reviewed commit: "fix(ci): pin baseline x86-64 compiler ta..." | Re-trigger Greptile}

coderabbitai · 2026-04-13T14:43:33Z

📝 Walkthrough

Walkthrough

The GitHub Actions workflow cache health check was modified to verify the cached virtual environment by attempting to import the annoy module instead of checking the pip version. This more accurately validates whether required dependencies are available.

Changes

Cohort / File(s)	Summary
Workflow Cache Verification `.github/workflows/_test.yml`	Changed cache health verification command from `poetry run pip --version` to `poetry run python -c "import annoy"` to better detect missing or corrupted dependency installations.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title mentions pinning a baseline x86-64 compiler target, but the actual change validates native extensions in the cache health check by importing annoy instead of checking pip version.	Update the title to reflect the actual change: something like 'fix(ci): validate native extensions in venv cache health check' would better describe the fallback verification command change from pip --version to importing annoy.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR partially addresses the linked issue by improving cache health detection, but does not implement all proposed mitigations (NPY_DISABLE_CPU_FEATURES, faulthandler) from issue `#1659`.
Out of Scope Changes check	✅ Passed	The change is narrowly scoped to the cache health check validation and directly addresses the root cause of SIGILL crashes by detecting incompatible native extensions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	This PR contains a minor, single-line change to a GitHub Actions workflow file that improves cache health validation by checking for native extension compatibility.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/ci-venv-cache-sigill

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/_test.yml (1)

83-83: Guard timeout by runner OS in this reusable workflow.

At Line 83, timeout is Linux/GNU-specific; if callers use macOS/Windows images, this can invalidate .venv on every cache hit even when healthy.

Proposed fix

-        run: timeout 10s poetry run python -c "import annoy" || rm -rf .venv
+        run: |
+          CHECK='from annoy import AnnoyIndex; i=AnnoyIndex(8,"angular"); i.add_item(0,[0.0]*8); i.build(1)'
+          if [[ "$RUNNER_OS" == "Linux" ]]; then
+            timeout 10s poetry run python -c "$CHECK" || rm -rf .venv
+          else
+            poetry run python -c "$CHECK" || rm -rf .venv
+          fi

#!/bin/bash
# Inspect callers of this reusable workflow and what images they pass.
# Expected: if non-Linux images are used, timeout guard is required.
rg -n -C3 --iglob '*.yml' --iglob '*.yaml' '_test\.yml|uses:.*_test\.yml|image:' .github/workflows

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/_test.yml at line 83, The run step that executes `timeout
10s poetry run python -c "import annoy" || rm -rf .venv` uses the Linux-specific
`timeout` binary; change the step so it checks the runner OS (runner.os) and
only uses `timeout` on Linux, falling back to running `poetry run python -c
"import annoy"` without `timeout` on non-Linux runners, and preserve the
existing `|| rm -rf .venv` behavior if the import fails; update the workflow run
step (the `run:` that contains `timeout`) to use a conditional shell branch
based on `runner.os` so macOS/Windows callers do not invoke the Linux `timeout`
command.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/_test.yml:
- Line 83: The run step that executes `timeout 10s poetry run python -c "import
annoy" || rm -rf .venv` uses the Linux-specific `timeout` binary; change the
step so it checks the runner OS (runner.os) and only uses `timeout` on Linux,
falling back to running `poetry run python -c "import annoy"` without `timeout`
on non-Linux runners, and preserve the existing `|| rm -rf .venv` behavior if
the import fails; update the workflow run step (the `run:` that contains
`timeout`) to use a conditional shell branch based on `runner.os` so
macOS/Windows callers do not invoke the Linux `timeout` command.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c48424a5-bd84-4df6-a6e7-8ce6cb0c28f2

📥 Commits

Reviewing files that changed from the base of the PR and between 3a9cbad and 28a27a0.

📒 Files selected for processing (1)

.github/workflows/_test.yml

Copilot

Pull request overview

Pins Linux compiler target to baseline x86-64 in CI to prevent cached virtualenvs from containing native extensions compiled with newer SIMD instructions that can SIGILL on older GitHub-hosted runners.

Changes:

Set CFLAGS/CXXFLAGS to -march=x86-64 on Linux runners before installing dependencies.
Modify PR test workflow trigger configuration.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
.github/workflows/_test.yml	Exports baseline x86-64 compiler flags for Linux builds before `poetry install`.
.github/workflows/pr-tests.yml	Changes workflow trigger from `pull_request` to `push` and adjusts ignored paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-04-14T19:42:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…hed venvs

Pouyanpi · 2026-04-14T20:02:34Z

caches were deleted and recreated on this PR. So we should be OK. We must watch for similar issue in case it reappear

Pouyanpi added this to the v0.22.0 milestone Apr 13, 2026

Pouyanpi self-assigned this Apr 13, 2026

Pouyanpi added the CI label Apr 13, 2026

greptile-apps Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml Outdated

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread .github/workflows/_test.yml Outdated

Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from 28a27a0 to fb3339d Compare April 14, 2026 19:24

Pouyanpi changed the title ~~fix(ci): validate native extensions in venv cache health check~~ fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs Apr 14, 2026

Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from b29ae7a to 1cf7365 Compare April 14, 2026 19:27

greptile-apps Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread .github/workflows/pr-tests.yml Outdated

Pouyanpi requested a review from Copilot April 14, 2026 19:33

Copilot started reviewing on behalf of Pouyanpi April 14, 2026 19:33 View session

Pouyanpi marked this pull request as draft April 14, 2026 19:35

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread .github/workflows/pr-tests.yml Outdated

Comment thread .github/workflows/_test.yml

Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch 2 times, most recently from a62b2dd to 1a5cc57 Compare April 14, 2026 19:50

Pouyanpi marked this pull request as ready for review April 14, 2026 19:58

fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cac…

fe12b3e

…hed venvs

Pouyanpi force-pushed the fix/ci-venv-cache-sigill branch from 69c563f to fe12b3e Compare April 14, 2026 19:58

Pouyanpi merged commit 6a7be49 into develop Apr 14, 2026
2 checks passed

Pouyanpi deleted the fix/ci-venv-cache-sigill branch April 14, 2026 20:02

Pouyanpi mentioned this pull request Apr 29, 2026

fix(ci): override annoy's -march=native to actually enforce baseline x86-64 #1839

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs#1785

fix(ci): pin baseline x86-64 compiler target to prevent SIGILL on cached venvs#1785
Pouyanpi merged 1 commit into
developfrom
fix/ci-venv-cache-sigill

Pouyanpi commented Apr 13, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 13, 2026 •

edited

Loading

Confidence Score: 4/5

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 14, 2026

Uh oh!

Pouyanpi commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pouyanpi commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fix

Summary by CodeRabbit

Uh oh!

greptile-apps Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 14, 2026

Codecov Report

Uh oh!

Pouyanpi commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pouyanpi commented Apr 13, 2026 •

edited

Loading

greptile-apps Bot commented Apr 13, 2026 •

edited

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading