Skip to content

wip ci(smoke): enable agent isolation on smoke runs#525

Closed
akshaylive wants to merge 2 commits into
mainfrom
akshaya/isolate
Closed

wip ci(smoke): enable agent isolation on smoke runs#525
akshaylive wants to merge 2 commits into
mainfrom
akshaya/isolate

Conversation

@akshaylive
Copy link
Copy Markdown
Collaborator

Summary

  • Set `agent.isolation: true` in `tests/experiments/default.yaml` so smoke tasks exercise coder_eval's filesystem-perimeter sandbox
  • Pin the e2e smoke job to `ubuntu-24.04` and install `bubblewrap` + `socat` (with the AppArmor sysctl + `aa-disable` workarounds) so the SDK's bwrap sandbox actually engages instead of silently falling back to no-op

Mirrors UiPath/coder_eval#207.

Test plan

  • Smoke Skill Tests workflow passes on this PR
  • Spot-check one task's `sdk_options.json` (or task.json) to confirm `sandbox.enabled=true` was sent to the SDK

🤖 Generated with Claude Code

Set agent.isolation=true in tests/experiments/default.yaml and update
the e2e smoke job to ubuntu-24.04 with bubblewrap+socat installed and
AppArmor's unprivileged-userns restriction relaxed, so the SDK's bwrap
sandbox actually engages instead of silently falling back to no-op.

Mirrors UiPath/coder_eval#207.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@akshaylive akshaylive requested a review from uipreliga as a code owner May 1, 2026 22:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Claude finished @akshaylive's task in 1m 39s —— View job


Review: PR #525 — ci(smoke): enable agent isolation on smoke runs

  • Read review criteria and project conventions
  • Read full diff and changed files
  • Check PR conversation for existing feedback
  • Perform review against criteria
  • Post review

Summary

Enables filesystem-perimeter sandboxing (agent.isolation: true) for all smoke test tasks by updating the default experiment config, and pins the CI runner to ubuntu-24.04 with bubblewrap + socat installed (plus AppArmor workarounds) so the sandbox actually engages.

Change-by-Change Review

1. .github/workflows/smoke-skills.yml

Severity: OK

Clean, well-scoped changes:

  • ubuntu-latestubuntu-24.04 (line 157): Correct — ubuntu-latest currently resolves to 22.04, which ships bubblewrap 0.6.1 lacking the mount overlay features the SDK requires. The explanatory comment is thorough.
  • New step: Install OS sandbox dependencies (lines 183–197): Installs bubblewrap and socat with --no-install-recommends (minimal footprint). The AppArmor workarounds are properly guarded:
    • echo 0 | sudo tee /proc/sys/kernel/apparmor_restrict_unprivileged_userns || true — gracefully no-ops if the sysctl doesn't exist.
    • The for loop over AppArmor profiles checks [ -f "$prof" ] before disabling, with || true catching both missing-file and aa-disable failures. Correct bash short-circuit.
  • The detect job stays on ubuntu-latest — appropriate since it only does git operations and doesn't need bwrap.

One observation: when GitHub eventually moves ubuntu-latest to 24.04 (expected mid-2025, may already be in progress), the pin becomes redundant but harmless. No action needed now, just noting for future cleanup.

2. tests/experiments/default.yaml

Severity: OK

isolation: true added at line 21, correctly placed under agent: at the same indentation level as max_turns, allowed_tools, and plugins. The YAML structure is valid.

What's Missing

Nothing identified. This is a CI infrastructure change — no skill files are modified, so no CODEOWNERS, frontmatter, or skill-body checks apply.

Area Ratings

Area Status Notes
Frontmatter N/A No skill files changed
E2E Tests N/A Test config updated, not test tasks
Skill Body N/A No skill files changed
References & Assets N/A No skill files changed
Repo Hygiene OK Scoped to CI + test config, no secrets, no cross-skill refs

Issues for Manual Review

  • Confirm the upstream UiPath/coder_eval#207 change is merged/available so the SDK actually honors isolation: true when it receives it.
  • Per the PR's test plan: spot-check one task's sdk_options.json (or task.json) in a CI run to confirm sandbox.enabled=true was sent to the SDK.

Conclusion

Clean, well-documented CI change. The runner pin and sandbox dependency installation are correctly implemented with appropriate fallbacks. No issues found — approve.

@akshaylive akshaylive changed the title ci(smoke): enable agent isolation on smoke runs wip ci(smoke): enable agent isolation on smoke runs May 3, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@akshaylive
Copy link
Copy Markdown
Collaborator Author

After debating back and forth, while this change works, it's very brittle. Closing this with no-op.

@akshaylive akshaylive closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant