fix(sandbox): validate tar entries before host-side extraction#2163
Conversation
backupSandboxState() extracted sandbox-produced tar archives on the host with no entry validation, enabling path traversal sandbox escape. Add safeTarExtract() which validates all entry paths stay within the target directory, extracts with --no-same-owner, and audits symlinks post-extraction.
📝 WalkthroughWalkthroughAdded tar-safety utilities: validation, hard-link rejection, and guarded extraction that runs tar list/extract, blocks unsafe archives, audits post-extraction symlinks, and updates backup logic to use the safe extractor. A new security test suite exercises these behaviors. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Validator as validateTarEntries / rejectHardLinks
participant TarList as "tar -tf"
participant TarExtract as "tar -xf --no-same-owner"
participant FS as Filesystem
participant Auditor as auditExtractedSymlinks
Caller->>Validator: validateTarEntries(tarBuffer, targetDir)
Validator->>TarList: run tar -tf on tarBuffer
TarList-->>Validator: entry list
alt validation or hard-link violations
Validator-->>Caller: TarValidationResult { safe: false, violations... }
else validations pass
Caller->>TarExtract: safeTarExtract(tarBuffer, targetDir)
TarExtract->>FS: write files into targetDir
TarExtract-->>Caller: extraction status
Caller->>Auditor: auditExtractedSymlinks(targetDir)
Auditor->>FS: resolve symlinks and check targets
alt symlink escapes found
Auditor-->>Caller: fail, remove/recreate targetDir
else all within target
Auditor-->>Caller: success
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 117-163: validateTarEntries currently uses "tar -tf" which only
lists names and misses hard-link metadata; update validateTarEntries to
enumerate tar members with metadata (e.g. use "tar -tvf -" / verbose listing or
otherwise invoke tar to expose typeflag/linkname) and explicitly reject
hard-link entries by detecting members whose metadata indicate a hard link
(typeflag '1' or a non-empty linkname), pushing a violation like "hard link:
<entry>"; keep existing checks (null bytes, absolute paths, containment) and add
a regression test that creates a hard-link-to-traversal archive and asserts
validateTarEntries returns unsafe so the attack is blocked at validation time
(also note related symbols safeTarExtract(), auditExtractedSymlinks(), and
sanitizeBackupDirectory() for context).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5ca1e362-3532-4f2e-8c80-95a15184ce27
📒 Files selected for processing (2)
src/lib/sandbox-state.tstest/security-sandbox-tar-traversal.test.ts
Add rejectHardLinks() that uses tar -tvf to detect hard-link entries before extraction. Hard links have no legitimate use in sandbox state backups and can reference files outside the extraction root.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/security-sandbox-tar-traversal.test.ts (2)
145-155: Make the PoC assertions prove the dangerous path exactly.These checks still pass if the tar tool normalizes the malicious prefix away but leaves
evil.txtoretc/cron.din the listing, so they stop proving that the archive retained a traversal/absolute entry. Reading the header name from the tar buffer directly would make the PoC deterministic and avoid depending on hosttarformatting here.Also applies to: 162-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/security-sandbox-tar-traversal.test.ts` around lines 145 - 155, The test currently relies on spawning the system tar and checking list.stdout (variables: spawnSync call, list, entries) which can normalize paths; instead parse the in-memory tar Buffer (variable tar) to read header name bytes directly and assert the exact header name strings (e.g., "../evil.txt" or "/etc/cron.d/...") appear in the header name fields; update both occurrences (the block using spawnSync and the similar block at lines 162-169) to extract and assert header names from the tar buffer rather than from list.stdout so the PoC deterministically proves the traversal/absolute entry is preserved.
1-1: Import the source module directly and drop@ts-nocheck.Right now the behavior checks run against
dist/lib/sandbox-state.js, while the regression guards inspectsrc/lib/sandbox-state.ts. That split can let this suite validate a stale build instead of the code under review, and Line 1 hides any shape drift by disabling TS for the whole file. Prefer importing the source exports directly, or at least typing the module shape so the test stays checked.Also applies to: 118-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/security-sandbox-tar-traversal.test.ts` at line 1, Remove the top-line "// `@ts-nocheck`" and modify the test import so it references the source module (e.g., import the exports from the src/lib/sandbox-state TypeScript module instead of dist/lib/sandbox-state.js), or alternatively add an explicit typed module declaration for the tested exports (the sandbox-state exports used in this file and around lines 118-127) so TypeScript can verify shapes; ensure the test uses the actual source exports (same named functions/classes) rather than the built dist artifact and run the test type-check to confirm no shape drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/security-sandbox-tar-traversal.test.ts`:
- Around line 145-155: The test currently relies on spawning the system tar and
checking list.stdout (variables: spawnSync call, list, entries) which can
normalize paths; instead parse the in-memory tar Buffer (variable tar) to read
header name bytes directly and assert the exact header name strings (e.g.,
"../evil.txt" or "/etc/cron.d/...") appear in the header name fields; update
both occurrences (the block using spawnSync and the similar block at lines
162-169) to extract and assert header names from the tar buffer rather than from
list.stdout so the PoC deterministically proves the traversal/absolute entry is
preserved.
- Line 1: Remove the top-line "// `@ts-nocheck`" and modify the test import so it
references the source module (e.g., import the exports from the
src/lib/sandbox-state TypeScript module instead of dist/lib/sandbox-state.js),
or alternatively add an explicit typed module declaration for the tested exports
(the sandbox-state exports used in this file and around lines 118-127) so
TypeScript can verify shapes; ensure the test uses the actual source exports
(same named functions/classes) rather than the built dist artifact and run the
test type-check to confirm no shape drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9260e932-ff8d-4b2d-9aaa-217cc10daee4
📒 Files selected for processing (2)
src/lib/sandbox-state.tstest/security-sandbox-tar-traversal.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/sandbox-state.ts
…2268) (#2308) ## Summary Closes #2268 — P1 regression in v0.0.22. \`safeTarExtract\`'s post-extraction symlink audit (added in #2163 to block tar path-traversal attacks) flagged every symlink whose target resolved outside the extraction temp directory. The sandbox base image legitimately ships intra-sandbox symlinks like \`/sandbox/.openclaw → /sandbox/.openclaw-data\` — those resolve *outside* the host temp dir but *inside* the canonical \`/sandbox\` root, where they'll be correctly resolved when the backup is restored into a fresh sandbox. Result: the audit nuked every backup, breaking \`nemoclaw rebuild\` and \`nemoclaw snapshot create\` on v0.0.22. The only workaround was \`destroy + onboard\` (loses all state). ## Fix - \`auditExtractedSymlinks(dirPath, allowedRoots: string[])\` — now takes an array of allowed roots instead of a single one - \`safeTarExtract\` passes \`[targetDir, "/sandbox"]\` so intra-sandbox absolute symlinks are honored while genuine escape attempts (e.g. symlink to \`/etc/passwd\` or \`../../.ssh/authorized_keys\`) still abort the extraction The security guardrail is intact: \`/sandbox\` is a subtree root, so crafted symlinks with absolute targets outside \`/sandbox\` (e.g. \`/etc/passwd\`) are still rejected. ## Test plan - [x] New test: \`allows symlinks whose target resolves within /sandbox (intra-sandbox layout)\` — locks the regression fix - [x] New test: \`blocks symlinks that escape /sandbox even with an absolute target\` — confirms \`/etc/passwd\` symlinks still rejected - [x] Existing \`blocks symlink escaping target directory\` (relative \`../../\`) still passes - [x] All 21 sandbox-tar security tests pass - [x] Full CLI suite: 1783 passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security & Bug Fixes** * Improved symlink validation during tar archive extraction to correctly allow absolute symlink targets that resolve within the sandbox environment. * Enhanced security checks now reject symlinks with targets outside permitted directories. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
backupSandboxState()extracted sandbox-produced tar archives on the host with no entry validation — a compromised sandbox could craft path-traversal entries (../../.ssh/authorized_keys) to write arbitrary files on the host filesystemsafeTarExtract()which validates all entry paths stay within the target directory, extracts with--no-same-owner, and audits symlinks post-extractiontest/security-sandbox-tar-traversal.test.ts) covering PoC, fix verification, and source-code regression guardsTest plan
npm run build:clicompiles successfullynpx vitest run test/security-sandbox-tar-traversal.test.ts(14/14)npx vitest run(1925 passed, 5 pre-existing failures in onboard.test.ts)tar -xfwithout validation remains in sandbox-state.tsSummary by CodeRabbit
New Features
Bug Fixes
Tests