Skip to content

fix(sandbox): validate tar entries before host-side extraction#2163

Merged
cv merged 4 commits intomainfrom
fix/sandbox-tar-traversal
Apr 21, 2026
Merged

fix(sandbox): validate tar entries before host-side extraction#2163
cv merged 4 commits intomainfrom
fix/sandbox-tar-traversal

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 21, 2026

Summary

  • Security fix (HIGH): backupSandboxState() extracted sandbox-produced tar archives on the host with no entry validation — a compromised sandbox could craft path-traversal entries (../../.ssh/authorized_keys) to write arbitrary files on the host filesystem
  • Adds safeTarExtract() which validates all entry paths stay within the target directory, extracts with --no-same-owner, and audits symlinks post-extraction
  • Adds 14-test security regression suite (test/security-sandbox-tar-traversal.test.ts) covering PoC, fix verification, and source-code regression guards

Test plan

  • npm run build:cli compiles successfully
  • New test passes: npx vitest run test/security-sandbox-tar-traversal.test.ts (14/14)
  • Full suite passes: npx vitest run (1925 passed, 5 pre-existing failures in onboard.test.ts)
  • Manual review: confirm no raw tar -xf without validation remains in sandbox-state.ts

Summary by CodeRabbit

  • New Features

    • Stronger tar extraction safeguards: archives are validated and unsafe contents (path traversal, absolute paths, hidden escapes, hard links) are blocked; post-extract checks remove unsafe results.
  • Bug Fixes

    • Backup restoration now uses guarded extraction and logs blocked extractions as security failures.
  • Tests

    • Added security regression tests covering traversal, absolute paths, symlink escapes, hard-link detection, and extraction safeguards.

backupSandboxState() extracted sandbox-produced tar archives on the host
with no entry validation, enabling path traversal sandbox escape. Add
safeTarExtract() which validates all entry paths stay within the target
directory, extracts with --no-same-owner, and audits symlinks post-extraction.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

Added tar-safety utilities: validation, hard-link rejection, and guarded extraction that runs tar list/extract, blocks unsafe archives, audits post-extraction symlinks, and updates backup logic to use the safe extractor. A new security test suite exercises these behaviors.

Changes

Cohort / File(s) Summary
Tar Security Implementation
src/lib/sandbox-state.ts
Added TarValidationResult and SafeExtractResult types; implemented validateTarEntries(), rejectHardLinks(), and safeTarExtract() which run tar listing, reject null/absolute/path-traversal entries and hard links, perform tar -xf - --no-same-owner -C targetDir, audit extracted symlinks, and update backupSandboxState to use the new extractor.
Security Test Coverage
test/security-sandbox-tar-traversal.test.ts
New Vitest suite creating in-memory ustar tars to assert detection of traversal/absolute paths, symlink escape detection and directory cleanup, hard-link rejection, and source-level regression guards verifying safeTarExtract usage and extraction flags.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Validator as validateTarEntries / rejectHardLinks
    participant TarList as "tar -tf"
    participant TarExtract as "tar -xf --no-same-owner"
    participant FS as Filesystem
    participant Auditor as auditExtractedSymlinks

    Caller->>Validator: validateTarEntries(tarBuffer, targetDir)
    Validator->>TarList: run tar -tf on tarBuffer
    TarList-->>Validator: entry list
    alt validation or hard-link violations
        Validator-->>Caller: TarValidationResult { safe: false, violations... }
    else validations pass
        Caller->>TarExtract: safeTarExtract(tarBuffer, targetDir)
        TarExtract->>FS: write files into targetDir
        TarExtract-->>Caller: extraction status
        Caller->>Auditor: auditExtractedSymlinks(targetDir)
        Auditor->>FS: resolve symlinks and check targets
        alt symlink escapes found
            Auditor-->>Caller: fail, remove/recreate targetDir
        else all within target
            Auditor-->>Caller: success
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit nibbles through tar and code, 🐇
Sniffing paths that try to go rogue,
Blocks the tricks and seals each gate,
Rebuilds the nest when links misbehave,
Hooray — safe packs now hop home! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main security fix: validating tar entries before extraction to prevent host-side vulnerabilities.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sandbox-tar-traversal

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa self-assigned this Apr 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 117-163: validateTarEntries currently uses "tar -tf" which only
lists names and misses hard-link metadata; update validateTarEntries to
enumerate tar members with metadata (e.g. use "tar -tvf -" / verbose listing or
otherwise invoke tar to expose typeflag/linkname) and explicitly reject
hard-link entries by detecting members whose metadata indicate a hard link
(typeflag '1' or a non-empty linkname), pushing a violation like "hard link:
<entry>"; keep existing checks (null bytes, absolute paths, containment) and add
a regression test that creates a hard-link-to-traversal archive and asserts
validateTarEntries returns unsafe so the attack is blocked at validation time
(also note related symbols safeTarExtract(), auditExtractedSymlinks(), and
sanitizeBackupDirectory() for context).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ca1e362-3532-4f2e-8c80-95a15184ce27

📥 Commits

Reviewing files that changed from the base of the PR and between 5c4c48a and d0d1383.

📒 Files selected for processing (2)
  • src/lib/sandbox-state.ts
  • test/security-sandbox-tar-traversal.test.ts

Comment thread src/lib/sandbox-state.ts
Add rejectHardLinks() that uses tar -tvf to detect hard-link entries
before extraction. Hard links have no legitimate use in sandbox state
backups and can reference files outside the extraction root.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/security-sandbox-tar-traversal.test.ts (2)

145-155: Make the PoC assertions prove the dangerous path exactly.

These checks still pass if the tar tool normalizes the malicious prefix away but leaves evil.txt or etc/cron.d in the listing, so they stop proving that the archive retained a traversal/absolute entry. Reading the header name from the tar buffer directly would make the PoC deterministic and avoid depending on host tar formatting here.

Also applies to: 162-169

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-sandbox-tar-traversal.test.ts` around lines 145 - 155, The test
currently relies on spawning the system tar and checking list.stdout (variables:
spawnSync call, list, entries) which can normalize paths; instead parse the
in-memory tar Buffer (variable tar) to read header name bytes directly and
assert the exact header name strings (e.g., "../evil.txt" or "/etc/cron.d/...")
appear in the header name fields; update both occurrences (the block using
spawnSync and the similar block at lines 162-169) to extract and assert header
names from the tar buffer rather than from list.stdout so the PoC
deterministically proves the traversal/absolute entry is preserved.

1-1: Import the source module directly and drop @ts-nocheck.

Right now the behavior checks run against dist/lib/sandbox-state.js, while the regression guards inspect src/lib/sandbox-state.ts. That split can let this suite validate a stale build instead of the code under review, and Line 1 hides any shape drift by disabling TS for the whole file. Prefer importing the source exports directly, or at least typing the module shape so the test stays checked.

Also applies to: 118-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-sandbox-tar-traversal.test.ts` at line 1, Remove the top-line
"// `@ts-nocheck`" and modify the test import so it references the source module
(e.g., import the exports from the src/lib/sandbox-state TypeScript module
instead of dist/lib/sandbox-state.js), or alternatively add an explicit typed
module declaration for the tested exports (the sandbox-state exports used in
this file and around lines 118-127) so TypeScript can verify shapes; ensure the
test uses the actual source exports (same named functions/classes) rather than
the built dist artifact and run the test type-check to confirm no shape drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/security-sandbox-tar-traversal.test.ts`:
- Around line 145-155: The test currently relies on spawning the system tar and
checking list.stdout (variables: spawnSync call, list, entries) which can
normalize paths; instead parse the in-memory tar Buffer (variable tar) to read
header name bytes directly and assert the exact header name strings (e.g.,
"../evil.txt" or "/etc/cron.d/...") appear in the header name fields; update
both occurrences (the block using spawnSync and the similar block at lines
162-169) to extract and assert header names from the tar buffer rather than from
list.stdout so the PoC deterministically proves the traversal/absolute entry is
preserved.
- Line 1: Remove the top-line "// `@ts-nocheck`" and modify the test import so it
references the source module (e.g., import the exports from the
src/lib/sandbox-state TypeScript module instead of dist/lib/sandbox-state.js),
or alternatively add an explicit typed module declaration for the tested exports
(the sandbox-state exports used in this file and around lines 118-127) so
TypeScript can verify shapes; ensure the test uses the actual source exports
(same named functions/classes) rather than the built dist artifact and run the
test type-check to confirm no shape drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9260e932-ff8d-4b2d-9aaa-217cc10daee4

📥 Commits

Reviewing files that changed from the base of the PR and between d0d1383 and 1dd388b.

📒 Files selected for processing (2)
  • src/lib/sandbox-state.ts
  • test/security-sandbox-tar-traversal.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/sandbox-state.ts

Comment thread test/security-sandbox-tar-traversal.test.ts
Comment thread test/security-sandbox-tar-traversal.test.ts
@wscurran wscurran added security Something isn't secure priority: high Important issue that should be resolved in the next release NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). fix labels Apr 21, 2026
@cv cv merged commit 4eca8ea into main Apr 21, 2026
15 checks passed
@cv cv added the v0.0.22 Release target label Apr 21, 2026
ericksoa pushed a commit that referenced this pull request Apr 23, 2026
…2268) (#2308)

## Summary

Closes #2268 — P1 regression in v0.0.22.

\`safeTarExtract\`'s post-extraction symlink audit (added in #2163 to
block tar path-traversal attacks) flagged every symlink whose target
resolved outside the extraction temp directory. The sandbox base image
legitimately ships intra-sandbox symlinks like \`/sandbox/.openclaw →
/sandbox/.openclaw-data\` — those resolve *outside* the host temp dir
but *inside* the canonical \`/sandbox\` root, where they'll be correctly
resolved when the backup is restored into a fresh sandbox.

Result: the audit nuked every backup, breaking \`nemoclaw rebuild\` and
\`nemoclaw snapshot create\` on v0.0.22. The only workaround was
\`destroy + onboard\` (loses all state).

## Fix

- \`auditExtractedSymlinks(dirPath, allowedRoots: string[])\` — now
takes an array of allowed roots instead of a single one
- \`safeTarExtract\` passes \`[targetDir, "/sandbox"]\` so intra-sandbox
absolute symlinks are honored while genuine escape attempts (e.g.
symlink to \`/etc/passwd\` or \`../../.ssh/authorized_keys\`) still
abort the extraction

The security guardrail is intact: \`/sandbox\` is a subtree root, so
crafted symlinks with absolute targets outside \`/sandbox\` (e.g.
\`/etc/passwd\`) are still rejected.

## Test plan

- [x] New test: \`allows symlinks whose target resolves within /sandbox
(intra-sandbox layout)\` — locks the regression fix
- [x] New test: \`blocks symlinks that escape /sandbox even with an
absolute target\` — confirms \`/etc/passwd\` symlinks still rejected
- [x] Existing \`blocks symlink escaping target directory\` (relative
\`../../\`) still passes
- [x] All 21 sandbox-tar security tests pass
- [x] Full CLI suite: 1783 passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Security & Bug Fixes**
* Improved symlink validation during tar archive extraction to correctly
allow absolute symlink targets that resolve within the sandbox
environment.
* Enhanced security checks now reject symlinks with targets outside
permitted directories.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). priority: high Important issue that should be resolved in the next release security Something isn't secure v0.0.22 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants