fix(sandbox): validate tar entries before host-side extraction by ericksoa · Pull Request #2163 · NVIDIA/NemoClaw

ericksoa · 2026-04-21T11:13:31Z

Summary

Security fix (HIGH): backupSandboxState() extracted sandbox-produced tar archives on the host with no entry validation — a compromised sandbox could craft path-traversal entries (../../.ssh/authorized_keys) to write arbitrary files on the host filesystem
Adds safeTarExtract() which validates all entry paths stay within the target directory, extracts with --no-same-owner, and audits symlinks post-extraction
Adds 14-test security regression suite (test/security-sandbox-tar-traversal.test.ts) covering PoC, fix verification, and source-code regression guards

Test plan

npm run build:cli compiles successfully
New test passes: npx vitest run test/security-sandbox-tar-traversal.test.ts (14/14)
Full suite passes: npx vitest run (1925 passed, 5 pre-existing failures in onboard.test.ts)
Manual review: confirm no raw tar -xf without validation remains in sandbox-state.ts

Summary by CodeRabbit

New Features
- Stronger tar extraction safeguards: archives are validated and unsafe contents (path traversal, absolute paths, hidden escapes, hard links) are blocked; post-extract checks remove unsafe results.
Bug Fixes
- Backup restoration now uses guarded extraction and logs blocked extractions as security failures.
Tests
- Added security regression tests covering traversal, absolute paths, symlink escapes, hard-link detection, and extraction safeguards.

backupSandboxState() extracted sandbox-produced tar archives on the host with no entry validation, enabling path traversal sandbox escape. Add safeTarExtract() which validates all entry paths stay within the target directory, extracts with --no-same-owner, and audits symlinks post-extraction.

coderabbitai · 2026-04-21T11:13:48Z

📝 Walkthrough

Walkthrough

Added tar-safety utilities: validation, hard-link rejection, and guarded extraction that runs tar list/extract, blocks unsafe archives, audits post-extraction symlinks, and updates backup logic to use the safe extractor. A new security test suite exercises these behaviors.

Changes

Cohort / File(s)	Summary
Tar Security Implementation `src/lib/sandbox-state.ts`	Added `TarValidationResult` and `SafeExtractResult` types; implemented `validateTarEntries()`, `rejectHardLinks()`, and `safeTarExtract()` which run tar listing, reject null/absolute/path-traversal entries and hard links, perform `tar -xf - --no-same-owner -C targetDir`, audit extracted symlinks, and update `backupSandboxState` to use the new extractor.
Security Test Coverage `test/security-sandbox-tar-traversal.test.ts`	New Vitest suite creating in-memory ustar tars to assert detection of traversal/absolute paths, symlink escape detection and directory cleanup, hard-link rejection, and source-level regression guards verifying `safeTarExtract` usage and extraction flags.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Validator as validateTarEntries / rejectHardLinks
    participant TarList as "tar -tf"
    participant TarExtract as "tar -xf --no-same-owner"
    participant FS as Filesystem
    participant Auditor as auditExtractedSymlinks

    Caller->>Validator: validateTarEntries(tarBuffer, targetDir)
    Validator->>TarList: run tar -tf on tarBuffer
    TarList-->>Validator: entry list
    alt validation or hard-link violations
        Validator-->>Caller: TarValidationResult { safe: false, violations... }
    else validations pass
        Caller->>TarExtract: safeTarExtract(tarBuffer, targetDir)
        TarExtract->>FS: write files into targetDir
        TarExtract-->>Caller: extraction status
        Caller->>Auditor: auditExtractedSymlinks(targetDir)
        Auditor->>FS: resolve symlinks and check targets
        alt symlink escapes found
            Auditor-->>Caller: fail, remove/recreate targetDir
        else all within target
            Auditor-->>Caller: success
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

A rabbit nibbles through tar and code, 🐇
Sniffing paths that try to go rogue,
Blocks the tricks and seals each gate,
Rebuilds the nest when links misbehave,
Hooray — safe packs now hop home! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main security fix: validating tar entries before extraction to prevent host-side vulnerabilities.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/sandbox-tar-traversal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 117-163: validateTarEntries currently uses "tar -tf" which only
lists names and misses hard-link metadata; update validateTarEntries to
enumerate tar members with metadata (e.g. use "tar -tvf -" / verbose listing or
otherwise invoke tar to expose typeflag/linkname) and explicitly reject
hard-link entries by detecting members whose metadata indicate a hard link
(typeflag '1' or a non-empty linkname), pushing a violation like "hard link:
<entry>"; keep existing checks (null bytes, absolute paths, containment) and add
a regression test that creates a hard-link-to-traversal archive and asserts
validateTarEntries returns unsafe so the attack is blocked at validation time
(also note related symbols safeTarExtract(), auditExtractedSymlinks(), and
sanitizeBackupDirectory() for context).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ca1e362-3532-4f2e-8c80-95a15184ce27

📥 Commits

Reviewing files that changed from the base of the PR and between 5c4c48a and d0d1383.

📒 Files selected for processing (2)

src/lib/sandbox-state.ts
test/security-sandbox-tar-traversal.test.ts

Add rejectHardLinks() that uses tar -tvf to detect hard-link entries before extraction. Hard links have no legitimate use in sandbox state backups and can reference files outside the extraction root.

coderabbitai

🧹 Nitpick comments (2)

test/security-sandbox-tar-traversal.test.ts (2)
145-155: Make the PoC assertions prove the dangerous path exactly.

These checks still pass if the tar tool normalizes the malicious prefix away but leaves evil.txt or etc/cron.d in the listing, so they stop proving that the archive retained a traversal/absolute entry. Reading the header name from the tar buffer directly would make the PoC deterministic and avoid depending on host tar formatting here.

Also applies to: 162-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-sandbox-tar-traversal.test.ts` around lines 145 - 155, The test
currently relies on spawning the system tar and checking list.stdout (variables:
spawnSync call, list, entries) which can normalize paths; instead parse the
in-memory tar Buffer (variable tar) to read header name bytes directly and
assert the exact header name strings (e.g., "../evil.txt" or "/etc/cron.d/...")
appear in the header name fields; update both occurrences (the block using
spawnSync and the similar block at lines 162-169) to extract and assert header
names from the tar buffer rather than from list.stdout so the PoC
deterministically proves the traversal/absolute entry is preserved.
1-1: Import the source module directly and drop @ts-nocheck.

Right now the behavior checks run against dist/lib/sandbox-state.js, while the regression guards inspect src/lib/sandbox-state.ts. That split can let this suite validate a stale build instead of the code under review, and Line 1 hides any shape drift by disabling TS for the whole file. Prefer importing the source exports directly, or at least typing the module shape so the test stays checked.

Also applies to: 118-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/security-sandbox-tar-traversal.test.ts` at line 1, Remove the top-line
"// `@ts-nocheck`" and modify the test import so it references the source module
(e.g., import the exports from the src/lib/sandbox-state TypeScript module
instead of dist/lib/sandbox-state.js), or alternatively add an explicit typed
module declaration for the tested exports (the sandbox-state exports used in
this file and around lines 118-127) so TypeScript can verify shapes; ensure the
test uses the actual source exports (same named functions/classes) rather than
the built dist artifact and run the test type-check to confirm no shape drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/security-sandbox-tar-traversal.test.ts`:
- Around line 145-155: The test currently relies on spawning the system tar and
checking list.stdout (variables: spawnSync call, list, entries) which can
normalize paths; instead parse the in-memory tar Buffer (variable tar) to read
header name bytes directly and assert the exact header name strings (e.g.,
"../evil.txt" or "/etc/cron.d/...") appear in the header name fields; update
both occurrences (the block using spawnSync and the similar block at lines
162-169) to extract and assert header names from the tar buffer rather than from
list.stdout so the PoC deterministically proves the traversal/absolute entry is
preserved.
- Line 1: Remove the top-line "// `@ts-nocheck`" and modify the test import so it
references the source module (e.g., import the exports from the
src/lib/sandbox-state TypeScript module instead of dist/lib/sandbox-state.js),
or alternatively add an explicit typed module declaration for the tested exports
(the sandbox-state exports used in this file and around lines 118-127) so
TypeScript can verify shapes; ensure the test uses the actual source exports
(same named functions/classes) rather than the built dist artifact and run the
test type-check to confirm no shape drift.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9260e932-ff8d-4b2d-9aaa-217cc10daee4

📥 Commits

Reviewing files that changed from the base of the PR and between d0d1383 and 1dd388b.

📒 Files selected for processing (2)

src/lib/sandbox-state.ts
test/security-sandbox-tar-traversal.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/sandbox-state.ts

…2268) (#2308) ## Summary Closes #2268 — P1 regression in v0.0.22. \`safeTarExtract\`'s post-extraction symlink audit (added in #2163 to block tar path-traversal attacks) flagged every symlink whose target resolved outside the extraction temp directory. The sandbox base image legitimately ships intra-sandbox symlinks like \`/sandbox/.openclaw → /sandbox/.openclaw-data\` — those resolve *outside* the host temp dir but *inside* the canonical \`/sandbox\` root, where they'll be correctly resolved when the backup is restored into a fresh sandbox. Result: the audit nuked every backup, breaking \`nemoclaw rebuild\` and \`nemoclaw snapshot create\` on v0.0.22. The only workaround was \`destroy + onboard\` (loses all state). ## Fix - \`auditExtractedSymlinks(dirPath, allowedRoots: string[])\` — now takes an array of allowed roots instead of a single one - \`safeTarExtract\` passes \`[targetDir, "/sandbox"]\` so intra-sandbox absolute symlinks are honored while genuine escape attempts (e.g. symlink to \`/etc/passwd\` or \`../../.ssh/authorized_keys\`) still abort the extraction The security guardrail is intact: \`/sandbox\` is a subtree root, so crafted symlinks with absolute targets outside \`/sandbox\` (e.g. \`/etc/passwd\`) are still rejected. ## Test plan - [x] New test: \`allows symlinks whose target resolves within /sandbox (intra-sandbox layout)\` — locks the regression fix - [x] New test: \`blocks symlinks that escape /sandbox even with an absolute target\` — confirms \`/etc/passwd\` symlinks still rejected - [x] Existing \`blocks symlink escaping target directory\` (relative \`../../\`) still passes - [x] All 21 sandbox-tar security tests pass - [x] Full CLI suite: 1783 passed 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Security & Bug Fixes** * Improved symlink validation during tar archive extraction to correctly allow absolute symlink targets that resolve within the sandbox environment. * Enhanced security checks now reject symlinks with targets outside permitted directories.  Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into fix/sandbox-tar-traversal

d0d1383

ericksoa self-assigned this Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/lib/sandbox-state.ts

fix(sandbox): reject hard-link entries in tar validation

1dd388b

Add rejectHardLinks() that uses tar -tvf to detect hard-link entries before extraction. Hard links have no legitimate use in sandbox state backups and can reference files outside the extraction root.

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

ericksoa commented Apr 21, 2026

View reviewed changes

Comment thread test/security-sandbox-tar-traversal.test.ts

ericksoa commented Apr 21, 2026

View reviewed changes

Comment thread test/security-sandbox-tar-traversal.test.ts

Merge branch 'main' into fix/sandbox-tar-traversal

a973e1a

wscurran added security Something isn't secure priority: high Important issue that should be resolved in the next release NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). fix labels Apr 21, 2026

cv approved these changes Apr 21, 2026

View reviewed changes

cv merged commit 4eca8ea into main Apr 21, 2026
15 checks passed

cv added the v0.0.22 Release target label Apr 21, 2026

zNeill mentioned this pull request Apr 22, 2026

[brev][Sandbox] nemoclaw rebuild and snapshot create fail — safeTarExtract rejects sandbox symlinks as escape violations #2268

Closed

cjagwani mentioned this pull request Apr 22, 2026

fix(sandbox): allow intra-sandbox symlinks in safeTarExtract audit (#2268) #2308

Merged

5 tasks

shidsaa mentioned this pull request Apr 23, 2026

[NemoClaw 0.0.22] snapshot create/rebuild/backup-all blocked by safeTarExtract symlink audit on internal .openclaw-data/* symlinks (PR #2163) #2317

Open

BenediktSchackenberg mentioned this pull request Apr 26, 2026

fix(snapshot): allow /sandbox/.openclaw-data symlinks in safeTarExtract #2488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): validate tar entries before host-side extraction#2163

fix(sandbox): validate tar entries before host-side extraction#2163
cv merged 4 commits intomainfrom
fix/sandbox-tar-traversal

ericksoa commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading