Add kernel-level sandbox via nono-py for scan command by christophetd · Pull Request #712 · DataDog/guarddog

christophetd · 2026-04-10T15:12:39Z

Summary

Adds kernel-level sandboxing (via nono-py) to the scan command as defense-in-depth against archive extraction vulnerabilities (path traversal, zip bombs) that led to CVE-2022-23530, CVE-2022-23531, CVE-2026-22870, CVE-2026-22871
--sandbox is on by default; --no-sandbox to disable. Fail-safe: exits if nono can't set up the sandbox on the platform
Local scans (directory or archive): full sandbox applied to the main process (network blocked, filesystem restricted to scanned path + temp dir) before extraction and analysis
Remote scans: three-phase approach:
1. Download + metadata analysis run unsandboxed (need network for package registry and DNS/email checks)
2. Archive extraction runs in a sandboxed subprocess (python -m guarddog.sandbox) with network blocked and filesystem restricted
3. Source code analysis (YARA/Semgrep) runs in the main process after a sandbox is applied (network blocked, filesystem restricted to extracted files)
Handles nested .gem archive extraction in the subprocess
Scope: scan command only; verify deferred to follow-up

New files

guarddog/sandbox.py -- is_available(), apply_sandbox(), extract_sandboxed(), and subprocess entry point
tests/core/test_sandbox.py -- 9 unit tests

How sandboxing works

Scan type	Extraction	Metadata analysis	Source code analysis	Network
Local dir	N/A	N/A	Main process (sandboxed)	Blocked
Local archive	Main process (sandboxed)	N/A	Main process (sandboxed)	Blocked
Remote package	Sandboxed subprocess	Unsandboxed	Main process (sandboxed)	Blocked after metadata

Demo 1: RCE in the unsandboxed code path

Simulates a vulnerability in the --no-sandbox code path:

diff --git a/guarddog/cli.py b/guarddog/cli.py
index a28b5e5..cb9d5ef 100644
--- a/guarddog/cli.py
+++ b/guarddog/cli.py
@@ -254,6 +254,7 @@ def _scan(
                 )
             else:
                 result |= scanner.scan_remote(identifier, version, rule_param)
+                import subprocess; subprocess.run(["touch", "/tmp/pwned"])
 
     except Exception as e:
         log.error(f"Error occurred while scanning target {identifier}: '{e}'\n")

$ git apply demo1.diff
$ uv run guarddog npm scan requests && cat /tmp/pwned
cat: /tmp/pwned: No such file or directory   # --no-sandbox path not reached

$ uv run guarddog npm scan requests --no-sandbox && cat /tmp/pwned
# no error -- file was created

Demo 2: RCE during source code analysis

Simulates a vulnerability in YARA analysis (e.g. a crafted file exploiting a parser bug):

diff --git a/guarddog/analyzer/analyzer.py b/guarddog/analyzer/analyzer.py
index 78a36a6..289e5e3 100644
--- a/guarddog/analyzer/analyzer.py
+++ b/guarddog/analyzer/analyzer.py
@@ -376,6 +376,7 @@ class Analyzer:
             dict[str]: map from each IOC rule and their corresponding output
         """
         log.debug(f"Running yara rules against directory '{path}'")
+        import subprocess; subprocess.run(["touch", "/tmp/pwned-during-analysis"])
 
         all_rules = self.yara_ruleset
         if rules is not None:

$ git apply demo2.diff
$ uv run guarddog pypi scan requests
$ ls /tmp/pwned-during-analysis
ls: /tmp/pwned-during-analysis: No such file or directory   # blocked by sandbox

Defense-in-depth against archive extraction vulnerabilities (path traversal, zip bombs) that led to CVE-2022-23530, CVE-2022-23531, CVE-2026-22870, CVE-2026-22871. - New guarddog/sandbox.py wrapping nono-py: filesystem restrictions and network blocking via CapabilitySet - --sandbox (default) / --no-sandbox CLI flag on scan command - Fail-safe: exits if sandbox cannot be set up on the platform - Local scans: full sandbox (network blocked, filesystem restricted) before extraction and analysis - Remote scans: two-phase approach. Download + metadata analysis run unsandboxed (need network), then sandbox is applied before source code analysis (YARA/Semgrep) - New download_package() on PackageScanner for the two-phase flow - Scope: scan command only; verify deferred to follow-up

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 513309fa58

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Extract archives in a sandboxed subprocess (python -m guarddog.sandbox) that applies nono before calling safe_extract. This closes the TOCTOU gap for remote scans: extraction is now fully sandboxed regardless of scan type. - Simplify _scan_remote_sandboxed: monkey-patch scanner._extract_archive to route through the sandboxed subprocess, then delegate to scan_remote. No more duplicated merge/risk-formatting logic. - Remove download_package() from PackageScanner (no longer needed). - Move sandbox imports to top-level in cli.py. - Add DEBUG logging of full sandbox capability set. - Handle nested .gem extraction in subprocess.

- Allow parent directory when scan_path is a file (nono-py expects directories) - Add [project] table to pyproject.toml so uv can install the package

Three issues with the sandboxed extraction subprocess: - Writable paths must exist before allow_path (nono-py requirement) - Remove dry-run QueryContext validation that gave false denials - Set subprocess CWD to temp dir so tarsafe's os.getcwd() works under the sandbox (the inherited CWD was outside allowed paths) - Don't pass archive as scan_path in subprocess since the temp dir already provides access

Previously only archive extraction ran in a sandbox for remote packages. Now the main process also enters a sandbox (network blocked, filesystem restricted) before running YARA/Semgrep analysis.

- Add guarddog package directory to sandbox read paths so YARA/Semgrep rule files are accessible when running from source (outside sys.prefix) - Pass tmpdir as writable in remote sandboxed scans so temp dir cleanup and Semgrep temp writes work correctly - Remove leftover test code (touch /tmp/pwned)

Use mkdtemp + realpath instead of TemporaryDirectory context manager. On macOS /var symlinks to /private/var, and nono doesn't resolve symlinks, so cleanup via the symlink path was blocked by the sandbox.

Metadata rules (e.g. unclaimed_maintainer_email_domain) need network access for DNS checks. Split the analyze() call so metadata runs in phase 1 (unsandboxed) and source code analysis runs in phase 2 (sandboxed).

- Default (no flag): use sandbox if available, warn if not - --sandbox: force sandbox, hard-fail if unavailable - --no-sandbox: skip sandbox entirely

nono-py is a required dependency. ImportError means a broken install, not an unsupported platform. is_available() now only checks nono.is_supported().

christophetd requested a review from a team as a code owner April 10, 2026 15:12

chatgpt-codex-connector Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread guarddog/cli.py Outdated

christophetd added 3 commits April 10, 2026 17:22

Fix sandbox crash on file paths, add PEP 621 project metadata

f1ddeae

- Allow parent directory when scan_path is a file (nono-py expects directories) - Add [project] table to pyproject.toml so uv can install the package

sobregosodd reviewed Apr 10, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Fix incorrect pyproject.toml

622ab66

sobregosodd approved these changes Apr 10, 2026

View reviewed changes

christophetd added 12 commits April 10, 2026 20:50

Document sandboxed scanning in README

06086ce

Fix sandbox platform details and nono repo URL in README

ed3b12b

Clarify sandbox default behavior on unsupported platforms

2ce32e9

Sandbox source code analysis for remote scans

038bac5

Previously only archive extraction ran in a sandbox for remote packages. Now the main process also enters a sandbox (network blocked, filesystem restricted) before running YARA/Semgrep analysis.

Fix macOS symlink issue in sandbox temp dir cleanup

85614d0

Use mkdtemp + realpath instead of TemporaryDirectory context manager. On macOS /var symlinks to /private/var, and nono doesn't resolve symlinks, so cleanup via the symlink path was blocked by the sandbox.

Run metadata analysis before sandbox in remote scans

b7b57f2

Metadata rules (e.g. unclaimed_maintainer_email_domain) need network access for DNS checks. Split the analyze() call so metadata runs in phase 1 (unsandboxed) and source code analysis runs in phase 2 (sandboxed).

Update README sandbox section with accurate phase descriptions

2c03089

Auto-detect sandbox availability by default

56666cf

- Default (no flag): use sandbox if available, warn if not - --sandbox: force sandbox, hard-fail if unavailable - --no-sandbox: skip sandbox entirely

Let ImportError propagate if nono-py is missing

d99cfcb

nono-py is a required dependency. ImportError means a broken install, not an unsupported platform. is_available() now only checks nono.is_supported().

Remove leftover test code (touch /tmp/pwned)

f688c82

add poetry.lock

141840d

sobregosodd approved these changes Apr 10, 2026

View reviewed changes

Comment thread guarddog/cli.py

christophetd merged commit acf7b4e into s.obregoso/v3 Apr 10, 2026
1 check passed

christophetd deleted the christophe.tafanidereeper/nono-sandbox branch April 10, 2026 20:11

christophetd added a commit that referenced this pull request Apr 10, 2026

Add sandboxing via nono-py for the scanning process (#712)

6a6b5f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kernel-level sandbox via nono-py for scan command#712

Add kernel-level sandbox via nono-py for scan command#712
christophetd merged 17 commits intos.obregoso/v3from
christophe.tafanidereeper/nono-sandbox

christophetd commented Apr 10, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christophetd commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New files

How sandboxing works

Demo 1: RCE in the unsandboxed code path

Demo 2: RCE during source code analysis

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

christophetd commented Apr 10, 2026 •

edited

Loading