security: replace shell string interpolation with argv arrays#148
security: replace shell string interpolation with argv arrays#148gn00295120 wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
1158c9b to
c9cc2df
Compare
All user-controlled values (sandbox names, instance names, container
names, Docker images) were previously interpolated into shell command
strings passed to `bash -c` via the `run()` helper. This enabled
command injection through any of these values (e.g., a sandbox name
containing `; malicious-command`).
Changes:
- Add `runArgv()` and `runCaptureArgv()` to runner.js — these call
`spawnSync(prog, args)` and `execFileSync(prog, args)` directly
with no shell interpolation.
- Add `assertSafeName()` validator that enforces `^[a-zA-Z0-9][a-zA-Z0-9_-]{0,62}$`
on all sandbox, instance, and container names at entry points.
- Convert all call sites in nemoclaw.js, onboard.js, nim.js, policies.js,
and credentials.js that pass user-controlled values to use the new
argv-based functions.
- Replace shell `cp -r` / `rm -rf` in onboard.js with Node.js
`fs.cpSync()` / `fs.rmSync()`.
- Keep `run()` / `runCapture()` available for hardcoded-only commands
(e.g., `bash scripts/setup.sh`) with security comments.
- Write policy temp files with mode 0o600 instead of default permissions.
The existing `run()` function is preserved (with a security warning in
its JSDoc) for commands that are built entirely from hardcoded strings
and require shell features (pipes, redirects, backgrounding).
Signed-off-by: Lucas Wang <lucas_wang@lucas-futures.com>
7 test suites (28 tests) that demonstrate: - PoC 1-5: OLD bash -c pattern IS vulnerable to semicolons, $(), backticks, pipes, && chains — NEW argv pattern treats them as literals - PoC 6: assertSafeName rejects 11 attack payloads and accepts 5 safe names - PoC 7: nemoclaw CLI deploy rejects malicious instance names at entry point Signed-off-by: Lucas Wang <lucas_wang@lucas-futures.com>
- Fix setupSpark(): pass NVIDIA_API_KEY via env option, not shell string
- Remove dead sandboxEnv variable in createSandbox(), clarify comment
on openshell env subcommand protocol
- Add assertSafeName() at dispatch entry before registry lookup,
blocking metacharacters even for unregistered sandbox names
- Validate NEMOCLAW_GPU with dedicated regex allowing colons/dots,
instead of stripping chars before assertSafeName
- Remove no-op local `execSync("echo ok")` from SSH wait loop
- Convert waitForNimHealth() curl to runCaptureArgv for consistency
Signed-off-by: Lucas Wang <lucas_wang@lucas-futures.com>
c9cc2df to
bff823f
Compare
📝 WalkthroughWalkthroughThe changes migrate shell-based command execution to argv-based execution across multiple library and main modules, introducing new runner utilities (runArgv, runCaptureArgv, assertSafeName). Input validation for sandbox and preset names is added throughout. Security tests validate command-injection defenses using the new patterns. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/nemoclaw.js (1)
124-145:⚠️ Potential issue | 🔴 CriticalShell sourcing of unescaped credentials enables command execution on the remote host.
Lines 124–128 write raw credential values directly to the
.envfile without shell escaping. When the file is sourced via. .envon lines 135, 139, and 145, any$(), backticks, quotes, or newlines inNVIDIA_API_KEY,GITHUB_TOKEN, orTELEGRAM_BOT_TOKENwill be evaluated and executed by the remote shell. Useprintf '%s\n' "KEY=$(printf '%s\n' "$VALUE" | sed 's/[&/\]/\\&/g')"or similar escaping before writing, or load the file with a format parser instead of sourcing it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/nemoclaw.js` around lines 124 - 145, The code currently writes raw credential values into envLines (NVIDIA_API_KEY, GITHUB_TOKEN, TELEGRAM_BOT_TOKEN via getCredential) and then writes envTmp which is later sourced with ". .env" in the runArgv SSH calls, enabling remote shell execution if values contain shell metacharacters; instead, sanitize or avoid sourcing raw env: before pushing envTmp, escape each credential value (e.g., safely quote or escape single quotes/newlines or encode values such as base64 and decode on the remote side) when building envLines, or switch to a non-sourced parser approach (export via ssh with a safe transport or use a dotenv parser on the remote side) so the ". .env" invocations in the runArgv SSH commands no longer execute unescaped content.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 352-354: The onboarding code is hiding failures from the final
"openshell inference set" call by passing ignoreError: true and silencing stdio
in runArgv; change the runArgv invocation(s) that call ("openshell",
["inference","set",...]) to not use ignoreError: true (remove that option) and
either let stdio inherit (so errors are visible) or capture and check the
returned exit status from runArgv, then gate the registry update / success
message on a non-zero exit (throw or return error on failure). Specifically
update the runArgv calls that set inference (the ones building args
["inference","set", "--no-verify","--provider","nvidia-nim","--model", model])
to propagate failures instead of swallowing them.
In `@bin/lib/runner.js`:
- Around line 67-73: The runner helpers (run, runArgv, runCapture,
runCaptureArgv) currently build an env merge but then spread ...opts afterwards
which can overwrite that env merge when callers pass opts.env, causing inherited
vars like PATH/HOME to be lost; fix by extracting env from opts first (e.g.,
const { env: optsEnv, ...restOpts } = opts) or by placing the final env property
after the ...opts spread so env = { ...process.env, ...opts.env } is assigned
last, and update run(), runArgv(), runCapture(), and runCaptureArgv() to use
this pattern so callers' opts no longer drop inherited environment variables.
In `@test/security-poc.test.js`:
- Around line 47-58: The test currently quotes ${maliciousName} so the semicolon
is literal; change the cmd construction in the OLD pattern test (the const cmd
used with oldRun) to remove the surrounding double quotes around
${maliciousName} so the injected semicolon is interpreted by the shell (e.g.,
`echo getting sandbox: ${maliciousName}`), and update the nearby comment that
incorrectly states "The semicolon splits the command" to reflect that removal of
quotes is required for a real injection; keep the existing assertion that checks
result.stdout for "INJECTED".
---
Outside diff comments:
In `@bin/nemoclaw.js`:
- Around line 124-145: The code currently writes raw credential values into
envLines (NVIDIA_API_KEY, GITHUB_TOKEN, TELEGRAM_BOT_TOKEN via getCredential)
and then writes envTmp which is later sourced with ". .env" in the runArgv SSH
calls, enabling remote shell execution if values contain shell metacharacters;
instead, sanitize or avoid sourcing raw env: before pushing envTmp, escape each
credential value (e.g., safely quote or escape single quotes/newlines or encode
values such as base64 and decode on the remote side) when building envLines, or
switch to a non-sourced parser approach (export via ssh with a safe transport or
use a dotenv parser on the remote side) so the ". .env" invocations in the
runArgv SSH commands no longer execute unescaped content.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 66e410c3-4611-479a-b879-3895145d9beb
📒 Files selected for processing (7)
bin/lib/credentials.jsbin/lib/nim.jsbin/lib/onboard.jsbin/lib/policies.jsbin/lib/runner.jsbin/nemoclaw.jstest/security-poc.test.js
| runArgv("openshell", [ | ||
| "inference", "set", "--no-verify", "--provider", "nvidia-nim", "--model", model, | ||
| ], { ignoreError: true, stdio: ["ignore", "ignore", "ignore"] }); |
There was a problem hiding this comment.
Don't hide openshell inference set failures.
Lines 352, 361, and 370 all run the final inference wiring with ignoreError: true and silenced stdio. If that step fails, onboarding still updates the registry and prints success, leaving a broken sandbox in a false-success state. Let inference set fail hard, or gate the later success path on its exit status.
Also applies to: 361-363, 370-372
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard.js` around lines 352 - 354, The onboarding code is hiding
failures from the final "openshell inference set" call by passing ignoreError:
true and silencing stdio in runArgv; change the runArgv invocation(s) that call
("openshell", ["inference","set",...]) to not use ignoreError: true (remove that
option) and either let stdio inherit (so errors are visible) or capture and
check the returned exit status from runArgv, then gate the registry update /
success message on a non-zero exit (throw or return error on failure).
Specifically update the runArgv calls that set inference (the ones building args
["inference","set", "--no-verify","--provider","nvidia-nim","--model", model])
to propagate failures instead of swallowing them.
| function runArgv(prog, args, opts = {}) { | ||
| const result = spawnSync(prog, args, { | ||
| stdio: "inherit", | ||
| cwd: ROOT, | ||
| env: { ...process.env, ...opts.env }, | ||
| ...opts, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n bin/lib/runner.jsRepository: NVIDIA/NemoClaw
Length of output: 4605
🏁 Script executed:
rg -n "runArgv\(|runCaptureArgv\(|run\(|runCapture\(" --type js -A 3 bin/Repository: NVIDIA/NemoClaw
Length of output: 14536
🏁 Script executed:
rg -n "env.*process\.env" bin/lib/runner.jsRepository: NVIDIA/NemoClaw
Length of output: 243
opts.env currently drops the inherited environment in all four runner functions.
Lines 52, 71, 93, and 113 build a merged env, but the later ...opts spread overwrites it whenever callers pass opts.env. This makes the helpers lose inherited variables like PATH, HOME, and DOCKER_HOST unless every caller manually re-spreads process.env. The pattern is fragile: extract env from opts before spreading, or assign env after ...opts to ensure inherited variables are always preserved regardless of how callers construct their options.
Affected functions: run(), runArgv(), runCapture(), runCaptureArgv()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/runner.js` around lines 67 - 73, The runner helpers (run, runArgv,
runCapture, runCaptureArgv) currently build an env merge but then spread ...opts
afterwards which can overwrite that env merge when callers pass opts.env,
causing inherited vars like PATH/HOME to be lost; fix by extracting env from
opts first (e.g., const { env: optsEnv, ...restOpts } = opts) or by placing the
final env property after the ...opts spread so env = { ...process.env,
...opts.env } is assigned last, and update run(), runArgv(), runCapture(), and
runCaptureArgv() to use this pattern so callers' opts no longer drop inherited
environment variables.
| it("OLD pattern: bash -c IS vulnerable — injected command executes", () => { | ||
| // This simulates what the old code did: | ||
| // run(`openshell sandbox connect ${sandboxName}`) | ||
| // We use echo to safely demonstrate injection via stdout | ||
| const cmd = `echo "getting sandbox: ${maliciousName}"`; | ||
| const result = oldRun(cmd); | ||
|
|
||
| // The semicolon splits the command — "echo INJECTED" runs separately | ||
| assert.ok( | ||
| result.stdout.includes("INJECTED"), | ||
| `Expected stdout to contain "INJECTED" from injected command, got: "${result.stdout}"` | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /dev/shm && find . -name "security-poc.test.js" -o -name "*security*" | head -20Repository: NVIDIA/NemoClaw
Length of output: 117
🏁 Script executed:
find . -name "*.test.js" -type f | head -20Repository: NVIDIA/NemoClaw
Length of output: 205
🏁 Script executed:
git ls-files | grep -i security | head -20Repository: NVIDIA/NemoClaw
Length of output: 96
🏁 Script executed:
git ls-files | grep -i test | head -20Repository: NVIDIA/NemoClaw
Length of output: 410
🏁 Script executed:
ls -laRepository: NVIDIA/NemoClaw
Length of output: 1951
🏁 Script executed:
find . -type f -name "*.test.js" -o -name "*poc*" 2>/dev/null | head -20Repository: NVIDIA/NemoClaw
Length of output: 205
🏁 Script executed:
git ls-files | grep test | head -30Repository: NVIDIA/NemoClaw
Length of output: 410
🏁 Script executed:
cat -n test/security-poc.test.js | head -100Repository: NVIDIA/NemoClaw
Length of output: 4738
🏁 Script executed:
cat -n test/security-poc.test.js | tail -n +100 | head -100Repository: NVIDIA/NemoClaw
Length of output: 4458
🏁 Script executed:
cat -n test/security-poc.test.js | tail -n +200Repository: NVIDIA/NemoClaw
Length of output: 4375
🏁 Script executed:
# Verify bash quoting behavior to confirm the analysis
bash -c 'maliciousName="test; echo INJECTED; echo"; cmd="echo \"getting sandbox: $maliciousName\""; eval "$cmd"'Repository: NVIDIA/NemoClaw
Length of output: 101
🏁 Script executed:
# Now verify without quotes to show actual injection
bash -c 'maliciousName="test; echo INJECTED; echo"; cmd="echo getting sandbox: $maliciousName"; eval "$cmd"'Repository: NVIDIA/NemoClaw
Length of output: 90
The PoC test doesn't demonstrate command injection in its current form.
Line 51 wraps ${maliciousName} in double quotes, so semicolons remain literal. The assertion on line 56 passes because the literal string "INJECTED" is echoed as part of the argument, not because a second command executed. Additionally, the comment on line 54 is incorrect—it claims the semicolon splits the command, but semicolons inside double quotes are literal characters, not separators.
Remove the quotes to properly demonstrate command injection:
Suggested fix
- const cmd = `echo "getting sandbox: ${maliciousName}"`;
+ const cmd = `echo getting sandbox: ${maliciousName}`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/security-poc.test.js` around lines 47 - 58, The test currently quotes
${maliciousName} so the semicolon is literal; change the cmd construction in the
OLD pattern test (the const cmd used with oldRun) to remove the surrounding
double quotes around ${maliciousName} so the injected semicolon is interpreted
by the shell (e.g., `echo getting sandbox: ${maliciousName}`), and update the
nearby comment that incorrectly states "The semicolon splits the command" to
reflect that removal of quotes is required for a real injection; keep the
existing assertion that checks result.stdout for "INJECTED".
Closes shell-injection attack surface in the legacy CJS layer by replacing
all user-controlled run() / runCapture() shell strings with the new argv-safe
runArgv() / runCaptureArgv() helpers. assertSafeName() guards every
user-supplied sandbox/instance/preset name before it enters any command.
bin/lib/onboard.js -- all openshell/bash/brew calls -> runArgv;
file copies -> fs.cpSync/fs.rmSync (no cp shell)
bin/lib/nim.js -- docker pull/rm/run/stop/inspect -> runArgv/runCaptureArgv;
assertSafeName guard on sandboxName
bin/lib/policies.js -- openshell policy get/set -> runCaptureArgv/runArgv;
assertSafeName on sandboxName and presetName;
temp policy file written with mode 0o600
bin/nemoclaw.js -- setupSpark: remove inline NVIDIA_API_KEY=VALUE from
sudo argv (sudo -E already inherits env);
deploy: assertSafeName on instanceName;
sandbox connect/status/logs/destroy -> runArgv
Supersedes PRs: NVIDIA#148 (shell injection), part of NVIDIA#330 (credential leak).
|
Closing — the command injection hardening here is valuable but #330 addresses the actual ps aux credential exposure issue (#325) by switching to openshell's env-lookup form for --credential. The argv array approach in this PR doesn't prevent the key from appearing in process args. Thanks for the thorough work @gn00295120. |
Closes shell-injection attack surface in the legacy CJS layer by replacing
all user-controlled run() / runCapture() shell strings with the new argv-safe
runArgv() / runCaptureArgv() helpers. assertSafeName() guards every
user-supplied sandbox/instance/preset name before it enters any command.
bin/lib/onboard.js -- all openshell/bash/brew calls -> runArgv;
file copies -> fs.cpSync/fs.rmSync (no cp shell)
bin/lib/nim.js -- docker pull/rm/run/stop/inspect -> runArgv/runCaptureArgv;
assertSafeName guard on sandboxName
bin/lib/policies.js -- openshell policy get/set -> runCaptureArgv/runArgv;
assertSafeName on sandboxName and presetName;
temp policy file written with mode 0o600
bin/nemoclaw.js -- setupSpark: remove inline NVIDIA_API_KEY=VALUE from
sudo argv (sudo -E already inherits env);
deploy: assertSafeName on instanceName;
sandbox connect/status/logs/destroy -> runArgv
Supersedes PRs: NVIDIA#148 (shell injection), part of NVIDIA#330 (credential leak).
Summary
runArgv()andrunCaptureArgv()torunner.js— safe alternatives torun()that usespawnSync(prog, args)/execFileSync(prog, args)with no shell interpolationassertSafeName()validator enforcing^[a-zA-Z0-9][a-zA-Z0-9_-]{0,62}$on all sandbox, instance, and container namesnemoclaw.js,onboard.js,nim.js,policies.js, andcredentials.jsto use argv-based functionscp -r/rm -rfinonboard.jswith Node.jsfs.cpSync()/fs.rmSync()0o600instead of default permissionsrun()/runCapture()(with security warning JSDoc) for hardcoded-only shell commandsMotivation
The
run()helper passes commands tospawnSync("bash", ["-c", cmd]). Any user-controlled value interpolated into the command string enables OS command injection. For example:This affected sandbox names, instance names, Docker container names, GPU specifications, and model names across the entire CLI.
Related Issues & PRs
This PR provides a comprehensive, root-cause fix for the shell injection attack surface. It overlaps with several other PRs that address subsets of the same problem:
deploy()— 3execSync→execFileSyncdeploy()— shell quoting +deploy.jshelperdeploy()changes +assertSafeName()nemoclaw-start.sh), addresses #117const→letinpolicies.jsconstbugWhy this PR over #110 / #143?
runArgv()/runCaptureArgv()infrastructure inrunner.js— all current and future callers benefitassertSafeName()provides defense-in-depth at CLI entry points;,$(),`,|,&&) are mitigatedTest plan
nemoclaw onboardcompletes successfullynemoclaw <sandbox> connectworksnemoclaw deploy "foo;id"→Invalid instance namenemoclaw deploy "$(whoami)"→Invalid instance namePoC Results (actual test output)
Summary by CodeRabbit
Release Notes
Security
Improvements