fix: HIGH-tier batch 2 — receipt/audit-chain determinism, command injection, Solidity overflow#6
Closed
New1Direction wants to merge 4 commits into
Closed
fix: HIGH-tier batch 2 — receipt/audit-chain determinism, command injection, Solidity overflow#6New1Direction wants to merge 4 commits into
New1Direction wants to merge 4 commits into
Conversation
added 4 commits
June 1, 2026 12:54
ExecutionReceipt.compute_hash folded in resource_usage (cpu_ms, a wall-clock
float) and status_detail (which for TIMEOUT embeds 'Elapsed {ms}ms > max ...').
Receipts are chained + verify_chain recomputes the hash, so machine-speed jitter
changed the whole ledger chain. compute_hash now covers only logical identity
(status enum kept; usage/detail stay as stored metadata). + determinism tests.
…ock from hash) AuditLogger.log folded the wall-clock audit_id (time.time()*1e6) and a datetime.now() timestamp into the hashed payload, so replaying the same logical sequence produced a different audit_hash chain — the 'immutable, replayable audit ledger' was not reproducible. The chained hash now covers only the logical event + prev_hash; audit_id/timestamp remain stored columns. + reproducibility test.
execute() interpolated context values (incl. agent-controlled tool_input) into a string run under subprocess.run(shell=True) with no escaping, so a value with shell metacharacters (;, |, $(), backticks) executed arbitrary commands. Each interpolated value is now shlex.quote-d (_interpolate gains quote_values=True), so untrusted values become a single literal argument. + injection tests.
… panic) PiSolidityCompilerBugsSentry did parse::<i64>().unwrap() on each version component, so an oversized component (e.g. a 20-digit major in 'pragma solidity 99999999999999999999.8.13;') overflowed and panicked — a Rust<->Python divergence (Python's int() is arbitrary-precision) and, pre-fix, a DoS that only fell back via run_agent_safe. The components are only compared to small buggy-release constants, so unparseable/oversized -> sentinel(-1) that can never match, reproducing Python's 'not flagged' outcome without panicking. Also refactor run_agent_safe to delegate to an agent-independent catch_panic() helper, so the panic-safety test no longer depends on this (now-fixed) overflow.
Owner
Author
|
Consolidated. All commits from this PR are now in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second (final) batch of HIGH-severity audit fixes — completes the tier. Stacked on #5; base is
fix/high-tier.Fixed in this PR
ExecutionReceipt.compute_hashincludedresource_usage(cpu_ms) andstatus_detail(which for TIMEOUT embeds"Elapsed {ms}ms > max ..."). Receipts are chained +verify_chainrecomputes, so machine-speed jitter changed the whole ledger chain. Hash now covers only logical identity (statusenum kept; usage/detail stay as stored metadata).AuditLogger.logfolded the wall-clockaudit_id(time.time()*1e6) and adatetime.now()timestamp into the hashed payload, so replaying the same logical sequence produced a differentaudit_hashchain. The chain now hashes only the logical event +prev_hash;audit_id/timestampremain stored columns.BashCommandHookcommand injection (security) — interpolated context values (incl. agent-controlledtool_input) were spliced into asubprocess.run(shell=True)string with no escaping. Each interpolated value is nowshlex.quote-d, so a value with;,|,$(...), backticks is a single literal argument.i64overflow (Rust parity) —parse::<i64>().unwrap()panicked on an oversized version component (e.g. a 20-digit major); Python'sint()is arbitrary-precision. Oversized/unparseable components → sentinel that can't match the small buggy-release constants, reproducing Python's "not flagged" outcome with no panic. (The panic was already caught by PR fix: resolve 3 critical security findings (sandbox RCE, unauth console, Rust panic fail-safe) #3'srun_agent_safe; this removes it at the source and restores byte-parity.) Also refactoredrun_agent_safeto an agent-independentcatch_panichelper so its test no longer depends on this bug.Verification
PYTHONHASHSEED=0) · Rust: 784 + 13 passed · ruff check + format clean.HIGH tier status
With this PR, all HIGH findings from the audit are addressed (the prior 4 in #5; these 4 here). The one remaining audit item —
pi-semantic-governance.yml"dependency-confusion gate" — was found to be untracked/never-committed (a latent, not live, risk), documented in #5.🤖 Generated with Claude Code