Skip to content

Emit GitHub annotations for drift findings#2

Merged
Conalh merged 1 commit into
mainfrom
codex/github-annotations
May 21, 2026
Merged

Emit GitHub annotations for drift findings#2
Conalh merged 1 commit into
mainfrom
codex/github-annotations

Conversation

@Conalh
Copy link
Copy Markdown
Owner

@Conalh Conalh commented May 21, 2026

Summary

  • Add --format github to emit GitHub workflow warning annotations for each ScopeTrail finding.
  • Wire the composite action to emit annotations in addition to Markdown step summaries and JSON rating output.
  • Update README text and tests for annotation behavior.

Validation

  • npm run build
  • npm test (9 tests, 0 failures)
  • node dist/index.js diff --old test/fixtures/combined/old --new test/fixtures/combined/new --format github

@Conalh Conalh marked this pull request as ready for review May 21, 2026 17:51
@Conalh Conalh merged commit 0c78fdc into main May 21, 2026
1 check passed
@Conalh Conalh deleted the codex/github-annotations branch May 21, 2026 17:51
Conalh added a commit that referenced this pull request May 29, 2026
…ion-drift blind spots (#49)

* fix: run CLI bin through symlinks and stop false Codex baseline drift

Two trust-critical fixes flagged in external review.

1. npm installs the `scopetrail` bin as a symlink, so the entrypoint's
   `import.meta.url === process.argv[1]` main-module check was false when
   launched via `npx`/global install: main() never ran and the CLI exited
   0 with no output -- a CI gate that silently passed every PR. Resolve both
   sides through realpathSync. Add a test that runs the built bin through a
   symlink (the existing CLI tests only ran `node dist/index.js` directly).

2. Codex sandbox/approval drift ranked a missing base value at -1, so a
   brand-new .codex/config.toml that merely set the narrowest posture
   (read-only sandbox, untrusted approval) was reported as a high-severity
   widening/weakening -- a false positive on the safest possible config.
   Anchor the missing baseline at Codex's safe default so only settings
   genuinely wider than that default surface; a brand-new danger-full-access
   sandbox or `never` approval still fires.

Add unit tests for both Codex directions and a benign benchmark fixture
(codex-baseline-narrowest); regenerate RESULTS.md (29 cases, precision/recall
unchanged at 100%/0% FP) and sync the README corpus counts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat: close remaining permission-drift blind spots and tighten docs

Implements the rest of the external review (the trust-critical bin/Codex
pair landed in the previous commit).

Detector blind spots:
- #4 Merge all recognized MCP server maps instead of first-map-wins, so an
  empty `mcpServers: {}` can no longer shadow a populated `servers: {}` in
  Cursor/VS Code configs.
- #3 Diff sensitive MCP fields (env, headers, cwd) on existing servers, which
  serverCommand()-based comparison ignored. A server keeping the same command
  but gaining a secret env var or auth header now surfaces; secret-bearing key
  names escalate to high. Applies to .mcp.json and Codex [mcp_servers.NAME].
  Removals stay silent (narrowing, not widening). New kinds:
  scope_trail.mcp_server_sensitive_field_changed and
  scope_trail.codex_mcp_sensitive_field_changed.
- #5 Model Claude hook entries by (matcher, type, command) rather than command
  alone, so rebinding a guard from one tool to another (same command) is
  caught. NOTE: this reverses the prior "matcher change is noise" decision and
  its test — a PreToolUse guard moved from Bash to Read stops guarding Bash,
  which is a real enforcement-surface change.
- #7 Expand the critical removed-deny pattern set beyond .env/secret/
  credential/.pem to SSH keys, *.key, cloud credentials, registry tokens, and
  kube configs.

Docs & supply-chain:
- #2/#8 Qualify the README "local-only" claim: the scanner uploads nothing,
  but the Action's setup step runs `npm ci` from the registry.
- #9 Pin agent-gov-core to exact 1.3.0 (lockfile kept in sync) so npm
  consumers do not silently resolve a newer parser/schema.
- #10 Note that the benchmark figures bound regressions, not real-world field
  accuracy.

Each change ships unit tests and a labeled benchmark fixture. Corpus grows to
35 cases (27 rogue, 8 benign) across 21 detector kinds; precision/recall hold
at 100%/0% false-positive rate, RESULTS.md regenerated and README counts
synced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: clean up base snapshot when head materialization fails

runDiff materialized the base snapshot, then the head snapshot, and only
assigned `cleanup` after both succeeded. If head materialization threw — an
unresolvable head ref (a shallow checkout missing it), or a max-buffer error
on an oversized config — the base snapshot's temp dir was already on disk and
leaked, since cleanup was never wired up. Clean the base snapshot explicitly
before the error propagates.

Found during a fresh correctness pass over the codebase (not part of the
external review). The existing unresolvable-ref test fails on the *base* ref,
before any dir exists, so it never exercised this path. Adds a regression test
that asserts no scopetrail-snapshot temp dir survives a head-side failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant