Skip to content

Betterleaks migration#93

Merged
Raftersecurity merged 7 commits into
mainfrom
betterleaks-migration
May 9, 2026
Merged

Betterleaks migration#93
Raftersecurity merged 7 commits into
mainfrom
betterleaks-migration

Conversation

@Raftersecurity
Copy link
Copy Markdown
Owner

No description provided.

Rome-1 and others added 7 commits May 7, 2026 01:41
Replace the secret-scanning binary in both Node and Python with betterleaks
(github.com/betterleaks/betterleaks v1.1.2), maintained by the gitleaks
authors. JSON report shape is unchanged, so the result-mapping logic survives;
the wire-up changes are CLI subcommand (`detect --no-git -s` → `dir <path>`),
release URL pattern, and checksum filename.

Internal renames:
- GitleaksScanner → BetterleaksScanner (node + python)
- BinaryManager.get/is/verify/find/download_gitleaks → *_betterleaks
- GITLEAKS_VERSION → BETTERLEAKS_VERSION
- update-gitleaks command → update-betterleaks (with deprecated alias)

User-facing back-compat:
- `--with-gitleaks` accepted as alias of `--with-betterleaks`
- `--engine gitleaks` accepted as alias of `--engine betterleaks`
- `agent update-gitleaks` accepted as hidden alias

Dropped the parenthetical in the `rafter secrets` help description so the
"Secrets only" phrase no longer wraps mid-line at typer's default width.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Update README, CLAUDE.md, SKILL.md, llms.txt, shared-docs/CLI_SPEC.md, recipes/*,
and the bundled python skill/agent resources to refer to Betterleaks (the
gitleaks successor) as the canonical name. Each occurrence of the legacy
flag/value/subcommand explicitly notes the back-compat alias so existing
docs/scripts/agents that still say `gitleaks` keep working.

Historical docs under docs/ (audits, code reviews, proposals, research) are
left as-is — they describe state at a point in time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ning)

Functional fixes (4 reviewers reported):
- Node BetterleaksScanner now falls back to PATH-installed binary (Homebrew
  users were silently demoted to regex; Python already did this).
- agent verify/status soft-degrade when only legacy gitleaks is present —
  emit "run rafter agent update-betterleaks" hint instead of hard-failing.
- Update remaining user-facing surfaces: action.yml, .pre-commit-hooks.yaml,
  Dockerfile (was `--with-gitleaks 2>/dev/null || true`), python/README.md,
  .github/copilot-instructions.md.
- Python scanner now logs warnings on JSON/timeout errors (was silent FN).

Supply-chain hardening:
- Pin SHA256 hashes for the bundled BETTERLEAKS_VERSION (1.1.2) in source.
  Default install no longer trusts the release-page checksums.txt to
  authenticate itself; --version <other> still falls back to the upstream
  file (TOFU at install time).
- Reject symlink/hardlink/device tar entries on extract — without this,
  a malicious release could ship `betterleaks` as a symlink to e.g.
  ~/.ssh/authorized_keys, and the subsequent chmod +x would mode-flip the
  target. Same defense for zip extraction (Unix mode bits in external_attr).
  Post-extract lstat confirms the result is a regular file.
- Refuse non-https redirects (Node) and non-https final URLs (Python).
- Validate `--version` against /^[A-Za-z0-9._-]+$/ to neutralize URL
  injection attempts like `1.1.2/../evil`.
- Add `--` separator before user-supplied paths in betterleaks invocations
  so a path beginning with `-` isn't parsed as a flag.

Aligns Node scanFile timeout with scanDirectory (60s, matches Python).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parity:
- Python BetterleaksScanner.getSeverity now inspects Tags (key+secret =>
  critical, api => high, generic => medium) — matches Node behavior; same
  rule could classify differently across runtimes before this.
- Tighten Node BetterleaksResult interface: git-history-only fields
  (Commit/Author/Email/Date/Message/Fingerprint) are absent on `dir` mode
  output, so mark them optional rather than lying about the contract.
- Python User-Agent now uses rafter-cli __version__ instead of accidentally
  reusing BETTERLEAKS_VERSION.

Tests for the legacy alias surface (no coverage before this — would have
silently regressed on the next refactor):
- node: --engine gitleaks (scan), --with-gitleaks (init), legacy gitleaks
  detection in agent status with the upgrade hint.
- python: --engine gitleaks (secrets), --with-gitleaks (agent init).
- python: tag-based severity (critical/high/medium) parametrized.
- python: new test_binary_manager.py covers --version validation
  (URL-injection guards), pinned-hash table completeness for the bundled
  BETTERLEAKS_VERSION, and non-https refusal in _download_file.
- node: matching --version validation tests in binary-manager.test.ts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The migration kept --with-gitleaks, --engine gitleaks, and the
update-gitleaks subcommand as silent aliases. Per product direction,
move off gitleaks entirely.

Removed (now hard-error):
- `rafter agent init --with-gitleaks`            → "unknown option"
- `rafter agent scan --engine gitleaks`          → "Invalid engine"
- `rafter agent baseline create --engine gitleaks` → same
- `rafter agent update-gitleaks` (hidden alias)  → "unknown command"
- `gitleaks` value in mcp `scan_secrets` enum    → schema rejection

Kept (read-only legacy detection):
- `rafter agent verify` and `rafter agent status` continue to probe for
  ~/.rafter/bin/gitleaks and gitleaks on PATH so users with leftover
  installs see "legacy gitleaks at X; run: rafter agent update-betterleaks"
  instead of a confusing "not found". This is the soft-degrade that
  prevents a verify hard-fail regression on upgrade.

Docs scrubbed of "legacy alias accepted" copy across README, CLAUDE.md,
llms.txt, shared-docs/CLI_SPEC.md, recipes/gemini-cli.md, and the
bundled python skill cli-reference. Tests flipped from "alias is
accepted" to "alias is rejected".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit run by 2 codex agents and 2 claude agents on the
betterleaks-migration branch (post-alias-removal). Fixes:

Bundled Node skill resources (P0, both reviewers flagged):
The npm-bundled docs at node/resources/** still said "Gitleaks" and
documented the removed `update-gitleaks` subcommand and `--with-gitleaks`
flag — a clean install would land docs that recommend commands the CLI
now hard-rejects. Mirrored the rename across:
- node/resources/skills/rafter/SKILL.md
- node/resources/skills/rafter/docs/cli-reference.md
- node/resources/agents/rafter.md
- node/resources/rafter-security-skill.md
- node/.claude/skills/rafter/docs/cli-reference.md (dev-side mirror)
- fixtures/vulnerable-repo/README.md (`--engine gitleaks` example)

Python silent-failure bug (P1, codex-functional flagged):
`_run_scan()` ignored the betterleaks subprocess return code, so any
non-zero exit other than 1 (panic, OOM, malformed args) returned []
and the scan looked clean. Mirrored Node's contract: 0=clean, 1=findings,
anything else => RuntimeError with stderr tail. Catches OSError on the
subprocess call too.

Windows .exe extension parity (P2, codex-functional flagged):
Several legacy-detection / status / init paths hardcoded the binary
name without `.exe`, so a managed `~/.rafter/bin/betterleaks.exe` (or
leftover `gitleaks.exe`) was missed on Windows.
- python/rafter_cli/commands/agent.py: agent init, _check_betterleaks,
  status output all now select binary name by sys.platform.
- node/src/commands/agent/status.ts: same — derive `.exe` once and reuse.

Test cleanup (P2, codex-functional flagged):
node/tests/mcp-server.test.ts handler mock kept the
`engineRaw === "gitleaks" ? "betterleaks" : engineRaw` normalization that
production no longer has — removed so the test reflects current behavior.

Copy polish (both reviewers flagged):
- recipes/pre-commit.md sentence "21+ patterns via Betterleaks" wrongly
  attributed all 21 patterns to betterleaks. Now: "21+ built-in
  credential patterns plus optional Betterleaks integration".
- shared-docs/SHOW_HN_DRAFT.md, drafts/show-hn/post.md, drafts/show-hn/faq.md
  marketing drafts updated to mention Betterleaks (kept gitleaks references
  where they're historical comparisons or the original FAQ question).

CHANGELOG entry:
Added an [Unreleased] entry for the full migration: scanner change, breaking
removal of legacy aliases, soft-degrade detection, supply-chain hardening,
and the alias-removal-as-breaking note.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Security hardening (Claude security reviewer flagged 3 real issues; Codex
security review confirmed the layered hardening is otherwise solid):

- Node download: add MAX_REDIRECTS=10 (was unbounded recursion on 30x
  loops), MAX_BYTES=200MB body cap with Content-Length precheck + per-chunk
  enforcement (was a 50GB-mirror DoS), and REQUEST_TIMEOUT_MS=60s socket
  timeout (was slow-loris hang). All in node/src/utils/binary-manager.ts
  downloadFile.
- Python download: matching MAX_BYTES=200MB body cap. Python already had
  urllib's internal redirect cap and timeout=60.

Simplicity (Claude simplicity reviewer):

- Extract findLegacyGitleaks() onto BinaryManager (Node) and
  find_legacy_gitleaks() onto BinaryManager (Python). Was duplicated 3x
  per runtime (verify + status + agent init); now one canonical
  implementation each side, used everywhere. Drops the duplicated
  Windows .exe extension handling and the homedir traversal.
- Inline _update_betterleaks_impl() back into update_betterleaks() —
  the wrapper existed for the (now-removed) update-gitleaks alias and
  was only called once.

Functional correctness (Codex functional reviewer ran the full test
suite, blew context before completing the manual matrix):

- Confirmed: 73/73 node + 164/164 python tests pass.
- Confirmed live: aliases hard-error correctly (--with-gitleaks,
  --engine gitleaks, update-gitleaks), legacy detection emits the
  upgrade hint identically across both runtimes, real download still
  works end-to-end after the hardening additions.

Items deferred (low value or trade-off):
- Re-verify on-disk binary hash before each scan (TOFU drift) — local
  malware threat outside our model.
- Race on parallel agent init writing to same archive path — local DoS
  only, no security boundary.
- Switch Node verify_betterleaks_verbose from execAsync (shell quoted)
  to execFile (argv) — fragile but trusted inputs only.
- Drop post-extract lstat from Python tarball extract — kept for
  parity with Node where node-tar's filter typing is loose.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Raftersecurity Raftersecurity merged commit cbfe354 into main May 9, 2026
3 of 4 checks passed
Rome-1 added a commit that referenced this pull request May 10, 2026
Minor bump (not patch) because the betterleaks migration removed
user-facing CLI surface — `--with-gitleaks`, `--engine gitleaks`, and
`rafter agent update-gitleaks` now hard-error. On a 0.x line, removing
documented CLI flags is the textbook MINOR trigger; burying it in a
patch would mislead anyone scripting against those flags.

Bumps:
- node/package.json: 0.7.9 → 0.8.0
- python/pyproject.toml: 0.7.9 → 0.8.0
- node/resources/rafter-security-skill.md: 0.7.9 → 0.8.0  (ClawHub publish)
- python/rafter_cli/resources/rafter-security-skill.md: 0.7.9 → 0.8.0
- recipes/openclaw.md example frontmatter: 0.7.9 → 0.8.0

CHANGELOG cleanup:
- Move betterleaks bullets from [0.7.9] (where the PR #93 octopus merge
  parked them under main's `### Changed` heading) into the new [0.8.0]
  section. The published v0.7.9 (dc81574) does not contain betterleaks
  code; npm/PyPI 0.7.9 still ships gitleaks.
- New [0.8.0] groups all unreleased bullets (rf-hrtd dry-run, ClawHub
  auto-publish) and adds three Fixed entries: rf-cfjc (action.yml jq
  parser, fixed in 941879f), #96 (ClawHub owner handle), rf-z6sv (#97,
  pre-commit rev pin bump).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rome-1 added a commit that referenced this pull request May 10, 2026
Minor bump (not patch) because the betterleaks migration removed
user-facing CLI surface — `--with-gitleaks`, `--engine gitleaks`, and
`rafter agent update-gitleaks` now hard-error. On a 0.x line, removing
documented CLI flags is the textbook MINOR trigger; burying it in a
patch would mislead anyone scripting against those flags.

Bumps:
- node/package.json: 0.7.9 → 0.8.0
- python/pyproject.toml: 0.7.9 → 0.8.0
- node/resources/rafter-security-skill.md: 0.7.9 → 0.8.0  (ClawHub publish)
- python/rafter_cli/resources/rafter-security-skill.md: 0.7.9 → 0.8.0
- recipes/openclaw.md example frontmatter: 0.7.9 → 0.8.0

CHANGELOG cleanup:
- Move betterleaks bullets from [0.7.9] (where the PR #93 octopus merge
  parked them under main's `### Changed` heading) into the new [0.8.0]
  section. The published v0.7.9 (dc81574) does not contain betterleaks
  code; npm/PyPI 0.7.9 still ships gitleaks.
- New [0.8.0] groups all unreleased bullets (rf-hrtd dry-run, ClawHub
  auto-publish) and adds three Fixed entries: rf-cfjc (action.yml jq
  parser, fixed in 941879f), #96 (ClawHub owner handle), rf-z6sv (#97,
  pre-commit rev pin bump).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants