Correctness fixes + offline catalog (add/search) + pre-commit support (0.3.0) by DataDave-Dev · Pull Request #38 · DataDave-Dev/becwright

DataDave-Dev · 2026-07-01T04:17:48Z

Summary

Hardens the enforcement path so becwright keeps its "guaranteed, not probabilistic" promise, and lowers adoption friction with an offline catalog and pre-commit/Husky support. Bumps to 0.3.0.

Correctness (P0)

Check staged content, not the working tree. The pre-commit path listed staged file names but opened the working-tree copy, so an unstaged edit could flag a violation that isn't being committed, or hide a staged one. Now the staged blobs are materialized into a temp mirror and checked there. --all still inspects the working tree on purpose.
Per-check timeout. A hung/runaway check (bundles can carry third-party code) no longer freezes the commit. 30s cap, override via BECWRIGHT_CHECK_TIMEOUT (0 disables).
Validated rules.yaml. A misspelled severity silently downgraded a rule to a warning; malformed YAML dumped a raw traceback. Both now raise a readable RulesError and exit 2.

Adoption (P1)

Offline catalog: becwright add <name> and becwright search [query]. The catalog now ships inside the package (src/becwright/becs/, via package-data), so installing a ready-made BEC needs no URL and works offline. Shares the trust-gated install flow with import.
pre-commit framework + Husky. New .pre-commit-hooks.yaml and README setup for both.

Polish (P2)

Memoize compiled glob patterns (check --all perf).
CI now also tests Python 3.14.

Test plan

pytest — 152 passing (new: staged-content both directions, timeout, rules validation, catalog/add/search).
Wheel inspected: catalog .bec.yaml files are included.
Dogfooding (becwright check --all) green; the pre-commit hook validated each commit using staged content.

Summary by CodeRabbit

New Features
- Added built-in catalog browsing and installation commands for ready-to-use rules.
- Offline rule installation is now supported from the packaged catalog.
- CI now runs on Python 3.14.
Bug Fixes
- Rule checks now use staged content for pre-commit-style runs.
- Invalid rule definitions are reported more clearly, including malformed YAML and unsupported severities.
- Long-running checks now time out cleanly instead of hanging.
Documentation
- Updated usage guides and command references to match the new catalog workflow.

coderabbitai · 2026-07-01T04:18:07Z

Warning

Review limit reached

@DataDave-Dev, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 48 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 46aa4f6b-00c6-4d41-9b2e-8a7c7ed838af

📥 Commits

Reviewing files that changed from the base of the PR and between edf81c9 and 09c5fa7.

📒 Files selected for processing (42)

.github/workflows/ci.yml
.pre-commit-hooks.yaml
CLAUDE.md
README.es.md
README.md
becs/README.md
documentation/portability.es.md
documentation/portability.md
documentation/usage.es.md
documentation/usage.md
integrations/claude-code/skills/becwright/SKILL.md
npm/becwright/package.json
npm/platforms/darwin-arm64/package.json
npm/platforms/darwin-x64/package.json
npm/platforms/linux-arm64/package.json
npm/platforms/linux-x64/package.json
npm/platforms/win32-x64/package.json
pyproject.toml
src/becwright/__init__.py
src/becwright/becs/README.md
src/becwright/becs/no-console-log-js.bec.yaml
src/becwright/becs/no-dangerous-eval.bec.yaml
src/becwright/becs/no-debug-go.bec.yaml
src/becwright/becs/no-debug-remnants.bec.yaml
src/becwright/becs/no-debug-rust.bec.yaml
src/becwright/becs/no-debugger-js.bec.yaml
src/becwright/becs/no-hardcoded-secrets.bec.yaml
src/becwright/becs/no-token-in-logs.bec.yaml
src/becwright/becs/no-wildcard-imports.bec.yaml
src/becwright/catalog.py
src/becwright/cli.py
src/becwright/engine.py
src/becwright/git.py
src/becwright/report.py
src/becwright/rules.py
tests/test_bundle.py
tests/test_catalog.py
tests/test_cli_and_git.py
tests/test_engine_integration.py
tests/test_list.py
tests/test_rules.py
tests/test_staged_content.py

📝 Walkthrough

Walkthrough

This PR adds an offline, package-bundled BEC catalog with new add/search CLI commands, hardens rule-file validation with a RulesError exception, introduces a check timeout and glob caching in the engine, adds staged-worktree evaluation for git checks, bumps package versions to 0.3.0, and updates documentation/CI accordingly.

Changes

Catalog, rules hardening, engine/staging, and packaging

Layer / File(s)	Summary
Catalog module and packaging `src/becwright/catalog.py`, `src/becwright/becs/README.md`, `pyproject.toml`, `tests/test_bundle.py`, `tests/test_catalog.py`	Adds `CatalogError`, `catalog_names()`, and `read_bec()` to enumerate/read bundled `.bec.yaml` files, ships catalog docs, packages catalog data in the wheel, and updates/adds tests.
CLI add/search commands and bundle install refactor `src/becwright/cli.py`, `tests/test_catalog.py`, `tests/test_list.py`	Adds `catalog` import, refactors install logic into `_install_bundle()`, adds `_cmd_add`/`_cmd_search` and parser wiring, updates list/rules-header text, and tests these commands.
Rules validation hardening `src/becwright/rules.py`, `tests/test_rules.py`, `tests/test_cli_and_git.py`	Adds `RulesError` and `_VALID_SEVERITIES`, validates rule shape/required fields/severity in `_to_rule`, wraps YAML parsing errors in `load_rules`, and adds failure-path tests including a CLI-level invalid severity test.
Check timeout and glob caching in engine `src/becwright/engine.py`, `tests/test_engine_integration.py`	Adds `BECWRIGHT_CHECK_TIMEOUT`-driven timeout handling around subprocess calls, caches compiled glob regexes via `lru_cache`, and adds a timeout integration test.
Staged-content evaluation for checks `src/becwright/git.py`, `src/becwright/report.py`, `tests/test_staged_content.py`	Adds `_staged_blob`/`staged_worktree` to materialize staged file content into a temp directory, updates `gather()` to evaluate staged content unless `all_files` is set, and adds staged-vs-working-tree tests.
Version bumps across packages `pyproject.toml`, `src/becwright/__init__.py`, `npm/becwright/package.json`, `npm/platforms/*/package.json`	Bumps version from 0.2.2 to 0.3.0 across Python and npm package manifests.
Documentation and CI updates for catalog workflow `README.md`, `README.es.md`, `CLAUDE.md`, `documentation/usage.md`, `documentation/portability.md`, `integrations/claude-code/skills/becwright/SKILL.md`, `.pre-commit-hooks.yaml`, `.github/workflows/ci.yml`	Updates docs to describe `search`/`add` catalog workflow, retargets catalog directory links, adds a pre-commit hook definition, and expands CI Python matrix to include 3.14.

Estimated code review effort: 3 (Moderate) | ~30 minutes

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI
  participant catalog
  participant bundle
  participant report
  participant git
  participant engine

  User->>CLI: becwright search term
  CLI->>catalog: catalog_names()
  CLI->>bundle: parse_bundle(read_bec(name))
  CLI-->>User: matching BECs

  User->>CLI: becwright add name
  CLI->>catalog: read_bec(name)
  CLI->>bundle: _install_bundle(root, text)
  CLI-->>User: install result

  User->>CLI: becwright check
  CLI->>report: gather(rules, files, root, all_files)
  report->>git: staged_worktree(root, files)
  git-->>report: temp staged dir
  report->>engine: evaluate(rules, files, staged_root)
  engine-->>report: results (pass/fail/timeout)
  report-->>CLI: summary
  CLI-->>User: exit code + output

Possibly related PRs

DataDave-Dev/becwright#6: Both PRs extend CLI subcommand wiring in src/becwright/cli.py (adding init versus adding add/search).
DataDave-Dev/becwright#12: Both PRs touch the list subcommand in src/becwright/cli.py, one implementing it and this one changing its output text.
DataDave-Dev/becwright#2: Both PRs update README install/usage documentation reflecting distribution changes.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description has useful detail, but it does not follow the required What, Why, and Checklist template headings.	Rewrite the description using the template's What, Why, and Checklist sections, and include the required test and docs notes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: correctness fixes, offline catalog/search, pre-commit support, and the 0.3.0 release.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/staged-content-and-robustness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

The pre-commit path listed staged file names but the checks opened the working-tree copy, so the result could diverge from what the commit records: an unstaged edit could flag a violation that is not being committed, or hide a staged one behind a clean working tree. Materialize the staged blobs (git show :0:<file>) into a temp mirror, copy the working-tree .bec/ tooling alongside so custom checks resolve, and run the checks there. `--all` still inspects the working tree on purpose.

A check runs on every commit and may carry imported third-party code (from a bundle). Without a timeout a hung or runaway check freezes the commit forever. Add a 30s cap, overridable via BECWRIGHT_CHECK_TIMEOUT (0 disables it for slow whole-repo runs), and treat a timeout as a failed check with a clear message.

A misspelled severity ("blockign", "bloqueante") silently downgraded a rule to a warning, quietly disabling enforcement; a malformed rules.yaml surfaced a raw PyYAML traceback in the middle of the user's commit. Raise a readable RulesError for invalid YAML, a non-list `rules:`, a missing required field, or an invalid severity. The CLI prints it and exits 2.

Importing a ready-made BEC required knowing and typing a raw GitHub URL, and the catalog was not shipped with the package, so it could not work offline. Bundle the catalog inside the wheel (src/becwright/becs/, packaged via package-data) and read it through a new catalog module. Add two commands: `becwright search [query]` lists the catalog with each BEC's intent, and `becwright add <name>` installs one with the same trust-gated flow as `import` (shared _install_bundle helper). No network, no URL. Docs, the init header, `list`, and the repo layout are updated to point at the new commands and location.

Teams that already manage git hooks could not plug becwright in without letting it own the pre-commit hook. Ship a .pre-commit-hooks.yaml so becwright can be added as a pre-commit repo (it determines the staged files itself, so pass_filenames is false and always_run is true), and document both the pre-commit and Husky setups in the README.

matches() rebuilt and recompiled each path glob on every (file, pattern) pair, which adds up on `check --all` over a large repo. Memoize the compiled regex.

New since 0.2.2: staged-content checking, per-check timeout, validated rules.yaml, the offline catalog with `add`/`search`, and pre-commit/Husky support. Bump the version across pyproject, the npm launcher and platform packages, and pin the README pre-commit examples to v0.3.0 (the first tag that ships .pre-commit-hooks.yaml).

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

tests/test_staged_content.py (1)

1-30: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Duplicate test helpers across files.

_git, _init_repo, and _rules_yaml re-implement helpers that already exist in tests/test_cli_and_git.py. Consider extracting these into a shared conftest.py fixture/helper module to avoid drift between the two copies.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_staged_content.py` around lines 1 - 30, The staged-content tests
duplicate the same git/setup helpers already defined in
tests/test_cli_and_git.py, so please remove the copy in this test module and
reuse shared helpers instead. Move or import _git, _init_repo, and _rules_yaml
from a common test support location such as conftest.py or a shared helper
module, then update _setup to call those shared symbols so both test files stay
in sync.

src/becwright/git.py (1)

63-84: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Consider batching staged-file materialization instead of one git show per file.

staged_worktree spawns one git show :0:<path> subprocess per staged file. For large commits this means N process spawns just to build the temp worktree. git checkout-index --prefix=<dir>/ -z --stdin can materialize all staged files into a directory with a single git invocation, reading NUL-separated paths from stdin.
♻️ Illustrative sketch (needs adaptation for `.bec/` copy + error handling)
def staged_worktree(root: Path, files: list[str]) -> Iterator[Path]:
    tmp = Path(tempfile.mkdtemp(prefix="becwright-"))
    try:
        subprocess.run(
            ["git", "checkout-index", f"--prefix={tmp}/", "-z", "--stdin"],
            cwd=root, input="\0".join(files).encode(), check=False,
            timeout=_check_timeout(),
        )
        bec = root / ".bec"
        if bec.is_dir():
            shutil.copytree(bec, tmp / ".bec", dirs_exist_ok=True)
        yield tmp
    finally:
        shutil.rmtree(tmp, ignore_errors=True)
Please confirm git checkout-index --stdin behaves as expected for the repo's supported git version range before adopting.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/becwright/git.py` around lines 63 - 84, The staged_worktree helper is
currently materializing each file with a separate _staged_blob/git show call,
which is inefficient for large file sets. Update staged_worktree to batch
staged-file checkout using a single git checkout-index invocation with --prefix
and --stdin, while preserving the existing .bec copy behavior and temp-dir
cleanup in the try/finally block. Verify the supported Git version range for
checkout-index --stdin and keep the function’s path handling compatible with the
current root/files workflow.

src/becwright/engine.py (1)

85-96: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Redundant re-parsing of _check_timeout().

Called once for timeout= and again for the timeout message, re-reading/re-parsing the env var twice per rule for no benefit.

♻️ Suggested fix

+        timeout = _check_timeout()
         try:
             proc = subprocess.run(
                 rule.check, shell=True, cwd=root,
                 input="\n".join(relevant), capture_output=True, text=True,
-                timeout=_check_timeout(),
+                timeout=timeout,
             )
         except subprocess.TimeoutExpired:
             results.append(RuleResult(
                 rule=rule, passed=False,
-                output=f"check timed out after {_check_timeout():g}s (its command hung)",
+                output=f"check timed out after {timeout:g}s (its command hung)",
             ))
             continue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/becwright/engine.py` around lines 85 - 96, The timeout handling in the
rule execution path re-calls _check_timeout() twice, causing unnecessary
re-reading/parsing of the env var. In the subprocess.run block inside the
rule-checking logic, compute the timeout once in the surrounding scope and reuse
that same value for both the timeout= argument and the TimeoutExpired message;
keep the change localized to the rule execution flow and the RuleResult append
path.

src/becwright/catalog.py (1)

26-33: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low value

Consider sanitizing name against path traversal.

read_bec joins user-supplied name directly into the resource path without checking for /, .., or absolute path components. Since name originates from the local becwright add <name> / becwright search CLI, the practical exploitability is minimal (the user only affects their own filesystem), but a stray ../ could resolve outside the becs/ directory.
🛡️ Optional hardening
 def read_bec(name: str) -> str:
+    if "/" in name or "\\" in name or name in ("..", "."):
+        raise CatalogError(f"Invalid BEC name: {name!r}")
     entry = _catalog_dir().joinpath(name + _SUFFIX)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/becwright/catalog.py` around lines 26 - 33, `read_bec` currently joins
the user-controlled `name` directly into the catalog path, so harden it against
path traversal before resolving the file. Update `read_bec` to validate or
normalize `name` so it cannot contain `/`, `..`, or absolute path components,
and ensure the final resolved path stays under `_catalog_dir()`. Keep the
existing `CatalogError` flow in `read_bec` for invalid or missing entries.

src/becwright/rules.py (1)

54-63: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Chain the original YAML error for a proper traceback.

Ruff (B904) flags line 60: re-raising inside an except block without from e/from None discards the original traceback context.

🔧 Proposed fix

     try:
         data = yaml.safe_load(rules_path.read_text(encoding="utf-8")) or {}
     except yaml.YAMLError as e:
-        raise RulesError(f"{rules_path}: invalid YAML ({e}).")
+        raise RulesError(f"{rules_path}: invalid YAML ({e}).") from e

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/becwright/rules.py` around lines 54 - 63, The YAML parse failure handling
in load_rules currently raises RulesError without preserving the original
traceback. Update the except yaml.YAMLError block to chain the caught exception
when raising RulesError so the original YAML error context is retained; use the
existing load_rules function and RulesError symbol to locate the change.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/becwright/git.py`:
- Around line 53-60: The staged file read in _staged_blob currently runs git
show without any timeout, so a single slow or stuck subprocess can still hang
the commit path. Update the subprocess.run call in _staged_blob to use the same
BECWRIGHT_CHECK_TIMEOUT protection as the rest of the check flow, and handle the
timeout case by returning None or otherwise failing safely so the commit does
not freeze.

In `@src/becwright/rules.py`:
- Around line 32-51: Require the rule parser in the Rule creation flow to reject
missing or empty paths instead of defaulting to an empty tuple. In the parsing
logic that builds the Rule object, validate the raw mapping for a non-empty
paths field alongside id and check, and raise RulesError when paths is absent or
empty so evaluate() never receives a rule that matches nothing. Use the existing
parse/validation function that returns Rule to make this change.

---

Nitpick comments:
In `@src/becwright/catalog.py`:
- Around line 26-33: `read_bec` currently joins the user-controlled `name`
directly into the catalog path, so harden it against path traversal before
resolving the file. Update `read_bec` to validate or normalize `name` so it
cannot contain `/`, `..`, or absolute path components, and ensure the final
resolved path stays under `_catalog_dir()`. Keep the existing `CatalogError`
flow in `read_bec` for invalid or missing entries.

In `@src/becwright/engine.py`:
- Around line 85-96: The timeout handling in the rule execution path re-calls
_check_timeout() twice, causing unnecessary re-reading/parsing of the env var.
In the subprocess.run block inside the rule-checking logic, compute the timeout
once in the surrounding scope and reuse that same value for both the timeout=
argument and the TimeoutExpired message; keep the change localized to the rule
execution flow and the RuleResult append path.

In `@src/becwright/git.py`:
- Around line 63-84: The staged_worktree helper is currently materializing each
file with a separate _staged_blob/git show call, which is inefficient for large
file sets. Update staged_worktree to batch staged-file checkout using a single
git checkout-index invocation with --prefix and --stdin, while preserving the
existing .bec copy behavior and temp-dir cleanup in the try/finally block.
Verify the supported Git version range for checkout-index --stdin and keep the
function’s path handling compatible with the current root/files workflow.

In `@src/becwright/rules.py`:
- Around line 54-63: The YAML parse failure handling in load_rules currently
raises RulesError without preserving the original traceback. Update the except
yaml.YAMLError block to chain the caught exception when raising RulesError so
the original YAML error context is retained; use the existing load_rules
function and RulesError symbol to locate the change.

In `@tests/test_staged_content.py`:
- Around line 1-30: The staged-content tests duplicate the same git/setup
helpers already defined in tests/test_cli_and_git.py, so please remove the copy
in this test module and reuse shared helpers instead. Move or import _git,
_init_repo, and _rules_yaml from a common test support location such as
conftest.py or a shared helper module, then update _setup to call those shared
symbols so both test files stay in sync.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 67e00cea-b541-42c6-8a98-55eadf27ccb8

📥 Commits

Reviewing files that changed from the base of the PR and between 0b13502 and edf81c9.

📒 Files selected for processing (42)

.github/workflows/ci.yml
.pre-commit-hooks.yaml
CLAUDE.md
README.es.md
README.md
becs/README.md
documentation/portability.es.md
documentation/portability.md
documentation/usage.es.md
documentation/usage.md
integrations/claude-code/skills/becwright/SKILL.md
npm/becwright/package.json
npm/platforms/darwin-arm64/package.json
npm/platforms/darwin-x64/package.json
npm/platforms/linux-arm64/package.json
npm/platforms/linux-x64/package.json
npm/platforms/win32-x64/package.json
pyproject.toml
src/becwright/__init__.py
src/becwright/becs/README.md
src/becwright/becs/no-console-log-js.bec.yaml
src/becwright/becs/no-dangerous-eval.bec.yaml
src/becwright/becs/no-debug-go.bec.yaml
src/becwright/becs/no-debug-remnants.bec.yaml
src/becwright/becs/no-debug-rust.bec.yaml
src/becwright/becs/no-debugger-js.bec.yaml
src/becwright/becs/no-hardcoded-secrets.bec.yaml
src/becwright/becs/no-token-in-logs.bec.yaml
src/becwright/becs/no-wildcard-imports.bec.yaml
src/becwright/catalog.py
src/becwright/cli.py
src/becwright/engine.py
src/becwright/git.py
src/becwright/report.py
src/becwright/rules.py
tests/test_bundle.py
tests/test_catalog.py
tests/test_cli_and_git.py
tests/test_engine_integration.py
tests/test_list.py
tests/test_rules.py
tests/test_staged_content.py

💤 Files with no reviewable changes (1)

becs/README.md

coderabbitai · 2026-07-01T04:29:44Z

+def _staged_blob(root: Path, path: str) -> bytes | None:
+    # `:0:<path>` is the staged (index) version of the file, which is exactly
+    # what the commit will record — not the working-tree copy that may differ.
+    res = subprocess.run(
+        ["git", "show", f":0:{path}"],
+        cwd=root, capture_output=True,
+    )
+    return res.stdout if res.returncode == 0 else None


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

No timeout on the per-file git show subprocess call.

This PR adds BECWRIGHT_CHECK_TIMEOUT specifically to stop a hung check from freezing a commit, but _staged_blob's git show call (invoked once per staged file) has no timeout at all. A stuck git process (lock contention, slow clean/smudge filters, network-mounted repo) here would hang the same commit path this feature is meant to protect.

🔒 Suggested fix

def _staged_blob(root: Path, path: str) -> bytes | None: # `:0:<path>` is the staged (index) version of the file, which is exactly # what the commit will record — not the working-tree copy that may differ. res = subprocess.run( ["git", "show", f":0:{path}"], - cwd=root, capture_output=True, + cwd=root, capture_output=True, timeout=_check_timeout(), ) return res.stdout if res.returncode == 0 else None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _staged_blob(root: Path, path: str) -> bytes | None:

# `:0:<path>` is the staged (index) version of the file, which is exactly

# what the commit will record — not the working-tree copy that may differ.

res = subprocess.run(

["git", "show", f":0:{path}"],

cwd=root, capture_output=True,

)

return res.stdout if res.returncode == 0 else None

def _staged_blob(root: Path, path: str) -> bytes | None:

# `:0:<path>` is the staged (index) version of the file, which is exactly

# what the commit will record — not the working-tree copy that may differ.

res = subprocess.run(

["git", "show", f":0:{path}"],

cwd=root, capture_output=True, timeout=_check_timeout(),

)

return res.stdout if res.returncode == 0 else None

🧰 Tools

🪛 ast-grep (0.44.0)

[error] 55-58: Command coming from incoming request
Context: subprocess.run(
["git", "show", f":0:{path}"],
cwd=root, capture_output=True,
)
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

(subprocess-from-request)

🪛 Ruff (0.15.20)

[error] 56-56: subprocess call: check for execution of untrusted input

(S603)

[error] 57-57: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/git.py` around lines 53 - 60, The staged file read in _staged_blob currently runs git show without any timeout, so a single slow or stuck subprocess can still hang the commit path. Update the subprocess.run call in _staged_blob to use the same BECWRIGHT_CHECK_TIMEOUT protection as the rest of the check flow, and handle the timeout case by returning None or otherwise failing safely so the commit does not freeze.

coderabbitai · 2026-07-01T04:29:44Z

+    if not isinstance(raw, dict):
+        raise RulesError(f"each rule must be a mapping, got {type(raw).__name__}.")
+    for field in ("id", "check"):
+        if not raw.get(field):
+            raise RulesError(f"a rule is missing the required field '{field}'.")
+    severity = raw.get("severity", "blocking")
+    if severity not in _VALID_SEVERITIES:
+        raise RulesError(
+            f"rule '{raw['id']}': invalid severity {severity!r} "
+            f"(use one of: {', '.join(_VALID_SEVERITIES)})."
+        )
    return Rule(
        id=raw["id"],
        paths=tuple(raw.get("paths", [])),
        check=raw["check"],
        intent=(raw.get("intent") or "").strip(),
        why_it_matters=(raw.get("why_it_matters") or "").strip(),
        rejected_alternatives=tuple(raw.get("rejected_alternatives", [])),
-        severity=raw.get("severity", "blocking"),
+        severity=severity,
    )


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm how empty/missing `paths` is handled during matching in engine.py rg -n "paths" src/becwright/engine.py -C3

Repository: DataDave-Dev/becwright

Length of output: 435

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the matching logic and the Rule model to determine what empty paths do. ast-grep outline src/becwright/engine.py --view expanded || true printf '\n--- engine.py relevant lines ---\n' rg -n "def matches|matches\(|paths" src/becwright/engine.py -C4 printf '\n--- rules.py model lines ---\n' rg -n "class Rule|paths:" src/becwright/rules.py -C4

Repository: DataDave-Dev/becwright

Length of output: 1895

Require paths during rule parsing. matches() returns False for an empty tuple, so a rule without paths is accepted as paths=() and then skipped by evaluate() before check ever runs. That turns a bad config into a no-op instead of a RulesError.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/rules.py` around lines 32 - 51, Require the rule parser in the Rule creation flow to reject missing or empty paths instead of defaulting to an empty tuple. In the parsing logic that builds the Rule object, validate the raw mapping for a non-empty paths field alongside id and check, and raise RulesError when paths is absent or empty so evaluate() never receives a rule that matches nothing. Use the existing parse/validation function that returns Rule to make this change.

Source: Learnings

DataDave-Dev added 8 commits June 30, 2026 22:22

perf(engine): cache compiled glob patterns

15fb7fc

matches() rebuilt and recompiled each path glob on every (file, pattern) pair, which adds up on `check --all` over a large repo. Memoize the compiled regex.

ci: test on Python 3.14

ee7f68e

DataDave-Dev force-pushed the fix/staged-content-and-robustness branch from edf81c9 to 09c5fa7 Compare July 1, 2026 04:23

DataDave-Dev merged commit 732b97d into main Jul 1, 2026
3 of 4 checks passed

DataDave-Dev deleted the fix/staged-content-and-robustness branch July 1, 2026 04:25

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness fixes + offline catalog (add/search) + pre-commit support (0.3.0)#38

Correctness fixes + offline catalog (add/search) + pre-commit support (0.3.0)#38
DataDave-Dev merged 8 commits into
mainfrom
fix/staged-content-and-robustness

DataDave-Dev commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DataDave-Dev commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Correctness (P0)

Adoption (P1)

Polish (P2)

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (2 warnings)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DataDave-Dev commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading