Correctness fixes + offline catalog (add/search) + pre-commit support (0.3.0)#38
Conversation
|
Warning Review limit reached
Next review available in: 48 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (42)
📝 WalkthroughWalkthroughThis PR adds an offline, package-bundled BEC catalog with new ChangesCatalog, rules hardening, engine/staging, and packaging
Estimated code review effort: 3 (Moderate) | ~30 minutes Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant catalog
participant bundle
participant report
participant git
participant engine
User->>CLI: becwright search term
CLI->>catalog: catalog_names()
CLI->>bundle: parse_bundle(read_bec(name))
CLI-->>User: matching BECs
User->>CLI: becwright add name
CLI->>catalog: read_bec(name)
CLI->>bundle: _install_bundle(root, text)
CLI-->>User: install result
User->>CLI: becwright check
CLI->>report: gather(rules, files, root, all_files)
report->>git: staged_worktree(root, files)
git-->>report: temp staged dir
report->>engine: evaluate(rules, files, staged_root)
engine-->>report: results (pass/fail/timeout)
report-->>CLI: summary
CLI-->>User: exit code + output
Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The pre-commit path listed staged file names but the checks opened the working-tree copy, so the result could diverge from what the commit records: an unstaged edit could flag a violation that is not being committed, or hide a staged one behind a clean working tree. Materialize the staged blobs (git show :0:<file>) into a temp mirror, copy the working-tree .bec/ tooling alongside so custom checks resolve, and run the checks there. `--all` still inspects the working tree on purpose.
A check runs on every commit and may carry imported third-party code (from a bundle). Without a timeout a hung or runaway check freezes the commit forever. Add a 30s cap, overridable via BECWRIGHT_CHECK_TIMEOUT (0 disables it for slow whole-repo runs), and treat a timeout as a failed check with a clear message.
A misspelled severity ("blockign", "bloqueante") silently downgraded a rule
to a warning, quietly disabling enforcement; a malformed rules.yaml surfaced a
raw PyYAML traceback in the middle of the user's commit.
Raise a readable RulesError for invalid YAML, a non-list `rules:`, a missing
required field, or an invalid severity. The CLI prints it and exits 2.
Importing a ready-made BEC required knowing and typing a raw GitHub URL, and the catalog was not shipped with the package, so it could not work offline. Bundle the catalog inside the wheel (src/becwright/becs/, packaged via package-data) and read it through a new catalog module. Add two commands: `becwright search [query]` lists the catalog with each BEC's intent, and `becwright add <name>` installs one with the same trust-gated flow as `import` (shared _install_bundle helper). No network, no URL. Docs, the init header, `list`, and the repo layout are updated to point at the new commands and location.
Teams that already manage git hooks could not plug becwright in without letting it own the pre-commit hook. Ship a .pre-commit-hooks.yaml so becwright can be added as a pre-commit repo (it determines the staged files itself, so pass_filenames is false and always_run is true), and document both the pre-commit and Husky setups in the README.
matches() rebuilt and recompiled each path glob on every (file, pattern) pair, which adds up on `check --all` over a large repo. Memoize the compiled regex.
New since 0.2.2: staged-content checking, per-check timeout, validated rules.yaml, the offline catalog with `add`/`search`, and pre-commit/Husky support. Bump the version across pyproject, the npm launcher and platform packages, and pin the README pre-commit examples to v0.3.0 (the first tag that ships .pre-commit-hooks.yaml).
edf81c9 to
09c5fa7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
tests/test_staged_content.py (1)
1-30: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDuplicate test helpers across files.
_git,_init_repo, and_rules_yamlre-implement helpers that already exist intests/test_cli_and_git.py. Consider extracting these into a sharedconftest.pyfixture/helper module to avoid drift between the two copies.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_staged_content.py` around lines 1 - 30, The staged-content tests duplicate the same git/setup helpers already defined in tests/test_cli_and_git.py, so please remove the copy in this test module and reuse shared helpers instead. Move or import _git, _init_repo, and _rules_yaml from a common test support location such as conftest.py or a shared helper module, then update _setup to call those shared symbols so both test files stay in sync.src/becwright/git.py (1)
63-84: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy liftConsider batching staged-file materialization instead of one
git showper file.
staged_worktreespawns onegit show :0:<path>subprocess per staged file. For large commits this means N process spawns just to build the temp worktree.git checkout-index --prefix=<dir>/ -z --stdincan materialize all staged files into a directory with a single git invocation, reading NUL-separated paths from stdin.♻️ Illustrative sketch (needs adaptation for `.bec/` copy + error handling)
def staged_worktree(root: Path, files: list[str]) -> Iterator[Path]: tmp = Path(tempfile.mkdtemp(prefix="becwright-")) try: subprocess.run( ["git", "checkout-index", f"--prefix={tmp}/", "-z", "--stdin"], cwd=root, input="\0".join(files).encode(), check=False, timeout=_check_timeout(), ) bec = root / ".bec" if bec.is_dir(): shutil.copytree(bec, tmp / ".bec", dirs_exist_ok=True) yield tmp finally: shutil.rmtree(tmp, ignore_errors=True)Please confirm
git checkout-index --stdinbehaves as expected for the repo's supported git version range before adopting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/git.py` around lines 63 - 84, The staged_worktree helper is currently materializing each file with a separate _staged_blob/git show call, which is inefficient for large file sets. Update staged_worktree to batch staged-file checkout using a single git checkout-index invocation with --prefix and --stdin, while preserving the existing .bec copy behavior and temp-dir cleanup in the try/finally block. Verify the supported Git version range for checkout-index --stdin and keep the function’s path handling compatible with the current root/files workflow.src/becwright/engine.py (1)
85-96: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueRedundant re-parsing of
_check_timeout().Called once for
timeout=and again for the timeout message, re-reading/re-parsing the env var twice per rule for no benefit.♻️ Suggested fix
+ timeout = _check_timeout() try: proc = subprocess.run( rule.check, shell=True, cwd=root, input="\n".join(relevant), capture_output=True, text=True, - timeout=_check_timeout(), + timeout=timeout, ) except subprocess.TimeoutExpired: results.append(RuleResult( rule=rule, passed=False, - output=f"check timed out after {_check_timeout():g}s (its command hung)", + output=f"check timed out after {timeout:g}s (its command hung)", )) continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/engine.py` around lines 85 - 96, The timeout handling in the rule execution path re-calls _check_timeout() twice, causing unnecessary re-reading/parsing of the env var. In the subprocess.run block inside the rule-checking logic, compute the timeout once in the surrounding scope and reuse that same value for both the timeout= argument and the TimeoutExpired message; keep the change localized to the rule execution flow and the RuleResult append path.src/becwright/catalog.py (1)
26-33: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueConsider sanitizing
nameagainst path traversal.
read_becjoins user-suppliednamedirectly into the resource path without checking for/,.., or absolute path components. Sincenameoriginates from the localbecwright add <name>/becwright searchCLI, the practical exploitability is minimal (the user only affects their own filesystem), but a stray../could resolve outside thebecs/directory.🛡️ Optional hardening
def read_bec(name: str) -> str: + if "/" in name or "\\" in name or name in ("..", "."): + raise CatalogError(f"Invalid BEC name: {name!r}") entry = _catalog_dir().joinpath(name + _SUFFIX)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/catalog.py` around lines 26 - 33, `read_bec` currently joins the user-controlled `name` directly into the catalog path, so harden it against path traversal before resolving the file. Update `read_bec` to validate or normalize `name` so it cannot contain `/`, `..`, or absolute path components, and ensure the final resolved path stays under `_catalog_dir()`. Keep the existing `CatalogError` flow in `read_bec` for invalid or missing entries.src/becwright/rules.py (1)
54-63: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winChain the original YAML error for a proper traceback.
Ruff (B904) flags line 60: re-raising inside an
exceptblock withoutfrom e/from Nonediscards the original traceback context.🔧 Proposed fix
try: data = yaml.safe_load(rules_path.read_text(encoding="utf-8")) or {} except yaml.YAMLError as e: - raise RulesError(f"{rules_path}: invalid YAML ({e}).") + raise RulesError(f"{rules_path}: invalid YAML ({e}).") from e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/becwright/rules.py` around lines 54 - 63, The YAML parse failure handling in load_rules currently raises RulesError without preserving the original traceback. Update the except yaml.YAMLError block to chain the caught exception when raising RulesError so the original YAML error context is retained; use the existing load_rules function and RulesError symbol to locate the change.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/becwright/git.py`:
- Around line 53-60: The staged file read in _staged_blob currently runs git
show without any timeout, so a single slow or stuck subprocess can still hang
the commit path. Update the subprocess.run call in _staged_blob to use the same
BECWRIGHT_CHECK_TIMEOUT protection as the rest of the check flow, and handle the
timeout case by returning None or otherwise failing safely so the commit does
not freeze.
In `@src/becwright/rules.py`:
- Around line 32-51: Require the rule parser in the Rule creation flow to reject
missing or empty paths instead of defaulting to an empty tuple. In the parsing
logic that builds the Rule object, validate the raw mapping for a non-empty
paths field alongside id and check, and raise RulesError when paths is absent or
empty so evaluate() never receives a rule that matches nothing. Use the existing
parse/validation function that returns Rule to make this change.
---
Nitpick comments:
In `@src/becwright/catalog.py`:
- Around line 26-33: `read_bec` currently joins the user-controlled `name`
directly into the catalog path, so harden it against path traversal before
resolving the file. Update `read_bec` to validate or normalize `name` so it
cannot contain `/`, `..`, or absolute path components, and ensure the final
resolved path stays under `_catalog_dir()`. Keep the existing `CatalogError`
flow in `read_bec` for invalid or missing entries.
In `@src/becwright/engine.py`:
- Around line 85-96: The timeout handling in the rule execution path re-calls
_check_timeout() twice, causing unnecessary re-reading/parsing of the env var.
In the subprocess.run block inside the rule-checking logic, compute the timeout
once in the surrounding scope and reuse that same value for both the timeout=
argument and the TimeoutExpired message; keep the change localized to the rule
execution flow and the RuleResult append path.
In `@src/becwright/git.py`:
- Around line 63-84: The staged_worktree helper is currently materializing each
file with a separate _staged_blob/git show call, which is inefficient for large
file sets. Update staged_worktree to batch staged-file checkout using a single
git checkout-index invocation with --prefix and --stdin, while preserving the
existing .bec copy behavior and temp-dir cleanup in the try/finally block.
Verify the supported Git version range for checkout-index --stdin and keep the
function’s path handling compatible with the current root/files workflow.
In `@src/becwright/rules.py`:
- Around line 54-63: The YAML parse failure handling in load_rules currently
raises RulesError without preserving the original traceback. Update the except
yaml.YAMLError block to chain the caught exception when raising RulesError so
the original YAML error context is retained; use the existing load_rules
function and RulesError symbol to locate the change.
In `@tests/test_staged_content.py`:
- Around line 1-30: The staged-content tests duplicate the same git/setup
helpers already defined in tests/test_cli_and_git.py, so please remove the copy
in this test module and reuse shared helpers instead. Move or import _git,
_init_repo, and _rules_yaml from a common test support location such as
conftest.py or a shared helper module, then update _setup to call those shared
symbols so both test files stay in sync.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 67e00cea-b541-42c6-8a98-55eadf27ccb8
📒 Files selected for processing (42)
.github/workflows/ci.yml.pre-commit-hooks.yamlCLAUDE.mdREADME.es.mdREADME.mdbecs/README.mddocumentation/portability.es.mddocumentation/portability.mddocumentation/usage.es.mddocumentation/usage.mdintegrations/claude-code/skills/becwright/SKILL.mdnpm/becwright/package.jsonnpm/platforms/darwin-arm64/package.jsonnpm/platforms/darwin-x64/package.jsonnpm/platforms/linux-arm64/package.jsonnpm/platforms/linux-x64/package.jsonnpm/platforms/win32-x64/package.jsonpyproject.tomlsrc/becwright/__init__.pysrc/becwright/becs/README.mdsrc/becwright/becs/no-console-log-js.bec.yamlsrc/becwright/becs/no-dangerous-eval.bec.yamlsrc/becwright/becs/no-debug-go.bec.yamlsrc/becwright/becs/no-debug-remnants.bec.yamlsrc/becwright/becs/no-debug-rust.bec.yamlsrc/becwright/becs/no-debugger-js.bec.yamlsrc/becwright/becs/no-hardcoded-secrets.bec.yamlsrc/becwright/becs/no-token-in-logs.bec.yamlsrc/becwright/becs/no-wildcard-imports.bec.yamlsrc/becwright/catalog.pysrc/becwright/cli.pysrc/becwright/engine.pysrc/becwright/git.pysrc/becwright/report.pysrc/becwright/rules.pytests/test_bundle.pytests/test_catalog.pytests/test_cli_and_git.pytests/test_engine_integration.pytests/test_list.pytests/test_rules.pytests/test_staged_content.py
💤 Files with no reviewable changes (1)
- becs/README.md
| def _staged_blob(root: Path, path: str) -> bytes | None: | ||
| # `:0:<path>` is the staged (index) version of the file, which is exactly | ||
| # what the commit will record — not the working-tree copy that may differ. | ||
| res = subprocess.run( | ||
| ["git", "show", f":0:{path}"], | ||
| cwd=root, capture_output=True, | ||
| ) | ||
| return res.stdout if res.returncode == 0 else None |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
No timeout on the per-file git show subprocess call.
This PR adds BECWRIGHT_CHECK_TIMEOUT specifically to stop a hung check from freezing a commit, but _staged_blob's git show call (invoked once per staged file) has no timeout at all. A stuck git process (lock contention, slow clean/smudge filters, network-mounted repo) here would hang the same commit path this feature is meant to protect.
🔒 Suggested fix
def _staged_blob(root: Path, path: str) -> bytes | None:
# `:0:<path>` is the staged (index) version of the file, which is exactly
# what the commit will record — not the working-tree copy that may differ.
res = subprocess.run(
["git", "show", f":0:{path}"],
- cwd=root, capture_output=True,
+ cwd=root, capture_output=True, timeout=_check_timeout(),
)
return res.stdout if res.returncode == 0 else None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _staged_blob(root: Path, path: str) -> bytes | None: | |
| # `:0:<path>` is the staged (index) version of the file, which is exactly | |
| # what the commit will record — not the working-tree copy that may differ. | |
| res = subprocess.run( | |
| ["git", "show", f":0:{path}"], | |
| cwd=root, capture_output=True, | |
| ) | |
| return res.stdout if res.returncode == 0 else None | |
| def _staged_blob(root: Path, path: str) -> bytes | None: | |
| # `:0:<path>` is the staged (index) version of the file, which is exactly | |
| # what the commit will record — not the working-tree copy that may differ. | |
| res = subprocess.run( | |
| ["git", "show", f":0:{path}"], | |
| cwd=root, capture_output=True, timeout=_check_timeout(), | |
| ) | |
| return res.stdout if res.returncode == 0 else None |
🧰 Tools
🪛 ast-grep (0.44.0)
[error] 55-58: Command coming from incoming request
Context: subprocess.run(
["git", "show", f":0:{path}"],
cwd=root, capture_output=True,
)
Note: [CWE-78] Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').
(subprocess-from-request)
🪛 Ruff (0.15.20)
[error] 56-56: subprocess call: check for execution of untrusted input
(S603)
[error] 57-57: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/becwright/git.py` around lines 53 - 60, The staged file read in
_staged_blob currently runs git show without any timeout, so a single slow or
stuck subprocess can still hang the commit path. Update the subprocess.run call
in _staged_blob to use the same BECWRIGHT_CHECK_TIMEOUT protection as the rest
of the check flow, and handle the timeout case by returning None or otherwise
failing safely so the commit does not freeze.
| if not isinstance(raw, dict): | ||
| raise RulesError(f"each rule must be a mapping, got {type(raw).__name__}.") | ||
| for field in ("id", "check"): | ||
| if not raw.get(field): | ||
| raise RulesError(f"a rule is missing the required field '{field}'.") | ||
| severity = raw.get("severity", "blocking") | ||
| if severity not in _VALID_SEVERITIES: | ||
| raise RulesError( | ||
| f"rule '{raw['id']}': invalid severity {severity!r} " | ||
| f"(use one of: {', '.join(_VALID_SEVERITIES)})." | ||
| ) | ||
| return Rule( | ||
| id=raw["id"], | ||
| paths=tuple(raw.get("paths", [])), | ||
| check=raw["check"], | ||
| intent=(raw.get("intent") or "").strip(), | ||
| why_it_matters=(raw.get("why_it_matters") or "").strip(), | ||
| rejected_alternatives=tuple(raw.get("rejected_alternatives", [])), | ||
| severity=raw.get("severity", "blocking"), | ||
| severity=severity, | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm how empty/missing `paths` is handled during matching in engine.py
rg -n "paths" src/becwright/engine.py -C3Repository: DataDave-Dev/becwright
Length of output: 435
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the matching logic and the Rule model to determine what empty paths do.
ast-grep outline src/becwright/engine.py --view expanded || true
printf '\n--- engine.py relevant lines ---\n'
rg -n "def matches|matches\(|paths" src/becwright/engine.py -C4
printf '\n--- rules.py model lines ---\n'
rg -n "class Rule|paths:" src/becwright/rules.py -C4Repository: DataDave-Dev/becwright
Length of output: 1895
Require paths during rule parsing. matches() returns False for an empty tuple, so a rule without paths is accepted as paths=() and then skipped by evaluate() before check ever runs. That turns a bad config into a no-op instead of a RulesError.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/becwright/rules.py` around lines 32 - 51, Require the rule parser in the
Rule creation flow to reject missing or empty paths instead of defaulting to an
empty tuple. In the parsing logic that builds the Rule object, validate the raw
mapping for a non-empty paths field alongside id and check, and raise RulesError
when paths is absent or empty so evaluate() never receives a rule that matches
nothing. Use the existing parse/validation function that returns Rule to make
this change.
Source: Learnings
Summary
Hardens the enforcement path so becwright keeps its "guaranteed, not probabilistic" promise, and lowers adoption friction with an offline catalog and pre-commit/Husky support. Bumps to 0.3.0.
Correctness (P0)
--allstill inspects the working tree on purpose.BECWRIGHT_CHECK_TIMEOUT(0disables).rules.yaml. A misspelledseveritysilently downgraded a rule to a warning; malformed YAML dumped a raw traceback. Both now raise a readableRulesErrorand exit 2.Adoption (P1)
becwright add <name>andbecwright search [query]. The catalog now ships inside the package (src/becwright/becs/, via package-data), so installing a ready-made BEC needs no URL and works offline. Shares the trust-gated install flow withimport..pre-commit-hooks.yamland README setup for both.Polish (P2)
check --allperf).Test plan
pytest— 152 passing (new: staged-content both directions, timeout, rules validation, catalog/add/search)..bec.yamlfiles are included.becwright check --all) green; the pre-commit hook validated each commit using staged content.Summary by CodeRabbit
New Features
Bug Fixes
Documentation