ci: guard oversized test files by cv · Pull Request #4905 · NVIDIA/NemoClaw

cv · 2026-06-07T05:58:25Z

Summary

Adds a test-file size budget so new or touched test files cannot keep growing into oversized catch-all suites. This is a trimmed replacement for #4899 with a clean branch from current main, reducing the review diff from ~1k lines to about 520 added lines.

Related Issue

Related to #4892. Replaces #4899.

Changes

Add scripts/check-test-file-size-budget.ts and ci/test-file-size-budget.json to enforce a 1,500-line default budget with legacy ratchets for existing oversized tests.
Wire npm run test-size:check into pre-commit and PR static checks.
Add a compact trusted pull_request_target guard for changed test files and budget-policy tampering without using PR-controlled code or raw_url token fetches.
Add focused Vitest coverage and workflow-contract assertions for the new budget and trusted guard behavior.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Focused local checks passed: npx prek run --files ... --skip test-cli --skip test-plugin, npx vitest run --project cli test/pr-workflow-contract.test.ts test/test-file-size-budget.test.ts, npm run test-size:check, npm run build:cli, npm run typecheck:cli, npm run source-shape:check, and git diff --check. Full local commit hooks were attempted in the temporary clean worktree but the CLI coverage hook hit/stuck on local coverage-process instability, so the full matrix is left to CI.

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

New Features
- Introduced test file size budgeting system with configurable maximum line count limits for test files.
- Legacy grandfathering support allows existing large test files to remain within their own budgets while preventing further growth.
- Integrated automatic size validation into CI/CD workflow and pre-commit hooks.

coderabbitai · 2026-06-07T05:58:37Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a7e70190-7f8b-4a4d-9dd1-26c60b54aaea

📥 Commits

Reviewing files that changed from the base of the PR and between d7aa5e0 and 8c1d8fc.

📒 Files selected for processing (8)

.github/workflows/codebase-growth-guardrails.yaml
.github/workflows/pr.yaml
.pre-commit-config.yaml
ci/test-file-size-budget.json
package.json
scripts/check-test-file-size-budget.ts
test/pr-workflow-contract.test.ts
test/test-file-size-budget.test.ts

📝 Walkthrough

Walkthrough

This PR introduces an end-to-end test file size budget enforcement system. It includes a TypeScript CLI script that scans, counts lines in test files, and validates them against a JSON budget with default and per-file legacy limits. The system integrates into pre-commit hooks, npm scripts, and GitHub Actions workflows with comprehensive validation of budget monotonicity.

Changes

Test File Size Budget System

Layer / File(s)	Summary
Budget enforcement script and core logic `scripts/check-test-file-size-budget.ts`	Implements exported data types (`TestFileSizeBudget`, `TestFileSizeEntry`, `TestFileSizeViolation`) and utility functions for counting lines, scanning test directories, parsing JSON budgets with validation, evaluating violations (oversized, legacy-ratchet, stale-legacy-budget), and formatting error messages.
Budget configuration and npm script wiring `ci/test-file-size-budget.json`, `package.json`	Introduces JSON budget config with `defaultMaxLines: 1500` and per-file `legacyMaxLines` overrides; adds SPDX attribution to `package.json` and wires `test-size:check` npm script to run the budget checker.
Pre-commit hook integration `.pre-commit-config.yaml`	Adds SPDX header and registers `test-file-size-budget` hook with priority 20 to enforce budget checks locally on test/spec files and budget configuration changes.
GitHub Actions workflows integration `.github/workflows/codebase-growth-guardrails.yaml`, `.github/workflows/pr.yaml`	Adds `permissions.contents: read` and a new "Require changed test files to stay within size budget" step to `codebase-growth-guardrails` with inline script that validates budget monotonicity via GitHub API, retrieves changed test files, and enforces size limits. Updates `pr.yaml` to skip the hook during pre-push and invoke the check separately as a dedicated step.
Unit tests and workflow contract validation `test/test-file-size-budget.test.ts`, `test/pr-workflow-contract.test.ts`	Adds Vitest suite covering line counting, violation detection (oversized, legacy-ratchet, stale-legacy-budget), legacy ratcheting, and JSON parsing. Adds workflow contract tests validating the codebase-growth-guardrails job structure and ensuring `test-file-size-budget` is in skipped hooks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, area: ci, chore

Suggested reviewers

prekshivyas
cjagwani
jyaunches

Poem

🐰 Hops through test files with glee,
Counting lines for all to see,
"1500 lines is the way!"
No more test bloat here to stay,
Budget ratchets down each day! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'ci: guard oversized test files' directly and clearly describes the main change: introducing a test-file size budget mechanism to prevent oversized test files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/test-size-guard-trimmed

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-07T05:59:10Z

🌿 Preview your docs: https://nvidia-preview-pr-4905.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-07T05:59:30Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No E2E is recommended. The PR is limited to CI guardrails, pre-commit configuration, package scripts, a test-size budget utility, and unit/contract tests. It does not change NemoClaw runtime code or behavior for installer/onboarding, sandbox lifecycle, credentials, network policy, inference routing, deployment, or real assistant user flows. Existing static checks, typecheck, CLI/plugin unit tests, and the added workflow-contract tests are the appropriate validation layer.

Optional E2E

None.

New E2E recommendations

None.

github-actions · 2026-06-07T05:59:31Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. Changes are limited to PR/static CI guardrails, pre-commit configuration, package scripts, a test-file-size budget checker, and unit/contract tests outside test/e2e-scenario/. They do not modify scenario workflows, scenario metadata, expected states, validation suites, runtime/runner code, onboarding/install helpers, or suite scripts used by scenario E2E.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-06-07T06:03:45Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Behavior-test the trusted test-size guard fallback and policy-tamper paths

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: .github/workflows/codebase-growth-guardrails.yaml FALLBACK_BUDGET/baseWasFallback: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The workflow defines FALLBACK_BUDGET, computes baseWasFallback = baseText === null, and validateBudget returns early when baseWasFallback is true.
Behavior-test the trusted test-size guard fallback and policy-tamper paths (.github/workflows/codebase-growth-guardrails.yaml:123): The pull_request_target guard correctly avoids checking out or executing PR code, but its inline JavaScript duplicates policy logic from scripts/check-test-file-size-budget.ts. The unit tests exercise the local script, while the workflow contract test mostly asserts that key substrings are present. That leaves the trusted copy's monotonic-budget and fallback behavior vulnerable to behavior-preserving-looking edits that still satisfy substring checks. The initial FALLBACK_BUDGET/baseWasFallback compatibility path is reasonable for introducing the file, but its removal condition is not documented or behaviorally tested.
- Recommendation: Add focused behavioral tests for the trusted workflow guard logic, or factor the pure policy logic into a tested generator/shared fixture that emits the workflow script from trusted source. At minimum, document that FALLBACK_BUDGET can be removed once supported base branches contain ci/test-file-size-budget.json.
- Evidence: The workflow embeds FALLBACK_BUDGET and validateBudget logic inline, while test/test-file-size-budget.test.ts covers only scripts/check-test-file-size-budget.ts. test/pr-workflow-contract.test.ts asserts strings such as HEAD_REPO, HEAD_SHA, no .raw_url, previous_filename, budgetChanged, and a stale-budget message, but does not execute mocked API scenarios.

🌱 Nice ideas

None.

Consider writing more tests for

**Mocked behavioral coverage** — trusted pull_request_target guard rejects increased defaultMaxLines. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
**Mocked behavioral coverage** — trusted pull_request_target guard rejects adding a new above-default legacy budget. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
**Mocked behavioral coverage** — trusted pull_request_target guard rejects removing a legacy budget while the head file still exceeds defaultMaxLines. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
**Mocked behavioral coverage** — trusted pull_request_target guard rejects stale legacy budget entries at the PR head. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
**Mocked behavioral coverage** — trusted pull_request_target guard checks renamed test files using previous_filename. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
**Behavior-test the trusted test-size guard fallback and policy-tamper paths** — Add focused behavioral tests for the trusted workflow guard logic, or factor the pure policy logic into a tested generator/shared fixture that emits the workflow script from trusted source. At minimum, document that FALLBACK_BUDGET can be removed once supported base branches contain ci/test-file-size-budget.json.
**Acceptance clause:** Related to ci: safely split slow CLI coverage suites #4892. Replaces ci: guard oversized test files #4899. — add test evidence or identify existing coverage. Deterministic context reported linkedIssues: [], so there were no linked issue clauses or issue comments available to map. Open overlap context shows ci: guard oversized test files #4899 touching the same files, consistent with the replacement claim, but issue ci: safely split slow CLI coverage suites #4892 acceptance text was not provided.
**Acceptance clause:** Add a compact trusted `pull_request_target` guard for changed test files and budget-policy tampering without using PR-controlled code or `raw_url` token fetches. — add test evidence or identify existing coverage. .github/workflows/codebase-growth-guardrails.yaml adds a pull_request_target guard that does not checkout or execute PR code, uses read-only permissions, reads GitHub API metadata/content by HEAD_REPO and HEAD_SHA, and does not use .raw_url. The main remaining gap is test depth for the inline trusted guard's behavior and fallback path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

ci: guard oversized test files

8c1d8fc

cv self-assigned this Jun 7, 2026

This was referenced Jun 7, 2026

ci: guard oversized test files #4899

Closed

ci: safely split slow CLI coverage suites #4892

Closed

cv merged commit 29b0d5f into main Jun 7, 2026
41 checks passed

cv deleted the codex/test-size-guard-trimmed branch June 7, 2026 06:36

This was referenced Jun 7, 2026

chore(cli): clean up unused CodeQL declarations #4906

Merged

test(cli): split sandbox connect inference tests #4907

Merged

cv added the v0.0.61 Release target label Jun 7, 2026

coderabbitai Bot mentioned this pull request Jun 7, 2026

test(cli): speed up startup auto-pair tests #4913

Merged

12 tasks

wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

test(onboard): split vLLM selection suite #5011

Merged

12 tasks

Conversation

cv commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented Jun 7, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 7, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading