Skip to content

ci: guard oversized test files#4905

Merged
cv merged 1 commit into
mainfrom
codex/test-size-guard-trimmed
Jun 7, 2026
Merged

ci: guard oversized test files#4905
cv merged 1 commit into
mainfrom
codex/test-size-guard-trimmed

Conversation

@cv

@cv cv commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a test-file size budget so new or touched test files cannot keep growing into oversized catch-all suites. This is a trimmed replacement for #4899 with a clean branch from current main, reducing the review diff from ~1k lines to about 520 added lines.

Related Issue

Related to #4892. Replaces #4899.

Changes

  • Add scripts/check-test-file-size-budget.ts and ci/test-file-size-budget.json to enforce a 1,500-line default budget with legacy ratchets for existing oversized tests.
  • Wire npm run test-size:check into pre-commit and PR static checks.
  • Add a compact trusted pull_request_target guard for changed test files and budget-policy tampering without using PR-controlled code or raw_url token fetches.
  • Add focused Vitest coverage and workflow-contract assertions for the new budget and trusted guard behavior.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

Focused local checks passed: npx prek run --files ... --skip test-cli --skip test-plugin, npx vitest run --project cli test/pr-workflow-contract.test.ts test/test-file-size-budget.test.ts, npm run test-size:check, npm run build:cli, npm run typecheck:cli, npm run source-shape:check, and git diff --check. Full local commit hooks were attempted in the temporary clean worktree but the CLI coverage hook hit/stuck on local coverage-process instability, so the full matrix is left to CI.

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • New Features
    • Introduced test file size budgeting system with configurable maximum line count limits for test files.
    • Legacy grandfathering support allows existing large test files to remain within their own budgets while preventing further growth.
    • Integrated automatic size validation into CI/CD workflow and pre-commit hooks.

@cv cv self-assigned this Jun 7, 2026
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a7e70190-7f8b-4a4d-9dd1-26c60b54aaea

📥 Commits

Reviewing files that changed from the base of the PR and between d7aa5e0 and 8c1d8fc.

📒 Files selected for processing (8)
  • .github/workflows/codebase-growth-guardrails.yaml
  • .github/workflows/pr.yaml
  • .pre-commit-config.yaml
  • ci/test-file-size-budget.json
  • package.json
  • scripts/check-test-file-size-budget.ts
  • test/pr-workflow-contract.test.ts
  • test/test-file-size-budget.test.ts

📝 Walkthrough

Walkthrough

This PR introduces an end-to-end test file size budget enforcement system. It includes a TypeScript CLI script that scans, counts lines in test files, and validates them against a JSON budget with default and per-file legacy limits. The system integrates into pre-commit hooks, npm scripts, and GitHub Actions workflows with comprehensive validation of budget monotonicity.

Changes

Test File Size Budget System

Layer / File(s) Summary
Budget enforcement script and core logic
scripts/check-test-file-size-budget.ts
Implements exported data types (TestFileSizeBudget, TestFileSizeEntry, TestFileSizeViolation) and utility functions for counting lines, scanning test directories, parsing JSON budgets with validation, evaluating violations (oversized, legacy-ratchet, stale-legacy-budget), and formatting error messages.
Budget configuration and npm script wiring
ci/test-file-size-budget.json, package.json
Introduces JSON budget config with defaultMaxLines: 1500 and per-file legacyMaxLines overrides; adds SPDX attribution to package.json and wires test-size:check npm script to run the budget checker.
Pre-commit hook integration
.pre-commit-config.yaml
Adds SPDX header and registers test-file-size-budget hook with priority 20 to enforce budget checks locally on test/spec files and budget configuration changes.
GitHub Actions workflows integration
.github/workflows/codebase-growth-guardrails.yaml, .github/workflows/pr.yaml
Adds permissions.contents: read and a new "Require changed test files to stay within size budget" step to codebase-growth-guardrails with inline script that validates budget monotonicity via GitHub API, retrieves changed test files, and enforces size limits. Updates pr.yaml to skip the hook during pre-push and invoke the check separately as a dedicated step.
Unit tests and workflow contract validation
test/test-file-size-budget.test.ts, test/pr-workflow-contract.test.ts
Adds Vitest suite covering line counting, violation detection (oversized, legacy-ratchet, stale-legacy-budget), legacy ratcheting, and JSON parsing. Adds workflow contract tests validating the codebase-growth-guardrails job structure and ensuring test-file-size-budget is in skipped hooks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, area: ci, chore

Suggested reviewers

  • prekshivyas
  • cjagwani
  • jyaunches

Poem

🐰 Hops through test files with glee,
Counting lines for all to see,
"1500 lines is the way!"
No more test bloat here to stay,
Budget ratchets down each day! 📊✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci: guard oversized test files' directly and clearly describes the main change: introducing a test-file size budget mechanism to prevent oversized test files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/test-size-guard-trimmed

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No E2E is recommended. The PR is limited to CI guardrails, pre-commit configuration, package scripts, a test-size budget utility, and unit/contract tests. It does not change NemoClaw runtime code or behavior for installer/onboarding, sandbox lifecycle, credentials, network policy, inference routing, deployment, or real assistant user flows. Existing static checks, typecheck, CLI/plugin unit tests, and the added workflow-contract tests are the appropriate validation layer.

Optional E2E

  • None.

New E2E recommendations

  • None.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. Changes are limited to PR/static CI guardrails, pre-commit configuration, package scripts, a test-file-size budget checker, and unit/contract tests outside test/e2e-scenario/. They do not modify scenario workflows, scenario metadata, expected states, validation suites, runtime/runner code, onboarding/install helpers, or suite scripts used by scenario E2E.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Behavior-test the trusted test-size guard fallback and policy-tamper paths

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: .github/workflows/codebase-growth-guardrails.yaml FALLBACK_BUDGET/baseWasFallback: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The workflow defines FALLBACK_BUDGET, computes baseWasFallback = baseText === null, and validateBudget returns early when baseWasFallback is true.
  • Behavior-test the trusted test-size guard fallback and policy-tamper paths (.github/workflows/codebase-growth-guardrails.yaml:123): The pull_request_target guard correctly avoids checking out or executing PR code, but its inline JavaScript duplicates policy logic from scripts/check-test-file-size-budget.ts. The unit tests exercise the local script, while the workflow contract test mostly asserts that key substrings are present. That leaves the trusted copy's monotonic-budget and fallback behavior vulnerable to behavior-preserving-looking edits that still satisfy substring checks. The initial FALLBACK_BUDGET/baseWasFallback compatibility path is reasonable for introducing the file, but its removal condition is not documented or behaviorally tested.
    • Recommendation: Add focused behavioral tests for the trusted workflow guard logic, or factor the pure policy logic into a tested generator/shared fixture that emits the workflow script from trusted source. At minimum, document that FALLBACK_BUDGET can be removed once supported base branches contain ci/test-file-size-budget.json.
    • Evidence: The workflow embeds FALLBACK_BUDGET and validateBudget logic inline, while test/test-file-size-budget.test.ts covers only scripts/check-test-file-size-budget.ts. test/pr-workflow-contract.test.ts asserts strings such as HEAD_REPO, HEAD_SHA, no .raw_url, previous_filename, budgetChanged, and a stale-budget message, but does not execute mocked API scenarios.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Mocked behavioral coverage** — trusted pull_request_target guard rejects increased defaultMaxLines. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
  • **Mocked behavioral coverage** — trusted pull_request_target guard rejects adding a new above-default legacy budget. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
  • **Mocked behavioral coverage** — trusted pull_request_target guard rejects removing a legacy budget while the head file still exceeds defaultMaxLines. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
  • **Mocked behavioral coverage** — trusted pull_request_target guard rejects stale legacy budget entries at the PR head. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
  • **Mocked behavioral coverage** — trusted pull_request_target guard checks renamed test files using previous_filename. The local scanner has useful unit coverage, but the most sensitive changed behavior is an inline pull_request_target script that performs GitHub API I/O and policy decisions. It is currently covered mostly by substring contract checks rather than mocked behavioral scenarios.
  • **Behavior-test the trusted test-size guard fallback and policy-tamper paths** — Add focused behavioral tests for the trusted workflow guard logic, or factor the pure policy logic into a tested generator/shared fixture that emits the workflow script from trusted source. At minimum, document that FALLBACK_BUDGET can be removed once supported base branches contain ci/test-file-size-budget.json.
  • **Acceptance clause:** Related to ci: safely split slow CLI coverage suites #4892. Replaces ci: guard oversized test files #4899. — add test evidence or identify existing coverage. Deterministic context reported linkedIssues: [], so there were no linked issue clauses or issue comments available to map. Open overlap context shows ci: guard oversized test files #4899 touching the same files, consistent with the replacement claim, but issue ci: safely split slow CLI coverage suites #4892 acceptance text was not provided.
  • **Acceptance clause:** Add a compact trusted `pull_request_target` guard for changed test files and budget-policy tampering without using PR-controlled code or `raw_url` token fetches. — add test evidence or identify existing coverage. .github/workflows/codebase-growth-guardrails.yaml adds a pull_request_target guard that does not checkout or execute PR code, uses read-only permissions, reads GitHub API metadata/content by HEAD_REPO and HEAD_SHA, and does not use .raw_url. The main remaining gap is test depth for the inline trusted guard's behavior and fallback path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv merged commit 29b0d5f into main Jun 7, 2026
41 checks passed
@cv cv deleted the codex/test-size-guard-trimmed branch June 7, 2026 06:36
@cv cv added the v0.0.61 Release target label Jun 7, 2026
@wscurran wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Build, CI, dependency, or tooling maintenance v0.0.61 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants