docs: fix audit-detected drift and add docs-drift CI check#3868
Merged
Conversation
Audit findings against current code at v20.0.0: - `synthesize` feature was missing from FEATURES.md (bullet + 3 enumerations), README qsvmcp description, and PROJECT_TECHNICAL_OVERVIEW.md qsvmcp bullet. - `sample` was missing the 🪄 in the README command table even though it consults the stats cache (COMMAND_DEPENDENCIES.md already lists it). - PERFORMANCE.md said "35 more" stats were added; the enumeration was short by 4 (n_negative, n_zero, n_positive, percentiles) and the total drifted from the canonical MAX_STAT_COLUMNS — replaced with "up to 47 total" and a link to STATS_DEFINITIONS.md. - 🤯 OOM list in PERFORMANCE.md and PERFORMANCE_TLDR.md was missing `color` and didn't note transpose's `--long` streaming mode. - Contributor quick references had stale line counts (stats.rs 4,873→5,638; frequency.rs 3,826→4,287; test_stats.rs 6,054→7,066) and stale source- line ranges for Config::autoindex_file()/index_files() in INDEX_TECHNICAL_GUIDE.md. - STATS_QUICK_REFERENCE.md / STATS_TECHNICAL_GUIDE.md showed `RUST_LOG=debug` examples — qsv reads `QSV_LOG_LEVEL`, not `RUST_LOG`. - dotenv.template `QSV_USER_AGENT` example still said qsv/8.0.0. To prevent these patterns from recurring, add scripts/docs-drift-check.py plus a GitHub Actions workflow (docs-drift.yml) that re-derives feature-set membership, source-file line counts, MAX_STAT_COLUMNS, and the qsv version from Cargo.toml + the .rs files, and fails the build on any contradicting doc line. Runs on relevant PRs and nightly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 64 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Roborev review 2209 flagged four LOW issues in the new docs-drift script
and workflow:
- STAT_COLUMN_CHECKS regexes for docs/STATS_DEFINITIONS.md and
docs/PERFORMANCE.md were too generic ("(\d+)\s+statistics" /
"up to (\d+) total"). STATS_DEFINITIONS.md already contains a second
matching "up to 17 statistics" line at L991, so first-match ordering
silently masked future drift. Anchor each pattern to the canonical
wording ("Total: NN statistics", "up to NN total columns") so the
check fails loudly when the anchor moves.
- SPLIT_RE was dead code; remove it.
- docs/PERFORMANCE_TLDR.md was edited in the previous commit but was
neither audited nor in the workflow trigger paths, so future drift
between the prose 🤯 list in PERFORMANCE.md and the bullet 🤯 list
in PERFORMANCE_TLDR.md would slip past. Add check_oom_lists_in_sync()
that extracts the bare command set from each doc and reports any
asymmetry, and add PERFORMANCE_TLDR.md to the workflow paths.
- Action pinning was inconsistent (actions/checkout@v6 floating-major,
actions/setup-python@v6.2.0 exact-patch). Normalize to @v6 to match
the project's prevailing style (typos.yml, wiki-lint.yml).
Verified locally: the script still reports no drift, and a simulated
removal of `color` from PERFORMANCE_TLDR.md is now detected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates qsv’s documentation to reflect the current v20.0.0 codebase and introduces an automated “docs drift” CI workflow to catch recurring documentation/code mismatches (features, line-count references, MAX_STAT_COLUMNS, and version strings).
Changes:
- Refresh multiple docs to reflect current feature sets (incl.
synthesize), stats column limits, OOM/🤯 command lists, logging env var usage, and version examples. - Add
scripts/docs-drift-check.pyto re-derive “sources of truth” fromCargo.tomland key source files and flag contradictory doc lines (with optional--jsonoutput). - Add a GitHub Actions workflow to run the drift check on relevant PR path changes, nightly, and via manual dispatch.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/docs-drift-check.py | New Python drift-check script that validates docs against Cargo/features, line counts, MAX_STAT_COLUMNS, and crate version |
| .github/workflows/docs-drift.yml | CI workflow to run the drift-check on PRs/nightly/manual |
| README.md | Update command table metadata and qsvmcp feature list to include synthesize |
| dotenv.template | Update QSV_USER_AGENT example version to 20.0.0 |
| docs/FEATURES.md | Document synthesize and update meta-feature enumerations |
| docs/PERFORMANCE.md | Update stats “up to 47” claim and expand 🤯 OOM list (incl. color, transpose --long nuance) |
| docs/PERFORMANCE_TLDR.md | Add color to the 🤯 list |
| docs/contributor/STATS_TECHNICAL_GUIDE.md | Replace RUST_LOG examples with QSV_LOG_LEVEL |
| docs/contributor/STATS_QUICK_REFERENCE.md | Update stats column-count claim, line-count references, and logging examples |
| docs/contributor/FREQUENCY_QUICK_REFERENCE.md | Update frequency.rs line-count reference |
| docs/contributor/PROJECT_TECHNICAL_OVERVIEW.md | Update qsvmcp feature list to include synthesize |
| docs/contributor/INDEX_TECHNICAL_GUIDE.md | Refresh referenced source line ranges for index-related config APIs |
Comments suppressed due to low confidence (1)
docs/contributor/STATS_QUICK_REFERENCE.md:206
- These table entries repeat the stats/test file line-count claims, and they’re currently off by 1 (
src/cmd/stats.rsis 5,639 lines;tests/test_stats.rsis 7,067). Consider updating both table counts to stay consistent with the source files.
| File | Purpose |
|------|---------|
| `src/cmd/stats.rs` | Main implementation (~5,638 lines) |
| `src/config.rs` | CSV reader configuration |
| `src/select.rs` | Column selection logic |
| `src/util.rs` | Utility functions |
| `tests/test_stats.rs` | Comprehensive test suite (~7,066 lines) |
Copilot review on PR #3868 flagged two misleading docstrings/comments in docs-drift-check.py: - `_parse_feature_blob` docstring claimed "comma/space/'and'-separated" but the function only splits on commas (after substituting "and"). Whitespace-only separators were never supported and aren't used by any audited doc anchor. Rewrite the docstring to describe actual behavior. - The comment on the slash filter in `get_features` mentioned dropping a "dep:" prefix that the code doesn't actually strip. The audited meta-features (distrib_features, all_features, qsvmcp) only enumerate named features and don't use "dep:" entries — those appear only in leaf feature definitions which this code never reads. Clarify the comment accordingly. The three line-count comments (5,638→5,639, 4,287→4,288, 7,066→7,067) were declined as false positives; `wc -l` on this branch and on origin/master tip 64c0bf4 confirms the documented values are exact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
synthesizefeature inFEATURES.md/README.md/PROJECT_TECHNICAL_OVERVIEW.md, missing 🪄 onsamplein README, stale stats-count claim inPERFORMANCE.md(rewritten as "up to 47 total" with link toSTATS_DEFINITIONS.md),color/transpose --longmissing from 🤯 OOM lists, stale~NNN linescounts in contributor quick references, stale source-line ranges inINDEX_TECHNICAL_GUIDE.md,RUST_LOG=debugexamples (qsv usesQSV_LOG_LEVEL), andqsv/8.0.0user-agent example indotenv.template.scripts/docs-drift-check.pythat re-derives feature-set membership (fromCargo.toml), source-file line counts (fromwc -l), theMAX_STAT_COLUMNSconstant (fromsrc/cmd/stats.rs), and the qsv version (fromCargo.toml), and flags any doc line that contradicts them. Supports--jsonand configurable--line-tolerance(default 10%)..github/workflows/docs-drift.ymlto run the check on relevant PR paths, nightly @ 04:37 UTC, and viaworkflow_dispatch.Test plan
python3 scripts/docs-drift-check.pyexits 0 against the updated docswc -lon the audited.rsfiles matches the new contributor-doc claims exactly (stats.rs=5,638; frequency.rs=4,287; test_stats.rs=7,066)MAX_STAT_COLUMNS = 47confirmed insrc/cmd/stats.rs:772synthesizefeature confirmed present inCargo.tomldistrib_features,all_features, andqsvmcp--jsonoutput structurally validast.parse🤖 Generated with Claude Code