Skip to content

docs: fix audit-detected drift and add docs-drift CI check#3868

Merged
jqnatividad merged 3 commits into
masterfrom
docs/audit-drift-fixes-and-ci
May 17, 2026
Merged

docs: fix audit-detected drift and add docs-drift CI check#3868
jqnatividad merged 3 commits into
masterfrom
docs/audit-drift-fixes-and-ci

Conversation

@jqnatividad
Copy link
Copy Markdown
Collaborator

Summary

  • Fix 10 documentation files for drift detected by the documentation-audit skill against current code at v20.0.0: missing synthesize feature in FEATURES.md/README.md/PROJECT_TECHNICAL_OVERVIEW.md, missing 🪄 on sample in README, stale stats-count claim in PERFORMANCE.md (rewritten as "up to 47 total" with link to STATS_DEFINITIONS.md), color/transpose --long missing from 🤯 OOM lists, stale ~NNN lines counts in contributor quick references, stale source-line ranges in INDEX_TECHNICAL_GUIDE.md, RUST_LOG=debug examples (qsv uses QSV_LOG_LEVEL), and qsv/8.0.0 user-agent example in dotenv.template.
  • Add scripts/docs-drift-check.py that re-derives feature-set membership (from Cargo.toml), source-file line counts (from wc -l), the MAX_STAT_COLUMNS constant (from src/cmd/stats.rs), and the qsv version (from Cargo.toml), and flags any doc line that contradicts them. Supports --json and configurable --line-tolerance (default 10%).
  • Add .github/workflows/docs-drift.yml to run the check on relevant PR paths, nightly @ 04:37 UTC, and via workflow_dispatch.

Test plan

  • python3 scripts/docs-drift-check.py exits 0 against the updated docs
  • wc -l on the audited .rs files matches the new contributor-doc claims exactly (stats.rs=5,638; frequency.rs=4,287; test_stats.rs=7,066)
  • MAX_STAT_COLUMNS = 47 confirmed in src/cmd/stats.rs:772
  • synthesize feature confirmed present in Cargo.toml distrib_features, all_features, and qsvmcp
  • Script --json output structurally valid
  • Script syntax checked via ast.parse
  • Verify the docs-drift workflow runs green in CI on this PR

🤖 Generated with Claude Code

Audit findings against current code at v20.0.0:
- `synthesize` feature was missing from FEATURES.md (bullet + 3 enumerations),
  README qsvmcp description, and PROJECT_TECHNICAL_OVERVIEW.md qsvmcp bullet.
- `sample` was missing the 🪄 in the README command table even though it
  consults the stats cache (COMMAND_DEPENDENCIES.md already lists it).
- PERFORMANCE.md said "35 more" stats were added; the enumeration was short
  by 4 (n_negative, n_zero, n_positive, percentiles) and the total drifted
  from the canonical MAX_STAT_COLUMNS — replaced with "up to 47 total" and
  a link to STATS_DEFINITIONS.md.
- 🤯 OOM list in PERFORMANCE.md and PERFORMANCE_TLDR.md was missing `color`
  and didn't note transpose's `--long` streaming mode.
- Contributor quick references had stale line counts (stats.rs 4,873→5,638;
  frequency.rs 3,826→4,287; test_stats.rs 6,054→7,066) and stale source-
  line ranges for Config::autoindex_file()/index_files() in
  INDEX_TECHNICAL_GUIDE.md.
- STATS_QUICK_REFERENCE.md / STATS_TECHNICAL_GUIDE.md showed `RUST_LOG=debug`
  examples — qsv reads `QSV_LOG_LEVEL`, not `RUST_LOG`.
- dotenv.template `QSV_USER_AGENT` example still said qsv/8.0.0.

To prevent these patterns from recurring, add scripts/docs-drift-check.py
plus a GitHub Actions workflow (docs-drift.yml) that re-derives feature-set
membership, source-file line counts, MAX_STAT_COLUMNS, and the qsv version
from Cargo.toml + the .rs files, and fails the build on any contradicting
doc line. Runs on relevant PRs and nightly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 17, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 64 complexity

Metric Results
Complexity 64

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Roborev review 2209 flagged four LOW issues in the new docs-drift script
and workflow:

- STAT_COLUMN_CHECKS regexes for docs/STATS_DEFINITIONS.md and
  docs/PERFORMANCE.md were too generic ("(\d+)\s+statistics" /
  "up to (\d+) total"). STATS_DEFINITIONS.md already contains a second
  matching "up to 17 statistics" line at L991, so first-match ordering
  silently masked future drift. Anchor each pattern to the canonical
  wording ("Total: NN statistics", "up to NN total columns") so the
  check fails loudly when the anchor moves.
- SPLIT_RE was dead code; remove it.
- docs/PERFORMANCE_TLDR.md was edited in the previous commit but was
  neither audited nor in the workflow trigger paths, so future drift
  between the prose 🤯 list in PERFORMANCE.md and the bullet 🤯 list
  in PERFORMANCE_TLDR.md would slip past. Add check_oom_lists_in_sync()
  that extracts the bare command set from each doc and reports any
  asymmetry, and add PERFORMANCE_TLDR.md to the workflow paths.
- Action pinning was inconsistent (actions/checkout@v6 floating-major,
  actions/setup-python@v6.2.0 exact-patch). Normalize to @v6 to match
  the project's prevailing style (typos.yml, wiki-lint.yml).

Verified locally: the script still reports no drift, and a simulated
removal of `color` from PERFORMANCE_TLDR.md is now detected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates qsv’s documentation to reflect the current v20.0.0 codebase and introduces an automated “docs drift” CI workflow to catch recurring documentation/code mismatches (features, line-count references, MAX_STAT_COLUMNS, and version strings).

Changes:

  • Refresh multiple docs to reflect current feature sets (incl. synthesize), stats column limits, OOM/🤯 command lists, logging env var usage, and version examples.
  • Add scripts/docs-drift-check.py to re-derive “sources of truth” from Cargo.toml and key source files and flag contradictory doc lines (with optional --json output).
  • Add a GitHub Actions workflow to run the drift check on relevant PR path changes, nightly, and via manual dispatch.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/docs-drift-check.py New Python drift-check script that validates docs against Cargo/features, line counts, MAX_STAT_COLUMNS, and crate version
.github/workflows/docs-drift.yml CI workflow to run the drift-check on PRs/nightly/manual
README.md Update command table metadata and qsvmcp feature list to include synthesize
dotenv.template Update QSV_USER_AGENT example version to 20.0.0
docs/FEATURES.md Document synthesize and update meta-feature enumerations
docs/PERFORMANCE.md Update stats “up to 47” claim and expand 🤯 OOM list (incl. color, transpose --long nuance)
docs/PERFORMANCE_TLDR.md Add color to the 🤯 list
docs/contributor/STATS_TECHNICAL_GUIDE.md Replace RUST_LOG examples with QSV_LOG_LEVEL
docs/contributor/STATS_QUICK_REFERENCE.md Update stats column-count claim, line-count references, and logging examples
docs/contributor/FREQUENCY_QUICK_REFERENCE.md Update frequency.rs line-count reference
docs/contributor/PROJECT_TECHNICAL_OVERVIEW.md Update qsvmcp feature list to include synthesize
docs/contributor/INDEX_TECHNICAL_GUIDE.md Refresh referenced source line ranges for index-related config APIs
Comments suppressed due to low confidence (1)

docs/contributor/STATS_QUICK_REFERENCE.md:206

  • These table entries repeat the stats/test file line-count claims, and they’re currently off by 1 (src/cmd/stats.rs is 5,639 lines; tests/test_stats.rs is 7,067). Consider updating both table counts to stay consistent with the source files.
| File | Purpose |
|------|---------|
| `src/cmd/stats.rs` | Main implementation (~5,638 lines) |
| `src/config.rs` | CSV reader configuration |
| `src/select.rs` | Column selection logic |
| `src/util.rs` | Utility functions |
| `tests/test_stats.rs` | Comprehensive test suite (~7,066 lines) |

Comment thread scripts/docs-drift-check.py
Comment thread scripts/docs-drift-check.py Outdated
Comment thread docs/contributor/STATS_QUICK_REFERENCE.md
Comment thread docs/contributor/STATS_QUICK_REFERENCE.md
Comment thread docs/contributor/FREQUENCY_QUICK_REFERENCE.md
Copilot review on PR #3868 flagged two misleading docstrings/comments in
docs-drift-check.py:

- `_parse_feature_blob` docstring claimed "comma/space/'and'-separated"
  but the function only splits on commas (after substituting "and").
  Whitespace-only separators were never supported and aren't used by any
  audited doc anchor. Rewrite the docstring to describe actual behavior.
- The comment on the slash filter in `get_features` mentioned dropping a
  "dep:" prefix that the code doesn't actually strip. The audited
  meta-features (distrib_features, all_features, qsvmcp) only enumerate
  named features and don't use "dep:" entries — those appear only in
  leaf feature definitions which this code never reads. Clarify the
  comment accordingly.

The three line-count comments (5,638→5,639, 4,287→4,288, 7,066→7,067)
were declined as false positives; `wc -l` on this branch and on
origin/master tip 64c0bf4 confirms the documented values are exact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jqnatividad jqnatividad merged commit 1ca3bb3 into master May 17, 2026
18 checks passed
@jqnatividad jqnatividad deleted the docs/audit-drift-fixes-and-ci branch May 17, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants