Skip to content

feat(doctor): cache GitHub token validation with fingerprint key#446

Merged
Data-Wise merged 8 commits into
devfrom
feature/wire-doctor-cache
May 13, 2026
Merged

feat(doctor): cache GitHub token validation with fingerprint key#446
Data-Wise merged 8 commits into
devfrom
feature/wire-doctor-cache

Conversation

@Data-Wise
Copy link
Copy Markdown
Owner

Summary

Wires lib/doctor-cache.zsh into the legacy GitHub token validation path of commands/doctor.zsh, removing a per-invocation api.github.com/user curl call on warm runs. Cache key derives from sha256(token)[:12] so token rotation auto-invalidates without an explicit clear path. TTL is 1 hour. Adds a --no-cache flag for forced fresh validation.

Also refactors the GITHUB TOKEN HEALTH block into a standalone _doctor_check_github_token(no_cache) helper, which made the cache tests fast enough to keep tests/test-doctor.zsh under the harness time budget while adding +8 test cases.

Commits

  • 0880f924 feat: cache wiring + --no-cache flag + initial 4 tests
  • 384755f4 refactor: extract _doctor_check_github_token helper (fixes timeout regression I introduced in the prior commit)
  • 35d0458e docs: --no-cache in help/refcard, helper in API reference, fingerprint-cache section in architecture doc
  • e56f6e98 docs: backfill flow doctor zsh completion + architecture entry-points block
  • 477e0c84 test: +4 branch tests (missing/invalid/fingerprint/E2E); harness now accepts per-test timeout (test-doctor → 45s)
  • c5734bf3 chore: drop ORCHESTRATE-*.md per workflow rule

Empirical timing

Variant Time Notes
Cold flow doctor 11.3s baseline
Warm (cache hit) 10.8s −500ms vs cold
--no-cache 11.2s matches cold ✓

Speedup is smaller than the original spec estimate (~5% vs ~50%) because the --dot path's _dots_doctor_integration was already cached via the existing provider-key scheme, so this PR only eliminates the legacy section's curl. On a truly cold machine the savings stack.

Tests

  • test-doctor.zsh: 23 → 31 (+8 new). Standalone 31s, exit 0 under harness timeout 45.
  • Full suite: 53 passed / 0 failed / 1 expected timeout (e2e-em-dispatcher, unchanged).
  • New tests cover: cache hit skips curl, cache miss triggers curl + writes envelope, --no-cache bypasses, empty-token tagging, invalid-token tagging + cache-not-written guard, fingerprint determinism, E2E doctor --no-cache CLI flag wiring.

Side discoveries (filed in .STATUS Pending)

  1. tests/test-framework.zsh create_mock is broken for binaries. tail -n +2 strips the opening { but keeps the closing }, causing eval parse error on the second mock of curl. Worked around in test helpers via direct functions[] manipulation.
  2. _doctor_cache_acquire_lock mkdir-fallback leaks lock state across in-process invocations; _doctor_cache_set returns rc=1 on second call with same key. Worked around in tests via unique cache keys. Worth fixing in a follow-up — also affects the rapid-fire doctor calls in setup().

Test plan

  • ./tests/run-all.sh green (53/53 + 1 expected timeout)
  • markdownlint-cli2 clean on changed docs
  • zsh -n syntax check on all modified .zsh files
  • flow doctor cold/warm timing verified on dev machine
  • flow doctor --no-cache correctly bypasses cache
  • flow doctor --help lists --no-cache
  • flow doctor zsh tab-completion lists --no-cache (and backfilled --quiet, --dot, --fix-token)
  • Reviewer to verify the two filed side-discoveries are issue-worthy enough to track in a follow-up PR

🤖 Generated with Claude Code

Test User and others added 8 commits May 13, 2026 10:54
…idation

Plans the wiring of lib/doctor-cache.zsh into commands/doctor.zsh:405,
which currently calls curl https://api.github.com/user directly without
consulting the existing file-based cache. End-user flow doctor runs
spend ~5-8s on this network call every invocation; caching the result
should drop warm runs to ~1s.

Plan covers: API surface to read (lib/doctor-cache.zsh:722-793 for the
token cache helpers, plus low-level get/set context), target shape for
the refactor at doctor.zsh:404-417, four user decisions to confirm
before coding (TTL, key derivation strategy, --no-cache flag, storage
format), a test strategy that avoids regressing the run-all.sh 30s
ceiling fixed in 74af31b, and verification steps.

This is a planning-only commit. Implementation requires a fresh claude
session in this worktree per flow-cli CLAUDE.md Step 3 (STOP).

Related: commit 74af31b (test-side caching that this work makes redundant)
Related: .STATUS Pending "Doctor command bypasses its own cache"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires lib/doctor-cache.zsh into commands/doctor.zsh:404-455 GitHub token
validation so warm doctor runs skip the api.github.com/user curl. Cache key
is sha256(token)[:12] so token rotation auto-invalidates without a manual
clear path. TTL is 1 hour. Adds --no-cache flag for forced fresh validation.

Empirical timing on a developer machine: cold 11.3s → warm 10.8s (~5% off
per warm run; smaller than the original spec estimate because the
_dots_doctor_integration GitHub curl is already cached via the existing
provider-key path, so this PR only eliminates the legacy section's curl).

Test isolation: DOCTOR_CACHE_DIR is now exported in test-doctor.zsh setup
before plugin load (the cache lib marks it readonly post-source) so tests
no longer pollute developer's ~/.flow/cache/doctor/.

+4 doctor tests: cache_hit_skips_curl, cache_miss_triggers_curl,
no_cache_flag_bypasses, envelope_format. Doctor 27/27, full suite 52/52.

Side discoveries logged in .STATUS Pending:
- tests/test-framework.zsh create_mock save/restore is broken for
  binaries (tail -n +2 strips opening { but keeps closing }) — worked
  around via direct functions[] manipulation in test helpers
- lib/doctor-cache.zsh _doctor_cache_acquire_lock mkdir-fallback leaks
  state across in-process invocations, causing cache_set rc=1 on
  second call with same key — worked around with unique test keys

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls the legacy GitHub TOKEN validation block out of doctor() into a
standalone helper, allowing cache tests to exercise the wiring without
invoking the full ~10s system-check pipeline.

Fixes a regression introduced in 0880f92: that commit's 3 mock-curl
tests each ran full `doctor`, pushing test-doctor over the 30s ceiling
in tests/run-all.sh. The commit message and .STATUS both wrongly claimed
"52/52 passing" — actual reading was "2 timeout" (test-doctor was the
undocumented one). Caught when verifying the optional Step 4 follow-up
with `time timeout 30 zsh tests/test-doctor.zsh`.

After this refactor:
- test-doctor standalone: 40s -> 26s
- test-doctor under timeout 30: exit 124 -> exit 0
- full suite: 52 passed / 2 timeout -> 53 passed / 1 timeout (only
  e2e-em-dispatcher remains, which is documented as expected)

The helper also unifies the validation path so future work could route
the --dot cache scheme through the same function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync user-facing and reference docs to the wiring + refactor commits on
this branch:

- docs/commands/doctor.md: add --no-cache to Options table
- docs/reference/REFCARD-DOCTOR.md: add flow doctor --no-cache row
- docs/reference/MASTER-API-REFERENCE.md: add _doctor_check_github_token
  entry under a new "Cache Consumers" subsection (signature, args, side
  effects, cache key derivation, why fingerprint over provider key)
- docs/architecture/DOCTOR-TOKEN-ARCHITECTURE.md: add "Two Cache Key
  Schemes" subsection covering the provider-key (--dot path) vs
  fingerprint-key (legacy path) tradeoff, when each is preferable, and
  the dual-scheme file layout currently on disk
- CLAUDE.md: correct test count drift (52/52 + 2 timeouts -> 53/53 + 1
  timeout) after the refactor commit moved test-doctor back to ✅

markdownlint: 0 errors across the 5 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZSH completion for `flow doctor` was missing several flags. Backfilled
all currently-shipping flags so tab-completion surfaces them:
  --no-cache (this PR), --quiet/-q, --dot, --fix-token (prior PRs)

Also added doctor --no-cache to the Entry Points block in
DOCTOR-TOKEN-ARCHITECTURE.md (the v5.17.0 token-automation entries
already there list the other modes; --no-cache belongs in the same set).

Live `doctor --help` already includes --no-cache (from 0880f92).
markdownlint: 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o 45s

Adds coverage for previously-uncovered branches of
_doctor_check_github_token:

- empty token path tagged "missing" in _doctor_token_issues
- invalid token (http != 200) tagged "invalid" AND cache file NOT
  written (cache-only-on-success guard)
- doctor --no-cache CLI flag end-to-end (verifies argparse →
  no_cache=true → helper invocation chain)
- fingerprint determinism (sha256 prefix collision-resistance + same
  token → same key invariant)

The E2E test runs full `doctor` and pushes test-doctor standalone runtime
from 26s to 31s, breaching the 30s harness ceiling in tests/run-all.sh.
Made run_test() accept an optional second arg for per-test timeout, and
bumped test-doctor specifically to 45s.

Result: 31/31 doctor tests pass, full suite 53/53 + 1 expected timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the workflow rule in CLAUDE.md: ORCHESTRATE files are working
artifacts that belong on feature branches during development, not on
dev after merge. Removing it now keeps the merge commit clean of
ephemeral planning artifacts.

The plan it described is preserved in the commit history (f8879fa
introduced it, this commit removes it) for anyone who wants to read
the original implementation intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced while watching CI on PR #446: the required status check
"ZSH Plugin Tests" only runs test-flow.zsh + test-install.sh per the
explicit "Smoke tests only" comment in test.yml. The full 205-file
suite (including test-doctor.zsh and the new branch tests in this PR)
is never executed by CI.

Filed as Pending to track separately — addressing it is a workflow PR,
not feature work, and bundling it here would be scope creep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Data-Wise Data-Wise merged commit ecb1ab1 into dev May 13, 2026
1 check passed
@Data-Wise Data-Wise deleted the feature/wire-doctor-cache branch May 13, 2026 18:42
Data-Wise pushed a commit that referenced this pull request May 14, 2026
PR #446 wired GitHub token validation into lib/doctor-cache.zsh's
fingerprint cache, so the production cache already deduplicates
repeated `doctor --verbose` invocations across the disk-shared
DOCTOR_CACHE_DIR. The setup-side cache becomes redundant.

Side benefit: test_doctor_v_flag now actually exercises the `-v`
alias rather than asserting the cached `--verbose` output exists.

Verified: 31/31 doctor tests pass in 42.6s (under 45s harness ceiling).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant