-
Notifications
You must be signed in to change notification settings - Fork 0
Testing
English | 繁體中文
LiveDocs has two independent test suites — 151 tests, all green. The Swift suite verifies
the engine; the Python suite verifies that the look-up skill actually triggers a
LiveDocs query and answers currently, plus the vs-context7 comparison harness. Counts below
are a snapshot; the source of truth is running the suites.
swift test # 110 Swift tests
python3 -m pytest evals/look-up/tests/ # 41 Python eval testsSplit along the pure-core / MCP-shell architecture. LiveDocsCore is dependency-free and
tested without a network (HTTP is injected as a fake); CheLiveDocsMCP covers the process
and file-system layer.
| File | Tests | Covers |
|---|---|---|
RuntimeIntrospectionTests |
25 | Language-runtime detection: pin parsers (.tool-versions, mise, idiomatic .<lang>-version), the precedence engine (active toolchain authoritative), mise inline-comment / array rejection, multi-line / alias rejection. |
RegistryAdaptersTests |
11 | Parsers for all 9 registries (npm / PyPI / crates / go / rubygems / JSR / packagist / maven / CRAN). |
URLSafetyTests |
9 | SSRF guard: scheme allowlist + loopback / link-local (169.254) / RFC-1918 / ULA / metadata host classification. |
TextSanitizeTests |
7 | Control / ANSI / OSC / bidi / zero-width stripping of fetched content + UTF-8 byte truncation. |
LLMSTxtTests |
6 |
llms.txt candidate ordering, soft-404 content-type guard, index/full flavor split. |
IntrospectionTests |
6 | OpenAPI / GraphQL schema parsing (method allowlist, shape-only, deterministic ordering). |
RIntrospectionTests |
6 | Installed-R-package version parsing + safe-name validation. |
EngineTests |
5 | Discovery chain over injected HTTP fakes (registry → llms.txt → repo). |
ETagCacheTests |
5 | ETag revalidation semantics (304 → cached, 200 → refresh, never blind-stale, POST never cached). |
RegistryTests |
5 | Per-ecosystem registry resolution + version pinning. |
ClassificationTests |
4 | Soft-404 hit/miss classification. |
ETagCacheLRUTests |
4 | Cache bounding (LRU eviction, byte budget, oversized-entry refusal). |
ValidationTests |
4 | Boundary validation (package/version strings can't inject URL structure). |
RankingTests |
2 | Fidelity-then-freshness ranking with a stable tiebreak. |
| File | Tests | Covers |
|---|---|---|
RuntimeIntrospectTests |
7 | Symlink version-file refusal (secret-exfil guard), uncovered-language fallback to the universal pin layer, canonical mise.toml, PATH-first executable resolution. |
ProcessRunnerTests |
4 | Large output doesn't deadlock (concurrent pipe drain), exit code surfaces, timeout reported, SIGTERM→SIGKILL escalation. |
The look-up skill eval harness — not a test of the Swift engine, but of whether
the skill fires a LiveDocs query for varied prompts and answers currently — plus the
vs-context7 comparison harness. See
evals/look-up/README.md.
| File | Tests | Covers |
|---|---|---|
test_run_eval.py |
12 | Rate-threshold judging, the N=3 threshold-collapse guard, failed-run / inconclusive handling. |
test_compare.py |
12 | vs-context7 freshness harness: version-token matching (boundary forms), symmetric scoring, table render, corpus↔sample sync. |
test_oracle.py |
8 |
self_check (fetch registry at eval time — rot-proof) / structural / golden oracles. |
test_detect.py |
7 |
claude -p stream-json parsing, is_error detection, LiveDocs trigger-signal recognition. |
test_corpus.py |
2 | Corpus coverage guards (a golden case exists + a library-named adversarial negative exists). |
-
CI runs
swift build+swift teston every push and pull request. -
scripts/release.shgates a release on a greenswift test, a version-source match against the tag, and Developer ID signing + notarization. - The Python eval is periodic / manual, not per-PR CI — it makes real
claude -pcalls (cost + stochastic), so it runs as a maintainer baseline rather than on the critical path.
Security- and robustness-critical surfaces were built test-first (TDD): URLSafety,
TextSanitize, ProcessRunner, and the eval harness each had a failing test before the
implementation. The suite grew 72 → 110 Swift tests during the v0.7.0 hardening (adding the
previously-untested MCP shell layer), and 0 → 41 for the Python evals (skill eval + vs-context7).