Code review closure: 2 review rounds + Phase 08 prep (v0.1.0 ship-readiness) by ayhammouda · Pull Request #1 · ayhammouda/python-docs-mcp-server

ayhammouda · 2026-04-16T20:48:39Z

Summary

Bundle of 19 commits closing out two code-review rounds (Round 2 → quick task 260416-u2r, Round 3 → 260416-v0s) plus Phase 08 pre-release prep (version detection tool, tool-validation/lifecycle hardening, CI-breaking lint/type fixes). All quality gates green.

Gate	Before this bundle	After
`pytest`	209 passed	246 passed, 3 skipped (+37 regression tests)
`ruff check .`	broken (E501/I001/F541)	clean
`pyright`	broken	9 pre-existing, 0 new

Phase 08 prep (5 commits)

Commit	Scope
`9206c09`	persist symbol-only build mode so `validate-corpus` respects it
`faf6a78`	add `detect_python_version` tool + auto-default `get_docs` to user's Python
`31c0525`	harden tool validation and lifecycle cleanup
`dbf081c`	resolve ruff E501/I001/F541 errors breaking CI
`7f9e84c`	resolve 3 pyright errors in `sphinx_json.py`

Review Round 2 closure (12 commits, quick task `260416-u2r`)

4 Important + 8 Minor findings from CODE-REVIEW.md:

Finding	Commit	What
I-4	`ba41707`	move `bs4` + `markdownify` to `[build]` extra
M-8	`e95f106`	drop dead `AppContext.detected_python_source`
M-1	`5c3df9b`	`contextlib.closing` on sqlite cursors
M-2	`392eb23`	anchor `_VERSION_RE` to prevent substring over-match
M-3	`da4b23c`	bound `.python-version` read to 1 KB
M-5	`2850e53`	length-2 short-circuit inside `classify_query` (defense-in-depth)
M-6	`d1be3f9`	tighten `highlight-*` regex anchoring
I-1 (initial)	`6a72fe0`	(superseded by `bc3c674` in Round 3)
I-3	`a598756`	case-insensitive symbol fast-path
M-4	`b09befe`	`_require_ctx` guard in all 4 tool shims
I-2 + M-7	`32ef625`	single RW conn + `finalize_for_swap()` WAL collapse before atomic swap
—	`23063d8`	pyright/ruff fixup for I-4 sentinels + M-2 tests

Review Round 3 closure (4 fix commits + merge + docs, quick task `260416-v0s`)

Second-pass review caught I-1 was test-only (locked in the old behavior instead of fixing it) and M-5 was gated in the wrong layer. Cleaned up plus a cosmetic finalize-on-failure path:

Finding	Commit	What
I-1 (actual fix)	`bc3c674`	`get_docs` falls back to `documents.content_text` when `section_rows` is empty
M-5 (call-site gate)	`21ebc46`	gate `classify_query` at `services/search.py` call site for non-symbol kinds
round3-minor	`f1b368b`	`finalize_for_swap` on smoke-test failure path too (no orphan sidecars after failed builds)
docs	`8357f38`	ratify I-3 deviation ( `COLLATE NOCASE` vs `normalized_name`)

Verification

uv run pytest -q — 246 passed, 3 skipped
uv run ruff check . — clean
uv run pyright — 9 pre-existing errors, 0 new
3 independent code-review passes via superpowers:code-reviewer subagent (Round 1 → fixed in pre-session work; Round 2 → fixed in 260416-u2r; Round 3 → fixed in 260416-v0s)
IN-01 (Windows os.rename in rollback()) remains explicitly deferred per REVIEW-FIX.md

Key Decisions

I-2 + M-7 fused — finalize_for_swap(conn) helper opt-in rather than changing get_readwrite_connection default, so only publish_index pays the WAL→DELETE switch before atomic swap
I-4 TYPE_CHECKING pattern — beautifulsoup4/ markdownify imports gated with a split-import so pyright sees real types even when [build] extra isn't installed
I-3 COLLATE NOCASE — chosen over normalized_name column query; correctness-equivalent, index behavior identical (neither is indexed for scan pattern); CREATE INDEX idx_symbols_normalized deferred to v1.1 if needed
M-5 defense-in-depth — gate exists at both the services/search.py call site (correctness) and inside classify_query as a length-2 short-circuit (redundant safety)

Out of scope (preserved)

IN-01 Windows os.rename() overwrite behavior in rollback() — deferred per REVIEW-FIX.md; covered by a v1.1 note
Phase 08 remaining plans — this PR covers Phase 08 prep, not the full phase (RELEASE.md polish, release workflow hardening already merged separately)

🤖 Generated with Claude Code

- Remove beautifulsoup4 and markdownify from base [project].dependencies - Add [project.optional-dependencies].build with the same pins - Guard bs4/markdownify imports in ingestion/sphinx_json.py with None sentinels - Add _ensure_build_deps() helper raising clear ImportError pointing to 'pip install mcp-server-python-docs[build]' - Call _ensure_build_deps() at top of html_to_markdown, extract_sections, extract_code_blocks, ingest_fjson_file, ingest_sphinx_json_dir - Update tests/test_packaging.py: remove bs4/markdownify from required base deps, add test_build_extras_present + test_build_deps_not_in_base - Regenerate uv.lock

- Remove detected_python_source field from AppContext dataclass - Drop the kwarg from app_lifespan's AppContext construction - Rename the now-unused detected_src local to _detected_src - detect_python_version tool already re-detects via _detect() inside the body, so the field was dead code

- ranker.py: wrap the 4 cursor-assigning spots (search_sections, search_symbols, search_examples, lookup_symbols_exact) in contextlib.closing() so cursors are released promptly on every path - services/content.py: same treatment for doc_row, id_row, section_rows queries in ContentService.get_docs() - ingestion/publish.py: wrap the FTS5 MATCH queries in run_smoke_tests for consistency inside the try/except blocks - Pure hygiene refactor — SQL and logic unchanged

- Change _VERSION_RE from (\d+\.\d+) to (?<!\d)(\d+\.\d+)(?!\d) so the match cannot steal digits from surrounding runs (e.g. parsing '3.2' from '13.2' or leaking across a multi-part version) - Accept: '3.13', 'Python 3.13.2' -> '3.13', 'cpython-3.13' -> '3.13', '11.2' (multi-digit major), '11.2.3' -> '11.2' - Reject: 'nope' -> None, '' -> None - Add tests/test_detection.py with 17 regression cases covering the anchored boundaries and the .python-version file detection path - NOTE: '1.23' is still matched as '1.23' by the regex — the plan's description of rejecting '1.23' as None is inconsistent with the proposed regex; documented in the test file

- Replace pv_file.read_text() with an open('r').read(1024) bounded read - Guard against empty files: if stripped content is empty, fall through to the next detection path instead of raising IndexError on splitlines()[0] - Hostile or accidentally-huge .python-version files (log rotation misconfiguration, symlink to /dev/urandom, etc.) no longer waste memory - The existing try/except Exception remains as a belt-and-suspenders catch - Regression tests in tests/test_detection.py cover: plain file, 2MB garbage, empty file, whitespace-only file

- Add a length < 2 short-circuit to classify_query so single-character identifier-shaped tokens (overwhelmingly stop-word garbage) never hit the symbol table lookup - The existing empty-string early-return still prevents DB calls for whitespace-only queries - Dotted queries continue to take the fast-path without touching the DB - Add Mock-based regression tests in tests/test_retrieval.py that assert symbol_exists_fn is NOT called for: '', ' ', 'a', 'asyncio.TaskGroup', 'foo bar' — and IS called exactly once for 'os'

- Anchor the highlight-div class regex with ^...$ so a hostile class like 'xhighlight-pythonfoo' can't leak through the unanchored pattern - Collapse pycon|pycon3 -> pycon3? and python|python3 -> python3? for both brevity and stricter matching - Switch the is_doctest detection from a substring match on the joined class string to an explicit class-name list (defense-in-depth; the anchored regex already guarantees the class list is known-safe) - All existing ingestion tests continue to pass

… builds (I-1) - Add regression test test_get_docs_returns_empty_content_for_symbols_only_doc that seeds a documents row with zero matching sections and asserts ContentService.get_docs returns GetDocsResult(content='', char_count=0, truncated=False, next_start_index=None) instead of raising PageNotFoundError - Test-only commit: the current page-level code path already handles the empty-sections case correctly (section_rows empty -> full_text = '' -> apply_budget returns the empty-content triple) — I-1 locks that behavior down so future refactors can't regress it

- Add COLLATE NOCASE to the equality and LIKE predicates in lookup_symbols_exact, and to the ORDER BY CASE so exact case-insensitive matches sort before prefix matches - Python-side score now compares row['qualified_name'].lower() to query.lower() so ordering stays consistent with the SQL collation - Wrap the SELECT in a sqlite3.OperationalError guard that mirrors the other three search functions (logs and returns [] instead of propagating as an unclassified Internal error) - Performance note in the docstring: COLLATE NOCASE may bypass the primary-key index; acceptable for v0.1.0 with ~20K symbols, a case- insensitive index can be added in v1.1 if profiling calls for it - Three regression tests: lowercase, uppercase, and exact-case scoring

- Add a _require_ctx(ctx) helper near the top of server.py that raises ToolError('MCP context unavailable') when ctx is None, otherwise returns ctx unchanged - Change each of the four @mcp.tool shim signatures from ctx: Context = None, # type: ignore[assignment] to ctx: Context | None = None, (no type: ignore needed because the type now admits None) - Add ctx = _require_ctx(ctx) as the first line of each tool body: search_docs, get_docs, list_versions, detect_python_version - Create tests/test_server.py covering the helper directly and the structural invariant that every tool shim calls the guard - pyright stays clean; the existing narrow-via-function pattern lets ctx.request_context.lifespan_context access type-check after the guard

…I-2, M-7) M-7 (consolidate): publish_index() previously opened three separate read-write connections across the lifetime of a publish (initial record_ingestion_run, the failure UPDATE, and the success UPDATE). Rewrite to use a single connection across all three status updates so we don't repeatedly tear down and rebuild the WAL superstructure. I-2 (WAL cleanup): add finalize_for_swap(conn) in storage/db.py that does 'PRAGMA wal_checkpoint(TRUNCATE)' followed by 'PRAGMA journal_mode = DELETE'. Call it from publish_index() after the final commit and BEFORE atomic_swap. Without this, the -wal and -shm sidecars stay alive during the rename and leak into the cache directory, defeating atomic-swap semantics. Tests: - TestWalCleanupOnSwap.test_publish_index_leaves_no_wal_sidecars: full-content build, monkeypatches get_cache_dir / get_index_path to a tmp dir, asserts index.db is present and no -wal/-shm files remain - TestWalCleanupOnSwap.test_publish_index_second_build_replaces_cleanly: second publish exercises the .previous path and also asserts no sidecars remain Both findings fused into one commit because they operate on the same publish_index() control flow and the WAL finalize only makes sense inside the consolidated connection.

After the verification gate in Task 12, two issues surfaced that are direct consequences of earlier commits in this same plan: - I-4 sentinel pattern introduced 7 new pyright errors in sphinx_json.py because the runtime BeautifulSoup/Tag/md = None pattern broadened the inferred types. Fix: move the real imports behind TYPE_CHECKING so static checkers always see the bs4/markdownify types, and keep the runtime probe in a try/except that only flips the _BUILD_DEPS_AVAILABLE flag. Verified at runtime: importing the module with both deps missing still works and _ensure_build_deps() raises the expected actionable ImportError. - M-2 introduced an unused pathlib.Path import in tests/test_detection.py after I removed a test helper in an earlier iteration. Ruff I001. Both issues are strictly mop-up from the prior 11 atomic commits; this lands as a separate commit (not amend) per the plan's verification protocol. Final verification gate after this commit: - uv run pytest -q -> 243 passed, 3 skipped - uv run ruff check . -> All checks passed! - uv run pyright -> 9 pre-existing errors, 0 new

…ctions (I-1)

…non-symbol kinds (M-5)

…lized_name)

… + finalize + I-3 note)

ayhammouda added 20 commits April 16, 2026 21:49

docs(quick-260416-u2r): Fix review findings (4 Important + 8 Minor)

75bdd80

fix(review): get_docs falls back to documents.content_text when no se…

bc3c674

…ctions (I-1)

fix(review): gate classify_query at services/search.py call site for …

21ebc46

…non-symbol kinds (M-5)

fix(review): finalize WAL on smoke-test failure path too (round3-minor)

f1b368b

docs(quick-260416-u2r): ratify I-3 deviation (COLLATE NOCASE vs norma…

8357f38

…lized_name)

chore: merge quick task worktree (worktree-agent-a8ad88bb)

c2ef99c

docs(quick-260416-v0s): Close Round 3 review gaps (I-1 fallback + M-5…

bfc1803

… + finalize + I-3 note)

docs(ship): mark PR #1 shipped

30fdd96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code review closure: 2 review rounds + Phase 08 prep (v0.1.0 ship-readiness)#1

Code review closure: 2 review rounds + Phase 08 prep (v0.1.0 ship-readiness)#1
ayhammouda wants to merge 20 commits intomainfrom
ship/code-review-closure

ayhammouda commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ayhammouda commented Apr 16, 2026

Summary

Phase 08 prep (5 commits)

Review Round 2 closure (12 commits, quick task 260416-u2r)

Review Round 3 closure (4 fix commits + merge + docs, quick task 260416-v0s)

Verification

Key Decisions

Out of scope (preserved)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Review Round 2 closure (12 commits, quick task `260416-u2r`)

Review Round 3 closure (4 fix commits + merge + docs, quick task `260416-v0s`)