Code review closure: 2 review rounds + Phase 08 prep (v0.1.0 ship-readiness)#1
Open
ayhammouda wants to merge 20 commits intomainfrom
Open
Code review closure: 2 review rounds + Phase 08 prep (v0.1.0 ship-readiness)#1ayhammouda wants to merge 20 commits intomainfrom
ayhammouda wants to merge 20 commits intomainfrom
Conversation
- Remove beautifulsoup4 and markdownify from base [project].dependencies - Add [project.optional-dependencies].build with the same pins - Guard bs4/markdownify imports in ingestion/sphinx_json.py with None sentinels - Add _ensure_build_deps() helper raising clear ImportError pointing to 'pip install mcp-server-python-docs[build]' - Call _ensure_build_deps() at top of html_to_markdown, extract_sections, extract_code_blocks, ingest_fjson_file, ingest_sphinx_json_dir - Update tests/test_packaging.py: remove bs4/markdownify from required base deps, add test_build_extras_present + test_build_deps_not_in_base - Regenerate uv.lock
- Remove detected_python_source field from AppContext dataclass - Drop the kwarg from app_lifespan's AppContext construction - Rename the now-unused detected_src local to _detected_src - detect_python_version tool already re-detects via _detect() inside the body, so the field was dead code
- ranker.py: wrap the 4 cursor-assigning spots (search_sections, search_symbols, search_examples, lookup_symbols_exact) in contextlib.closing() so cursors are released promptly on every path - services/content.py: same treatment for doc_row, id_row, section_rows queries in ContentService.get_docs() - ingestion/publish.py: wrap the FTS5 MATCH queries in run_smoke_tests for consistency inside the try/except blocks - Pure hygiene refactor — SQL and logic unchanged
- Change _VERSION_RE from (\d+\.\d+) to (?<!\d)(\d+\.\d+)(?!\d) so the match cannot steal digits from surrounding runs (e.g. parsing '3.2' from '13.2' or leaking across a multi-part version) - Accept: '3.13', 'Python 3.13.2' -> '3.13', 'cpython-3.13' -> '3.13', '11.2' (multi-digit major), '11.2.3' -> '11.2' - Reject: 'nope' -> None, '' -> None - Add tests/test_detection.py with 17 regression cases covering the anchored boundaries and the .python-version file detection path - NOTE: '1.23' is still matched as '1.23' by the regex — the plan's description of rejecting '1.23' as None is inconsistent with the proposed regex; documented in the test file
- Replace pv_file.read_text() with an open('r').read(1024) bounded read
- Guard against empty files: if stripped content is empty, fall through
to the next detection path instead of raising IndexError on
splitlines()[0]
- Hostile or accidentally-huge .python-version files (log rotation
misconfiguration, symlink to /dev/urandom, etc.) no longer waste memory
- The existing try/except Exception remains as a belt-and-suspenders catch
- Regression tests in tests/test_detection.py cover: plain file, 2MB
garbage, empty file, whitespace-only file
- Add a length < 2 short-circuit to classify_query so single-character identifier-shaped tokens (overwhelmingly stop-word garbage) never hit the symbol table lookup - The existing empty-string early-return still prevents DB calls for whitespace-only queries - Dotted queries continue to take the fast-path without touching the DB - Add Mock-based regression tests in tests/test_retrieval.py that assert symbol_exists_fn is NOT called for: '', ' ', 'a', 'asyncio.TaskGroup', 'foo bar' — and IS called exactly once for 'os'
- Anchor the highlight-div class regex with ^...$ so a hostile class like 'xhighlight-pythonfoo' can't leak through the unanchored pattern - Collapse pycon|pycon3 -> pycon3? and python|python3 -> python3? for both brevity and stricter matching - Switch the is_doctest detection from a substring match on the joined class string to an explicit class-name list (defense-in-depth; the anchored regex already guarantees the class list is known-safe) - All existing ingestion tests continue to pass
… builds (I-1) - Add regression test test_get_docs_returns_empty_content_for_symbols_only_doc that seeds a documents row with zero matching sections and asserts ContentService.get_docs returns GetDocsResult(content='', char_count=0, truncated=False, next_start_index=None) instead of raising PageNotFoundError - Test-only commit: the current page-level code path already handles the empty-sections case correctly (section_rows empty -> full_text = '' -> apply_budget returns the empty-content triple) — I-1 locks that behavior down so future refactors can't regress it
- Add COLLATE NOCASE to the equality and LIKE predicates in lookup_symbols_exact, and to the ORDER BY CASE so exact case-insensitive matches sort before prefix matches - Python-side score now compares row['qualified_name'].lower() to query.lower() so ordering stays consistent with the SQL collation - Wrap the SELECT in a sqlite3.OperationalError guard that mirrors the other three search functions (logs and returns [] instead of propagating as an unclassified Internal error) - Performance note in the docstring: COLLATE NOCASE may bypass the primary-key index; acceptable for v0.1.0 with ~20K symbols, a case- insensitive index can be added in v1.1 if profiling calls for it - Three regression tests: lowercase, uppercase, and exact-case scoring
- Add a _require_ctx(ctx) helper near the top of server.py that raises
ToolError('MCP context unavailable') when ctx is None, otherwise
returns ctx unchanged
- Change each of the four @mcp.tool shim signatures from
ctx: Context = None, # type: ignore[assignment]
to
ctx: Context | None = None,
(no type: ignore needed because the type now admits None)
- Add ctx = _require_ctx(ctx) as the first line of each tool body:
search_docs, get_docs, list_versions, detect_python_version
- Create tests/test_server.py covering the helper directly and the
structural invariant that every tool shim calls the guard
- pyright stays clean; the existing narrow-via-function pattern lets
ctx.request_context.lifespan_context access type-check after the guard
…I-2, M-7) M-7 (consolidate): publish_index() previously opened three separate read-write connections across the lifetime of a publish (initial record_ingestion_run, the failure UPDATE, and the success UPDATE). Rewrite to use a single connection across all three status updates so we don't repeatedly tear down and rebuild the WAL superstructure. I-2 (WAL cleanup): add finalize_for_swap(conn) in storage/db.py that does 'PRAGMA wal_checkpoint(TRUNCATE)' followed by 'PRAGMA journal_mode = DELETE'. Call it from publish_index() after the final commit and BEFORE atomic_swap. Without this, the -wal and -shm sidecars stay alive during the rename and leak into the cache directory, defeating atomic-swap semantics. Tests: - TestWalCleanupOnSwap.test_publish_index_leaves_no_wal_sidecars: full-content build, monkeypatches get_cache_dir / get_index_path to a tmp dir, asserts index.db is present and no -wal/-shm files remain - TestWalCleanupOnSwap.test_publish_index_second_build_replaces_cleanly: second publish exercises the .previous path and also asserts no sidecars remain Both findings fused into one commit because they operate on the same publish_index() control flow and the WAL finalize only makes sense inside the consolidated connection.
After the verification gate in Task 12, two issues surfaced that are direct consequences of earlier commits in this same plan: - I-4 sentinel pattern introduced 7 new pyright errors in sphinx_json.py because the runtime BeautifulSoup/Tag/md = None pattern broadened the inferred types. Fix: move the real imports behind TYPE_CHECKING so static checkers always see the bs4/markdownify types, and keep the runtime probe in a try/except that only flips the _BUILD_DEPS_AVAILABLE flag. Verified at runtime: importing the module with both deps missing still works and _ensure_build_deps() raises the expected actionable ImportError. - M-2 introduced an unused pathlib.Path import in tests/test_detection.py after I removed a test helper in an earlier iteration. Ruff I001. Both issues are strictly mop-up from the prior 11 atomic commits; this lands as a separate commit (not amend) per the plan's verification protocol. Final verification gate after this commit: - uv run pytest -q -> 243 passed, 3 skipped - uv run ruff check . -> All checks passed! - uv run pyright -> 9 pre-existing errors, 0 new
…non-symbol kinds (M-5)
… + finalize + I-3 note)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundle of 19 commits closing out two code-review rounds (Round 2 → quick task 260416-u2r, Round 3 → 260416-v0s) plus Phase 08 pre-release prep (version detection tool, tool-validation/lifecycle hardening, CI-breaking lint/type fixes). All quality gates green.
pytestruff check .pyrightPhase 08 prep (5 commits)
9206c09validate-corpusrespects itfaf6a78detect_python_versiontool + auto-defaultget_docsto user's Python31c0525dbf081c7f9e84csphinx_json.pyReview Round 2 closure (12 commits, quick task
260416-u2r)4 Important + 8 Minor findings from
CODE-REVIEW.md:ba41707bs4+markdownifyto[build]extrae95f106AppContext.detected_python_source5c3df9bcontextlib.closingon sqlite cursors392eb23_VERSION_REto prevent substring over-matchda4b23c.python-versionread to 1 KB2850e53classify_query(defense-in-depth)d1be3f9highlight-*regex anchoring6a72fe0bc3c674in Round 3)a598756b09befe_require_ctxguard in all 4 tool shims32ef625finalize_for_swap()WAL collapse before atomic swap23063d8Review Round 3 closure (4 fix commits + merge + docs, quick task
260416-v0s)Second-pass review caught I-1 was test-only (locked in the old behavior instead of fixing it) and M-5 was gated in the wrong layer. Cleaned up plus a cosmetic finalize-on-failure path:
bc3c674get_docsfalls back todocuments.content_textwhensection_rowsis empty21ebc46classify_queryatservices/search.pycall site for non-symbol kindsf1b368bfinalize_for_swapon smoke-test failure path too (no orphan sidecars after failed builds)8357f38COLLATE NOCASEvsnormalized_name)Verification
uv run pytest -q— 246 passed, 3 skippeduv run ruff check .— cleanuv run pyright— 9 pre-existing errors, 0 newsuperpowers:code-reviewersubagent (Round 1 → fixed in pre-session work; Round 2 → fixed in260416-u2r; Round 3 → fixed in260416-v0s)os.renameinrollback()) remains explicitly deferred perREVIEW-FIX.mdKey Decisions
finalize_for_swap(conn)helper opt-in rather than changingget_readwrite_connectiondefault, so onlypublish_indexpays the WAL→DELETE switch before atomic swapbeautifulsoup4/markdownifyimports gated with a split-import so pyright sees real types even when[build]extra isn't installedCOLLATE NOCASE— chosen overnormalized_namecolumn query; correctness-equivalent, index behavior identical (neither is indexed for scan pattern);CREATE INDEX idx_symbols_normalizeddeferred to v1.1 if neededservices/search.pycall site (correctness) and insideclassify_queryas a length-2 short-circuit (redundant safety)Out of scope (preserved)
os.rename()overwrite behavior inrollback()— deferred perREVIEW-FIX.md; covered by a v1.1 note🤖 Generated with Claude Code