feat: Index/Map Architecture — SQLite as index, .md files as content store#41
Merged
feat: Index/Map Architecture — SQLite as index, .md files as content store#41
Conversation
…s content store Phase 1 of the Index/Map Architecture: - SQLite stores empty post_content for markdown-sourced posts - New _markdown_file_index table maps post_id → file_path - Boot parses frontmatter only (skip body), cutting boot I/O ~90% - Driver lazy-loads content from .md files on SELECT queries - Write engine updates file index after writing .md files - All existing functionality preserved — lazy-load is transparent
Switch from :memory: to persistent on-disk SQLite index file. On cold boot (no file), full load from disk as before. On warm boot (file exists), incremental sync only: - _json_file_manifest tracks JSON file mtimes — only reload changed tables - _markdown_file_index tracks .md file mtimes — only re-parse changed posts - Detect new files (INSERT), changed files (UPDATE), deleted files (DELETE) - Falls back to full reload if incremental sync fails SQLite index file: wp-content/markdown-index.sqlite (~700KB for 43 posts) WAL journal mode for concurrent read/write safety.
…systems WAL mode requires shared memory files (-shm) that don't work across container/host filesystem boundaries (e.g. Studio). The PRAGMA journal_mode query was corrupting the SQLite file on warm boot. Changed to null (SQLite default DELETE mode) which is safe everywhere. WAL can still be enabled via SQLITE_JOURNAL_MODE constant if needed.
This was referenced Apr 16, 2026
When the persistent SQLite index file is corrupted (e.g. unclean shutdown, filesystem issues), the site now self-heals: 1. Detects corruption via 'file is not a database' exception 2. Deletes the corrupted .sqlite file + any journal files 3. Falls back to cold boot (full rebuild from .md/JSON files) 4. Logs the recovery for admin visibility Extracted boot_connection() from db_connect() so the connection setup can be retried cleanly after deleting the corrupted file.
chubes4
added a commit
that referenced
this pull request
Apr 21, 2026
After the Index/Map Architecture (PR #41), post_content is stored as an empty string in SQLite and lazy-loaded from .md files on demand. WHERE post_content LIKE '%foo%' silently matches nothing, breaking WP default search (?s=foo) and any plugin that queries post_content directly. Fix it without reintroducing the coupling PR #41 removed: grep the .md files on disk instead of rebuilding a full-text index inside SQLite. Changes: - New WP_Markdown_Search class encapsulates the grep logic. Iterates _markdown_file_index (post_id -> file_path) and case-insensitively matches the needle against each source file. Per-request cache keyed by lowercased needle. - Driver intercepts SELECT queries with post_content LIKE clauses and rewrites each one into (table.)?ID IN (1,2,3) based on grep results, or 0=1 when nothing matches. Only the %needle% contains-match shape is rewritten; prefix, suffix, or embedded-wildcard patterns are left untouched for SQLite to handle. - Extension point: markdown_db_search_matching_ids filter lets an FTS5, Meilisearch, or Elasticsearch backend short-circuit the default grep without patching core. - 20 pure-PHP smoke tests covering grep correctness, multi-word AND queries, escaped LIKE wildcards, table prefix preservation, and unsupported pattern passthrough. Run via: php tests/smoke-search.php Closes #43
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 of the Index/Map Architecture: SQLite becomes an index, markdown files become the only content store. No more duplicating
post_contentinto SQLite.post_contentis stored as empty string in SQLite for markdown-sourced posts_markdown_file_indextable mapspost_id → file_path, file_mtime, file_sizepost_content.mdfilesChanges
class-wp-markdown-storage.phpparse_file()gains$metadata_onlyflag; newread_frontmatter_only()(line-by-line, stops at closing---); new publicread_content_from_file()for lazy-loading; posts carry_source_filepathclass-wp-markdown-loader.phpload_posts()callsget_all_posts(true), inserts emptypost_content, creates & populates_markdown_file_indexclass-wp-markdown-driver.phpquery()intercepts SELECT results onwp_postsand resolves content viaresolve_content(); file index cache loaded once into memory;update_file_index()/remove_from_file_index()for write-pathclass-wp-markdown-write-engine.php.mdfiles; removes index entry on DELETEDesign doc
See wiki article: Markdown DB: Index Architecture Design (ID 128)
Testing
All existing functionality preserved — lazy-load is transparent:
wp post list --post_type=wiki --format=count→ 42 ✅wp post get 58 --field=post_content→ full article content ✅wp intelligence wiki read karpathy-llm-wiki-pattern→ full content ✅wp intelligence wiki tree→ full tree ✅Next phases
read_frontmatter_only())