Skip to content

ChWehner/EnerMind

Repository files navigation

LLM Wiki

A CLI tool that ingests PDF documents into a structured, LLM-maintained Markdown wiki. An LLM reads each document, extracts key concepts, and produces interlinked wiki pages with source citations — building a knowledge base incrementally.

How It Works

PDF ─→ Marker (OCR + layout) ─→ structured blocks ─→ LLM prompt ─→ wiki pages
  1. ParseMarker extracts text with OCR, layout detection, and page numbers
  2. Prompt — The parsed content is combined with the wiki schema and current index into a single LLM prompt
  3. Generate — The LLM (Ollama) produces wiki pages with YAML frontmatter, cross-links ([[page-name]]), and source citations (source.pdf, p.3)
  4. Store — Pages are written to a flat Markdown wiki with an auto-maintained index.md and append-only log.md

Prerequisites

  • Python 3.12+
  • uv
  • Ollama running locally (recommended: native install on macOS for Apple Silicon GPU support)

Setup

# Install dependencies
uv sync

# Start Ollama and pull the model
ollama serve &
ollama pull qwen2.5:7b

# Or use Docker (note: requires sufficient memory allocation)
docker compose up -d

Configuration lives in config.yml:

wiki:
  dir: wiki

ollama:
  model: qwen2.5:7b
  url: http://localhost:11434
  timeout: 300
  num_ctx: 16384

Environment variables override config values (precedence: CLI flag > env var > config.yml > defaults):

Variable Default Description
LLM_WIKI_OLLAMA_MODEL qwen2.5:7b Ollama model name
LLM_WIKI_OLLAMA_URL http://localhost:11434 Ollama API base URL
LLM_WIKI_DIR wiki Wiki output directory

Usage

# Ingest a single PDF
uv run llm-wiki ingest path/to/document.pdf

# Ingest all PDFs in a directory
uv run llm-wiki ingest-all path/to/pdfs/

# Show wiki status
uv run llm-wiki status

All CLI options can be overridden via flags:

uv run llm-wiki ingest doc.pdf --model qwen2.5:3b --wiki-dir output/

Wiki Structure

wiki/
├── _schema.md          # LLM instructions (page format, rules, output delimiters)
├── index.md            # Auto-maintained page catalog
├── log.md              # Append-only ingestion log
├── source-summary.md   # One per ingested document
├── concept-page.md     # Topics spanning multiple sources
└── entity-page.md      # Named things (models, standards, locations)

Pages use YAML frontmatter with source tracking and [[wiki-links]] for cross-references.

Project Structure

src/llm_wiki/
├── cli.py          # Typer CLI (ingest, ingest-all, status)
├── config.py       # YAML config loading with env var overrides
├── parser.py       # Marker PDF parsing + LLM text formatting
├── llm.py          # LLM Protocol + OllamaLLM implementation
├── ingestion.py    # Orchestrator: parse → prompt → generate → store
└── wiki_store.py   # Flat-file wiki (read/write pages, index, log)

Development

# Run tests
uv run pytest

# Run only unit tests
uv run pytest -m unit

# Lint
uv run ruff check src/ tests/

# Format
uv run ruff format src/ tests/

# Type check
uv run ty check src/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages