Turn any source tree into a local SQLite database. Full-text search 89,000 files in seconds. One file. No server. No network.
SQL-ManyThing builds a FTS5 trigram index of your entire codebase, adds optional symbol/graph enrichment, and records every query so agents get smarter with each session.
π¨π³ δΈζη
A full Unreal Engine 5.8 install β indexed locally, queried locally:
89,203 files indexed
~3.0 GB single SQLite database
Full-text search: seconds
UHT reflection symbols: 4,455 classes Β· 3,247 structs Β· 1,590 enums Β· 8,902 functions
This is the stress test. The framework works on anything with files: JS/TS libraries, Python tools, Java projects, monorepos, generated code, build outputs β whatever you point it at.
Most AI agent code search is still grep + cat: linear scan, whole-file reads, repeated loading, exploding token budgets. Every alternative has friction:
| grep | LSP | Cloud RAG | SQL-ManyThing | |
|---|---|---|---|---|
| Offline | β | β | β needs network | β |
| Query speed | O(n) scan | O(1) jump | ms + network | ms local FTS5 |
| Token cost | whole-file reads | precise but narrow | retrieval + stitching | bounded substr() |
| Auditable | β | β black box | β black box | β pure SQL |
| Language-agnostic | β | β language-locked | β | β any file |
| Self-built index | β | β auto | β external service | β local SQLite |
Any index beats grep. SQLite FTS5 is the simplest one that works at scale.
Model code search as A* search:
state space = files + rows + symbols + graph nodes + trace history
g(n) = queries / tool calls / tokens already spent
h(n) = remaining cost estimated by rank, symbol precision, graph coverage, trace reuse
operator = one SQL query or one bounded source extract
goal = evidence-rich answer with minimal source text
Narrow first. Extract second. Answer from evidence.
This is nearly inverted from mainstream RAG: instead of retrieving chunks and stuffing context, FTS5 locates targets, substr() extracts proof, and full files never enter the context window.
# Any project
python3 scripts/phase1/manything_build_db.py /path/to/project \
--git --ext .ts,.tsx,.js,.jsx,.json,.md
# Plain directory with .gitignore
python3 scripts/phase1/manything_build_db.py /path/to/project \
--gitignore /path/to/project/.gitignore
# Unreal Engine installed build
python3 scripts/phase1/manything_build_db.py /path/to/Engine \
--gitignore /path/to/Engine/.gitignore \
--profile unreal-installed-coreOutput: <project>/.srcidx/source.db
# Symbol enrichment
python3 scripts/phase2/enrich_cymbal.py /path/to/project
# Graph/document enrichment
python3 scripts/phase2/enrich_graphify.py /path/to/project
# Unreal UHT reflection metadata
python3 scripts/phase2/uht_enrich.py \
--db /path/to/Engine/.srcidx/source.db \
--uht-dir /path/to/Engine/Intermediate/Build/Win64/UnrealEditor/Inc \
--source-prefix Engine/ --batch 500# Initialize trace database
python3 scripts/phase3/manything_query_log.py init
# Install the sqlite3 wrapper
mkdir -p ~/.local/bin
cp scripts/phase3/sqlite3_wrapper.sh ~/.local/bin/sqlite3
cp scripts/phase3/SQL-ManyThing-query-log ~/.local/bin/SQL-ManyThing-query-log
chmod +x ~/.local/bin/sqlite3 ~/.local/bin/SQL-ManyThing-query-logEnsure ~/.local/bin precedes /usr/bin in PATH.
# Register a project
echo 'MANYTHING_myproject="/path/to/project"' >> ~/.hermes/manything/aliases.sh
# Query through the virtual path
sqlite3 /manything/myproject/source.db "SELECT COUNT(*) FROM files"
# Review trace history
SQL-ManyThing-query-log import
sqlite3 :trace "SELECT id, project, tag, substr(sql_text,1,120) FROM query_trace ORDER BY id DESC LIMIT 10"Per project:
<project>/.srcidx/source.db
Schema:
files β file metadata + full text
files_fts β FTS5 trigram index over path + content
file_enrich β symbol/domain enrich JSON per file
enrich_graphify_nodes β AST/document nodes
enrich_graphify_edges β graph/document edges
Global (Phase 3):
~/.hermes/manything/query_log.db β query trace database
~/.hermes/manything/aliases.sh β project aliases
~/.hermes/manything/pending.jsonl β pending query log buffer
Find files by content:
SELECT path, rank FROM files_fts
WHERE files_fts MATCH 'layout prepare'
ORDER BY rank LIMIT 20;Project shape at a glance:
SELECT ext, COUNT(*) FROM files
GROUP BY ext ORDER BY COUNT(*) DESC;Bounded source extraction (never read the whole file):
SELECT instr(content, 'export function layout') FROM files WHERE path='src/layout.ts';
SELECT substr(content, 1200, 1600) FROM files WHERE path='src/layout.ts';Symbol search across enrichment:
SELECT f.path,
json_extract(s.value, '$.name') AS name,
json_extract(s.value, '$.kind') AS kind
FROM file_enrich e
JOIN files f ON f.id = e.file_id,
json_each(e.symbols) AS s
WHERE json_extract(s.value, '$.name') LIKE '%layout%'
LIMIT 50;Reuse past queries as agent memory:
WITH intent(term) AS (
VALUES ('files'), ('symbols'), ('graph'), ('README'), ('package'), ('src')
)
SELECT id, project, tag, note, substr(sql_text, 1, 180)
FROM query_trace
WHERE project = 'myproject'
AND (tag IS NOT NULL OR EXISTS (
SELECT 1 FROM intent WHERE lower(sql_text) LIKE '%' || lower(term) || '%'
))
ORDER BY tag IS NULL, id DESC LIMIT 12;Tag a useful query for future sessions:
INSERT INTO query_notes (log_id, note, tag, created_at)
VALUES (42, 'overview entrypoint query', 'useful_pattern', strftime('%s','now'));You can reproduce this project with three prompts executed in sequence:
- FTS5 + trigram full-text index the target project; design your own filter rules
- Interactively query the DB, discover enrichment table designs, write batch enrichment scripts
- Auto-ingest SQL queries into the trace database, enabling historical query exploration before running new searches
- SQLite first. Query everything with SQL. One file, fully inspectable.
- Build once, reuse forever. Index cost is paid once; queries are free.
- Trace behavior, not just answers. Every session leaves navigable breadcrumbs for the next.
- Never read whole files. Bounded
substr()proves the answer without blowing context. - Profile policies over
.gitignoreassumptions. Control what gets indexed explicitly. - Project-agnostic by default. Unreal-specific and other project lessons live in
references/, not the core.
Every script is a stable entrypoint: python3 scripts/phase1/manything_build_db.py ...
A unified manything build wrapper would shift every token position in every command string. Transformer positional encoding is sensitive to displacement; even small shifts introduce noise in agent reasoning. By keeping raw scripts:
- Token positions across phases stay predictable
- Agent-issued commands in query traces are reproducible verbatim
- Zero cost forcing an agent to learn wrapper conventions
Same principle drives the Phase 3 sqlite3 wrapper: intercept at the binary level, never modify the query string reaching the LLM context.
For Windows-hosted repositories, run Phase 1 indexing with Windows Python when possible β DrvFs writes from WSL are slower. WSL can query the resulting database fine.
Template included: templates/run_phase1_unreal_windows.bat
Start here:
.hermes/ β Hermes Agent project context
references/INDEX.md
scripts/INDEX.md
Key references:
references/phase1/phase1-setup.md
references/phase1/gitignore-enumeration.md
references/phase2/enrich-cymbal.md
references/phase2/enrich-graphify.md
references/phase2/ue-uht-generated-files.md
references/phase3/phase3-design-rationale.md
references/unreal/unreal-installed-indexing-profiles.md
references/unreal/ue58-full-phase123-run.md
MIT