Persistent, incremental, searchable persona knowledge base — the data layer between raw sources and persona training.
Data sources persona-knowledge Downstream consumers
─────────────── → ────────────────────── → ──────────────────────
Obsidian vault Storage: MemPalace anyone-skill
GBrain export Graph: Knowledge Graph (4D extraction)
WhatsApp / Telegram Knowledge: Karpathy Wiki persona-model-trainer
X (Twitter) / Instagram Export: training/ (fine-tuning)
iMessage / Signal
.md / .txt / .csv / .pdf
.jsonl / .json
┌─────────────────────────────────────────────────┐
│ persona-knowledge │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ MemPalace│ │Knowledge │ │ Karpathy │ │
│ │ (ChromaDB│ │ Graph │ │ LLM Wiki │ │
│ │ +SQLite) │ │ (SQLite) │ │ (Markdown) │ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────┼───────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Export │ │
│ │ training/ │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────┘
Four layers:
| Layer | Technology | Role |
|---|---|---|
| Storage | MemPalace (ChromaDB + SQLite) | Verbatim content, semantic search |
| Graph | MemPalace Knowledge Graph | Entity-relationship graph with temporal validity |
| Knowledge | Karpathy LLM Wiki (interlinked .md) | LLM-maintained structured knowledge accumulation |
| Export | export_training.py |
Generate training/ for persona-model-trainer |
- Python >= 3.11
pip install mempalace(~1-2 GB disk for ChromaDB)
python scripts/init_knowledge.py --slug sam --name "Samantha"# WhatsApp chat
python scripts/ingest.py --slug sam --source ~/whatsapp-export.txt --persona-name "Samantha"
# Twitter archive
python scripts/ingest.py --slug sam --source ~/twitter-archive/ --persona-name "Sam"
# Obsidian vault
python scripts/ingest.py --slug sam --source ~/obsidian-vault/
# Generic JSONL
python scripts/ingest.py --slug sam --source data.jsonl --persona-name "Sam"
# Dry run (parse without writing)
python scripts/ingest.py --slug sam --source data.txt --dry-runAfter ingestion, the agent (Cursor / Claude Code) reads MemPalace content and updates the wiki pages following the Karpathy LLM Wiki pattern. This is an agent-driven task, not a script.
python scripts/export_training.py --slug sam --output training/Output:
training/
raw/ # authentic source files
conversations.jsonl # distilled Q-A pairs
profile.md # character sheet
metadata.json # stats
python scripts/lint_wiki.py --slug sampython scripts/query_kg.py --slug sam --entity "Tom"
python scripts/query_kg.py --slug sam --path "Tom" "Alice"
python scripts/query_kg.py --slug sam --statsThree adapters cover all formats:
| Source | Adapter | Auto-detected |
|---|---|---|
| Obsidian vault | universal |
.obsidian/ or *.md directory |
| GBrain export | universal |
Markdown dir with .raw/ sidecars |
.md / .txt / .csv / .pdf |
universal |
File extension |
.jsonl / .json |
universal |
File extension |
WhatsApp .txt |
chat_export |
Timestamp pattern |
Telegram result.json |
chat_export |
chats JSON key |
| Signal JSON | chat_export |
sender+body format |
iMessage .db |
chat_export |
SQLite tables |
| X (Twitter) archive | social |
data/tweets.js |
| Instagram archive | social |
content/posts_1.json |
~/.openpersona/knowledge/{slug}/
dataset.json # metadata + stats
.mempalace/ # MemPalace local data
palace/ # ChromaDB + KG
sources/ # immutable source backups (JSONL)
.source-index.json # per-file metadata
wiki/ # Karpathy wiki (derived from MemPalace)
_schema.md
identity.md
voice.md
values.md
thinking.md
relationships.md # KG-generated
timeline.md # KG-generated
_contradictions.md
_changelog.md
_evidence.md
persona-knowledge → anyone-skill → persona-model-trainer
(data management) (distillation) (fine-tuning)
persona-knowledgeis optional —anyone-skillworks standalone- When present,
anyone-skillusespersona-knowledgefor persistent storage and semantic search persona-model-trainerconsumes thetraining/export directly
MIT