persona-knowledge

Persistent, incremental, searchable persona knowledge base — the data layer between raw sources and persona training.

What it does

Data sources                  persona-knowledge                 Downstream consumers
───────────────          →   ──────────────────────      →   ──────────────────────
Obsidian vault                Storage: MemPalace              anyone-skill
GBrain export                 Graph: Knowledge Graph            (4D extraction)
WhatsApp / Telegram           Knowledge: Karpathy Wiki        persona-model-trainer
X (Twitter) / Instagram       Export: training/                 (fine-tuning)
iMessage / Signal
.md / .txt / .csv / .pdf
.jsonl / .json

Architecture

┌─────────────────────────────────────────────────┐
│                persona-knowledge                   │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │
│  │ MemPalace│  │Knowledge │  │  Karpathy    │  │
│  │ (ChromaDB│  │  Graph   │  │  LLM Wiki    │  │
│  │ +SQLite) │  │ (SQLite) │  │  (Markdown)  │  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  │
│       │              │               │           │
│       └──────────────┼───────────────┘           │
│                      │                           │
│              ┌───────┴───────┐                   │
│              │   Export      │                   │
│              │  training/    │                   │
│              └───────────────┘                   │
└─────────────────────────────────────────────────┘

Four layers:

Layer	Technology	Role
Storage	MemPalace (ChromaDB + SQLite)	Verbatim content, semantic search
Graph	MemPalace Knowledge Graph	Entity-relationship graph with temporal validity
Knowledge	Karpathy LLM Wiki (interlinked .md)	LLM-maintained structured knowledge accumulation
Export	`export_training.py`	Generate `training/` for persona-model-trainer

Quick start

Requirements

Python >= 3.11
pip install mempalace (~1-2 GB disk for ChromaDB)

1. Initialize

python scripts/init_knowledge.py --slug sam --name "Samantha"

2. Ingest data

# WhatsApp chat
python scripts/ingest.py --slug sam --source ~/whatsapp-export.txt --persona-name "Samantha"

# Twitter archive
python scripts/ingest.py --slug sam --source ~/twitter-archive/ --persona-name "Sam"

# Obsidian vault
python scripts/ingest.py --slug sam --source ~/obsidian-vault/

# Generic JSONL
python scripts/ingest.py --slug sam --source data.jsonl --persona-name "Sam"

# Dry run (parse without writing)
python scripts/ingest.py --slug sam --source data.txt --dry-run

3. Build wiki

After ingestion, the agent (Cursor / Claude Code) reads MemPalace content and updates the wiki pages following the Karpathy LLM Wiki pattern. This is an agent-driven task, not a script.

4. Export for training

python scripts/export_training.py --slug sam --output training/

Output:

training/
  raw/                    # authentic source files
  conversations.jsonl     # distilled Q-A pairs
  profile.md              # character sheet
  metadata.json           # stats

5. Lint wiki

python scripts/lint_wiki.py --slug sam

6. Query Knowledge Graph

python scripts/query_kg.py --slug sam --entity "Tom"
python scripts/query_kg.py --slug sam --path "Tom" "Alice"
python scripts/query_kg.py --slug sam --stats

Supported sources

Three adapters cover all formats:

Source	Adapter	Auto-detected
Obsidian vault	`universal`	`.obsidian/` or `*.md` directory
GBrain export	`universal`	Markdown dir with `.raw/` sidecars
`.md` / `.txt` / `.csv` / `.pdf`	`universal`	File extension
`.jsonl` / `.json`	`universal`	File extension
WhatsApp `.txt`	`chat_export`	Timestamp pattern
Telegram `result.json`	`chat_export`	`chats` JSON key
Signal JSON	`chat_export`	`sender`+`body` format
iMessage `.db`	`chat_export`	SQLite tables
X (Twitter) archive	`social`	`data/tweets.js`
Instagram archive	`social`	`content/posts_1.json`

Data storage

~/.openpersona/knowledge/{slug}/
  dataset.json                # metadata + stats
  .mempalace/                 # MemPalace local data
    palace/                   # ChromaDB + KG
  sources/                    # immutable source backups (JSONL)
    .source-index.json        # per-file metadata
  wiki/                       # Karpathy wiki (derived from MemPalace)
    _schema.md
    identity.md
    voice.md
    values.md
    thinking.md
    relationships.md          # KG-generated
    timeline.md               # KG-generated
    _contradictions.md
    _changelog.md
    _evidence.md

Dependency chain

persona-knowledge   →   anyone-skill   →   persona-model-trainer
(data management)     (distillation)     (fine-tuning)

persona-knowledge is optional — anyone-skill works standalone
When present, anyone-skill uses persona-knowledge for persistent storage and semantic search
persona-model-trainer consumes the training/ export directly

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
adapters		adapters
references		references
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
SKILL.md		SKILL.md
publish.sh		publish.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

persona-knowledge

What it does

Architecture

Quick start

Requirements

1. Initialize

2. Ingest data

3. Build wiki

4. Export for training

5. Lint wiki

6. Query Knowledge Graph

Supported sources

Data storage

Dependency chain

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

persona-knowledge

What it does

Architecture

Quick start

Requirements

1. Initialize

2. Ingest data

3. Build wiki

4. Export for training

5. Lint wiki

6. Query Knowledge Graph

Supported sources

Data storage

Dependency chain

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages