Leda Agents

Parameterized personality architecture and Letta evals for testing how agent behavior changes across models.

Core result

The most important result in this repo is not that personality matters. It's that stronger models can regress under heavier personality structure.

From evals/results.md:

Form	Auto/Letta	M2.5	M2.7
Stealth	0.73	0.77	0.82
Compressed	0.80	0.70	0.75
Full	0.63	0.70	0.67

Stealth improved monotonically with stronger models. Full did not.

That points to a more interesting failure mode than "long prompts bad": instruction hierarchy conflict.

Architecture

constitution.json    ← canonical behavioral source
       ↓
personality/         ← parameter schema, profiles, lexicons, render templates
       ↓
generated/           ← rendered forms, system overlays, candidate payloads
       ↓
forms/               ← legacy synced forms
       ↓
evals/               ← benchmark data, result artifacts, slot specs
       ↓
search/              ← candidate generation, runners, reports

Personality is data, not prose.

The repo is moving toward:

semantic parameters
deterministic rendering
model-specific evaluation
static eval slots instead of disposable eval agents

Quick start

git clone https://github.com/ameno-/leda-agents
cd leda-agents
python3 scripts/render_profiles.py --sync-legacy
cat evals/results.md

Repo highlights

constitution.json — behavioral source of truth
personality/profiles/base/ — baseline personality profiles
generated/ — rendered outputs from parameterized profiles
evals/results.md — current cross-model result summary
evals/rubric.txt — grader rubric
search/run_experiment.py — static-slot experiment runner scaffold

Current findings

Stealth scales with stronger models
Compressed is not universally best
Full can regress on stronger models
Scope-respect is the main regression surface
Eval environment contamination is real — fixture isolation matters

Why this exists

Most prompt tuning work still treats personality as prose. This repo treats it as a system:

source
rendering
evaluation
regression detection

That makes the failures legible.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
evals		evals
forms		forms
generated		generated
personality		personality
proposals		proposals
scripts		scripts
search		search
skills		skills
test-arena		test-arena
.gitignore		.gitignore
README.md		README.md
constitution.json		constitution.json
justfile		justfile
layers.json		layers.json
manifest.json		manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leda Agents

Core result

Architecture

Quick start

Repo highlights

Current findings

Why this exists

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Leda Agents

Core result

Architecture

Quick start

Repo highlights

Current findings

Why this exists

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages