Skip to content

ameno-/leda-agents

Repository files navigation

Leda Agents

Parameterized personality architecture and Letta evals for testing how agent behavior changes across models.

Core result

The most important result in this repo is not that personality matters. It's that stronger models can regress under heavier personality structure.

From evals/results.md:

Form Auto/Letta M2.5 M2.7
Stealth 0.73 0.77 0.82
Compressed 0.80 0.70 0.75
Full 0.63 0.70 0.67

Stealth improved monotonically with stronger models. Full did not.

That points to a more interesting failure mode than "long prompts bad": instruction hierarchy conflict.

Architecture

constitution.json    ← canonical behavioral source
       ↓
personality/         ← parameter schema, profiles, lexicons, render templates
       ↓
generated/           ← rendered forms, system overlays, candidate payloads
       ↓
forms/               ← legacy synced forms
       ↓
evals/               ← benchmark data, result artifacts, slot specs
       ↓
search/              ← candidate generation, runners, reports

Personality is data, not prose.

The repo is moving toward:

  • semantic parameters
  • deterministic rendering
  • model-specific evaluation
  • static eval slots instead of disposable eval agents

Quick start

git clone https://github.com/ameno-/leda-agents
cd leda-agents
python3 scripts/render_profiles.py --sync-legacy
cat evals/results.md

Repo highlights

Current findings

  1. Stealth scales with stronger models
  2. Compressed is not universally best
  3. Full can regress on stronger models
  4. Scope-respect is the main regression surface
  5. Eval environment contamination is real — fixture isolation matters

Why this exists

Most prompt tuning work still treats personality as prose. This repo treats it as a system:

  • source
  • rendering
  • evaluation
  • regression detection

That makes the failures legible.

License

MIT

About

Personality-driven Letta agents with measurable behavioral differences. Three agents, three configs, same tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors