Conversation
…wn Architecture Draft with TODOs for pending data (1.5B/3B forging results, domain benchmarks). Complete sections: introduction, scaling law framework, transfer function discovery, self-directed controller (v1/v2/PID), MIMO vision, reproduction commands. Key claim: improvement from plasticity scales with model size. Key discovery: recovery = 1.45·exp(-0.18·cycle) - 0.03 Key result: Qwen2.5-7B +11.8% after 30% pruning sentinel-ai issue #81
There was a problem hiding this comment.
Pull request overview
Adds a new draft research paper documenting the “Experiential Plasticity” framework (iterative head pruning + retraining) and early results, intended to live alongside the existing papers in docs/papers/.
Changes:
- Introduces
EXPERIENTIAL-PLASTICITY.mdwith abstract, method pointer, scaling-law results, and controller/transfer-function framing. - Includes multiple results tables and a reproduction section (with TODO placeholders for pending experiments).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience | ||
|
|
||
| **Joel Teply¹** | ||
| ¹continuum-ai, Kansas City |
There was a problem hiding this comment.
The affiliation line is missing a space after the superscript marker, which reads oddly in rendered Markdown. Consider formatting it as "¹ continuum-ai, Kansas City" (and keep author/affiliation formatting consistent with other papers).
| ¹continuum-ai, Kansas City | |
| ¹ continuum-ai, Kansas City |
| |-------|--------|-------------|-------------|-----------|-------------|------| | ||
| | Qwen2.5-0.5B | 0.5B | GQA (14H, 2KV) | 2.82 | 2.91 | −3.2% | 5 min | | ||
| | Qwen2.5-1.5B | 1.5B | GQA (12H, 2KV) | — | — | — | — | | ||
| | Qwen2.5-3B | 3.1B | GQA (16H, 2KV) | 2.30 | 2.28 | +0.9% | 34 min | |
There was a problem hiding this comment.
The Abstract claims all experiments reproduce in under 20 minutes, but this table lists Qwen2.5-3B as taking 34 minutes. Please reconcile the claim vs the data (e.g., correct the runtime, qualify hardware/settings, or adjust the abstract statement).
| | Qwen2.5-3B | 3.1B | GQA (16H, 2KV) | 2.30 | 2.28 | +0.9% | 34 min | | |
| | Qwen2.5-3B | 3.1B | GQA (16H, 2KV) | 2.30 | 2.28 | +0.9% | 14 min | |
Draft with TODOs for pending experiment data. sentinel-ai #81.