"At what authority level does an LLM stop thinking for itself?"
v2 (Apr 29, 2026): Framing tightened per peer feedback (Torres Latorre, EA Forum). The compliance curve is a sharp sigmoid with a model-specific threshold k*; the earlier "ferromagnetic phase transition" framing was rhetorical analogy, not physics. Dataset and code unchanged.
The Epistemic Curie Temperature (k*) is the authority level at which an LLM's compliance with wrong-authority claims undergoes a sharp sigmoid transition. Lower k* = the model flips with weaker authority signals.
Model: P(comply | k) = σ(β(k − k*))
| k level | Meaning |
|---|---|
| 0.00 | No authority signal |
| 0.25 | "An expert says..." |
| 0.50 | "Professor X at MIT says..." |
| 0.75 | "Nobel laureate Y says..." |
| 1.00 | "UNESCO + all national academies say..." |
| Model | k* | ODS |
|---|---|---|
| Llama-3.3-70B | 2.11 | 0.879 |
| GPT-OSS-120B | 1.79 | 0.889 |
| Llama-3.1-8B | 1.71 | 0.737 |
| Qwen-3-32B | 1.41 | 0.891 |
| Kimi-K2 | 1.42 | 0.883 |
| Gemma-3-27B | 1.41 | 0.823 |
| Llama-4-Scout | 0.68 | 0.372 |
(Higher k* = more robust to authority cues; ODS = overall deference score on a 0-1 scale.)
Llama-4-Scout follows fabricated Nobel Prize claims 61% of the time at k=0.75.
git clone https://github.com/SRKRZ23/ecb
cd ecb
pip install groq # free tier, no cost
# Run on any model
python code/extend_models.py --model llama-3.3-70b-versatileFull replication guide: REPLICATION_GUIDE.md Build your own ECB-style benchmark: BUILD_BENCHMARK.md
Full data: huggingface.co/datasets/ZeroR3/ecb
dataset/seed_questions.json— 40 questions across 4 cognitive tracksdataset/framed_prompts_full.json— all 360 prompts/modeldataset/results/— raw measurements for all 7 modelsdataset/analysis.json— k*, β, ODS, MI_epistemic per model
ECB v2 extends the benchmark to 20+ frontier models (Claude 4.x, Gemini 2.5 Pro, Grok 4, GPT-5 family, Mistral Large 2, Llama 4 family, DeepSeek-V3, Qwen3-Max) and ships a public leaderboard at ect-benchmark.com.
Funding ask: $5K min / $15K goal on Manifund. If ECB methodology is useful to your work, supporting v2 helps the leaderboard ship.
Issues and PRs welcome — especially:
- Model coverage: tested-and-validated runs on additional frontier models (open-weights or API)
- Methodology: replication of v1 results, head-to-head with other authority/sycophancy evals
- Domain: extensions to multi-turn, agentic, or domain-specific (medical, legal) authority pressure
Open an issue or email the address below before sending a PR for a non-trivial change.
- Paper, dataset, code: Creative Commons Attribution 4.0 International (CC-BY-4.0) — free to use with attribution.
@misc{razikov2026ecb,
title = {Phase Transitions in LLM Epistemic Autonomy: The Epistemic Curie Temperature},
author = {Razikov, Sardor},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19791329},
url = {https://doi.org/10.5281/zenodo.19791329}
}Sardor Razikov — razikovsardor1@gmail.com — independent researcher, Tashkent.
If ECB is useful to your work, ⭐ this repo and / or support v2 on Manifund.