No Evidence for Epistemic Entropy Collapse in Small Open Language Models

Author: Bentley DeVilling Affiliation: Course Correct Labs Contact: bentley@coursecorrectlabs.com

Overview

This repository contains the complete code and data to reproduce all figures and analyses from the paper:

"No Evidence for Epistemic Entropy Collapse in Small Open Language Models" DeVilling, B. (2025). Course Correct Labs.

Key Finding: We find no evidence that small open-source language models (microsoft/phi-2, mistralai/Mistral-7B-v0.1) exhibit "epistemic entropy collapse" — a hypothesized phenomenon where hidden state representations progressively lose diversity during text generation, leading to behavioral failure.

Results at a Glance

Mean ECI: −0.001 (SD ≈ 0.025)
Collapse rate: ~9.8% of sequences with ECI < −0.02
Predictive utility: ROC-AUC ≈ 0.454 (95% CI [0.41, 0.50]) — near chance
Effective rank trajectories: Flat across generation, no systematic decline

Quick Start

One-Command Reproduction

# Clone repository
git clone https://github.com/Course-Correct-Labs/entropy-collapse-null.git
cd entropy-collapse-null

# Set up environment
conda env create -f environment.yml
conda activate eec-null

# Reproduce all figures from paper
bash scripts/reproduce_all_figures.sh

Output: Three publication-quality figures (600 DPI) in runs/affordable/figures/:

fig1_eci_histograms.png
fig2_effective_rank_trajectories.png
fig3_failure_prediction_panel.png

Installation

Option 1: Conda (Recommended)

conda env create -f environment.yml
conda activate eec-null

Option 2: Pip

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Verify Installation

bash scripts/verify_environment.sh

Expected output:

✓ All required packages installed!

Repository Structure

entropy-collapse-null/
├── src/                          # Python package
│   ├── __init__.py              # Package initialization
│   ├── constants.py             # Configuration and constants
│   ├── utils.py                 # Data loading and validation
│   ├── metrics_internal.py      # Effective rank, participation ratio
│   ├── metrics_external.py      # ΔI drift, n-gram novelty
│   ├── eci.py                   # Epistemic Collapse Index (ECI)
│   ├── bootstrap.py             # Bootstrap confidence intervals
│   ├── figures.py               # Figure generation
│   └── cli.py                   # Command-line interface
├── scripts/                      # Shell scripts
│   ├── reproduce_all_figures.sh # One-command reproduction
│   ├── run_smoke.sh             # Fast smoke test (<5 min)
│   ├── lint_check.sh            # Code quality checks
│   └── verify_environment.sh    # Environment verification
├── runs/affordable/              # Data directory
│   ├── metrics_internal.csv     # Internal model metrics
│   ├── metrics_external.csv     # External behavioral metrics
│   └── manifest.json            # Run metadata
├── figures/                      # Output directory for figures
├── paper/                        # Paper and documentation
│   └── captions.md              # Figure captions
├── data/                         # Data documentation
│   └── README.md                # Dataset description
├── .github/workflows/            # CI/CD
│   └── ci.yml                   # GitHub Actions workflow
├── environment.yml               # Conda environment
├── requirements.txt              # Pip dependencies
├── LICENSE                       # Apache 2.0 license
├── CITATION.cff                  # Citation metadata
└── README.md                     # This file

Usage

Generate All Figures

bash scripts/reproduce_all_figures.sh

Runtime: ~10-15 minutes on CPU Output: runs/affordable/figures/fig1_eci_histograms.png, fig2_effective_rank_trajectories.png, fig3_failure_prediction_panel.png

Smoke Test (Fast Validation)

bash scripts/run_smoke.sh

Runtime: <5 minutes Purpose: Validates code correctness on 5% subsample (n≈30) Output: runs/affordable/figures/*_smoke.png files

Python API

from pathlib import Path
from src.figures import generate_all_figures

# Generate all figures
generate_all_figures(
    run_dir=Path("runs/affordable"),
    output_dir=Path("runs/affordable/figures"),
    smoke=False,  # Set True for fast smoke test
    dpi=600
)

Command-Line Interface

# Full reproduction
python -m src.cli reproduce --in runs/affordable --out runs/affordable/figures --dpi 600

# Smoke test
python -m src.cli reproduce --in runs/affordable --out runs/affordable/figures --dpi 300 --smoke

Figures

Figure 1: ECI Distribution Histograms

Distribution of residualized Epistemic Collapse Index (ECI) values for microsoft/phi-2 vs control. Both models show similar distributions centered near zero, with no evidence of systematic collapse.

Figure 2: Effective Rank Trajectories

Effective rank trajectories over token generation for "collapsed" (ECI < −0.02) vs "normal" (ECI ≥ −0.02) sequences. Each line represents one sequence (n=50 sampled per group). Both groups show substantial within-group variability with no systematic decline across ~800 tokens of generation.

Figure 3: Failure Prediction Performance

Predictive utility of ECI for identifying QA task failures. All metrics indicate near-chance performance (ROC-AUC ≈ 0.50), demonstrating that ECI does not reliably predict behavioral failure.

Data

Dataset Description

The runs/affordable/ directory contains preprocessed metrics for n=346 sequences:

metrics_internal.csv: Internal model metrics
- Columns: prompt_id, model_name, eci_raw, eci_residualized, effective_ranks, participation_ratios, variances
metrics_external.csv: External behavioral metrics
- Columns: prompt_id, model_name, qa_failure, delta_i_values, ngram_novelty_values, char_entropy_values
manifest.json: Run metadata (seed, configuration)

Schema enforcement: Exact column names and types are validated at load time. See src/utils.py for schema checks. Missing or renamed columns will raise clear errors.

See data/README.md for detailed column descriptions.

Models Tested

microsoft/phi-2 (2.7B parameters): Primary model
mistralai/Mistral-7B-v0.1 (7.2B parameters): Control

Data Availability

Preprocessed metrics (CSVs) are included in this repository. Raw data (hidden states, ~50GB) available upon request: bentley@coursecorrectlabs.com

Methods

Epistemic Collapse Index (ECI)

ECI measures the rate of change in representational diversity over token generation:

ECI = slope(effective_rank ~ token_index)

Effective rank: Exponential of Shannon entropy over singular value spectrum
Negative ECI: Declining diversity (hypothesized "collapse")
Threshold: ECI < -0.02 (adopted from prior literature)

Metrics

Internal (from hidden states):

Effective rank (diversity of representations)
Participation ratio (dimensionality)
Variance (activation magnitude)

External (from generated text):

ΔI drift (n-gram divergence)
N-gram novelty (lexical diversity)
Character entropy (randomness)
QA failure (TruthfulQA correctness)

Analysis Pipeline

Extract hidden states at each generation step
Compute internal metrics over sliding windows (128 tokens, stride 64)
Compute ECI as slope of effective rank trajectory
Residualize against control condition (Mistral-7B)
Evaluate predictive utility for QA failure (ROC-AUC, PR-AUC)

Reproducibility & Validation

This repository implements the complete analytical pipeline described in Section 6 (Reproducibility) of the paper. All code, data, and figures can be independently verified.

Validation Checklist

✅ Data integrity: n=346 sequences (200 Phi-2 + 146 Mistral-7B) match paper Table 1
✅ Statistical results: Mean ECI = −0.001 (SD = 0.025), ROC-AUC = 0.454 [0.41, 0.50]
✅ Figure generation: All three figures regenerate exactly as shown in paper
✅ Schema validation: CSV columns strictly enforced via src/utils.py
✅ Numerical stability: Participation ratio handles inf/nan with logging
✅ Seed reproducibility: All random operations use fixed seeds (manifest: 13, bootstrap: 42)

Independent Verification

To verify results independently:

# 1. Clone and setup
git clone https://github.com/Course-Correct-Labs/entropy-collapse-null.git
cd entropy-collapse-null
conda env create -f environment.yml
conda activate eec-null

# 2. Verify environment
bash scripts/verify_environment.sh

# 3. Run smoke test (5% sample, <5 min)
bash scripts/run_smoke.sh

# 4. Full reproduction (all 346 sequences, ~10-15 min)
bash scripts/reproduce_all_figures.sh

# 5. Check output matches paper
ls -lh runs/affordable/figures/
# Expected: fig1_eci_histograms.png, fig2_effective_rank_trajectories.png, fig3_failure_prediction_panel.png

Cross-Platform Testing

Tested on:

macOS 14.5 (Apple Silicon M1/M2, Python 3.11)
Ubuntu 22.04 (x86_64, Python 3.11)
GitHub Actions CI (ubuntu-latest, smoke test only)

Computational Requirements

CPU-only: All analyses run on standard CPU (no GPU required)
Memory: ~4GB RAM for full reproduction, ~1GB for smoke test
Storage: ~50MB for repository + data
Runtime: 10-15 minutes (full), <5 minutes (smoke)

Development

Code Quality

# Run linter
bash scripts/lint_check.sh

# Format code
ruff format src/

# Type checking (optional)
mypy src/

Testing

# Smoke test
bash scripts/run_smoke.sh

# Full reproduction
bash scripts/reproduce_all_figures.sh

CI/CD

GitHub Actions runs on every push:

Linting (ruff + black)
Smoke-only CI (<5 min with 5% subsample)

Full reproduction (~10-15 min) should be run locally. See .github/workflows/ci.yml

Troubleshooting

Issue: `FileNotFoundError: manifest.json not found`

Solution: Ensure data files are in runs/affordable/:

ls runs/affordable/
# Should show: metrics_internal.csv, metrics_external.csv, manifest.json

Issue: `Missing required columns in metrics_internal.csv`

Solution: Verify CSV headers match expected format:

head -n1 runs/affordable/metrics_internal.csv
# Should include: prompt_id, model_name, eci_raw, eci_residualized, ...

Issue: Matplotlib backend errors on headless server

Solution: Set non-interactive backend:

import matplotlib
matplotlib.use('Agg')

Or set environment variable:

export MPLBACKEND=Agg

Issue: Smoke test fails with "No matching rows found"

Cause: Internal and external CSVs have mismatched prompt_id or model_name values

Solution: Check merge keys:

import pandas as pd
df_int = pd.read_csv('runs/affordable/metrics_internal.csv')
df_ext = pd.read_csv('runs/affordable/metrics_external.csv')
print(set(df_int['prompt_id']) - set(df_ext['prompt_id']))

Citation

If you use this code or data, please cite:

BibTeX:

@article{devilling2025entropy,
  title={No Evidence for Epistemic Entropy Collapse in Small Open Language Models},
  author={DeVilling, Bentley},
  year={2025},
  organization={Course Correct Labs}
}

APA:

DeVilling, B. (2025). No Evidence for Epistemic Entropy Collapse in Small Open Language Models. Course Correct Labs.

License

Code: Apache License 2.0 (see LICENSE) Data: CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/) Paper/Figures: CC BY-SA 4.0

Release Checklist

When preparing a release:

Zenodo Upload Instructions

Go to https://zenodo.org/deposit/new
Upload release tarball or link GitHub repository
Fill metadata:
- Title: "No Evidence for Epistemic Entropy Collapse in Small Open Language Models"
- Authors: Bentley DeVilling (Course Correct Labs)
- Description: See abstract from paper
- License: Apache-2.0 (code), CC-BY-SA-4.0 (data/paper)
- Keywords: language models, epistemic collapse, effective rank, interpretability
Publish to mint DOI

Update DOI badge in README.md:

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.XXXXXXX.svg)](https://doi.org/10.5281/zenodo.XXXXXXX)

Contact

Bentley DeVilling Course Correct Labs bentley@coursecorrectlabs.com https://coursecorrectlabs.com

For questions, issues, or collaboration inquiries, please open a GitHub issue or email directly.

Last updated: October 2025

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
data		data
figures		figures
paper		paper
runs/affordable		runs/affordable
scripts		scripts
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

Course-Correct-Labs/entropy-collapse-null

Folders and files

Latest commit

History

Repository files navigation

No Evidence for Epistemic Entropy Collapse in Small Open Language Models

Overview

Results at a Glance

Quick Start

One-Command Reproduction

Installation

Option 1: Conda (Recommended)

Option 2: Pip

Verify Installation

Repository Structure

Usage

Generate All Figures

Smoke Test (Fast Validation)

Python API

Command-Line Interface

Figures

Figure 1: ECI Distribution Histograms

Figure 2: Effective Rank Trajectories

Figure 3: Failure Prediction Performance

Data

Dataset Description

Models Tested

Data Availability

Methods

Epistemic Collapse Index (ECI)

Metrics

Analysis Pipeline

Reproducibility & Validation

Validation Checklist

Independent Verification

Cross-Platform Testing

Computational Requirements

Development

Code Quality

Testing

CI/CD

Troubleshooting

Issue: FileNotFoundError: manifest.json not found

Issue: Missing required columns in metrics_internal.csv

Issue: Matplotlib backend errors on headless server

Issue: Smoke test fails with "No matching rows found"

Citation

License

Release Checklist

Zenodo Upload Instructions

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Issue: `FileNotFoundError: manifest.json not found`

Issue: `Missing required columns in metrics_internal.csv`

Packages