ECL: Effective Context Length

A perturbation-variance framework for estimating and comparing context utilization in sequence models.

Overview

Modern sequence models accept nominal context windows spanning hundreds of kilobases, yet only a fraction of the input context is functionally utilized. ECL provides a rigorous, estimable, and comparable notion of effective context length -- the input span that materially affects a model's embedding at a reference locus.

Key Quantities

Quantity	Description
ECL_beta	Minimum radius capturing beta-fraction of total influence
ECP	Effective Context Profile: full beta -> ECL curve
ECD	Effective Context Dimension: entropy-based count of contributing positions
AECP	Area Under the ECP (mean influence distance)
CDS	Context Decay Spectroscopy: mixture-of-exponentials decomposition
gNIAH	Genomic Needle-in-a-Haystack protocol

Supported Models

Enformer, Borzoi, HyenaDNA, Caduceus, Evo 2, DNABERT-2 (via model wrappers), plus synthetic models for testing.

Installation

# Create environment
bash create_env.sh

# Or install manually
pip install -e ".[all]"

Quick Start

from ecl.models.base import SyntheticModel
from ecl.influence import compute_influence_profile
from ecl.ecl import ECL, ECP, ECD
from ecl.perturbations import RandomSubstitution
import numpy as np

# Create model and sequences
model = SyntheticModel(seq_length=500, embed_dim=64, decay_length=50)
rng = np.random.default_rng(42)
sequences = rng.integers(0, 4, size=(20, 500)).astype(np.int8)

# Compute influence profile
distances, influence = compute_influence_profile(
    model_fn=model, sequences=sequences, reference=250,
    max_distance=200, perturbation=RandomSubstitution(), rng=rng,
)

# Compute ECL
ecl_90 = ECL(distances, influence, beta=0.9)
print(f"ECL_0.9 = {ecl_90} bp")

Project Structure

ECL/
├── src/ecl/                    # Core library
│   ├── perturbations.py        # Perturbation kernels (substitution, shuffle, Markov, generative)
│   ├── metrics.py              # Embedding discrepancy metrics
│   ├── influence.py            # Influence energy computation (Algorithms 1-2)
│   ├── ecl.py                  # ECL, ECP, ECD, AECP, directional ECL
│   ├── estimation.py           # Bernstein bounds, bootstrap CI, permutation tests
│   ├── cds.py                  # Context Decay Spectroscopy
│   ├── gniah.py                # Genomic Needle-in-a-Haystack
│   └── models/                 # Model wrappers
│       ├── base.py             # SyntheticModel, LocalModel, AdditiveModel
│       ├── enformer.py         # Enformer wrapper
│       ├── borzoi.py           # Borzoi wrapper
│       ├── hyenadna.py         # HyenaDNA wrapper
│       ├── caduceus.py         # Caduceus wrapper
│       ├── evo2.py             # Evo 2 wrapper
│       └── dnabert2.py         # DNABERT-2 wrapper
├── scripts/                    # One script per figure/table
│   ├── fig01..fig12_*.py       # 12 figure scripts
│   └── tab01..tab07_*.py       # 7 table scripts
├── tests/                      # Comprehensive test suite (106 tests)
├── docs/
│   ├── report/                 # LaTeX paper source
│   ├── tutorials/              # Python tutorials
│   ├── api.md                  # API reference
│   ├── theory.md               # Mathematical background
│   ├── getting_started.md      # Getting started guide
│   ├── model_guide.md          # Model wrapper guide
│   └── experiments.md          # Experiment reproduction guide
├── notebooks/                  # Jupyter tutorial notebooks
├── pyproject.toml
├── Makefile
├── create_env.sh
└── .pre-commit-config.yaml

Running

# Run tests
make test

# Run linter
make lint

# Generate all figures
make figures

# Generate all tables
make tables

# Run everything
make all-experiments

Citation

@inproceedings{shamssolari2026ecl,
  title={Effective Context Length: A Perturbation--Variance Framework for
         Estimating and Comparing Context Utilization in Sequence Models},
  author={Shams Solari, Omid},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECL: Effective Context Length

Overview

Key Quantities

Supported Models

Installation

Quick Start

Project Structure

Running

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
outputs		outputs
scripts		scripts
src/ecl		src/ecl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
create_env.sh		create_env.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

ECL: Effective Context Length

Overview

Key Quantities

Supported Models

Installation

Quick Start

Project Structure

Running

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages