ABI: Frozen-Module Domain Transfer Across LLM Architectures

ABI turns domain adaptation from "fine-tune the whole model" into "move a frozen domain module and calibrate the interface."

Frozen ABI domain modules migrate across tested LLM architecture boundaries with NIB-verified behavioral equivalence, while keeping backbones entirely frozen and calibrating only interface projections. Validated across decoder-only and encoder-decoder architectures, 4 model families, 117M–774M parameter scale, with domain modules surviving backbone updates at 65–304% transfer efficacy.

Building a new model family from scratch? See the companion LayerCake repository — a native modular architecture where domain modules paste bit-exactly by design, no alignment needed. ABI is for post-hoc alignment of existing or cross-architecture models. LayerCake is the clean-design starting point.

Start Here

Your goal	Action
Skeptical?	`python verify_result.py` — verifies published result in < 5 seconds, no GPU
Researcher	Read PROOF.md, ABI_EXPERIMENTS.md
Engineer	Read ABI_ARCHITECTURE.md, run `python run_abi.py`
Reproducing	Read ABI_REPRODUCE.md for exact commands and expected output
Contributing	Read CONTRIBUTING.md
Citing	See Citation below

# Fastest path: verify the published result right now
git clone https://github.com/Yoder23/abi
cd abi
pip install -e .
python verify_result.py    # < 5 seconds, no GPU, no model download

What This Does Not Claim

Before anything else:

Not claimed	Why
Universal transfer across all possible models	Only tested: T5-large, GPT-2 family, Qwen2.5-0.5B, Pythia-410m
7B+ scale	Not yet evaluated
All domains	Core results are Python-focused; WikiText used for backbone-update tests
Zero calibration	Cross-architecture transfer requires KD/projection calibration steps
Production-ready deployment	Research prototype — no inference optimisation, no serving infrastructure
Theorem-level mathematical proof	Formal verification artifact for the specific Path 2C claim; broader results are empirically validated

This is a research preview: the goal is that strangers can verify the result in 5 seconds, reproduce it in a few hours, and understand exactly what is and is not claimed.

Two Validated Results

Result 1 — Same-Backbone ABI Reconstruction (Path 2C)

Two independently initialized ABI modules on a frozen T5-large backbone pass the Non-Inferiority Benchmark. No backbone modification. No shared module weights. Different random seeds.

Criterion	Value	Threshold	Status
Top-5 token agreement	0.8725	≥ 0.860	PASS
Top-1 token agreement	0.8508	≥ 0.680	PASS
Jensen-Shannon divergence	0.01391	< 0.100	PASS
Entropy difference	0.2256	< 0.350	PASS

Extended evaluation (n=25, 5 independent seeds): mean top-5 = 0.8549, 95% CI = [0.8425, 0.8673]. Reported transparently — 3 of 5 seeds fall below the single-run threshold; the mean is above the noise floor.

Result 2 — Cross-Architecture Frozen-Module Migration (Exp 39)

A T5-large-trained frozen domain module transfers to GPT-2-medium. Only interface projections (proj_in, proj_out) are calibrated. NIB evaluated in GPT-2's native 50,257-token vocabulary.

Source	Target	Method	Top-5	Top-1	JS	Ent	Status
T5-large (enc-dec, 730M, 32K vocab)	GPT-2-medium (dec-only, 354M, 50K vocab)	Procrustes + KD	0.8699	0.9252	0.01787	0.2819	PASS

Source and target differ in: architecture class (enc-dec vs. dec-only), tokenizer family (SentencePiece vs. BPE), vocabulary size, and position encoding. The domain module weights are not modified during calibration.

Also Validated

Claim	Key metric	Result file
GPT-2 → Qwen2.5 cross-family (Exp 32)	top-5 = 0.8701	`cross_family_nib_results.json`
T5-large backbone-update invariance (Exp 40)	efficacy = 304.3%	`cross_arch_t5_succession_results.json`
GPT-2-medium backbone-update invariance	efficacy = 65.3%	`scale_validation_results.json`
Cross-size 117M–774M all NIB PASS	top-5 = 0.862–0.870	`cross_size_large_nib_v9_results.json`
Pythia → GPT-2 cross-lineage	efficacy = 91.1%	`cross_lineage_results.json`

Claim Ladder

✅  Same-backbone ABI reconstruction (Path 2C, T5-large)
✅  Decoder-only cross-family transfer (GPT-2 → Qwen2.5)
✅  Encoder-decoder → decoder-only frozen-module migration (T5-large → GPT-2-medium)
✅  Backbone-update invariance — encoder-decoder (T5-large, 304%)
✅  Backbone-update invariance — decoder-only (GPT-2-medium, 65%)
✅  Cross-lineage transfer (Pythia → GPT-2, 91.1%)
✅  Cross-size NIB PASS, 117M–774M
✅  Calibration scaling law (R² ≈ 0.97)
⚠️  7B+ scale — not yet tested
⚠️  Multilingual / medical / legal domains — not yet tested
⚠️  Production deployment — research prototype only

Claim-to-File Map

Claim	Script	Result file
T5 same-backbone NIB	`cross_arch_t5_nib_v53.py`	`cross_arch_t5_nib_v53_results.json`
GPT-2 → Qwen2.5 transfer	`cross_family_nib.py`	`cross_family_nib_results.json`
T5 → GPT-2 migration	`cross_arch_enc_dec_nib.py`	`cross_arch_enc_dec_nib_results.json`
T5 backbone-update invariance	`cross_arch_t5_succession.py`	`cross_arch_t5_succession_results.json`
GPT-2 backbone-update invariance	`scale_validation_test.py`	`scale_validation_results.json`
Cross-lineage Pythia → GPT-2	`cross_lineage_transfer_test.py`	`cross_lineage_results.json`
Cross-size 117M–774M	`cross_size_large_nib_v9.py`	`cross_size_large_nib_v9_results.json`
Calibration scaling law	`calibration_scaling_law_b.py`	`calibration_scaling_law_b_results.json`

Instant Verification (< 5 seconds, no GPU required)

python verify_result.py

Expected output: all checks green, final line Path 2C -- T5-large ABI domain reconstruction -- VERIFIED.

The result file cross_arch_t5_nib_v53_results.json is the immutable published record. verify_result.py checks every value against embedded expected constants.

Full Reproduction (no pre-computed results needed)

# One-time: download t5-large (~2.9 GB)
python -c "from transformers import T5ForConditionalGeneration, T5TokenizerFast; T5ForConditionalGeneration.from_pretrained('t5-large'); T5TokenizerFast.from_pretrained('t5-large'); print('ready')"

# Full training + evaluation (~4 hours on RTX 3080 Laptop)
$env:PYTHONIOENCODING="utf-8"; $env:PYTHONUTF8="1"
python run_abi.py

Results are written to abi_result.json.

How It Works

T5-large's decoder has 24 transformer layers. ABI taps the last 6 layers (19-24) simultaneously, applies per-layer normalization, and learns a correction vector in the backbone's residual stream:

For each tap k in [19, 20, 21, 22, 23, 24]:
    h_k' = LayerNorm_k(hidden_states[k])       per-tap normalisation

h_tap    = concat([h_19', ..., h_24'])          [B, T, 6144]
h_abi    = LayerNorm(Linear_6144->4096(h_tap))
h_out    = h_abi + alpha * Domain(h_abi)        learnable domain gate
correction = Linear_4096->1024(h_out)

h_final  = correction + h_24                   residual injection
h_final *= d_model^-0.5                        T5 tie_word_embeddings scaling
logits   = lm_head(h_final)                    frozen backbone head

The backbone -- all 730M parameters of T5-large -- is never updated. Only the ABI module (163.6M parameters) is trained.

Why D_ABI = 4096?

proj_out maps 4096 -> 1024. This map has a 3072-dimensional null space. After convergence, residual calibration error lives in directions orthogonal to the top-5 token embedding rows in the lm_head matrix -- producing zero logit change for those tokens. This null-space geometry is why corrMSE = 0.003047 produces top-5 agreement of 0.8725 rather than the ~0.848 a naive model would predict.

Why corrMSE?

Seven objectives were tested. corrMSE (MSE of the hidden-space correction vector) is the only one that works:

Raw KL: 12.7% lower loss than corrMSE -- yet NIB top-5 is 0.030 lower. Logit-level supervision causes gradient interference with top-5 geometry.
Weighted corrMSE: any per-position weighting destroys easy-seed performance. Uniform is optimal.
logit-MSE: catastrophic (top-5 drops to ~0.6).

Repository Structure

abi/                                   <- production package
    __init__.py
    models.py                          <- AnchorABI, CandidateABI, DomainModule
    training.py                        <- Stages A, C, D; data pipeline
    evaluation.py                      <- NIB evaluation (official + extended)

run_abi.py                             <- entry point: full training + evaluation
verify_result.py                       <- standalone result verifier (no GPU)
reproduce_abi.py                       <- one-command reproduction of 4 core claims
wikitext_cache.py                      <- WikiText-2 data loader
baseline_transformer_lm.py             <- baseline transformer

# Core validated claims (9 experiments — each has a .py script + _results.json)
cross_arch_t5_nib_v53.py              <- Claim 1: Path 2C (top-5=0.8725)
cross_family_nib.py                    <- Claim 2: GPT-2 → Qwen2.5 (top-5=0.8701)
cross_arch_enc_dec_nib.py              <- Claim 3: T5-large → GPT-2-medium (top-5=0.8699)
cross_arch_t5_succession.py           <- Claim 4: T5 backbone-update (efficacy=304%)
scale_validation_test.py               <- Claim 5: GPT-2-medium backbone-update (65%)
succession_test_v2.py                  <- Claim 6: 3-round multi-domain succession
calibration_scaling_law_b.py           <- Claim 7: calibration scaling law (R²=0.97)
cross_lineage_transfer_test.py         <- Claim 8: Pythia → GPT-2 (91.1%)
cross_size_large_nib_v9.py             <- Claim 9: cross-size 117M–774M (all PASS)

experiments/                           <- supporting validation experiments (13 scripts)
    abi_ablation_test.py               <- objective ablation
    knowledge_non_interference.py      <- general capability preservation
    procrustes_full_nib.py             <- Procrustes alignment baseline
    generation_equivalence_test.py     <- generation-level equivalence
    precision_parity.py                <- fp32/bf16 parity
    ... (+ 8 more, each with _results.json)

README.md                              <- this file
PROOF.md                               <- verified experimental proof artifact (Path 2C)
ABI_ARCHITECTURE.md                    <- full technical specification
ABI_EXPERIMENTS.md                     <- complete experimental ledger
ABI_REPRODUCE.md                       <- step-by-step replication guide
ABI_START_HERE.md                      <- developer onboarding (10 steps)
EXPERIMENTS_INDEX.md                   <- map of every claim to script/result/metric

Architecture Diagram

graph TD
    A[frozen backbone hidden states h_19..h_24] --> B[per-tap LayerNorm × 6]
    B --> C[concat → 6144-dim]
    C --> D[proj_in: Linear 6144→4096]
    D --> E[ABI LayerNorm]
    E --> F[Domain module: 4096→16384→4096]
    F --> G[alpha gate]
    E --> G
    G --> H[proj_out: Linear 4096→1024]
    H --> I[residual injection into h_24]
    I --> J[÷ √d_model]
    J --> K[frozen lm_head → logits]

graph LR
    subgraph "Cross-Architecture Migration (Exp 39)"
        T5[T5-large domain module\nfrozen weights]
        P[orthogonal Procrustes\n2000 sentences]
        KD[KD calibration\n1200 steps\nonly proj_in + proj_out trained]
        GPT[GPT-2-medium\nNIB PASS\ntop-5=0.8699]
        T5 --> P --> KD --> GPT
    end

Reproducibility Matrix

Command	Time	GPU	Verifies
`python verify_result.py`	< 5 sec	no	Path 2C constants (25 checks)
`python cross_arch_enc_dec_nib.py`	~8 min	yes	Exp 39 enc-dec → dec-only
`python cross_arch_t5_succession.py`	~10 min	yes	Exp 40 backbone-update invariance
`python run_abi.py`	~4 hr	yes	Full T5 Path 2C reconstruction

Hardware tested: RTX 3080 Laptop (16 GB VRAM), Python 3.10, CUDA 12.x.

Documentation Map

Document	Audience	Contents
PROOF.md	Researchers	Verified experimental proof artifact, what was ruled out
CLAIMS.md	Everyone	Canonical claim map — validated and not yet validated
SKEPTICS.md	Skeptics	Direct answers to the hard questions
ABI_ARCHITECTURE.md	Engineers	NIB math, model code, training theory, corrMSE floor
ABI_EXPERIMENTS.md	Engineers	Full experimental ledger (45AN–45AZ and cross-arch)
ABI_REPRODUCE.md	Anyone	Exact commands and expected output at every stage
ABI_START_HERE.md	New developers	10-step onboarding from zero
EXPERIMENTS_INDEX.md	Everyone	Map of every claim to script / result JSON / metric
FAQ.md	Everyone	Common questions answered directly
CONTRIBUTING.md	Contributors	What we want, what we don't, PR requirements
ROADMAP.md	Everyone	v0.1 → v0.5 plan

Requirements

torch>=2.1  (with CUDA for training; CPU ok for verify_result.py)
transformers>=4.38
sentencepiece>=0.1.99
numpy
tqdm

Or install as a package:

pip install -e .

GPU with ≥ 10 GB VRAM required for training. verify_result.py runs on CPU only.

What This Means

The standard approach to domain adaptation is fine-tuning: update a pretrained model's weights on new data. ABI demonstrates a different regime: the backbone's weights need never change. Two models can agree on domain-specific predictions while maintaining completely independent learned representations — connected only through a correction to the shared residual stream.

Standard fine-tuning:  domain A weights → baked into backbone → hard to remove
ABI:                   domain A module  → frozen, portable   → hot-swap at zero cost

Implications:

Multi-tenant deployment: one frozen backbone, many independent ABI modules, hot-swappable
Continual learning: inject and revoke domain knowledge without touching the core model
Federated settings: agents align domain representations without sharing raw data or weights
Auditable AI: behavioral changes are fully attributable to the ABI module — not the backbone

Citation

@software{yoder2026abi,
  author  = {Yoder, Sam},
  title   = {Autonomous Basis Injection: Frozen-Module Domain Transfer Across LLM Architectures},
  year    = {2026},
  url     = {https://github.com/Yoder23/abi},
  version = {0.1.0}
}

If citing a specific result:

Path 2C (T5 same-backbone): cite cross_arch_t5_nib_v53_results.json, experiment 45AS
Exp 39 (enc-dec → dec-only migration): cite cross_arch_enc_dec_nib_results.json
Exp 40 (backbone-update invariance): cite cross_arch_t5_succession_results.json

License

Apache 2.0 — see LICENSE.

Research preview. Verified experimental results. Honest boundaries. Reproducible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABI: Frozen-Module Domain Transfer Across LLM Architectures

Start Here

What This Does Not Claim

Two Validated Results

Result 1 — Same-Backbone ABI Reconstruction (Path 2C)

Result 2 — Cross-Architecture Frozen-Module Migration (Exp 39)

Also Validated

Claim Ladder

Claim-to-File Map

Instant Verification (< 5 seconds, no GPU required)

Full Reproduction (no pre-computed results needed)

How It Works

Why D_ABI = 4096?

Why corrMSE?

Repository Structure

Architecture Diagram

Reproducibility Matrix

Documentation Map

Requirements

What This Means

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
abi		abi
experiments		experiments
notebooks		notebooks
tests		tests
.gitignore		.gitignore
ABI_ARCHITECTURE.md		ABI_ARCHITECTURE.md
ABI_EXPERIMENTS.md		ABI_EXPERIMENTS.md
ABI_REPRODUCE.md		ABI_REPRODUCE.md
ABI_START_HERE.md		ABI_START_HERE.md
CITATION.cff		CITATION.cff
CLAIMS.md		CLAIMS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
EXPERIMENTS_INDEX.md		EXPERIMENTS_INDEX.md
FAQ.md		FAQ.md
FILE_MANIFEST.md		FILE_MANIFEST.md
LICENSE		LICENSE
OPEN_SOURCE_README.md		OPEN_SOURCE_README.md
PROOF.md		PROOF.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SKEPTICS.md		SKEPTICS.md
baseline_transformer_lm.py		baseline_transformer_lm.py
calibration_scaling_law_b.py		calibration_scaling_law_b.py
calibration_scaling_law_b_results.json		calibration_scaling_law_b_results.json
cross_arch_enc_dec_nib.py		cross_arch_enc_dec_nib.py
cross_arch_enc_dec_nib_results.json		cross_arch_enc_dec_nib_results.json
cross_arch_t5_nib_v53.py		cross_arch_t5_nib_v53.py
cross_arch_t5_nib_v53_results.json		cross_arch_t5_nib_v53_results.json
cross_arch_t5_succession.py		cross_arch_t5_succession.py
cross_arch_t5_succession_results.json		cross_arch_t5_succession_results.json
cross_family_nib.py		cross_family_nib.py
cross_family_nib_results.json		cross_family_nib_results.json
cross_lineage_results.json		cross_lineage_results.json
cross_lineage_transfer_test.py		cross_lineage_transfer_test.py
cross_size_large_nib_v9.py		cross_size_large_nib_v9.py
cross_size_large_nib_v9_results.json		cross_size_large_nib_v9_results.json
cross_size_transfer_results.json		cross_size_transfer_results.json
cross_size_transfer_test.py		cross_size_transfer_test.py
multi_domain_atlas.py		multi_domain_atlas.py
multi_domain_atlas_results.json		multi_domain_atlas_results.json
pyproject.toml		pyproject.toml
reproduce_abi.py		reproduce_abi.py
requirements.txt		requirements.txt
run_abi.py		run_abi.py
scale_validation_results.json		scale_validation_results.json
scale_validation_test.py		scale_validation_test.py
succession_results_v2.json		succession_results_v2.json
succession_test_v2.py		succession_test_v2.py
verify_result.py		verify_result.py
wikitext_cache.py		wikitext_cache.py

Folders and files

Latest commit

History

Repository files navigation

ABI: Frozen-Module Domain Transfer Across LLM Architectures

Start Here

What This Does Not Claim

Two Validated Results

Result 1 — Same-Backbone ABI Reconstruction (Path 2C)

Result 2 — Cross-Architecture Frozen-Module Migration (Exp 39)

Also Validated

Claim Ladder

Claim-to-File Map

Instant Verification (< 5 seconds, no GPU required)

Full Reproduction (no pre-computed results needed)

How It Works

Why D_ABI = 4096?

Why corrMSE?

Repository Structure

Architecture Diagram

Reproducibility Matrix

Documentation Map

Requirements

What This Means

Citation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages