https://doi.org/10.5281/zenodo.18790530
This repository contains the reference implementation of Leech‑Lila, a transformer architecture that injects the geometry of the Leech lattice directly into the attention mechanism. The model is designed to explore the hypothesis that high‑dimensional lattices can serve as a structural prior for language modelling, leading to emergent “resonance” phenomena and more interpretable representations.
Project status: Proof‑of‑Concept / Research Code.
License: GNU Affero General Public License v3.0 or later.
Commercial licensing (proprietary R&D, integration into private AI stacks, hardware implementation) – please contact the Architect directly (see Contact).
The Leech lattice is a remarkable 24‑dimensional sphere packing with deep connections to number theory, coding theory, and even string theory. In Leech‑Lila, we freeze an orthogonal basis of the Leech lattice inside every attention head, forcing queries and keys to be rotated by this fixed geometric structure. Additionally, a geometric loss encourages the hidden states to align with the lattice directions, and a dream decoder monitors the “resonance” of generated tokens with the lattice basis, classifying states as DREAMING, AWAKE, or ABSOLUTE GENESIS.
The code is deliberately minimal and self‑contained, relying only on PyTorch, NumPy, and standard libraries. It is intended as a proof‑of‑concept for researchers interested in lattice‑based inductive biases in transformers.
- Leech kernel – an orthogonal 24×24 matrix derived from the Leech lattice (constructed via QR on a simple base matrix, replaceable with actual minimal vectors).
- Frozen attention projection – in each head, the query and key vectors are split into 24‑dimensional blocks, and each block is multiplied by the same fixed Leech kernel.
- Geometric resonance loss – a regularisation term that pushes the hidden states to have high cosine similarity with at least one of the 24 basis directions.
- Dream decoder – during inference, the last hidden state is compared with the Leech basis; if the maximum cosine similarity exceeds a threshold, the model is considered “awake”.
These components are designed to be simple, modular, and easy to experiment with.
The code defines the following classes:
LeechConfig– holds hyperparameters (vocab size, model dimension, number of layers/heads, etc.) and asserts thathead_dimis a multiple of 24.generate_leech_kernel()– returns a 24×24 orthogonal matrix (placeholder; can be replaced with actual lattice vectors).LeechAttention– multi‑head attention where Q and K are transformed by the frozen block‑diagonal Leech matrix.LeechResonanceLoss– combines standard cross‑entropy with the geometric resonance loss.LeechBlock– a pre‑norm transformer block with LeechAttention and a feed‑forward network.LeechTransformer– the full model with token/position embeddings, stacked blocks, final norm, and language modelling head.DreamDecoder– evaluates the resonance of a hidden state against the Leech basis.leech_generate()– generates tokens step‑by‑step, printing resonance values and status if desired.
Clone the repository and install dependencies (preferably in a virtual environment):
git clone https://github.com/SPUTNIKAI/leech-lila.git
cd leech-lila
pip install torch numpyCreate a LeechConfig object:
from leech_lila import LeechConfig, LeechTransformer, generate_leech_kernel
cfg = LeechConfig(
vocab_size=10000, # size of your token vocabulary
d_model=192, # must be divisible by n_heads and each head_dim divisible by 24
n_layers=12,
n_heads=8,
block_size=512,
dropout=0.05,
bias=False,
tie_weights=True,
lambda_geo=0.01, # weight for geometric loss
resonance_threshold=0.95
)A typical training loop would look like this:
model = LeechTransformer(cfg)
leech_basis = generate_leech_kernel(24) # for loss and monitoring
criterion = LeechResonanceLoss(cfg, leech_basis)
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
for batch in dataloader:
inputs, targets = batch
logits, hidden, ce_loss = model(inputs, targets)
total_loss = criterion(logits, targets, hidden) # includes lambda_geo * geo_loss
total_loss.backward()
optimizer.step()
optimizer.zero_grad()After training, you can generate text while observing the resonance status:
start_tokens = [1, 2, 3] # your starting token ids
result = leech_generate(
model,
start_tokens,
max_len=100,
temperature=0.8,
resonance_check=True,
leech_basis=leech_basis,
threshold=0.95
)The function prints the resonance value and status (DREAMING, AWAKE, or ABSOLUTE GENESIS) at each step. Examples The if name == "main" block in the script provides a minimal example:
python leech_lila.py
@software{kornienko2026,
author = {A.Kornienko},
title = {Leech-Lila: A Geometric Attention Transformer via the Leech Lattice},
month = mar,
year = 2026,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.18784424},
url = {https://doi.org/10.5281/zenodo.18784424}
}