This repository implements adaptive transformer layers (ACT / Ponder-style) with shared KV-cache, trained on:
- Synthetic arithmetic dataset (addition & multiplication)
- BabyLM language modeling dataset
train.py: Main training script (loads config, builds model, trains & evaluates)train_earlyexit.py: Training script for models with early-exit headsconfigs/: YAML configs for different models/datasets (e.g.babylm_act.yaml,ponder.yaml)models/: Transformer blocks, ACT/Ponder modules, early-exit heads, shared KV-cachedataset/: Dataset loaders for synthetic arithmetic and BabyLMscripts/: Helper scripts (e.g.download_babylm.shfor dataset download)Notebooks/: Demo Jupyter notebooks with sample runs, inputs and outputs
Tested with Python ≥ 3.9.
Clone and install:
git clone https://github.com/ACharacterInASimulation/adaptive-computation.git
cd adaptive-computation
pip install -r requirements.txtDownload using the provided script:
./scripts/download_babylm.shThis downloads BabyLM from publicly available links into the expected data/ folder (see the script for exact paths).
The synthetic addition/multiplication data is generated on-the-fly inside the dataset loader.
BabyLM + ACT model:
python train.py --configs ./configs/babylm_act.yamlSynthetic arithmetic + Ponder model:
python train.py --configs ./configs/ponder.yamlAll available experiment configs are in ./configs.