Unofficial implementation of TinyLoRA from the paper "Learning to Reason in 13 Parameters" by Morris et al.
This repository provides a clean, documented implementation of the TinyLoRA technique, achieving extreme parameter efficiency by replacing trainable matrices with tiny projected vectors.
Key Result: Fine-tune Mistral-7B with only ~14,000 trainable parameters and achieve 53.6% accuracy on GSM8K.
TinyLoRA builds on LoRA-XS but takes parameter efficiency to the extreme:
| Method | Trainable Component | Parameters (7B model) |
|---|---|---|
| LoRA | B and A matrices | ~40M |
| LoRA-XS | r×r matrix R | ~900K |
| TinyLoRA | u-dim vector v | ~14K |
Standard LoRA: ΔW = B @ A (trains B, A)
LoRA-XS: ΔW = B @ R @ A (trains r×r matrix R)
TinyLoRA: ΔW = B @ (Σᵢ vᵢPᵢ) @ A (trains u-dim vector v)
Where:
- B, A: Frozen matrices initialized via SVD of pretrained weights
- v: Trainable vector of size
u(default: 64 parameters) - P: Fixed random tensor of shape
(u, r, r) - The projection
Σᵢ vᵢPᵢcreates an r×r matrix from the tiny v vector
Use n_tie to share v vectors across multiple layers:
n_tie=1: Each layer has its own v (default)n_tie=8: Every 8 layers share one v (8x fewer parameters)
git clone https://github.com/your-repo/TinyLoRA.git
cd TinyLoRA
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtSingle GPU:
python train_tinylora.py \
--base_model mistralai/Mistral-7B-v0.1 \
--dataset meta-math/MetaMathQA \
--output_dir ./outputMulti-GPU (8 GPUs):
torchrun --nproc_per_node=8 train_tinylora.py \
--base_model mistralai/Mistral-7B-v0.1 \
--dataset meta-math/MetaMathQA \
--output_dir ./output# Evaluate merged model
python eval_tinylora.py --model ./output/*/merged
# Or evaluate from adapter (merges automatically)
python eval_tinylora.py \
--adapter ./output/*/final \
--base_model mistralai/Mistral-7B-v0.1| Parameter | Default | Description |
|---|---|---|
--tinylora_u |
64 | Trainable parameters per weight group |
--tinylora_n_tie |
1 | Weight tying factor (higher = fewer params) |
--lora_r |
64 | LoRA rank for SVD initialization |
| Parameter | Default | Description |
|---|---|---|
--base_model |
mistralai/Mistral-7B-v0.1 | Base model to fine-tune |
--dataset |
meta-math/MetaMathQA | HuggingFace dataset |
--dataset_split |
train[:50000] | Dataset split |
--num_train_epochs |
3 | Number of epochs |
--learning_rate |
2e-4 | Learning rate |
--per_device_train_batch_size |
16 | Batch size per GPU |
Minimal parameters (~1,800 params):
python train_tinylora.py --tinylora_u 64 --tinylora_n_tie 8Balanced (~14,000 params):
python train_tinylora.py --tinylora_u 64 --tinylora_n_tie 1Maximum capacity (~57,000 params):
python train_tinylora.py --tinylora_u 256 --tinylora_n_tie 1from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from tinylora import initialize_tinylora
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
torch_dtype=torch.bfloat16,
)
# Apply LoRA config
lora_config = LoraConfig(
r=64,
lora_alpha=64,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0,
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
# Initialize TinyLoRA (freezes LoRA A/B, creates trainable v vectors)
model = initialize_tinylora(
model,
lora_config,
u=64, # trainable params per group
n_tie=1, # weight tying factor
)
# Train as usual with HuggingFace Trainer
# Only v vectors will be updated!from tinylora import save_tinylora_checkpoint
# Save adapter checkpoint
save_tinylora_checkpoint(model, "./checkpoint")
# Creates: tinylora_params.pt, lora_weights.ptfrom tinylora import load_tinylora_checkpoint
model = get_peft_model(base_model, lora_config)
model = load_tinylora_checkpoint(
model,
"./checkpoint",
lora_config,
u=64,
n_tie=1,
)from tinylora import merge_tinylora_to_base
# Merge adapter into base model
merge_tinylora_to_base(
base_model_name="mistralai/Mistral-7B-v0.1",
checkpoint_dir="./checkpoint",
output_dir="./merged_model",
lora_r=64,
u=64,
n_tie=1,
)
# Load merged model (no adapter overhead)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("./merged_model")TinyLoRA/
├── tinylora.py # Core TinyLoRA implementation
├── train_tinylora.py # Training script
├── eval_tinylora.py # Evaluation script (vLLM)
├── requirements.txt # Dependencies
├── utils/
│ ├── svd_utils.py # SVD utilities for initialization
│ └── ...
└── output/ # Training outputs
└── <run_name>/
├── config.json # Run configuration
├── final/ # TinyLoRA adapter
│ ├── tinylora_params.pt
│ └── lora_weights.pt
└── merged/ # Standalone merged model
TinyLoRA is fully compatible with PEFT's merge_and_unload():
- During training: The forward pass computes
B @ (Σᵢ vᵢPᵢ) @ A - For merging:
get_delta_weight()computes the same projection - After merge: ΔW is added to base weights, no adapter overhead
# After training
projection = model.tinylora_projection.get_projection(group_id)
delta_W = lora_B @ projection @ lora_A # Full weight update
# Merge into base
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged")| Model | Method | Trainable Params | GSM8K Accuracy |
|---|---|---|---|
| Mistral-7B | Base (no tuning) | 0 | ~0% |
| Mistral-7B | TinyLoRA (u=64, n_tie=1) | 14,336 | 53.6% |
This is an unofficial implementation. Please cite the original paper:
@article{morris2026learning,
title={Learning to Reason in 13 Parameters},
author={Morris, John X. and Mireshghallah, Niloofar and Ibrahim, Mark and Mahloujifar, Saeed},
journal={arXiv preprint arXiv:2602.04118},
year={2026}
}This implementation builds on LoRA-XS:
@article{balazy2024lora,
title={LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters},
author={Ba{\l}azy, Klaudia and Banaei, Mohammadreza and Aberer, Karl and Tabor, Jacek},
journal={arXiv preprint arXiv:2405.17604},
year={2024}
}This project builds on LoRA-XS. See LICENSE for details.