Skip to content

Unofficial implementation of TinyLoRA from the paper "Learning to Reason in 13 Parameters" by Morris et al. This repository provides a clean, documented implementation of the TinyLoRA technique, achieving extreme parameter efficiency by replacing trainable matrices with tiny projected vectors.

Notifications You must be signed in to change notification settings

RobotSail/TinyLoRA

 
 

Repository files navigation

TinyLoRA: Extreme Parameter-Efficient Fine-Tuning

Unofficial implementation of TinyLoRA from the paper "Learning to Reason in 13 Parameters" by Morris et al.

This repository provides a clean, documented implementation of the TinyLoRA technique, achieving extreme parameter efficiency by replacing trainable matrices with tiny projected vectors.

Key Result: Fine-tune Mistral-7B with only ~14,000 trainable parameters and achieve 53.6% accuracy on GSM8K.

How It Works

TinyLoRA builds on LoRA-XS but takes parameter efficiency to the extreme:

Method Trainable Component Parameters (7B model)
LoRA B and A matrices ~40M
LoRA-XS r×r matrix R ~900K
TinyLoRA u-dim vector v ~14K

Architecture

Standard LoRA:   ΔW = B @ A              (trains B, A)
LoRA-XS:         ΔW = B @ R @ A          (trains r×r matrix R)
TinyLoRA:        ΔW = B @ (Σᵢ vᵢPᵢ) @ A  (trains u-dim vector v)

Where:

  • B, A: Frozen matrices initialized via SVD of pretrained weights
  • v: Trainable vector of size u (default: 64 parameters)
  • P: Fixed random tensor of shape (u, r, r)
  • The projection Σᵢ vᵢPᵢ creates an r×r matrix from the tiny v vector

Weight Tying

Use n_tie to share v vectors across multiple layers:

  • n_tie=1: Each layer has its own v (default)
  • n_tie=8: Every 8 layers share one v (8x fewer parameters)

Quick Start

Installation

git clone https://github.com/your-repo/TinyLoRA.git
cd TinyLoRA

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Training

Single GPU:

python train_tinylora.py \
    --base_model mistralai/Mistral-7B-v0.1 \
    --dataset meta-math/MetaMathQA \
    --output_dir ./output

Multi-GPU (8 GPUs):

torchrun --nproc_per_node=8 train_tinylora.py \
    --base_model mistralai/Mistral-7B-v0.1 \
    --dataset meta-math/MetaMathQA \
    --output_dir ./output

Evaluation

# Evaluate merged model
python eval_tinylora.py --model ./output/*/merged

# Or evaluate from adapter (merges automatically)
python eval_tinylora.py \
    --adapter ./output/*/final \
    --base_model mistralai/Mistral-7B-v0.1

Configuration Options

TinyLoRA Parameters

Parameter Default Description
--tinylora_u 64 Trainable parameters per weight group
--tinylora_n_tie 1 Weight tying factor (higher = fewer params)
--lora_r 64 LoRA rank for SVD initialization

Training Parameters

Parameter Default Description
--base_model mistralai/Mistral-7B-v0.1 Base model to fine-tune
--dataset meta-math/MetaMathQA HuggingFace dataset
--dataset_split train[:50000] Dataset split
--num_train_epochs 3 Number of epochs
--learning_rate 2e-4 Learning rate
--per_device_train_batch_size 16 Batch size per GPU

Example Configurations

Minimal parameters (~1,800 params):

python train_tinylora.py --tinylora_u 64 --tinylora_n_tie 8

Balanced (~14,000 params):

python train_tinylora.py --tinylora_u 64 --tinylora_n_tie 1

Maximum capacity (~57,000 params):

python train_tinylora.py --tinylora_u 256 --tinylora_n_tie 1

Using TinyLoRA in Your Code

Basic Usage

from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from tinylora import initialize_tinylora

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    torch_dtype=torch.bfloat16,
)

# Apply LoRA config
lora_config = LoraConfig(
    r=64,
    lora_alpha=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0,
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)

# Initialize TinyLoRA (freezes LoRA A/B, creates trainable v vectors)
model = initialize_tinylora(
    model,
    lora_config,
    u=64,        # trainable params per group
    n_tie=1,     # weight tying factor
)

# Train as usual with HuggingFace Trainer
# Only v vectors will be updated!

Saving Checkpoints

from tinylora import save_tinylora_checkpoint

# Save adapter checkpoint
save_tinylora_checkpoint(model, "./checkpoint")
# Creates: tinylora_params.pt, lora_weights.pt

Loading Checkpoints

from tinylora import load_tinylora_checkpoint

model = get_peft_model(base_model, lora_config)
model = load_tinylora_checkpoint(
    model,
    "./checkpoint",
    lora_config,
    u=64,
    n_tie=1,
)

Merging for Deployment

from tinylora import merge_tinylora_to_base

# Merge adapter into base model
merge_tinylora_to_base(
    base_model_name="mistralai/Mistral-7B-v0.1",
    checkpoint_dir="./checkpoint",
    output_dir="./merged_model",
    lora_r=64,
    u=64,
    n_tie=1,
)

# Load merged model (no adapter overhead)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("./merged_model")

File Structure

TinyLoRA/
├── tinylora.py           # Core TinyLoRA implementation
├── train_tinylora.py     # Training script
├── eval_tinylora.py      # Evaluation script (vLLM)
├── requirements.txt      # Dependencies
├── utils/
│   ├── svd_utils.py      # SVD utilities for initialization
│   └── ...
└── output/               # Training outputs
    └── <run_name>/
        ├── config.json   # Run configuration
        ├── final/        # TinyLoRA adapter
        │   ├── tinylora_params.pt
        │   └── lora_weights.pt
        └── merged/       # Standalone merged model

How Merging Works

TinyLoRA is fully compatible with PEFT's merge_and_unload():

  1. During training: The forward pass computes B @ (Σᵢ vᵢPᵢ) @ A
  2. For merging: get_delta_weight() computes the same projection
  3. After merge: ΔW is added to base weights, no adapter overhead
# After training
projection = model.tinylora_projection.get_projection(group_id)
delta_W = lora_B @ projection @ lora_A  # Full weight update

# Merge into base
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged")

Results

Model Method Trainable Params GSM8K Accuracy
Mistral-7B Base (no tuning) 0 ~0%
Mistral-7B TinyLoRA (u=64, n_tie=1) 14,336 53.6%

Citation

This is an unofficial implementation. Please cite the original paper:

@article{morris2026learning,
  title={Learning to Reason in 13 Parameters},
  author={Morris, John X. and Mireshghallah, Niloofar and Ibrahim, Mark and Mahloujifar, Saeed},
  journal={arXiv preprint arXiv:2602.04118},
  year={2026}
}

This implementation builds on LoRA-XS:

@article{balazy2024lora,
  title={LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters},
  author={Ba{\l}azy, Klaudia and Banaei, Mohammadreza and Aberer, Karl and Tabor, Jacek},
  journal={arXiv preprint arXiv:2405.17604},
  year={2024}
}

License

This project builds on LoRA-XS. See LICENSE for details.

About

Unofficial implementation of TinyLoRA from the paper "Learning to Reason in 13 Parameters" by Morris et al. This repository provides a clean, documented implementation of the TinyLoRA technique, achieving extreme parameter efficiency by replacing trainable matrices with tiny projected vectors.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.4%
  • Shell 1.6%