GitHub - alif-munim/minOFT: A minimal re-implementation of orthogonal fine-tuning (OFT) for LLMs. Based on nanoGPT and minLoRA.

MinOFT

An extension of karpathy/nanoGPT which includes orthogonal fine-tuning (OFT), introduced in the paper Controlling Text-to-Image Diffusion by Orthogonal Finetuning. LoRA updates the pretrained weight matrix by adding a product of two low-rank matrices. OFT learns an orthogonal matrix to transform the neurons of the same layer, and it achieves stronger generalization and consistently more stable training than LoRA.

Main Additions

These are the main files that build upon karpathy/nanoGPT and cccntu/minLoRA.

finetuning/modular_oft.py is the OFT implementation simplified to ~500 lines of code.
train_nanogpt.py is an extension of the original train.py script from karpathy/nanoGPT but now the config files have additional flags for use_mlora, use_moft (modular LoRA and OFT) as well as use_plora (parametrized LoRA, from cccntu/minLoRA).
train_huggingface.py is a demo script for fine-tuning a roberta-base model on the eli5 dataset.
push_to_hub.py allows you to push your fine-tuned GPT2 models to the huggingface hub with the same config file used for training.

Training on Tiny Shakespeare

To prepare the tiny-shakespeare text dataset for GPT2 and start training, run the following commands:

python data/shakespeare/prepare.py
python train_nanogpt.py config/finetune_shakespeare.py

To push your model to Hugging Face Hub after training, run the following command:

python push_to_hub.py config/finetune_shakespeare.py

Below is a sample training run on tiny shakespeare (100 iterations) comparing regular LoRA and OFT.

Training on Guanaco

To prepare the openassistant-guanaco instruction tuning dataset for GPT2 and start training, run the following commands:

python data/guanaco/prepare.py
python train_nanogpt.py config/finetune_guanaco.py

To push your model to Hugging Face Hub after training, run the following command:

python push_to_hub.py config/finetune_guanaco.py

Below is a sample training run on guanaco (100 iterations) comparing regular fine-tuning, LoRA, and OFT.

General Usage

You can install minOFT using the commands below:

git clone https://github.com/alif-munim/minOFT.git
cd minOFT/
pip install -e .

Then, you can inject OFT into any PyTorch model with linear layers.

from minoft.modular_oft import inject_trainable_oft
import torch

# Load model and tokenizer, then freeze model weights
# ...
model.requires_grad_(False)

# Set OFT parameters
oft_r=4
oft_eps=1e-3
oft_coft=False
oft_block_share=False
normalize=False
search_class=[nn.Linear] # Default is only nn.Linear, but you can also pass nn.Conv2d

# Set training and optimization parameters
learning_rate=2e-5
weight_decay=0.01
beta1 = 0.9
beta2 = 0.95

# Replace modules with trainable OFT linear modules
ft_modules = ["CausalSelfAttention"] # Modules will be specific to your model, but you can target any number of them
oft_params, train_names = inject_trainable_oft(model, target_replace_module=ft_modules, 
                                              verbose=False, r=oft_r, eps=oft_eps, is_coft=oft_coft, block_share=oft_block_share,
                                              normalize=normalize, search_class=search_class)
# Set optimizer
optim_groups = [
    {
        "params": oft_params,
        "weight_decay": weight_decay
    }
]
optimizer = torch.optim.AdamW(optim_groups, lr=learning_rate, betas=(beta1, beta2))

# Train as usual
# ...

You can also try fine-tuning a roberta-base model on the eli5 dataset using the train_huggingface.py script. Just make sure that whatever causal language model you're using actually has linear layers (GPT2 style models use Conv1D instead of Linear). If you're not sure, you can set debug = True in the training script to check the modules.

python train_huggingface.py

Evaluations

For downstream evaluation, I would highly recommend using EleutherAI/lm-evaluation-harness. The repository is compatible with huggingface models, and after installing you can run evals as follows:

python lm-evaluation-harness/main.py --model hf-causal --model_args pretrained=alif-munim/gpt2-small-guanaco,tokenizer=gpt2 --tasks hellaswag --device cuda:0 --limit 0.05 --num_fewshot 0

Special Thanks

This project is an extension of the karpathy/nanoGPT repository created by Andrej Karpathy.
It is also heavily inspired by cccntu/minLoRA, although the much of the actual LoRA and OFT implementations are borrowed from cloneofsimo/lora and Zeju1997/oft respectively.
- The original minLoRA implementation has been moved to finetuning.parametrized_lora and can be set using use_plora in your config file. The modular versions of LoRA and OFT can be set using the use_mlora and use_moft flags.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
config		config
data		data
minoft		minoft
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
configurator.py		configurator.py
model.py		model.py
push_to_hub.py		push_to_hub.py
sample.py		sample.py
setup.py		setup.py
train_huggingface.py		train_huggingface.py
train_nanogpt.py		train_nanogpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MinOFT

Main Additions

Training on Tiny Shakespeare

Training on Guanaco

General Usage

Evaluations

Special Thanks

To-Do

About

Releases

Packages

Languages

alif-munim/minOFT

Folders and files

Latest commit

History

Repository files navigation

MinOFT

Main Additions

Training on Tiny Shakespeare

Training on Guanaco

General Usage

Evaluations

Special Thanks

To-Do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages