# MoE-PEFT: An Efficient LLM Fine-Tuning Factory for Mixture of Expert (MoE) Parameter-Efficient Fine-Tuning.
[![](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml/badge.svg)](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml)
[![](https://img.shields.io/github/stars/TUDB-Labs/MoE-PEFT?logo=GitHub&style=flat)](https://github.com/TUDB-Labs/MoE-PEFT/stargazers)
[![](https://img.shields.io/github/v/release/TUDB-Labs/MoE-PEFT?logo=Github)](https://github.com/TUDB-Labs/MoE-PEFT/releases/latest)
[![](https://img.shields.io/pypi/v/moe_peft?logo=pypi)](https://pypi.org/project/moe_peft/)
[![](https://img.shields.io/docker/v/mikecovlee/moe_peft?logo=Docker&label=docker)](https://hub.docker.com/r/mikecovlee/moe_peft/tags)
[![](https://img.shields.io/github/license/TUDB-Labs/MoE-PEFT)](http://www.apache.org/licenses/LICENSE-2.0)

MoE-PEFT is an open-source *LLMOps* framework built on [m-LoRA](https://github.com/TUDB-Labs/mLoRA). It is designed for high-throughput fine-tuning, evaluation, and inference of Large Language Models (LLMs) using techniques such as MoE + Others (like LoRA, DoRA). Key features of MoE-PEFT include:

- Concurrent fine-tuning, evaluation, and inference of multiple adapters with a shared pre-trained model.

- **MoE PEFT** optimization, mainly for [MixLoRA](https://github.com/TUDB-Labs/MixLoRA) and other MoLE implementation.

- Support for multiple PEFT algorithms and various pre-trained models.

- Seamless integration with the [HuggingFace](https://huggingface.co) ecosystem.

## About this notebook

This is a simple jupiter notebook for showcasing the basic process of fine-tuning **TinyLLaMA** with dummy data.

## Clone and install MoE-PEFT

In [None]:
! pip uninstall torchvision torchaudio -y
! pip install moe_peft

Found existing installation: torchvision 0.21.0+cu124
Uninstalling torchvision-0.21.0+cu124:
  Successfully uninstalled torchvision-0.21.0+cu124
Found existing installation: torchaudio 2.6.0+cu124
Uninstalling torchaudio-2.6.0+cu124:
  Successfully uninstalled torchaudio-2.6.0+cu124
Collecting moe_peft
  Downloading moe_peft-2.0.2-py3-none-any.whl.metadata (12 kB)
Collecting torch<2.6.0,>=2.4.0 (from moe_peft)
  Downloading torch-2.5.1-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting datasets (from moe_peft)
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting evaluate (from moe_peft)
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting transformers<4.47.0,>=4.44.0 (from moe_peft)
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken (from moe_peft)
  Downloading tiktoken-0.9.0-c

## Loading the base model

In [None]:
import torch

import moe_peft

moe_peft.setup_logging("INFO")

base_model = "DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B"

model = moe_peft.LLMModel.from_pretrained(
    base_model,
    device=moe_peft.executor.default_device_name(),
    load_dtype=torch.bfloat16,
)
tokenizer = moe_peft.Tokenizer(base_model)

## Training a dummy LoRA adapter

In [None]:
lora_config = moe_peft.adapter_factory(
    peft_type="LORA",
    adapter_name="lora_0",
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
    ],
)

model.init_adapter(lora_config)

train_config = moe_peft.TrainConfig(
    adapter_name="lora_0",
    data_path="ANTEGRAL/korean-persona-chat-v1",
    num_epochs=10,
    batch_size=16,
    micro_batch_size=8,
    learning_rate=1e-4,
)

moe_peft.train(model=model, tokenizer=tokenizer, configs=[train_config])

## Validate the effectiveness of LoRA adapter

In [None]:
generate_config = moe_peft.GenerateConfig(
    adapter_name="lora_0",
    prompts=["Could you provide an introduction to MoE-PEFT?"],
    stop_token="\n",
)

output = moe_peft.generate(
    model=model, tokenizer=tokenizer, configs=[generate_config], max_gen_len=128
)

print(output["lora_0"][0])