# MoE-PEFT: An Efficient LLM Fine-Tuning Factory for Mixture of Expert (MoE) Parameter-Efficient Fine-Tuning.
[![](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml/badge.svg)](https://github.com/TUDB-Labs/MoE-PEFT/actions/workflows/python-test.yml)
[![](https://img.shields.io/github/stars/TUDB-Labs/MoE-PEFT?logo=GitHub&style=flat)](https://github.com/TUDB-Labs/MoE-PEFT/stargazers)
[![](https://img.shields.io/github/v/release/TUDB-Labs/MoE-PEFT?logo=Github)](https://github.com/TUDB-Labs/MoE-PEFT/releases/latest)
[![](https://img.shields.io/pypi/v/moe_peft?logo=pypi)](https://pypi.org/project/moe_peft/)
[![](https://img.shields.io/docker/v/mikecovlee/moe_peft?logo=Docker&label=docker)](https://hub.docker.com/r/mikecovlee/moe_peft/tags)
[![](https://img.shields.io/github/license/TUDB-Labs/MoE-PEFT)](http://www.apache.org/licenses/LICENSE-2.0)

MoE-PEFT is an open-source *LLMOps* framework built on [m-LoRA](https://github.com/TUDB-Labs/mLoRA). It is designed for high-throughput fine-tuning, evaluation, and inference of Large Language Models (LLMs) using techniques such as MoE + Others (like LoRA, DoRA). Key features of MoE-PEFT include:

- Concurrent fine-tuning, evaluation, and inference of multiple adapters with a shared pre-trained model.

- **MoE PEFT** optimization, mainly for [MixLoRA](https://github.com/TUDB-Labs/MixLoRA) and other MoLE implementation.

- Support for multiple PEFT algorithms and various pre-trained models.

- Seamless integration with the [HuggingFace](https://huggingface.co) ecosystem.

## About this notebook

This is a simple jupiter notebook for showcasing the basic process of building chatbot with **Gemma-2 2B**.

## Clone and install MoE-PEFT

In [None]:
! pip uninstall torchvision torchaudio -y
! pip install moe_peft

## Loading the model

In [None]:
import torch

import moe_peft

base_model = "google/gemma-2-2b-it"

model = moe_peft.LLMModel.from_pretrained(
    base_model,
    device=moe_peft.executor.default_device_name(),
    load_dtype=torch.bfloat16,
)
tokenizer = moe_peft.Tokenizer(base_model)

model.init_adapter(moe_peft.AdapterConfig(adapter_name="default"))

gen_config = moe_peft.GenerateConfig(adapter_name="default")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Build a chatbot

In [11]:
from IPython.display import Markdown
import textwrap


def display_chat(prompt, text):
    formatted_prompt = (
        "<font color='brown'>🙋‍♂️<blockquote>" + prompt + "</blockquote></font>"
    )
    text = text.replace("•", "  *")
    text = textwrap.indent(text, "> ", predicate=lambda _: True)
    formatted_text = "<font color='teal'>🤖\n\n" + text + "\n</font>"
    return Markdown(formatted_prompt + formatted_text)


def to_markdown(text):
    text = text.replace("•", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))


class ChatState:
    """
    Manages the conversation history for a turn-based chatbot
    Follows the turn-based conversation guidelines for the Gemma family of models
    documented at https://ai.google.dev/gemma/docs/formatting
    """

    __START_TURN_USER__ = "<start_of_turn>user\n"
    __START_TURN_MODEL__ = "<start_of_turn>model\n"
    __END_TURN__ = "<end_of_turn>"

    def __init__(
        self,
        model: moe_peft.LLMModel,
        tokenizer: moe_peft.Tokenizer,
        gen_config: moe_peft.GenerateConfig,
        system: str = "",
    ):
        """
        Initializes the chat state.

        Args:
            model: The language model to use for generating responses.
            system: (Optional) System instructions or bot description.
        """
        self.model = model
        self.tokenizer = tokenizer
        self.gen_config = gen_config
        self.system = system
        self.history = []

    def add_to_history_as_user(self, message):
        """
        Adds a user message to the history with start/end turn markers.
        """
        self.history.append(
            self.__START_TURN_USER__ + message + self.__END_TURN__ + "\n"
        )

    def add_to_history_as_model(self, message):
        """
        Adds a model response to the history with start/end turn markers.
        """
        self.history.append(self.__START_TURN_MODEL__ + message)

    def get_history(self):
        """
        Returns the entire chat history as a single string.
        """
        return "".join([*self.history])

    def get_full_prompt(self):
        """
        Builds the prompt for the language model, including history and system description.
        """
        prompt = self.get_history() + self.__START_TURN_MODEL__
        if len(self.system) > 0:
            prompt = self.system + "\n" + prompt
        return prompt

    def send_message(self, message):
        """
        Handles sending a user message and getting a model response.

        Args:
            message: The user's message.

        Returns:
            The model's response.
        """
        self.add_to_history_as_user(message)
        prompt = self.get_full_prompt()
        self.gen_config.prompts = [prompt]
        response = moe_peft.generate(
            self.model, self.tokenizer, [self.gen_config], max_gen_len=2048
        )[self.gen_config.adapter_name][0]
        result = response.replace(prompt, "").replace(
            self.__END_TURN__, ""
        )  # Extract only the new response
        self.add_to_history_as_model(result)
        return result

## Chat with the model

In [12]:
chat = ChatState(model, tokenizer, gen_config)
message = "Tell me, in a few words,  how to compute all prime numbers up to 1000?"
display_chat(message, chat.send_message(message))

<font color='brown'>🙋‍♂️<blockquote>Tell me, in a few words,  how to compute all prime numbers up to 1000?</blockquote></font><font color='teal'>🤖

> Sieve of Eratosthenes.
> 
> **Explanation:** This is an efficient algorithm for finding prime numbers.  

</font>

In [13]:
message = "Now in Python! No numpy, please!"
display_chat(message, chat.send_message(message))

<font color='brown'>🙋‍♂️<blockquote>Now in Python! No numpy, please!</blockquote></font><font color='teal'>🤖

> ```python
> def sieve_of_eratosthenes(n):
>   """Returns a list of prime numbers up to n."""
>   primes = [True] * (n + 1)
>   primes[0] = primes[1] = False
> 
>   for i in range(2, int(n ** 0.5) + 1):
>     if primes[i]:
>       for j in range(i*i, n + 1, i):
>         primes[j] = False
> 
>   return [i for i, is_prime in enumerate(primes) if is_prime]
> 
> print(sieve_of_eratosthenes(1000))
> ```
> 
> 
> **Explanation:**
> 
> 1. **Initialization:** A boolean list `primes` is created with size `n+1`, representing potential primes from 0 to n. Initially, both 0 and 1 are marked as non-primes.
> 2. **Iteration:** The loop starts from 2 up to the square root of `n`. We only need to check divisors up to the square root because any composite number has a prime factor less than or equal to its square root. 
> 3. **Marking Non-Primes:** For each prime `i`, its multiples starting from `i*i` are marked as non-primes in the `primes` array.
> 4. **Returning Primes:** Finally, we create a new list by filtering the `primes` array using list comprehension, keeping only those indices where `primes[i]` is True, indicating that the corresponding index corresponds to a prime number. 
> 
> 
> 
> Let me know if you have any further questions or need additional explanations.
</font>

In [14]:
message = "Thank you, it works! Can you explain the code in French?"
display_chat(message, chat.send_message(message))

<font color='brown'>🙋‍♂️<blockquote>Thank you, it works! Can you explain the code in French?</blockquote></font><font color='teal'>🤖

> La fonction `sieve_of_eratosthenes(n)` renvoie une liste de nombres premiers jusqu'à `n`.
> 
> ### Explication:
>  La fonction utilise la méthode de Sieve d’Erathostène pour trouver les nombres premiers. Voici comment ça marche: 
> 
> 1. **Initialisation**: 
>    -  On crée une liste booléenne `primes` de taille `n+1` représentant des nombres potentiels qui sont premiers. Les éléments initiaux sont tous définis comme `True`.  Nous assignons les valeurs `False` à 0 et 1 car ils ne sont pas premiers.
> 
> 2. **Itération**: 
>     - On commence par le nombre 2 jusqu'au carré root de `n` (inclusif). Cela signifie que nous allons vérifier seulement les diviseurs potentiels jusqu'à la racine carrée de `n`. 
> 3. **Marquer les non-premiers**: 
>     - Pour chaque nombre premier `i`, on vérifie ses multiples (commençant par `i*i`) en plaçant les valeurs de `primes[j]` à `False`.
> 
> 
> 4. **Retourner les nombres premiers**: 
>    -  Enfin, un nouvel array est construit en filtrant la liste `primes`  en utilisant la syntaxe de liste comprehension. On sélectionne uniquement les indices correspondant à des nombres premiers où `primes[i]` est `True`. 
> 
> 
> 
> 
> 
>  
> 
> J'espère que cela vous aide à comprendre la fonction plus clairement! Si vous avez des questions supplémentaires, n'hésitez pas à les poser. 

</font>

In [15]:
message = "Great! Now add those explanations as comments in the code."
display_chat(message, chat.send_message(message))

<font color='brown'>🙋‍♂️<blockquote>Great! Now add those explanations as comments in the code.</blockquote></font><font color='teal'>🤖

> ```python
> def sieve_of_eratosthenes(n):
>   """Returns a list of prime numbers up to n."""
>   #Initialize a boolean list 'primes' with size n+1 to represent all possible prime numbers.
>   primes = [True] * (n + 1) 
>   primes[0] = primes[1] = False # 0 and 1 are not primes.
> 
>   # Iterate through the integers from 2 up to the square root of n.
>   for i in range(2, int(n ** 0.5) + 1): 
>     if primes[i]:
>       # If the current integer 'i' is prime
>       # Mark all multiples of 'i' as non-prime by setting their corresponding value in 'primes' to False
>       for j in range(i * i, n + 1, i): 
>           primes[j] = False 
> 
>   #Filter the 'primes' list to obtain a list of prime numbers
>   return [i for i, is_prime in enumerate(primes) if is_prime]
> 
> print(sieve_of_eratosthenes(1000))
> 
> ```
> 
> 
> 
> 
> Let me know if you would like further details or clarifications! 😄 
</font>