The following is based on a rUv tutorial

Reference:  https://gist.github.com/ruvnet/5cf24851841a120198f43e9639dba7a5

A `LionAGI` Implementation

## Mixture of Experts (MoE)

A **Mixture of Experts (MoE)** is a machine learning approach designed to enhance model performance by using multiple specialized models, called "experts." A gating model dynamically selects the most relevant expert(s) for each input, allowing the system to leverage the most appropriate expertise, thus improving overall accuracy and efficiency.

#### How MoE Works

1. **Training Experts**:
   - Multiple expert models are trained, each specializing in different aspects of the input data or different tasks.
   
2. **Gating Model**:
   - A neural network that dynamically assigns weights to these experts based on the input's features.
   - Routes the data to the most suitable expert(s).

3. **Combining Outputs**:
   - The outputs from the selected experts are combined, usually through a weighted sum, to produce the final result.
   - This enables the system to handle complex and diverse tasks more effectively than a single model.

#### Benefits of MoE

- **Scalability**:
  - MoE can scale model capacity without a proportional increase in computational cost.
  - Activates only a subset of experts for each input, maintaining high performance while managing resource usage.
  
- **Efficiency**:
  - Ideal for applications requiring high throughput and low latency, such as real-time translation and large-scale recommendation systems.
  
- **Accuracy**:
  - Achieves more accurate and tailored outputs by leveraging specialized knowledge from different experts.

MoE is a versatile tool that improves the efficiency and effectiveness of AI models in a wide range of applications, making it a powerful approach in modern machine learning.

In this tutorial, we will go through how to implement such a system using agentic framework `lionagi`. 

First we need to install the package, using `pip`

In [None]:
%pip install lionagi==0.2.4     # if in ipynb
# !pip install lionagi==0.2.4       # if on colab

In [1]:
# let us import lionagi and check the version
import lionagi as li

print(li.__version__)

0.2.4


In [2]:
# set logging, which is for debugging
import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

In this tutorial, the `mixture-of-experts` works by having multiple models working on the same task, 

but with different configuration. We will randomly create these configuration, including
- model (gpt-3.5-turbo, gpt-4o, gpt-4-turbo)
- temperature
- top_p
- ...

In [3]:
import random

models = [
    "google/gemini-pro-1.5",
    "google/gemini-flash-1.5",
    "openai/gpt-4o",
    "openai/gpt-4o-mini",
    "meta-llama/llama-3.1-405b-instruct"
]

def get_random_config(
    temperature_range = (0.7, 1.2),
    top_p_range = (0.8, 1.0),
    frequency_penalty_range = (0.0, 0.5),
    presence_penalty_range = (0.0, 0.5),
    max_tokens = 200,  
):
    return {
        "provider": "openrouter",
        "api_key_schema": "OPENROUTER_API_KEY",
        "model": random.choice(models),
        "temperature": random.uniform(*temperature_range),
        "top_p": random.uniform(*top_p_range),
        "frequency_penalty": random.uniform(*frequency_penalty_range),
        "presence_penalty": random.uniform(*presence_penalty_range),
        "max_tokens": max_tokens,
    }

# a helper function to check does the model output end with a complete sentence

def check_output_completeness(output: str) -> bool:
    if output.endswith(".") or output.endswith("!") or output.endswith("?"):
        return True
    return False

### Expert Models

We will use `iModel` from `lionagi` as base class for the expert model and extend functionalities. 

`iModel` in `lionagi` helps interaction of the system with AI Models such as LLMs. It can, 
- call API endpoint, such as chat completions, embeddings, 
- calculate perplexity score, (a measure used in information)
- token rate limit control
- configuration customization

#### Defining the `ExpertModel` class


In [4]:
import asyncio

class ExpertModel(li.iModel):
    
    def __init__(self, total_reward=0, **kwargs):
        super().__init__(**kwargs)
        
        # total reward is the sum of intrinsic and extrinsic rewards
        # it is used as a reinforcement signal
        self.total_reward = total_reward
    
    # a class method to generate a random expert model instance
    @classmethod
    def random_expert(
        cls, 
        temperature_range = (0.7, 1.2),
        top_p_range = (0.8, 1.0),
        frequency_penalty_range = (0.0, 0.5),
        presence_penalty_range = (0.0, 0.5),
        max_tokens = 1000,  
    ):
        config = get_random_config(
            temperature_range=temperature_range,
            top_p_range=top_p_range,
            frequency_penalty_range=frequency_penalty_range,
            presence_penalty_range=presence_penalty_range,
            max_tokens=max_tokens,
        )
        return cls(**config)

    # the main function to call chat completion
    # where expert generates output based on instruction and context
    async def generate_complete_output(
        self,
        instruction=None, 
        context=None,
        system=None,
        idx=0,
    ):
        # in Branch, we can chat with the model, by
        # passing it into the imodel parameteter, which in this case is self
        # since the class inherits from iModel
        branch = li.Branch(system=system, imodel=self)
        print(f"Generating output for Expert {idx+1}...")
        
        # this is the chat function to get model output
        output = await branch.chat(instruction=instruction, context=context)
        
        # we check if the output is complete
        if not check_output_completeness(output):
            
            # if not complete, we ask the model to continue and complete the sentence
            output += await branch.chat('continue and complete the previous sentence')

        # we now display the configuration of the expert model
        # and show the output
        config = self.to_dict()
        print(f"\nLLM Parameters for Expert {idx+1}:")
        print("------------------------")
        print(f"Model: {config['model']}")
        print(f"Max Tokens per chunk: {config['max_tokens']}")
        print(f"Temperature: {config['temperature']}")
        print(f"Top P: {config['top_p']}")
        print(f"Frequency Penalty: {config['frequency_penalty']}")
        print(f"Presence Penalty: {config['presence_penalty']}")
        print("------------------------")
        
        return output
    
    # a second component of mixture of experts is to choose the best output
    # out of a group of expert outputs
    # here we will lionagi branch.direct to score the outputs
    async def select_best_output(self, candidates, context, n_judge=3):
        
        context = {"background": context}
        
        async def inner_score(candidate):
            # since we are using asyncio, we need to copy the context
            _context = context.copy()
            
            # we add the candidate to the context
            _context["candidate"] = candidate

            branch = li.Branch(imodel=self)
            form = await branch.direct(
                system="Act as a critical judge", 
                instruction="Basing on context, score the text for relevance and coherence",
                context = _context,
                score=True,
                score_range=(0, 1),
                score_num_digits=3,
                retries=3,
            )
            return form.score
        
        # we use a number of judges to score the outputs
        # and take the average score to compare with other outputs
        # we run the scoring in parallel for all judges
        async def get_avg_score(candidate):
            task = [inner_score(candidate) for _ in range(n_judge)]
            return sum(await asyncio.gather(*task)) / n_judge
        
        print("Scoring the outputs...")
        
        # we run the scoring in parallel for all candidates
        # you can use alcall to run a function across all inputs in parallel
        tasks = [get_avg_score(candidate) for candidate in candidates]
        scores = await asyncio.gather(*tasks)
        
        # we now have the scores for all candidates
        # let us sort them and return the best output
        outputs = []
        for idx, candidate in enumerate(candidates):
            outputs.append((idx, candidate, scores[idx]))
        
        return sorted(outputs, key=lambda x: x[2], reverse=True)[0]
    
    
    # a third component of mixture of experts is to assign intrinsic reward
    # to the expert outputs from the scores and additional reward evaluation
    async def assign_intrinsic_reward(self, expert_output, context):
        
        form = await li.direct.select(
            system="Act as a critical judge",
            imodel=self,
            instruction="Basing on context, select a judgement for the model output",
            context={"background": context, "candidate": expert_output},
            choices=["highly effective", "effective", "moderate", "poor", "bad"],
        )

        reward_selection = form.selection.lower().strip()
        match reward_selection:
            case "highly effective":
                return 1.0
            case "effective":
                return 0.7
            case "moderate":
                return 0.5
            case "poor":
                return 0.2
            case _:
                return 0.0

### Mixture of Experts

Now we are ready to create a `MixtureOfExperts` class to optimize output

In [5]:
class MixtureOfExperts:
    
    # we initialize the mixture of experts with a number of experts
    # then we include the gating model and reward model
    # the gating model is used to select the best output from the experts
    # the reward model is used to assign intrinsic reward to the expert outputs
    def __init__(
        self,
        num_experts: int = 4, 
        min_iterations: int = 3, 
        learning_rate: float = 0.1, 
        discount_factor: float = 0.95, 
        exploration_rate: float = 0.2, 
        max_tokens: int = 1000,
    ):
        self.num_experts=num_experts
        self.experts = [
            ExpertModel.random_expert(max_tokens=max_tokens) for _ in range(num_experts)
        ]
        self.gating_model = ExpertModel.random_expert()
        self.reward_model = ExpertModel.random_expert()
        self.min_iterations = min_iterations
        self.learning_rate = learning_rate
        self.discount_factor = discount_factor
        self.exploration_rate = exploration_rate
        self.selection_history = []
    
    # the main function to generate expert outputs
    async def generate_expert_outputs(
        self, instruction, context, system=None, n_judge=3):
        tasks = [
            expert.generate_complete_output(instruction, context, system, idx) 
            for idx, expert in enumerate(self.experts)]
        candidates = await asyncio.gather(*tasks)
        
        # first we select the best output from the experts
        selected_idx, selected_output, _ = await self.gating_model.select_best_output(
            candidates=candidates, 
            context=context, 
            n_judge=n_judge)
        
        # then we calculate the intrinsic reward for the selected output
        intrinsic_reward = await self.reward_model.assign_intrinsic_reward(
            expert_output=selected_output, context=context)
        
        # if the selected expert has been selected before
        # we randomly select an expert to discriminate repetition
        if selected_idx in self.selection_history:
            selected_idx = random.randint(0, self.num_experts - 1)
        
        self.selection_history.append(selected_idx)
        print("Expert output generation complete!")
        logging.info("All expert outputs have been generated and the most relevant expert has been selected.")

        return selected_output, selected_idx, intrinsic_reward

    # we add a check to determine if the training should stop
    def check_termination_condition(self, iteration: int, total_reward: float) -> bool:
        if iteration >= self.min_iterations and total_reward >= 10.0:
            return True
        return False

    # exploration rate manages the exploration-exploitation trade-off by varying 
    # the system's willingness to try different experts.
    def update_exploration_rate(self, iteration: int):
        self.exploration_rate = max(0.1, 1.0 - (iteration / self.min_iterations))

    def update_expert_rewards(self, selected_expert_index: int, reward: float):
        experts_values = [expert.total_reward for expert in self.experts]
        reward = self.learning_rate * (
            reward 
            + self.discount_factor * max(experts_values) 
            - self.experts[selected_expert_index].total_reward
        )
        self.experts[selected_expert_index].total_reward += reward

        
        

In [6]:
context = "Acme Corporation is exploring investment opportunities in emerging technologies. The board seeks insights into which technologies could potentially transform their industry over the next decade."
instruction = "Evaluate the potential impact and investment viability of artificial intelligence (AI), blockchain, quantum computing, and biotechnology."

In [7]:
moe = MixtureOfExperts()
max_tokens = 1000

for iteration in range(moe.min_iterations):
    final_output, selected_expert_index, intrinsic_reward = await moe.generate_expert_outputs(instruction, context)

    print(f"Iteration {iteration+1} - Selected Expert: {selected_expert_index}, Intrinsic Reward: {intrinsic_reward}")
    print("Expert Values:", [expert.total_reward for expert in moe.experts])
    print("Final Expert Output:")
    print(final_output)
    print("------------------------")

    # # Get reward from an external expert
    # expert_reward = float(input(f"Enter expert reward for iteration {iteration+1}: "))

    expert_reward = random.uniform(0.3, 1)

    # Combine intrinsic and expert rewards
    total_reward = intrinsic_reward + expert_reward

    print(f"Expert Reward: {expert_reward}, Total Reward: {total_reward}")

    # Update the value estimate of the selected expert
    moe.update_expert_rewards(selected_expert_index, total_reward)

    # Update exploration rate based on the current iteration
    moe.update_exploration_rate(iteration)

    # Check termination condition
    if moe.check_termination_condition(iteration, total_reward):
        print("Termination condition met. Stopping the process.")
        break

Generating output for Expert 1...
Generating output for Expert 2...
Generating output for Expert 3...
Generating output for Expert 4...

LLM Parameters for Expert 3:
------------------------
Model: google/gemini-pro-flash
Max Tokens per chunk: 1000
Temperature: 0.8958890919885891
Top P: 0.8717363999866303
Frequency Penalty: 0.31753362792610845
Presence Penalty: 0.18864066902256277
------------------------

LLM Parameters for Expert 4:
------------------------
Model: openai/gpt-4o-mini
Max Tokens per chunk: 1000
Temperature: 0.9787904352444845
Top P: 0.9068555273368921
Frequency Penalty: 0.18585972529939276
Presence Penalty: 0.027041010312771396
------------------------

LLM Parameters for Expert 2:
------------------------
Model: meta-llama/llama-3.1-405b-instruct
Max Tokens per chunk: 1000
Temperature: 1.188752399240009
Top P: 0.8016784950838419
Frequency Penalty: 0.4295613205518591
Presence Penalty: 0.08466937287002219
------------------------

LLM Parameters for Expert 1:
----------

2024-08-09 11:51:54,976 - INFO - All expert outputs have been generated and the most relevant expert has been selected.


Expert output generation complete!
Iteration 1 - Selected Expert: 0, Intrinsic Reward: 1.0
Expert Values: [0, 0, 0, 0]
Final Expert Output:
### Evaluation of Emerging Technologies for Acme Corporation

#### 1. Artificial Intelligence (AI)

**Potential Impact:**
- **Automation and Efficiency:** AI can significantly enhance productivity by automating routine tasks, leading to cost savings and increased efficiency.
- **Data Analytics:** AI-driven data analytics can provide deeper insights into customer behavior, market trends, and operational inefficiencies.
- **Personalization:** AI enables hyper-personalized customer experiences, driving higher engagement and satisfaction.
- **Innovation:** AI fosters innovation in product development, supply chain optimization, and decision-making processes.

**Investment Viability:**
- **Market Growth:** The AI market is projected to grow substantially over the next decade, offering lucrative investment opportunities.
- **Applications Across Industrie

CancelledError: 