<a href="https://colab.research.google.com/gist/ruvnet/5cf24851841a120198f43e9639dba7a5/ruv-final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to the Recursive Unified Validators (rUv) MoE Toolkit

Reimagined by @rUv, the Recursive Unified Validators (rUv) toolkit redefines the landscape of Ai optimization.

## **What's the future of software look like?**
### Introducing rUv MoE Toolkit: Powering software with self-learning & auto-enhancement. Supercharging older models, drastically boosting Ai performance and significantly reducing costs.

Imagine if you could give super powers to older, less capable Ai models. The difference in performance, cost and capabilities is drastically different between the newest models and smaller or older models.

**What if there was a way to automatically & easily tune older, cheaper, less capable models to greatly improve them?**

My approach uses what I'm calling Recursive Unified Validators (rUv). It's an AI optimization framework that leverages a Mixture of Experts (MoE) with a self-optimization & training methodology. It reimagines AI optimization by combining reinforcement learning, self-optimization/hyper-tuning, and an autonomous self evolving architecture.

Built using DSPy, rUv allows for seamless integration of expert modules and facilitates the creation of powerful AI systems.

# Core Benefits
- **It evolves as it learns, auto-optimizing itself:** Using an internal teleprompter, it can create its own internal prompts on the fly, learning new things based any information / data / requests made to it.

- **Efficiency through Resource Optimization:**  rUv optimizes computational resources by dynamically selecting the most relevant & intelligent expert models for each task.
- **Hyper-Tuning:** Each model is hyper optimized for specific topic or domain using a automatic fine tuning based on internal reward system (reinforcement learning with human feedback).

- **Accuracy via Tailored Outputs:**  The framework generates tailored outputs by leveraging the specialized knowledge of multiple expert models. Automatically selects the best expert by testing for the best results.

- **Flexibility with Versatile Application:**  rUv can be applied to a wide range of domains and tasks, making it a versatile tool for various AI applications.

- **Insight Generation through Continuous Learning:** The self-learning capabilities of rUv enable it to continuously generate valuable insights and improve its performance over time.
- *Automation:*  rUv is great for automating various actions or task that require the application to learn and adjust. Think self driving software.

## Novel Features
- **Reinforcement Learning and Self-Optimization Self-Learning capabilities:** rUv continuously learns and improves its performance through reinforcement learning techniques.
- **Self-Optimizing Architecture:** The framework dynamically adjusts its architecture to adapt to different tasks and optimize its performance.
- **Dynamic Expert Model Selection Mixture of Experts (MoE) approach:** rUv employs a MoE approach, where multiple expert models are trained to specialize in different domains or tasks.
- **Context-aware selection:** The gating model dynamically selects the most relevant expert model based on the input context.

- **Enhanced Performance through Hyper-Tuning** rUv allows for fine-grained control over various hyperparameters, enabling users to tune the system for optimal performance based on their specific requirements.

- **Adaptable Architecture for Output Generation** The framework generates comprehensive outputs by combining the knowledge and capabilities of multiple expert models, resulting in more accurate and diverse results.

- **Auto Completion of Content or Code:** The rUv MoE Toolkit supports output continuation to generate more comprehensive responses. It recursively prompts the expert models to extend their outputs until a satisfactory level of completeness is achieved, as determined by checking for proper conclusion markers like context, grammar or code syntax, periods, exclamation points, or question marks at the end of the generated text.

## Uses

- **Business Analysis**: Offers detailed evaluations of market trends, investment opportunities, and technology impacts.
- **Code Development**: Assists in the generation, review, and optimization of code across various programming languages.
- **Creative Writing**: Enhances story creation, scriptwriting, and content development with innovative AI insights.
- **Academic Research**: Supports comprehensive analyses of complex topics, backed by up-to-date references and data.

# Technical Configuration Overview

## The Recursive Unified Validators (rUv) Toolkit operates under a set of technical parameters critical to its functionality.

The rUv parameters are pivotal in dictating the system's behavior and output quality.

Adjusting these settings enables precise control over the toolkit, ensuring optimal operation across various scenarios by fine-tuning efficiency, enhancing accuracy, and maintaining flexibility for a range of applications.


### ü§ñ Number of Expert Models

- **Purpose**: Determines the range and specialization of the expert models within the system.
- **Impact**: More experts increase topic coverage but require additional computational resources.
- **Configurable Range**: Typically between 3 to 12, with higher values offering greater diversity and specialization.

### üîÑ Minimum Number of Iterations

- **Purpose**: Ensures a meaningful exploration and refinement process by running the system for a sufficient number of iterations.
- **Impact**: Higher iteration counts allow for more thorough output refinement and system adaptation.
- **Configurable Range**: Common settings range from 3 for quick tasks to 15 for in-depth refinement.

### üìà Learning Rate

- **Purpose**: Adjusts the speed at which the system adapts by controlling the step size of expert value updates.
- **Impact**: Balances between fast adaptation and stability. Higher rates increase speed but may lead to instability.
- **Configurable Range**: Varied from 0.05 for slow, stable learning to 0.5 for rapid adaptation.

### üí∞ Discount Factor

- **Purpose**: Weighs the importance of future rewards in the system's decision-making process.
- **Impact**: Higher factors prioritize long-term success, while lower factors focus on immediate outcomes.
- **Configurable Range**: From 0.8, emphasizing short-term gains, to 0.99, focusing on long-term rewards.

### üîç Exploration Rate

- **Purpose**: Manages the exploration-exploitation trade-off by varying the system's willingness to try different experts.
- **Impact**: Higher exploration rates foster diversity and adaptability, whereas lower rates optimize for current knowledge.
- **Configurable Range**: Ranges from 0.05 for minimal exploration to 0.5 for aggressive exploration of new strategies.

These parameters provide the foundation for tailoring the rUv Toolkit to specific needs, ensuring optimal performance across a wide array of applications.

### License

rUv is made available under the MIT License, supporting open-source collaboration and innovation.



# Getting Set-Up

As we'll start to see below, **DSPy** can routinely teach powerful models like `GPT-3.5` and local models like `T5-base` or `Llama2-13b` to be much more reliable at complex tasks. **DSPy** will compile the _same program_ into different few-shot prompts and/or finetunes for each LM.

Let's begin by setting things up.

## The snippet below will also install **DSPy** if it's not there already.

In [None]:
%load_ext autoreload
%autoreload 2

import sys
import os

try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path
except:
    repo_path = '.'

if repo_path not in sys.path:
    sys.path.append(repo_path)

# Set up the cache for this notebook
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

import pkg_resources # Install the package if it's not installed
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install -U pip
    !pip install dspy-ai
    !pip install openai~=0.28.1
    # !pip install -e $repo_path

import dspy

## Configure your OpenAi Key or use another LLM

In [None]:
# Install the OpenAI library (uncomment if needed)
# !pip install openai

# Import necessary libraries
import openai
from google.colab import userdata

# Retrieve and set the API key
api_key = userdata.get('OPENAI_API_KEY')
openai.api_key = api_key

# Verify the API key is set (this is just for demonstration and should not be used in production code)
if openai.api_key:
    print("OpenAI API key is set. Ready to proceed!")
else:
    print("OpenAI API key is not set. Please check your setup.")


OpenAI API key is set. Ready to proceed!


### 1] Getting Started

We'll start by setting up the language model (LM) and retrieval model (RM). **DSPy** supports multiple API and local models. In this notebook, we'll work with GPT-3.5 (`gpt-3.5-turbo`) and the retriever `ColBERTv2`.

To make things easy, we've set up a ColBERTv2 server hosting a Wikipedia 2017 "abstracts" search index (i.e., containing first paragraph of each article from this [2017 dump](https://hotpotqa.github.io/wiki-readme.html)), so you don't need to worry about setting one up! It's free.

In [None]:
turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

### Recursive Unified Validators (rUv) with Mixture of Experts (MoE) Approach

The Recursive Unified Validators (rUv) concept, when applied within the DSPy framework, introduces a sophisticated approach to leveraging a Mixture of Experts (MoE) for complex problem-solving. This methodology is particularly effective in scenarios where a single model or "expert" is insufficient to address the multifaceted nature of the task at hand. By orchestrating a dynamic selection among a pool of specialized experts based on the input context, rUv aims to enhance both the accuracy and efficiency of the solution.

In the context of DSPy, each expert is encapsulated within a declarative signature, defining the expected inputs and outputs for that expert's domain of knowledge. Here's an example of how an expert signature might be defined:

```python
class rUv(dspy.Signature):
    input_field = dspy.InputField()
    output_field = dspy.OutputField(desc="Expert output")
```

This `rUv` serves as a blueprint for individual expert models, specifying the structure of the data they will receive and produce. The `input_field` represents the data or context that is fed into the expert, while the `output_field` describes the expert's output, which in this case, is generically described as "Expert output". This abstraction allows for a modular design where experts can be seamlessly integrated into the rUv system.

The rUv system itself is designed to recursively validate and refine the outputs of these experts. It employs a selector mechanism to dynamically choose the most appropriate expert(s) for a given input. This selection process is crucial for handling diverse and complex inputs that may require specialized knowledge or processing. Once an expert is selected, its output is then subject to validation and refinement processes, ensuring that the final output meets the desired criteria of accuracy and coherence.

The integration of rUv with MoE in DSPy facilitates a powerful, flexible approach to tackling challenging problems. It allows for the leveraging of specialized knowledge across various domains, ensuring that the most suitable expertise is applied to each aspect of the problem. This methodology not only enhances the system's overall performance but also its adaptability to new, unforeseen challenges.


# Configure your Experts

## Click the ‚ñ∂Ô∏è below to active each settings. Look for a ‚úÖ *mark*

In [None]:
#@title Configuration Settings - Expert Models and Iterations { display-mode: "form" }
import ipywidgets as widgets
from IPython.display import display

num_experts_widget = widgets.IntSlider(
    value=3, min=1, max=32, step=1,
    description='Number of expert models to use',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='40%')
)

min_iterations_widget = widgets.IntSlider(
    value=3, min=1, max=100, step=1,
    description='Minimum number of iterations to run',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='40%')
)

config_form_1 = widgets.VBox([
    widgets.HTML("<h2>ü§ñ Number of Expert Models</h2>"),
    widgets.HTML("<p>Determines the diversity and specialization of the expert models. More experts can cover a wider range of topics but may require more computational resources.</p>"),
    num_experts_widget,
    widgets.HTML("<p><em>Examples:</em></p>"),
    widgets.HTML("<ul>"
                 "<li><code>num_experts = 3</code>: Use a small number of expert models for a focused approach. Suitable for simpler tasks or when computational resources are limited.</li>"
                 "<li><code>num_experts = 5</code>: Use a moderate number of expert models to balance diversity and computational efficiency. Appropriate for most general-purpose applications.</li>"
                 "<li><code>num_experts = 8</code>: Use a higher number of expert models for increased diversity and specialization. Beneficial for complex tasks that require expertise in multiple domains.</li>"
                 "<li><code>num_experts = 12</code>: Use a large number of expert models for highly diverse and specialized knowledge. Suitable for advanced applications with ample computational resources.</li>"
                 "</ul>"),
    widgets.HTML("<h2>üîÑ Minimum Number of Iterations</h2>"),
    widgets.HTML("<p>Ensures that the system runs for a sufficient number of iterations to generate meaningful outputs and updates.</p>"),
    min_iterations_widget,
    widgets.HTML("<p><em>Examples:</em></p>"),
    widgets.HTML("<ul>"
                 "<li><code>min_iterations = 3</code>: Run the system for a minimum of 3 iterations. Suitable for quick prototyping or when a small number of iterations is sufficient.</li>"
                 "<li><code>min_iterations = 6</code>: Run the system for at least 6 iterations. Provides a balance between efficiency and allowing the system to refine its outputs.</li>"
                 "<li><code>min_iterations = 10</code>: Run the system for a minimum of 10 iterations. Allows for more comprehensive refinement and adaptation of the expert models.</li>"
                 "<li><code>min_iterations = 15</code>: Run the system for an extended number of iterations. Beneficial when the task requires significant fine-tuning and improvement over time.</li>"
                 "</ul>")
])

display(config_form_1)

VBox(children=(HTML(value='<h2>ü§ñ Number of Expert Models</h2>'), HTML(value='<p>Determines the diversity and s‚Ä¶

In [None]:
#@title Configuration Settings - Learning Parameters { display-mode: "form" }
learning_rate_widget = widgets.FloatSlider(
    value=0.1, min=0.01, max=0.5, step=0.01,
    description='Learning rate for updating expert values',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='40%')
)

discount_factor_widget = widgets.FloatSlider(
    value=0.99, min=0.8, max=0.99, step=0.01,
    description='Discount factor for future rewards',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='40%')
)

exploration_rate_widget = widgets.FloatSlider(
    value=0.2, min=0.05, max=0.5, step=0.05,
    description='Exploration rate for selecting experts',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='40%')
)

config_form_2 = widgets.VBox([
    widgets.HTML("<h2>üìà Learning Rate</h2>"),
    widgets.HTML("<p>Controls the step size of the value updates. Higher learning rates lead to faster adaptation but may cause instability, while lower learning rates result in slower but more stable learning.</p>"),
    learning_rate_widget,
    widgets.HTML("<p><em>Examples:</em></p>"),
    widgets.HTML("<ul>"
                 "<li><code>learning_rate = 0.05</code>: Use a low learning rate for cautious and gradual updates. Suitable when stability is a priority and slower adaptation is acceptable.</li>"
                 "<li><code>learning_rate = 0.1</code>: Use a moderate learning rate for balanced updates. Provides a good trade-off between adaptation speed and stability (default).</li>"
                 "<li><code>learning_rate = 0.2</code>: Use a higher learning rate for faster adaptation. Beneficial when quick adjustments are needed, but may introduce some instability.</li>"
                 "<li><code>learning_rate = 0.5</code>: Use an aggressive learning rate for rapid adaptation. Suitable for scenarios where fast convergence is desired, but careful monitoring is required to avoid instability.</li>"
                 "</ul>"),
    widgets.HTML("<h2>üí∞ Discount Factor</h2>"),
    widgets.HTML("<p>Determines the importance of future rewards. Higher discount factors give more weight to future rewards, while lower discount factors prioritize immediate rewards.</p>"),
    discount_factor_widget,
    widgets.HTML("<p><em>Examples:</em></p>"),
    widgets.HTML("<ul>"
                 "<li><code>discount_factor = 0.8</code>: Use a lower discount factor to prioritize short-term rewards. Suitable when immediate outcomes are more important than long-term considerations.</li>"
                 "<li><code>discount_factor = 0.9</code>: Use a moderate discount factor to balance short-term and long-term rewards. Provides a good trade-off for most applications.</li>"
                 "<li><code>discount_factor = 0.95</code>: Use a higher discount factor to give more emphasis to future rewards. Beneficial when long-term performance is a key objective.</li>"
                 "<li><code>discount_factor = 0.99</code>: Use a very high discount factor to strongly prioritize future rewards. Suitable for tasks where long-term success is crucial (default).</li>"
                 "</ul>"),
    widgets.HTML("<h2>üîç Exploration Rate</h2>"),
    widgets.HTML("<p>Balances the trade-off between exploiting the current best expert and exploring potentially better experts. Higher exploration rates encourage trying different experts, while lower rates focus on the current best expert.</p>"),
    exploration_rate_widget,
    widgets.HTML("<p><em>Examples:</em></p>"),
    widgets.HTML("<ul>"
                 "<li><code>exploration_rate = 0.05</code>: Use a low exploration rate to heavily focus on the current best expert. Suitable when the system has already converged to a good solution and stability is desired.</li>"
                 "<li><code>exploration_rate = 0.1</code>: Use a moderate exploration rate to occasionally explore alternative experts. Provides a balance between exploitation and exploration.</li>"
                 "<li><code>exploration_rate = 0.3</code>: Use a higher exploration rate to more frequently try different experts. Beneficial when the optimal expert is uncertain and more exploration is needed.</li>"
                 "<li><code>exploration_rate = 0.5</code>: Use an aggressive exploration rate to prioritize exploring new experts over exploiting the current best. Suitable for tasks with a high degree of uncertainty or when the system needs to adapt to changing conditions.</li>"
                 "</ul>")
])

display(config_form_2)

VBox(children=(HTML(value='<h2>üìà Learning Rate</h2>'), HTML(value='<p>Controls the step size of the value upda‚Ä¶

## The Prompts
To introduce a dynamic and user-friendly template selection for configuring context, prompt, and guidance settings, the following code integrates a dropdown widget with pre-defined templates.

These templates cover various scenarios like Business Analysis, Code Development, Story Creation, and TV/Movie Script writing, offering a structured approach to initializing the inputs for generating specialized content.

 The dropdown selection triggers an update in the context, prompt, and guidance text areas, reflecting the specific requirements of the chosen template. This setup not only streamlines the configuration process but also ensures that users can easily tailor the system to their immediate needs without manual input adjustments.


In [None]:
import ipywidgets as widgets

# Define the templates
templates = {
    "Business Analysis": {
        "context": "Acme Corporation, a leading multinational conglomerate, is actively exploring strategic investment opportunities in emerging technologies to maintain its competitive edge and drive future growth. The board of directors has convened a special committee to conduct a comprehensive analysis of the technological landscape and identify the most promising areas for investment. The committee seeks in-depth insights and recommendations on which cutting-edge technologies have the potential to revolutionize Acme's core industries and create new market opportunities over the next decade.",
        "prompt": "Conduct a thorough evaluation of the potential impact and investment viability of four key emerging technologies: artificial intelligence (AI), blockchain, quantum computing, and biotechnology. For each technology, provide a detailed assessment of its current state of development, major players in the field, and projected market growth. Analyze the specific applications and use cases within Acme's core industries, highlighting the potential benefits, challenges, and disruptions each technology could bring. Consider factors such as scalability, regulatory landscape, talent availability, and competitive dynamics when assessing the investment viability of each technology. Provide clear recommendations on which technologies Acme should prioritize for investment, along with a proposed allocation of resources and a high-level roadmap for integration into the company's existing operations.",
        "guidance": "Provide a comprehensive and well-structured analysis, focusing on delivering clear, concise, and actionable insights. Use industry-specific terminology and cite relevant data and examples to support your recommendations. Maintain an objective and analytical tone throughout the report."
    },
    "Application Planning": {
        "context": "Acme Software Solutions is developing a new web application for task management and collaboration. The application aims to streamline project management processes and enhance team productivity. The development team is in the early stages of the project and seeks guidance on architecting a scalable and maintainable solution.",
        "prompt": "Design a high-level architecture for the task management and collaboration web application. Consider factors such as user authentication, data storage, real-time updates, and integration with third-party services. Provide recommendations on the choice of frontend and backend technologies, along with a justification for each selection. Outline the key components of the application, including the user interface, database schema, and API endpoints. Discuss potential challenges and propose strategies for addressing them, such as performance optimization, security considerations, and error handling. Finally, provide a roadmap for the development process, including milestones and deliverables.",
        "guidance": "Provide a clear and concise architectural overview, focusing on the key design decisions and their rationale. Use technical terminology and diagrams where appropriate to illustrate the system architecture. Ensure that the recommendations align with industry best practices and consider the long-term maintainability and scalability of the application."
    },
   "Source Code Generation": {
        "context": "The development team at Acme Software Solutions is tasked with automating parts of their workflow, specifically focusing on generating source code for repetitive tasks and common software patterns. The team aims to enhance productivity and reduce manual coding errors.",
        "prompt": "Create a Python script that automates the generation of source code for a simple REST API. The API should support basic CRUD (Create, Read, Update, Delete) operations for managing user information. Consider aspects such as request handling, response formatting, and data storage. Include error handling and input validation to ensure the robustness of the API. Provide comments within the code to explain the functionality and decisions made during development.",
        "guidance": "Ensure the source code is clean, modular, and follows Python best practices. Use appropriate libraries and frameworks, such as Flask or FastAPI, to simplify the implementation. Structure the code to allow for easy extension and maintenance. Include detailed comments to aid understanding and future development. The final script should offer a clear example of how to structure a basic REST API in Python, serving as a template for further customization and expansion."
    },
    "SQL Generation": {
        "context": "The analytics team at Acme Corporation needs to frequently extract insights from their customer database. To streamline their analysis, they require an automated solution for generating SQL queries based on specific analytical requirements. This solution should accommodate various types of queries, such as data retrieval, aggregation, and filtering based on dynamic inputs.",
        "prompt": "Develop a Python function that generates SQL queries for extracting user data from a 'customers' table. The function should accept parameters for selecting fields, setting conditions, and defining aggregation operations (e.g., COUNT, AVG). For example, if the user needs to find the average age of users in New York, the function should produce the appropriate SQL query. Include error handling to manage invalid inputs and ensure the generated SQL is valid and efficient.",
        "guidance": "Focus on creating a flexible and robust function capable of handling a variety of query requirements. Ensure the function is well-documented, with examples demonstrating how to call it with different parameters. Use string formatting or templating libraries like Jinja2 to construct the SQL queries dynamically. Incorporate best practices for avoiding SQL injection vulnerabilities, such as using parameterized queries. The output should be an executable SQL query string, ready for use with a database connection."
   },
    "Story Creation": {
        "context": "Acme Publishing House is seeking fresh and engaging story ideas for its upcoming anthology series. The anthology will feature short stories across various genres, including science fiction, fantasy, mystery, and romance. The editorial team is looking for unique and captivating storylines that will resonate with a diverse audience.",
        "prompt": "Generate a collection of five original story ideas, each belonging to a different genre. For each story idea, provide a brief synopsis that captures the main plot, characters, and themes. The stories should have compelling hooks, well-developed protagonists, and unexpected twists. Consider the target audience for each genre and tailor the stories accordingly. Provide a title for each story and a short explanation of why it would be a good fit for the anthology. Additionally, suggest potential authors or writing styles that could bring each story to life.",
        "guidance": "Deliver creative and imaginative story ideas that showcase originality and depth. Use vivid descriptions and engaging language to capture the essence of each story. Ensure that the stories have a clear structure and narrative arc, with well-defined conflicts and resolutions. Provide enough detail to give the editorial team a strong sense of each story's potential, while leaving room for further development and interpretation."
    },
    "TV/Movie Script": {
        "context": "Acme Productions is developing a new television series that explores the lives of a group of friends navigating their careers, relationships, and personal growth in a bustling city. The series aims to capture the authentic experiences and challenges faced by young professionals in contemporary society. The writing team is brainstorming ideas for the pilot episode and seeks guidance on crafting a compelling script.",
        "prompt": "Develop a detailed outline for the pilot episode of the television series. Introduce the main characters, their backgrounds, and their relationships with each other. Establish the central conflict or theme that will drive the narrative throughout the episode. Create a series of scenes that showcase the characters' personalities, aspirations, and struggles. Incorporate realistic dialogue and relatable situations that resonate with the target audience. Consider the pacing and structure of the episode, including key moments of tension, humor, and emotional depth. Provide a clear resolution or cliffhanger that sets the stage for future episodes.",
        "guidance": "Craft a script outline that balances character development, plot progression, and thematic exploration. Use a mix of dialogue, action, and description to bring the scenes to life. Ensure that the characters have distinct voices and motivations that fuel their actions and interactions. Pay attention to the overall tone and style of the series, creating a consistent and engaging narrative. Provide enough detail to guide the writing process while allowing room for creative interpretation and collaboration among the writing team."
    }
}

# Create the dropdown widget
template_dropdown = widgets.Dropdown(
    options=list(templates.keys()),
    value=list(templates.keys())[0],
    description='Template:',
    layout=widgets.Layout(width='40%')
)

# Create the context, prompt, and guidance widgets
context_widget = widgets.Textarea(
    value=templates[template_dropdown.value]['context'],
    placeholder='Enter the context here',
    description='Context:',
    layout=widgets.Layout(width='40%', height='150px')
)

prompt_widget = widgets.Textarea(
    value=templates[template_dropdown.value]['prompt'],
    placeholder='Enter the prompt here',
    description='Prompt:',
    layout=widgets.Layout(width='40%', height='200px')
)

max_tokens_widget = widgets.IntSlider(
    value=100,
    min=100,
    max=2000,
    step=50,
    description='Max Tokens:',
    layout=widgets.Layout(width='40%'),
    style={'description_width': 'initial'}
)

guidance_widget = widgets.Textarea(
    value=templates[template_dropdown.value]['guidance'],
    placeholder='Enter guidance for the model',
    description='Guidance:',
    layout=widgets.Layout(width='40%', height='100px')
)

# Define the on_template_change function
def on_template_change(change):
    context_widget.value = templates[change.new]['context']
    prompt_widget.value = templates[change.new]['prompt']
    guidance_widget.value = templates[change.new]['guidance']

# Observe the dropdown value change
template_dropdown.observe(on_template_change, names='value')

# Create the configuration form
config_form_3 = widgets.VBox([
    template_dropdown,
    context_widget,
    prompt_widget,
    max_tokens_widget,
    guidance_widget
])

display(config_form_3)

VBox(children=(Dropdown(description='Template:', layout=Layout(width='40%'), options=('Business Analysis', 'Ap‚Ä¶

# Recursive Unified Validators (rUv) MoE Toolkit Source Code

This version maintains the foundational structure from the previous iteration, continuing to leverage DSPy for dynamic interaction with AI models like GPT-3.5-turbo and ColBERTv2.

Notable enhancements include refined mechanisms for generating and evaluating expert model outputs, with improved logging for monitoring and debugging.

The introduction of advanced features such as output continuation and intrinsic reward assessment underscores the toolkit's evolution towards more autonomous, context-sensitive operations. The system's architecture has been further optimized for adaptive learning, enabling more sophisticated and nuanced expert model integration and selection processes.


In [None]:
import dspy
import logging
import time
import random
from typing import List


#initial DSPy
turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

class rUv(dspy.Signature):
    """
    Recursive Unified Validators (rUv): Generate expert model outputs.
    This is the primary function that generates outputs from multiple expert models.
    """
    context = dspy.InputField(desc="The current context")
    prompt = dspy.InputField(desc="A prompt to guide the language model")
    max_tokens = dspy.InputField(desc="Maximum number of tokens to generate", default="1500")
    temperature = dspy.InputField(desc="Temperature for sampling (higher values make output more random)", default="0.1")
    top_k = dspy.InputField(desc="Top K words to sample from (higher values consider more words)", default="100")
    top_p = dspy.InputField(desc="Top P probability threshold (higher values make output more diverse)", default="0.9")
    frequency_penalty = dspy.InputField(desc="Frequency penalty (higher values penalize frequent words)", default="0.0")
    presence_penalty = dspy.InputField(desc="Presence penalty (higher values penalize repeated words)", default="0.0")
    output = dspy.OutputField(desc="The generated expert model output")
    teleprompter = dspy.InputField(desc="Additional context or instructions for the language model", default="matter of fact")

class GatingModel(dspy.Signature):
    """Assess expert model relevance."""
    context = dspy.InputField(desc="The current context")
    expert_outputs = dspy.InputField(desc="Serialized string of generated expert model outputs")
    selected_expert = dspy.OutputField(desc="The index of the selected expert model")
    teleprompter = dspy.InputField(desc="Additional context or instructions for the language model", default="Select the index of the most relevant expert for the given context.")

class IntrinsicRewardModel(dspy.Signature):
    """Evaluate the agent's performance intrinsically."""
    context = dspy.InputField(desc="The current context")
    expert_outputs = dspy.InputField(desc="Serialized string of generated expert model outputs")
    selected_expert_index = dspy.InputField(desc="The index of the selected expert model")
    intrinsic_reward = dspy.OutputField(desc="The intrinsic reward for the agent's performance")

class MixtureOfExperts:
    def __init__(self, num_experts: int = 4, min_iterations: int = 4, learning_rate: float = 0.1, discount_factor: float = 0.99, exploration_rate: float = 0.2):
        """
        Initialize the MixtureOfExperts class with default values if the previous code cell isn't run.

        Args:
            num_experts (int, optional): Number of expert models to use. Defaults to 4.
            min_iterations (int, optional): Minimum number of iterations to run. Defaults to 4.
            learning_rate (float, optional): Learning rate for updating expert values. Defaults to 0.1.
            discount_factor (float, optional): Discount factor for future rewards. Defaults to 0.99.
            exploration_rate (float, optional): Exploration rate for selecting experts. Defaults to 0.2.
        """
        self.num_experts: int = num_experts
        self.expert_outputs: List[str] = []
        self.min_iterations: int = min_iterations
        self.learning_rate: float = learning_rate
        self.discount_factor: float = discount_factor
        self.exploration_rate: float = exploration_rate
        self.expert_values: List[float] = [0.0] * num_experts
        self.expert_architectures: List[dict] = [self.initialize_expert_architecture() for _ in range(num_experts)]
        self.gating_architecture: dict = self.initialize_gating_architecture()
        self.selected_experts_history: List[int] = []

    def initialize_expert_architecture(self) -> dict:
        """Initialize the architecture of an expert model."""
        return {"num_layers": random.randint(1, 5), "hidden_size": random.randint(32, 256)}

    def initialize_gating_architecture(self) -> dict:
        """Initialize the architecture of the gating model."""
        return {"num_layers": random.randint(1, 5), "hidden_size": random.randint(32, 256)}

    def generate_expert_outputs(self, context: str, prompt: str, max_tokens: int, guidance: str) -> str:
        """Generate expert outputs based on the given context and prompt."""
        logging.info("Starting to generate expert outputs...")
        print("Generating expert outputs...")

        try:
            generate_expert = dspy.Predict(rUv)
            select_expert = dspy.Predict(GatingModel)
            evaluate_intrinsic_reward = dspy.Predict(IntrinsicRewardModel)
        except Exception as e:
            logging.error("Error initializing DSPy Predict functions: %s", e)
            return "Failed to initialize expert models."

        for i in range(self.num_experts):
            print(f"Generating output for Expert {i+1}/{self.num_experts}...")
            logging.info(f"Generating output for Expert {i+1}...")

            try:
                expert_prompt = f"Expert {i+1}: {prompt}"

                # Determine the desired output length based on the previous values
                if self.expert_values[i] < 0.2:
                    output_length = "short"
                elif self.expert_values[i] < 0.5:
                    output_length = "medium"
                else:
                    output_length = "long"

                expert_output = ""
                while True:
                    partial_output = generate_expert(
                        context=context,
                        prompt=expert_prompt,
                        max_tokens=str(max_tokens),
                        temperature=str(random.uniform(0.7, 1.2)),
                        top_k=str(random.randint(30, 70)),
                        top_p=str(random.uniform(0.8, 1.0)),
                        frequency_penalty=str(random.uniform(0.0, 0.5)),
                        presence_penalty=str(random.uniform(0.0, 0.5)),
                        teleprompter=f"Focus on your area of expertise. Provide a {output_length} response using a {random.choice(['formal', 'casual', 'technical'])} tone."
                    ).output
                    expert_output += partial_output

                    if self.check_output_completeness(expert_output):
                        break

                    expert_prompt = f"Expert {i+1} (continued): {prompt}\n{expert_output}"

                self.expert_outputs.append(expert_output)

                print(f"LLM Parameters for Expert {i+1}:")
                print(f"Max Tokens per chunk: {max_tokens}")
                print(f"Temperature: {random.uniform(0.7, 1.2)}")
                print(f"Top K: {random.randint(30, 70)}")
                print(f"Top P: {random.uniform(0.8, 1.0)}")
                print(f"Frequency Penalty: {random.uniform(0.0, 0.5)}")
                print(f"Presence Penalty: {random.uniform(0.0, 0.5)}")
                print("------------------------")

            except Exception as e:
                logging.error("Error generating output for Expert %d: %s", i+1, e)
                continue

            logging.info(f"Output for Expert {i+1} generated.")
            time.sleep(1)

        try:
            serialized_expert_outputs = ','.join(self.expert_outputs)

            if random.random() < self.exploration_rate:
                selected_expert_index = random.randint(0, self.num_experts - 1)
            else:
                selected_expert_index = select_expert(context=context, expert_outputs=serialized_expert_outputs, teleprompter="Select the index of the most relevant expert for the given context.").selected_expert
                selected_expert_index = int(selected_expert_index) if selected_expert_index.isdigit() else 0

                # Penalize selection of recently chosen experts
                if selected_expert_index in self.selected_experts_history:
                    selected_expert_index = random.randint(0, self.num_experts - 1)
        except Exception as e:
            logging.error("Error selecting the most relevant expert: %s", e)
            return "Failed to select the most relevant expert."

        if selected_expert_index < 0 or selected_expert_index >= len(self.expert_outputs):
            selected_expert_index = 0

        self.selected_experts_history.append(selected_expert_index)

        try:
            intrinsic_reward_str = evaluate_intrinsic_reward(context=context, expert_outputs=serialized_expert_outputs, selected_expert_index=str(selected_expert_index)).intrinsic_reward

            # Extract numeric reward value from the string
            if "highly effective" in intrinsic_reward_str.lower():
                intrinsic_reward = 1.0
            elif "effective" in intrinsic_reward_str.lower():
                intrinsic_reward = 0.7
            elif "moderate" in intrinsic_reward_str.lower():
                intrinsic_reward = 0.5
            elif "poor" in intrinsic_reward_str.lower():
                intrinsic_reward = 0.2
            else:
                intrinsic_reward = 0.0
        except Exception as e:
            logging.error("Error evaluating intrinsic reward: %s", e)
            intrinsic_reward = 0.0

        print("Expert output generation complete!")
        logging.info("All expert outputs have been generated and the most relevant expert has been selected.")

        return self.expert_outputs[selected_expert_index], selected_expert_index, intrinsic_reward

    def check_output_completeness(self, output: str) -> bool:
        """Check if the output ends with a proper conclusion."""
        if output.endswith(".") or output.endswith("!") or output.endswith("?"):
            return True
        return False

    def update_expert_values(self, selected_expert_index: int, reward: float):
        """Update the value estimate of the selected expert based on the received reward."""
        self.expert_values[selected_expert_index] += self.learning_rate * (reward + self.discount_factor * max(self.expert_values) - self.expert_values[selected_expert_index])

    def update_expert_architecture(self, expert_index: int):
        """Update the architecture of the specified expert model."""
        self.expert_architectures[expert_index] = self.initialize_expert_architecture()

    def update_gating_architecture(self):
        """Update the architecture of the gating model."""
        self.gating_architecture = self.initialize_gating_architecture()

    def check_termination_condition(self, iteration: int, total_reward: float) -> bool:
        """Check if the termination condition is met based on the iteration and total reward."""
        if iteration >= self.min_iterations and total_reward >= 10.0:
            return True
        return False

    def update_exploration_rate(self, iteration: int):
        """Update the exploration rate based on the current iteration."""
        self.exploration_rate = max(0.1, 1.0 - (iteration / self.min_iterations))

# Example usage with adjustments for self-improvement and intrinsic motivation
context = "Acme Corporation is exploring investment opportunities in emerging technologies. The board seeks insights into which technologies could potentially transform their industry over the next decade."
prompt = "Evaluate the potential impact and investment viability of artificial intelligence (AI), blockchain, quantum computing, and biotechnology."

# Get values from widgets
num_experts = num_experts_widget.value
min_iterations = min_iterations_widget.value
learning_rate = learning_rate_widget.value
discount_factor = discount_factor_widget.value
exploration_rate = exploration_rate_widget.value
context = context_widget.value
prompt = prompt_widget.value
max_tokens = max_tokens_widget.value
guidance = guidance_widget.value

# Instantiate MixtureOfExperts with widget values
moe = MixtureOfExperts(
    num_experts=num_experts,
    min_iterations=min_iterations,
    learning_rate=learning_rate,
    discount_factor=discount_factor,
    exploration_rate=exploration_rate
)
for iteration in range(moe.min_iterations):
    final_output, selected_expert_index, intrinsic_reward = moe.generate_expert_outputs(context, prompt, max_tokens, guidance)

    print(f"Iteration {iteration+1} - Selected Expert: {selected_expert_index}, Intrinsic Reward: {intrinsic_reward}")
    print("Expert Values:", moe.expert_values)
    print("Final Expert Output:")
    print(final_output)
    print("------------------------")

    # Get reward from an external expert
    expert_reward = float(input(f"Enter expert reward for iteration {iteration+1}: "))

    # Combine intrinsic and expert rewards
    total_reward = intrinsic_reward + expert_reward

    print(f"Expert Reward: {expert_reward}, Total Reward: {total_reward}")

    # Update the value estimate of the selected expert
    moe.update_expert_values(selected_expert_index, total_reward)

    # Update expert and gating architectures for self-improvement
    if random.random() < 0.2:
        moe.update_expert_architecture(selected_expert_index)
    if random.random() < 0.1:
        moe.update_gating_architecture()

    # Update exploration rate based on the current iteration
    moe.update_exploration_rate(iteration)

    # Check termination condition
    if moe.check_termination_condition(iteration, total_reward):
        print("Termination condition met. Stopping the process.")
        break

Generating expert outputs...
Generating output for Expert 1/5...
LLM Parameters for Expert 1:
Max Tokens per chunk: 100
Temperature: 1.1332587161457235
Top K: 32
Top P: 0.9830388581598246
Frequency Penalty: 0.4846254455168328
Presence Penalty: 0.46250359452569334
------------------------
Generating output for Expert 2/5...
LLM Parameters for Expert 2:
Max Tokens per chunk: 100
Temperature: 1.100431727680847
Top K: 57
Top P: 0.8261864052371326
Frequency Penalty: 0.05766838068219726
Presence Penalty: 0.23045925903293224
------------------------
Generating output for Expert 3/5...
LLM Parameters for Expert 3:
Max Tokens per chunk: 100
Temperature: 0.7101652480793152
Top K: 31
Top P: 0.8054642673532337
Frequency Penalty: 0.018006329559420386
Presence Penalty: 0.08152148485919464
------------------------
Generating output for Expert 4/5...
LLM Parameters for Expert 4:
Max Tokens per chunk: 100
Temperature: 0.7818407384390405
Top K: 55
Top P: 0.82234975542765
Frequency Penalty: 0.07942774549

## What's happening?
The expert reward is an external evaluation of the quality and relevance of the output generated by the selected expert model during each iteration of the Recursive Unified Validators (rUv) process.

It plays a crucial role in guiding the learning and adaptation of the expert models over time. Here's how the expert reward affects the output:

- **Feedback mechanism:** The expert reward serves as a feedback signal that indicates how well the selected expert model performed in generating a relevant and high-quality output for the given context and prompt. It allows the system to assess the effectiveness of each expert model based on external evaluation.

- **Updating expert values:** The expert reward is used to update the value estimate of the selected expert model. The update_expert_values method in the MixtureOfExperts class adjusts the value of the selected expert based on the received reward, the learning rate, and the discount factor. This update helps the system learn which experts are more reliable and valuable for specific contexts over time.

- **Reinforcement learning:** The expert reward is combined with the intrinsic reward (generated by the IntrinsicRewardModel) to calculate the total reward for each iteration. This total reward is used to guide the reinforcement learning process, where the system learns to select the most appropriate expert models based on their historical performance and the current context.

- **Balancing exploration and exploitation:** The expert reward influences the balance between exploration and exploitation in the expert selection process. If an expert consistently receives high rewards, it is more likely to be selected in future iterations (exploitation). However, the system also maintains an exploration rate to occasionally select random experts and explore potentially better options (exploration).

- **Termination condition:** The expert reward contributes to the total reward, which is used to check the termination condition for the rUv process. If the total reward exceeds a certain threshold and the minimum number of iterations is reached, the process may terminate early, indicating that a satisfactory output has been generated.

By providing an external evaluation of the generated outputs, the expert reward helps the rUv system learn and adapt over time. It guides the selection and improvement of expert models, ensuring that the most relevant and high-quality outputs are generated for the given context and prompt.

The expert reward is typically provided by a human evaluator or a separate evaluation model that assesses the quality and relevance of the generated outputs.

The reward value is usually a numeric score that represents the level of satisfaction or effectiveness of the output. It's important to note that the expert reward is specific to each iteration and expert model. It allows for fine-grained feedback and adaptation, enabling the system to continuously improve its performance and generate more relevant and coherent outputs over time.

# Simple rUv Version
The basic version, good with framrworks like FASTAPI.

Please note that the max_tokens parameter may need to be adjusted based on the desired output length, as there isn't an auto-complete option in this version. Experiment with different values to find the optimal token limit for your specific use case.


In [None]:
import dspy
import logging
import random

# Initialize DSPy
turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

class rUv(dspy.Signature):
    """
    Recursive Unified Validators (rUv): Generate expert model outputs.
    This is the primary function that generates outputs from multiple expert models.
    """
    context = dspy.InputField(desc="The current context")
    prompt = dspy.InputField(desc="A prompt to guide the language model")
    max_tokens = dspy.InputField(desc="Maximum number of tokens to generate", default="1500")
    temperature = dspy.InputField(desc="Temperature for sampling (higher values make output more random)", default="0.1")
    top_k = dspy.InputField(desc="Top K words to sample from (higher values consider more words)", default="100")
    top_p = dspy.InputField(desc="Top P probability threshold (higher values make output more diverse)", default="0.9")
    frequency_penalty = dspy.InputField(desc="Frequency penalty (higher values penalize frequent words)", default="0.0")
    presence_penalty = dspy.InputField(desc="Presence penalty (higher values penalize repeated words)", default="0.0")
    output = dspy.OutputField(desc="The generated expert model output")
    teleprompter = dspy.InputField(desc="Additional context or instructions for the language model", default="matter of fact")

def generate_expert_output(context: str, prompt: str, max_tokens: int, guidance: str) -> str:
    """Generate expert output based on the given context and prompt."""
    logging.info("Starting to generate expert output...")
    print("Generating expert output...")

    try:
        generate_expert = dspy.Predict(rUv)
    except Exception as e:
        logging.error("Error initializing DSPy Predict function: %s", e)
        return "Failed to initialize expert model."

    try:
        expert_output = generate_expert(
            context=context,
            prompt=prompt,
            max_tokens=str(max_tokens),
            temperature=str(random.uniform(0.7, 1.2)),
            top_k=str(random.randint(30, 70)),
            top_p=str(random.uniform(0.8, 1.0)),
            frequency_penalty=str(random.uniform(0.0, 0.5)),
            presence_penalty=str(random.uniform(0.0, 0.5)),
            teleprompter=f"Focus on your area of expertise. Provide a response using a {random.choice(['formal', 'casual', 'technical'])} tone."
        ).output
    except Exception as e:
        logging.error("Error generating expert output: %s", e)
        return "Failed to generate expert output."

    print("Expert output generation complete!")
    logging.info("Expert output has been generated.")

    return expert_output

# Example usage
context = "Acme Corporation is exploring investment opportunities in emerging technologies. The board seeks insights into which technologies could potentially transform their industry over the next decade."
prompt = "Evaluate the potential impact and investment viability of artificial intelligence (AI), blockchain, quantum computing, and biotechnology."
max_tokens = 1000
guidance = "Provide a detailed analysis of each technology, considering factors such as market potential, adoption rates, and regulatory landscape."

expert_output = generate_expert_output(context, prompt, max_tokens, guidance)
print("Expert Output:")
print(expert_output)

Generating expert output...
Expert output generation complete!
Expert Output:
Artificial Intelligence (AI) has already shown significant potential to transform industries across the board. From improving customer service through chatbots to optimizing supply chains with predictive analytics, AI is poised to revolutionize how businesses operate. In terms of investment viability, AI startups continue to attract substantial funding, indicating strong market interest in this technology.

Blockchain, known for its secure and transparent nature, has the potential to disrupt industries like finance, healthcare, and supply chain management. Its decentralized ledger system can enhance data security, streamline transactions, and reduce fraud. As more companies explore blockchain applications, investing in this technology could yield substantial returns in the long run.

Quantum computing, although still in its early stages, holds immense promise for solving complex problems that are beyond the c