### Introduction


In the notebook below, we demonstrate how using mixture of agents (MoA) can significantly improve the quality of responses by harnessing the power of multple LLMs. The code below goes in order of using just a single agent/LLM, then using a mixture of agents/LLMs, then using multiple iterations of a mixture of agents/LLMs. The basic architecture we follow is to prompt each LLM, then use an aggregator LLM combined with a final prompt to get the final output.

![alt text](8c88157-image.png "Title")

In this notebook, we will build a travel itinerary generator. This emulates the common dilemma of having to figure out flights, hotel, food, attraction planning, etc. delegates these to a mixture of agents to solve. The main request will be split into these subtasks, and a mixture of agents will tackle each task, the results will be aggregated to get refined answers for each task. Then the results of each task will be aggregated to build the final itinerary. 

Throughout each step, we will apply Judgment's scoring and tracing models to evaluate how good the outputs are

### Setup

In [1]:
import asyncio
import os
import together
import json
from together import AsyncTogether, Together
from judgeval.common.tracer import Tracer, wrap
from judgeval.scorers import AnswerRelevancyScorer, SummarizationScorer, FaithfulnessScorer
from tavily import TavilyClient



Langfuse client is disabled since no public_key was provided as a parameter or environment variable 'LANGFUSE_PUBLIC_KEY'. See our docs: https://langfuse.com/docs/sdk/python/low-level-sdk#initialize-client


In [None]:
client = wrap(Together(api_key=os.environ.get("TOGETHER_API_KEY")))
async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
judgment = Tracer(project_name="travel_agent")


In [12]:
user_prompt = "Make me an itinerary to Spain for one week from Feb 20 to March 1"

system_prompt = """
You are an AI assistant that breaks down user requests into specific subtasks for different specialized agents.
For each task description provided, create a clear, detailed prompt that the specialized agent can work with.
Each prompt should be self-contained with all necessary information from the original request.
Do not add any explanations or commentary - only output the JSON object.
"""

tasks = [
        "Find flights and hotels for the trip",
        "Find the best food spots for breakfast, lunch, and dinner",
        "Find the best attractions and things to do"
    ]

user_message = f"""
Original user request: "{user_prompt}"

Break this down into separate prompts for the following specialized agents:
{', '.join(tasks)}

Return the result as a JSON string where the keys are the task descriptions that I provided and the values are the specialized and more refined prompts that you came up with to solve the task. Dont include any ```json tags, just return the JSON that has simple key-value pairs.
"""

@judgment.observe(span_type="tool", overwrite=True)
def get_task_breakdown():
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo", 
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        stream=False,
    )

    
    # Get the response content which should include the task breakdown
    return response.choices[0].message.content

@judgment.observe(span_type="tool", overwrite=True)
def main():
    breakdown = get_task_breakdown()

    judgment.get_current_trace().async_evaluate(
        scorers=[AnswerRelevancyScorer(threshold=0.5)],
        input=user_message,
        actual_output=breakdown,
        model="gpt-4",
    )
    
    breakdown = json.loads(breakdown)
    return breakdown

#parse it as json and show prettify json
breakdown = main()

in traced create
response
id='91ab7db53e6724f3' object=<ObjectType.ChatCompletion: 'chat.completion'> created=1741029625 model='meta-llama/Llama-3.3-70B-Instruct-Turbo' choices=[ChatCompletionChoicesData(index=0, logprobs=None, seed=2723265465984095700, finish_reason=<FinishReason.StopSequence: 'stop'>, message=ChatCompletionMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='{\n  "Find flights and hotels for the trip": "Book flights from the user\'s preferred airport to Spain from February 20 to March 1 and find available hotels in the desired location for the entire duration of the trip, considering factors such as budget, location, and user reviews",\n  "Find the best food spots for breakfast, lunch, and dinner": "Research top-rated restaurants and cafes in Spain, providing recommendations for breakfast, lunch, and dinner options, including traditional Spanish cuisine and local specialties, and considering factors such as price range, dietary restrictions, and user reviews",

In [15]:
def search_tavily(query):
    """Fetch travel data using Tavily API."""
    API_KEY = os.getenv("TAVILY_API_KEY")
    client = TavilyClient(api_key=API_KEY)
    results = client.search(query, num_results=3)
    return results

In [16]:
print(json.dumps(breakdown, indent=4))

task_outputs = {}
for task in breakdown:
    print(f"Working on task: {task}")
    task_prompt = breakdown[task]
    reference_models = [
        "Qwen/Qwen2-72B-Instruct",
        "meta-llama/Llama-3.3-70B-Instruct-Turbo",
        "mistralai/Mixtral-8x22B-Instruct-v0.1",
        "databricks/dbrx-instruct",
    ]

    context = search_tavily(task_prompt)

    aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"
    aggreagator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query. Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability.

    Responses from models:"""

    async def run_llm(model):
        """Run a single LLM call with a reference model."""
        response = await async_client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": f"{task_prompt}. Here is some additional context to help you: {context}"}],
            temperature=0.7,
            max_tokens=512,
        )
        return response.choices[0].message.content
    
    @judgment.observe(span_type="tool", overwrite=True)
    async def run_aggregator():
        results = await asyncio.gather(*[run_llm(model) for model in reference_models])

        finalStream = client.chat.completions.create(
            model=aggregator_model,
            messages=[
                {"role": "system", "content": aggreagator_system_prompt},
                {"role": "user", "content": ",".join(str(element) for element in results)},
            ],
        )

        judgment.get_current_trace().async_evaluate(
            scorers=[AnswerRelevancyScorer(threshold=0.5)],
            input=task_prompt,
            actual_output=finalStream.choices[0].message.content,
            model="gpt-4",
        )
        return finalStream.choices[0].message.content

    taskOutput = await run_aggregator()
    # print(taskOutput)
    task_outputs[task] = taskOutput


{
    "Find flights and hotels for the trip": "Book flights from the user's preferred airport to Spain from February 20 to March 1 and find available hotels in the desired location for the entire duration of the trip, considering factors such as budget, location, and user reviews",
    "Find the best food spots for breakfast, lunch, and dinner": "Research top-rated restaurants and cafes in Spain, providing recommendations for breakfast, lunch, and dinner options, including traditional Spanish cuisine and local specialties, and considering factors such as price range, dietary restrictions, and user reviews",
    "Find the best attractions and things to do": "Create a list of must-visit attractions and activities in Spain, including cultural landmarks, historical sites, and popular events, and provide a daily schedule of things to do from February 20 to March 1, taking into account the user's interests and travel style"
}
Working on task: Find flights and hotels for the trip
in traced cr

  Expected `enum` but got `str` with value `'answer_relevancy'` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


Working on task: Find the best food spots for breakfast, lunch, and dinner
in traced create
response
id='91ab7f3c4b9724f3' object=<ObjectType.ChatCompletion: 'chat.completion'> created=1741029687 model='mistralai/Mixtral-8x22B-Instruct-v0.1' choices=[ChatCompletionChoicesData(index=0, logprobs=None, seed=7300190049374206000, finish_reason=<FinishReason.StopSequence: 'stop'>, message=ChatCompletionMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" After analyzing the provided responses, here are some top-rated restaurants and cafes in Spain for breakfast, lunch, and dinner options, considering factors such as price range, dietary restrictions, and user reviews:\n\n1. Mugaritz (San Sebastian) - Known for its seasonal tasting menu of 14 dishes showcasing the best local ingredients and creativity of chef Álvaro Garrido, Mugaritz is one of the best-rated restaurants in Spain. Price range: High. Dietary restrictions: Vegetarian options available. User reviews: Excellent.\n2. Desbor

Working on task: Find the best attractions and things to do
in traced create
response
id='91ab7fb018cb24f3' object=<ObjectType.ChatCompletion: 'chat.completion'> created=1741029706 model='mistralai/Mixtral-8x22B-Instruct-v0.1' choices=[ChatCompletionChoicesData(index=0, logprobs=None, seed=3510985803642738700, finish_reason=<FinishReason.StopSequence: 'stop'>, message=ChatCompletionMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" Based on the provided information, here is a synthesized response for a must-visit list and a daily schedule for Spain from February 20 to March 1, 2025:\n\n### Must-Visit Attractions and Activities\n1. **Basílica de la Sagrada Familia** - A stunning architectural masterpiece by Antoni Gaudí in Barcelona.\n2. **The Alhambra** - A Moorish palace and fortress in Granada with breathtaking gardens and architecture.\n3. **Nature and Wildlife Tours** - Explore Spain's natural beauty with guided tours.\n4. **Cultural Tours** - Dive into Spain's rich histo

In [17]:
system_prompt = """
You are an expert travel planner who creates cohesive, well-structured itineraries.
Your task is to create a final, comprehensive response that combines specialized information
from different agents into a single, flowing itinerary that addresses the user's original request.

The final response should:
1. Start with a brief introduction to the trip
2. Organize information in a logical, chronological structure (day by day)
3. Seamlessly integrate travel logistics, accommodations, meals, and activities
4. Ensure there are no scheduling conflicts or logistical impossibilities
5. Add transitions between sections to create a natural flow
6. End with a brief conclusion

Format the itinerary professionally, with clear headings, and make it easy to follow.
"""

user_message = f"""
Original user request: "{user_prompt}"

Specialized agent responses:

{json.dumps(task_outputs, indent=2)}

Please create a cohesive, well-structured final response that combines all this information
into a comprehensive itinerary. Organize it in a logical way (day by day) and ensure the whole
itinerary flows naturally and makes logistical sense.
"""

@judgment.observe(span_type="tool", overwrite=True)
def compile_final_itinerary():
    response = client.chat.completions.create(
        # model="gpt-4", 
        model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )


    judgment.get_current_trace().async_evaluate(
        scorers=[FaithfulnessScorer(threshold=0.5)],
        input=user_prompt,
        actual_output=response.choices[0].message.content,
        retrieval_context=task_outputs.values(),
        model="gpt-4",
    )


    return response.choices[0].message.content

final_itinerary = compile_final_itinerary()
print(final_itinerary)

in traced create
response
id='91ab867d4ed2ce5c' object=<ObjectType.ChatCompletion: 'chat.completion'> created=1741029985 model='meta-llama/Llama-3.3-70B-Instruct-Turbo' choices=[ChatCompletionChoicesData(index=0, logprobs=None, seed=5635809096046488000, finish_reason=<FinishReason.StopSequence: 'stop'>, message=ChatCompletionMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="**Introduction to Spain Trip**\nFrom February 20 to March 1, embark on a journey through the vibrant country of Spain, exploring its rich history, stunning architecture, and delectable cuisine. This one-week itinerary is carefully crafted to provide a mix of cultural experiences, breathtaking landscapes, and delicious food, ensuring an unforgettable adventure.\n\n**Day 1 (February 20): Arrival in Barcelona**\nArrive in Barcelona, the capital of Catalonia, and check-in to your hotel. Spend the day resting and getting accustomed to your surroundings. In the evening, head to **Casa Lucio** for a traditional S

  Expected `enum` but got `str` with value `'faithfulness'` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


**Introduction to Spain Trip**
From February 20 to March 1, embark on a journey through the vibrant country of Spain, exploring its rich history, stunning architecture, and delectable cuisine. This one-week itinerary is carefully crafted to provide a mix of cultural experiences, breathtaking landscapes, and delicious food, ensuring an unforgettable adventure.

**Day 1 (February 20): Arrival in Barcelona**
Arrive in Barcelona, the capital of Catalonia, and check-in to your hotel. Spend the day resting and getting accustomed to your surroundings. In the evening, head to **Casa Lucio** for a traditional Spanish dinner, famous for its huevos rotos (broken eggs). Book your flights and hotels in advance using websites like FlightsFinder.com, easyJet.com, and Skyscanner.com for the best deals.

**Day 2 (February 21): Barcelona**
Start the day with a visit to the iconic **Basílica de la Sagrada Familia**, a stunning architectural masterpiece by Antoni Gaudí. Afterward, explore the **Park Güell