# Automatic Agent Optimization with MIPRO

MIPRO (Multiprompt Instruction PRoposal Optimizer) is an optimizer you can use to create the best prompts from your task. You can read the original paper [here](https://arxiv.org/abs/2406.11695).

This optimizer was popularized by the [DSPy library](https://dspy.ai/) that provided the first and most commonly used implementation.

In the notebook below, we will re-implement the algorithm from scratch to get a better understanding of how it works.

## Optimization strategy

The MIPRO algorithm was developed to optimize the multi-step LLM applications when you don't have labels for each step but rather global label for the entire task. This makes it a great algorithm to automatically optimize agents !

## Define the agent

To test the optimization algorithm, we are going to start by creating a very simple agent.

In [1]:
import os
import getpass

%pip install openai

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [34]:
# Create a Sentiment analysis tool
sentiment_prompt = "Analyze the sentiment of this text and classify it as POSITIVE, NEGATIVE. Don't return any other words."


class SentimentAnalysisTool:
    def __init__(self, name: str = "sentiment", prompt: str = ""):
        self.name = name
        self.openai_client = OpenAI()
        self.sentiment_prompt = prompt

    def __call__(self, text: str) -> str:
        completion = self.openai_client.chat.completions.create(
            messages=[
                {"role": "system", "content": self.sentiment_prompt},
                {"role": "user", "content": text},
            ],
            model="gpt-4o",
        )

        return completion.choices[0].message.content

In [35]:
summarize_prompt = "Summarize the text provided below."


class SummarizeTool:
    def __init__(self, name: str = "summarization", prompt: str = ""):
        self.name = name
        self.openai_client = OpenAI()
        self.summarization_prompt = prompt

    def __call__(self, text: str) -> str:
        completion = self.openai_client.chat.completions.create(
            messages=[
                {"role": "system", "content": self.summarization_prompt},
                {"role": "user", "content": text},
            ],
            model="gpt-4o",
        )

        return completion.choices[0].message.content

In [46]:
orchestrator_prompt = """
You are an LLM agent that can both summarize and calculate.

You have access to the tools:
- sentiment: Return a sentiment analysis score
- summarization: Summarize some text

To call a tool, return the text:
Action: <tool_name>: <tool_parameter>

Return STOP when finished
"""


class Orchestrator:
    def __init__(self, name: str = "orchestror", prompt: str = ""):
        self.name = name
        self.openai_client = OpenAI()
        self.orchestrator_prompt = prompt

    def __call__(self, messages: list) -> str:
        completion = self.openai_client.chat.completions.create(
            messages=[
                {"role": "system", "content": self.orchestrator_prompt},
            ]
            + messages,
            model="gpt-4o",
        )

        return completion.choices[0].message.content

In [56]:
from openai import OpenAI
import re


class Agent:
    def __init__(
        self, client: OpenAI, orchestrator: Orchestrator, tools: list = []
    ) -> None:
        self.client = client
        self.tools = tools
        self.orchestrator = orchestrator

    def call_agent(self, message: str = "", max_iterations: int = 10) -> str:
        i = 0
        module_calls = []
        messages = [{"role": "user", "content": message}]

        next_prompt = message
        while i < max_iterations:
            i += 1
            orchestrator_response = self.orchestrator(messages)

            module_calls.append(
                {
                    "module_name": "orchestrator",
                    "input": messages,
                    "output": orchestrator_response,
                }
            )

            if "STOP" in orchestrator_response:
                break
            elif "Action" in orchestrator_response:
                action = re.findall(
                    r"Action: ([a-z_]+): (.+)", orchestrator_response, re.IGNORECASE
                )
                chosen_tool = action[0][0]
                arg = action[0][1]

                tool_names = [x.name for x in self.tools]
                if chosen_tool in tool_names:
                    tool = [x for x in self.tools if x.name == chosen_tool][0]

                    result_tool = tool(arg)
                    next_prompt = f"{chosen_tool}: {result_tool}"

                    messages += [{"role": "assistant", "content": next_prompt}]

                    module_calls.append(
                        {
                            "module_name": chosen_tool,
                            "input": arg,
                            "output": result_tool,
                        }
                    )
                else:
                    raise ValueError(f"Could not find tool {chosen_tool}")

        return next_prompt, module_calls


agent = Agent(
    client=OpenAI(),
    orchestrator=Orchestrator(prompt=orchestrator_prompt),
    tools=[
        SentimentAnalysisTool(prompt=sentiment_prompt),
        SummarizeTool(prompt=summarize_prompt),
    ],
)


print("----- Questions chatbot ------")
example = """Summarize the following customer reviews and analyze their overall sentiment:

The laptop is fantastic! The battery life lasts all day, and the display is beautiful. However, the keyboard feels a bit flimsy.
"""


print(example + "\n")
res, module_calls = agent.call_agent(example)
print("Response: " + res)

----- Questions chatbot ------
Summarize the following customer reviews and analyze their overall sentiment:

The laptop is fantastic! The battery life lasts all day, and the display is beautiful. However, the keyboard feels a bit flimsy.


Response: sentiment: POSITIVE


The MIPRO algorithm relies on the following:
1. A dataset: This is a set of labelled data with positive and negative samples
2. A metric: A metric to optimize over
3. A task: The agent to optimize

For the purposes of this investigation, we will define these as the following:

In [57]:
dataset = [
    {
        "input": """Summarize the following customer reviews and analyze their overall sentiment:
        I just love waiting in long lines at the DMV... said no one ever.""",
        "expected_output": "Result: NEGATIVE",
    },
    {
        "input": """Summarize the following customer reviews and analyze their overall sentiment:
        The food was absolutely terrible, but at least the waiter was friendly.""",
        "expected_output": "Result: NEGATIVE",
    },
    {
        "input": """Summarize the following customer reviews and analyze their overall sentiment:
        This movie was so bad that it’s actually hilarious. I had a great time watching it!""",
        "expected_output": "Result: POSITIVE",
    },
    {
        "input": """Summarize the following customer reviews and analyze their overall sentiment:
        Wow, I can't believe how amazing this service is! It only took them two hours to get my order wrong.""",
        "expected_output": "Result: NEGATIVE",
    },
    {
        "input": """Summarize the following customer reviews and analyze their overall sentiment:
        The software crashes every time I open it. But hey, at least the icon looks nice!""",
        "expected_output": "Result: NEGATIVE",
    },
]


def equals_metric(output, expected_output):
    if expected_output.lower() == output.lower():
        return 1
    else:
        return 0

In [58]:
for i in dataset:
    res, module_calls = agent.call_agent(i["input"])
    print(res)
    print(equals_metric(res, i["expected_output"]))
    print("---")

sentiment: NEGATIVE
0
---
sentiment: NEGATIVE
0
---
sentiment: POSITIVE
0
---
sentiment: NEGATIVE
0
---
sentiment: NEGATIVE
0
---


## The MIPRO algorithm

The MIPRO algorithm has 3 phases:
1. Initialize

### Initialization

For the initialization step, we will start by bootstraping a set of N few-shot examples for each module. In the agent defined above, we have three different modules:

1. Orchestrator
2. Sentiment analysis tool
3. Summarization tool

For this we are going to use the Bootstrap Demonstration and Grounding.

#### Bootstrap demonstrations

The concept of bootstrap demonstrations is remarkably simple. The idea is that if the agent as a whole returns the correct output, then we can expect that the input / output pair for each module is correct. This allows us to create a dataset of "labelled" data for each module without needing to specify a dataset for each module.

In [43]:
def bootstrap_demonstrations(agent, dataset, metric):
    positive_samples = []
    for item in dataset:
        output, module_calls = agent.call_agent(item["input"])

        score = metric(output=output, expected_output=item["expected_output"])

        if score >= 1:
            positive_samples += module_calls

    return positive_samples


positive_samples = bootstrap_demonstrations(agent, dataset, equals_metric)

In [44]:
positive_samples

[{'module_name': 'orchestrator',
  'input': 'Summarize the following customer reviews and analyze their overall sentiment:\n        I just love waiting in long lines at the DMV... said no one ever.',
  'output': 'Action: summarization: I just love waiting in long lines at the DMV... said no one ever.'},
 {'module_name': 'summarization',
  'input': 'I just love waiting in long lines at the DMV... said no one ever.',
  'output': 'This statement humorously expresses the common frustration people feel about waiting in long lines at the DMV. Nobody actually enjoys the experience.'},
 {'module_name': 'orchestrator',
  'input': 'Result: This statement humorously expresses the common frustration people feel about waiting in long lines at the DMV. Nobody actually enjoys the experience.',
  'output': 'Action: sentiment: This statement humorously expresses the common frustration people feel about waiting in long lines at the DMV. Nobody actually enjoys the experience.'},
 {'module_name': 'sentime

#### Grounding

In order to propose a better prompt template for each module, we need the LLM to create some instruction set based on what each module does specifically. While the authors created a zero-shot LLM program for this, we are going to take a simpler approach. We are going to craft a prompt that will take the code of each module as well as the agent as a whole and return a set of instructions on how to improve each module