# Prompt optimization

- ❌ Prompt engineering... sucks. It's a non-standard process, heavily relying on trial and error and difficult to standardize
- 🤩 Luckily, we can automate it using ✨prompt optimzation✨, investigated in recent works such as [_Self-Supervised Prompt Optimization_](https://arxiv.org/pdf/2502.06855)
- 🎯 In its essence, Prompt Optimization (PO) consists in the process of taking a prompt aiming at performing a certain task and iteratively refining it to make it better for the specific problem tackled.
- ✅ This notebook gives an overview of how to use PO with Mistral models

<div style="text-align: center;">
  <a href="https://ibb.co/GQKqGGvm"><img src="https://i.ibb.co/603kGGRQ/promptopt.png" alt="promptopt" border="0"></a>
</div>

# Problem setting

- You have put up a form, and collected many more answers than the ones you can read.
- Your survey got popular---very popular, 😅---and need to sift through the answers. To keep things accessibly, we allowed (and will continue to!) responses using plain text.
- Filtering is therefore _impossible_. Still, you need some strategies to sift through the applications received to identify the most promising profiles.
- Let's define a few prompts to process answers and output answers we can filter on effectively.

### Task prompts

- Let's define a few prompts to process answers
- These prompts are purposely not optimized, and rather serve as an example of something quick and dirty we wish to work with.
- For this example, we will consider answers collected as part of the applications for our [Ambassadorship Program](https://docs.mistral.ai/guides/contribute/ambassador/)

In [1]:
# overarching prompt, giving context
context = (
    "I am working on recruiting people to advocate about the products of an AI company. "
    "The position in in close contact with the DevRel team, and we are looking at having people "
    "share on their own personal social media more about the company and its products. "
    "The company I work at produces Large Language Models and is very followed, "
    "therefore I got a sheer amount of applications that I need to process "
    "very soon. I won't be able to process them by hand, and there is little structure in the "
    "form that we sent out to applicants. Therefore, I am expecting you to assist me into processing the "
    "information these people gave to make it much more structured. This means that you do read "
    "what applicants declared and extract key information based on the context of the question asked."
)

# classifying job titles
job_prompt = lambda job_title: (
    "Your task is to provide me with a direct classification of the person's job title into one of 4 categories. "
    "The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. "
    "There is no possibility for mixed assignments. You always assign one and one only category to each subject. "
    "When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. "
    "This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. "
    "Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word "
    "based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if "
    "(1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. "
    f"# INPUT declared title: the person job title is {job_title}"
)

# getting the location in an easy way
location_prompt = lambda location: (
    "Your task is basic. Your task is to disambiguate the respondent's answer in terms of the location used. "
    "Your output is always CITY, COUNTRY. Use always the English name of a city. Also, always use the international "
    "country code. Nothing else. For instance, if a user answered with 'Rome', you would output 'Rome, IT'. "
    "In the rare case when someone puts down multiple locations, make sure you always select the first one. Nothing more"
    f" #INPUT declared location: the respondent declared being located in {location}"
)

# unpacking the links someone gave
social_prompt = lambda answer: (
    "Your task is to unpack the answer users gave to questions into a list of links, using '[SEP]' as a separator."
    "Users have answered to a basic question like 'Social Media Links' using in all sorts of way. Some answered with one link "
    "while someone else answered with many (which is very problematic). Some people answered in a complex way, adding many links. "
    "Some people answered in a basic way, some used a format, some a list---quite of a mess. "
    "Your task is to turn whatever answer user gave into a precise format, consisting of the '[SEP]'-concatenation of the links used. "
    "For instance, if you see something on the lines of 'X: https://x.com/username Linkedin: https://linkedin.com/in/username' "
    "you would output something like 'https://x.com/username[SEP]https://linkedin.com/in/username'. My end goal is to unpack this list further "
    "and I am planning on doing so using a split over [SEP]. You will be penalized if you change the original links. "
    "You must make sure the links you see in your input are exactly the ones you see in the output. "
    f" #INPUT declared social media links: {answer}"
)

# classifying people based on amount of experience
experience_prompt = lambda experience: (
    "Your task is to understand whether the respondent has some real experience with AI or not. "
    "As a very technical company with a strong engineering and science team, we are only interested in talking with people "
    "who have a good understanding of the field of machine learning and artificial intelligence. "
    "Because of this, we need to understand if people have (1) 'LOW' (2) 'MEDIUM' or (3) 'HIGH' levels of experience with AI. "
    "This means that you need to read the respondant answer and classify it in one of the three tiers I have mentioned. "
    "You always answer with one word only, associated with the tier considered. "
    "You don't do anything but using words in 'LOW', 'MEDIUM', 'HIGH'. As a matter of fact, you exclusively know these words. "
    "Your entire vocabulary comprises only these three words. Nothing more. Absolutely nothing more. "
    "You would classify someone with research experience coming from papers or a thesis on AI as 'HIGH' experience, "
    "a software engineer that uses AI via external APIs---thus, as a user rather than a developer---as a 'MEDIUM' and "
    "someone with a background in business or a non-technical profile as 'LOW'. Those are the only categories you can chose."
    f" #INPUT declared ai experience: {experience}"
)

# understanding whether they have advocated or not
advocacy_prompt = lambda advocacy: (
    "Your task is to disambiguate whether the person's answer actually---bs asides---mentions advocacy of our products or not. "
    "People gave all sorts of answers. We are interested in public acts of advocacy, as well as enterprise-levels of advocacy "
    "and by this talking about our products to others. People who did talk about our products in public events and used them in content "
    "are strong advocators and we really need them to be surfaced. On the contrary, people who are somewhat indifferent to our products are "
    "not very strong advocators and we should signal them as such. It is important we signal this. "
    "As a general and very rigid piece of guidance, consider your answer is very direct. You exclusively answer with a 'Yes' or 'No', "
    "based on whether the person declares having spent time advocating or not. This would help us understand whether the person "
    "actually cares about products and offering or if is in only for the clout. For this reason, you need to disambiguate between those "
    "who genuinely made some contributions and have spent time advocating for our models (for these, strictly and only answer 'Yes') and "
    "these who did not (for these, strictly and only answer 'No')."
    f" #INPUT declared advocacy: {advocacy}"
    ""
)

# extract a bullet-point synthetic and structured summary
bullets_prompt = lambda content: (
    "Your task is to extract a synthetic, very synthetic and structured bullet-points summary of the content considered. "
    "In doing this, you must produce a very set of actionable insights. Beware the person who has written the content you are seeing "
    "might as well have very poor communication skills. Hence, it is very, very important that you do absolutely your best to extract "
    "a short and clear and synthetic and very, very structured bullet-point like summary of their answer to the question. "
    f" #INPUT possibly-unstructured content from user: {content}"
)

### Installing dependancies

To use SPO via MetaGPT you need to clone the repository, and move this notebook inside of it. Dependancies are not easily usable, but hacking around it is fairly straightforward 😉 

Just run:

```bash
# clone the repo
git clone https://github.com/geekan/MetaGPT
# move this notebook & path inside the directory, and...

# ... install dependancies
pip install -qUr requirements.txt
```

## Create instruction files

After having installed `metagpt`, we can perform prompt optimization creating a yaml file specifying the task tackled.

From `metagpt` [documentation](https://github.com/geekan/MetaGPT/tree/main/examples/spo), this yaml file needs the following structure:

```bash
prompt: |
  Please solve the following problem.

requirements: |
  ...

count: None

qa:
  - question: |
      ...
    answer: |
      ...

  - question: |
      ...
    answer: |
      ...
```

We will need to generate one of these template files **for each** of the prompts we are seeking to optimize. Luckily, we can do so automatically. 

Also, as the tasks we're dealing with are fairly straightforward we can spare us providing few shot examples in the form Q&As 🤩

Still, these template files offer a very straightforward way to provide real-world few-shot examples so definitely worth looking into those.

In [2]:
from typing import Optional

def prompt_to_dict(
        prompt: str,
        requirements: Optional[str],
        questions: list[str],
        answers: list[str],
        count: Optional[int] = None,
)->dict:
    return {
        "prompt": prompt if isinstance(prompt, str) else prompt(""),
        "requirements": requirements,
        "count": count,
        "qa": [
            {
                "question": question,
                "answer": answer
            } for question, answer in zip(questions, answers)
        ]
    }

In [3]:
import yaml

prompts = {
    "job": job_prompt,
    "location": location_prompt,
    "social": social_prompt,
    "experience": experience_prompt,
    "advocacy": advocacy_prompt
}

requirements = [
    "The job title, categorized",
    "The location, disambiguated",
    "The social media links, unpacked",
    "The AI experience, classified",
    "The advocacy level, disambiguated"
]
path = "metagpt/ext/spo/settings"  # this is the path where the template files needs to be saved

for (name, prompt), requirement in zip(prompts.items(), requirements):
    # creating template files for each prompt
    with open(f"{path}/{name}.yaml", "w") as f:
        yaml.dump(
            prompt_to_dict(
                prompt, 
                requirement,
                [""], 
                [""]
            ),
            f,
        )

## Creating model files

Once you created template files for the different prompts, you need to specify which models you need to use as (1) executors (2) evaluators and (3) optimizers for the different prompts.

metagpt's SPO requires you to provide these models within a specific `.yaml` file---you can use the following snippet to create these files using your own Mistral API key ([get one!](https://console.mistral.ai/api-keys)).

In [4]:
def models_dict(
        mistral_api_key: str
    )->dict:
    return {
        "llm": {
            "api_type": "openai",
            "model": "mistral-small-latest",
            "base_url": "https://api.mistral.ai/v1/",
            "api_key": mistral_api_key,
            "temperature": 0
        },
        "models": {
            "mistral-small-latest": {
                "api_type": "openai",
                "base_url": "https://api.mistral.ai/v1/",
                "api_key": mistral_api_key,
                "temperature": 0
            },
            "mistral-large-latest": {
                "api_type": "openai",
                "base_url": "https://api.mistral.ai/v1/",
                "api_key": mistral_api_key,
                "temperature": 0
            }
        }
    }

In [5]:
path = "config/config2.yaml" # saving the models file here

MISTRAL_API_KEY = "..."  # your api key

with open(path, "w") as f:
    yaml.dump(models_dict(MISTRAL_API_KEY), f)

**We're good! 🎉** 

Once you have (1) template files for your candidate prompts and (2) a `models.yaml` file to identify the different models you wish to use, we can get start running rounds and optimizing the prompts 😊

### A little hack: jupyter notebooks don't really work with `asyncio` 🫠

...if only jupyter notebooks worked well with `asyncio` 😂 The little hack here is to export the code you need to run prompt optimization to a `.py` file and then run that one using CLI-like instructions.

Here we are only creating one file for the job title extraction prompt. Exporting these prompt optimization processes to different files also allows for parallel execution (💨, right?). For the sake of demonstration, we are only showing how to optimize one prompt (job extraction), but you can easily switch this to other prompts yourself.

In [6]:
%%writefile spo.py

from metagpt.ext.spo.components.optimizer import PromptOptimizer
from metagpt.ext.spo.utils.llm_client import SPO_LLM

# Initialize LLM settings
SPO_LLM.initialize(
    # same temperature settings as metagpt's default!
    optimize_kwargs={
        "model": "mistral-large-latest", 
        "temperature": 0.6
    },
    evaluate_kwargs={
        "model": "mistral-small-latest", 
        "temperature": 0.3
    },
    execute_kwargs={
        "model": "mistral-small-latest", 
        "temperature": 0
    }
)

template_name = "job.yaml"  # change this for each prompt!

# Create and run optimizer
optimizer = PromptOptimizer(
    optimized_path="workspace",  # Output directory
    initial_round=1,  # Starting round
    max_rounds=5,  # Maximum optimization rounds
    template=template_name,  # Template file - Change this for each prompt!
    name="Mistral-Prompt-Opt",  # Project name
)

optimizer.optimize()

Overwriting spo.py


Now, let's run prompt optimization ☀️

In [7]:
!python spo.py

[32m2025-04-13 01:10:47.728[0m | [1mINFO    [0m | [36mmetagpt.const[0m:[36mget_metagpt_package_root[0m:[36m15[0m - [1mPackage root set to /Users/fracapuano/Documents/mistral/promptopt/MetaGPT[0m
[32m2025-04-13 01:10:47.728[0m | [1mINFO    [0m | [36mmetagpt.const[0m:[36mget_metagpt_package_root[0m:[36m15[0m - [1mPackage root set to /Users/fracapuano/Documents/mistral/promptopt/MetaGPT[0m
[32m2025-04-13 01:10:48.902[0m | [1mINFO    [0m | [36mmetagpt.ext.spo.components.optimizer[0m:[36m_handle_first_round[0m:[36m80[0m - [1m
⚡ RUNNING Round 1 PROMPT ⚡
[0m
[32m2025-04-13 01:10:49.193[0m | [1mINFO    [0m | [36mmetagpt.utils.cost_manager[0m:[36mupdate_cost[0m:[36m57[0m - [1mTotal running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 226, completion_tokens: 2[0m
[32m2025-04-13 01:10:49.376[0m | [1mINFO    [0m | [36mmetagpt.ext.spo.components.optimizer[0m:[36m_generate_optimized_prompt[0m:[36m97[0m - [1m
🚀Rou