# Generating the prompts for experiments

In the following notebooks, we will compare the performance of `prompto` against a traditional synchronous approach to querying LLM endpoints. Before doing this, we need a small sample of prompts to send to each model or API. In this notebook, we will generate a sample of prompts from the [Stanford Alpaca project](https://github.com/tatsu-lab/stanford_alpaca) [1].

## Downloading and sampling the data

First download the [`alpaca_data.json`](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) from the [`tatsu-lab/stanford_alpaca` Github repo](https://github.com/tatsu-lab/stanford_alpaca) and save it in the local directory to regenerate a sample from it.

In [1]:
import json
import random


def sample_alpaca_data(n: int, seed: int):
    # set seed for sampling
    random.seed(seed)

    # read in alpaca.json file
    with open("./alpaca_data.json") as f:
        alpaca_data = json.load(f)

    # randomly sample n prompts
    sampled_prompts = random.sample(alpaca_data, n)

    return sampled_prompts

We generate 100 with a seed of 42:

In [2]:
sample_alpaca = sample_alpaca_data(n=100, seed=42)

In [3]:
sample_alpaca[:5]

[{'instruction': 'What are the advantages of using a Scrum Agile methodology?',
  'input': '',
  'output': 'The advantages of using a Scrum Agile methodology include improved communication, increased accountability, faster project completion, better quality projects, greater customer satisfaction, and better team morale. Additionally, using Scrum Agile allows teams to more easily pivot and respond to changing requirements, which is often necessary in software development.'},
 {'instruction': 'Design an advertisement for a home security product.',
  'input': '',
  'output': 'Stay secure and monitored with [Product Name]! Our cutting-edge home security system keeps you connected no matter where you are in the world. With built-in sensors and 24/7 monitoring, you can rest easy knowing that your home is safe and sound. Get [Product Name] today and take complete control over your home security.'},
 {'instruction': 'Cite a relevant quote that reflects the message of the message.',
  'input':

## Using the prompt templates

We use the prompt templates outlined in the [README](https://github.com/tatsu-lab/stanford_alpaca?tab=readme-ov-file#data-release) of the [`tatsu-lab/stanford_alpaca` Github repo](https://github.com/tatsu-lab/stanford_alpaca) to create the prompts for the model.

In [4]:
prompt_template_input = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
"""

In [5]:
prompt_template_no_input = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""

We use a list compherension to decide whether or not to use the `prompt_template_input` template or `prompt_template_no_input` depending on the `input` key is present in the sample:

In [6]:
sample_prompts = [
    (
        prompt_template_input.format(
            instruction=prompt["instruction"], input=prompt["input"]
        )
        if prompt["input"] != ""
        else prompt_template_no_input.format(instruction=prompt["instruction"])
    )
    for prompt in sample_alpaca
]

In [7]:
sample_prompts

['Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the advantages of using a Scrum Agile methodology?\n\n### Response:\n',
 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nDesign an advertisement for a home security product.\n\n### Response:\n',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nCite a relevant quote that reflects the message of the message.\n\n### Input:\nMessage: Never give up on your goals.\n\n### Response:\n',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGenerate two similar sounding but semantically different words to contrast this word.\n

We write to a `sample_prompts.json` file for loading into different notebooks:

In [8]:
# write prompts to json
with open("./sample_prompts.json", "w") as f:
    json.dump(sample_prompts, f, indent=4)

## References

[1]: Stanford Alpaca: An Instruction-following LLaMA model. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, Tatsunori B. Hashimoto. 2023. https://github.com/tatsu-lab/stanford_alpaca.