In [2]:
# Set your OpenAI API key (required for openai/gpt-4o-mini).
# When you run this cell, a prompt will appear—paste your key there (it won't be shown on screen).
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    key = getpass.getpass("OpenAI API key: ")
    if key:
        os.environ["OPENAI_API_KEY"] = key
    else:
        print("No key entered. Set OPENAI_API_KEY in this notebook or in your shell.")

In [3]:
import dspy

lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)



In [15]:

qa = dspy.Predict('question: str -> response: str')
response = qa(question="what are high memory and low memory on linux?")

print(response.response)

In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of how the kernel manages memory for processes.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel and can be used by processes without any special handling. In a 32-bit system, this is usually the first 896 MB of RAM (though this can vary based on the architecture and configuration). Low memory is used for kernel data structures and for user-space processes that require direct access to memory.

- **High Memory**: This refers to memory that is above the low memory limit and is not directly accessible by the kernel in a 32-bit system. Processes can use this memory, but the kernel must use special mechanisms (like paging) to access it. High memory is often used in systems with large amounts of RAM, allowing more memory to be allocated to user-space processes while still maintaining a limited address space for 

You can inspect the n last prompts sent by DSPy easily. Alternatively, if you enabled MLflow Tracing above, you can see the full LLM interactions for each program execution in a tree view.

In [16]:
dspy.inspect_history(n=1)





[34m[2026-02-10T13:30:58.857597][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
what are high memory and low memory on linux?

Respond with the corresponding output fields, starting with the field `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## response ## ]]
In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of how the kernel manages memory for processes.

- **Low Memory**: This typically refers to the memory tha

DSPy has various built-in modules, e.g. dspy.ChainOfThought, dspy.ProgramOfThought, and dspy.ReAct. These are interchangeable with basic dspy.Predict: they take your signature, which is specific to your task, and they apply general-purpose prompting techniques and inference-time strategies to it.

For example, dspy.ChainOfThought is an easy way to elicit reasoning out of your LM before it commits to the outputs requested in your signature.

In the example below, we'll omit str types (as the default type is string). You should feel free to experiment with other fields and types, e.g. try topics: list[str] or is_realistic: bool.


In [19]:
# cot = dspy.ChainOfThought('question -> response: list[str]')
# cot(question="should curly braces appear on their own line?")

# cot = dspy.ChainOfThought('question -> response: bool')
# cot(question="should curly braces appear on their own line?")

cot = dspy.ChainOfThought('question -> response')
cot(question="should curly braces appear on their own line?")

Prediction(
    reasoning='The placement of curly braces on their own line is often a matter of coding style and conventions. In many programming languages, such as Java, C#, and JavaScript, it is common to place opening curly braces on the same line as the preceding statement, while closing curly braces may be placed on their own line. This style enhances readability and maintains a clean structure. However, some coding standards, like those used in Python or certain configurations of C++, may prefer braces to be on their own lines for clarity. Ultimately, it depends on the specific coding guidelines being followed.',
    response="Curly braces do not necessarily have to appear on their own line; it depends on the coding style and conventions being used. Many developers prefer to keep opening braces on the same line as the preceding statement for readability, while closing braces are often placed on their own line. It's best to follow the coding standards of the project or language yo

In [24]:
dspy.inspect_history(n=2)





[34m[2026-02-10T13:36:09.478108][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `response` (bool):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}        # note: the value you produce must be True or False

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## question ## ]]
should curly braces appear on their own line?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## response ## ]]` (must be formatted as a valid Python bool), and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The placement of curly braces on their own line 

In [5]:
import orjson
import requests
from pathlib import Path

# Save in this repo so the path is predictable (run this cell from workspace/dspy or set path below)
DATA_PATH = Path("ragqa_arena_tech_examples.jsonl").resolve()
print(f"Data file: {DATA_PATH}")

Data file: /Users/eddiej@nisos.com/mespace/dspy/ragqa_arena_tech_examples.jsonl


Evaluation

In [None]:
URL = "https://huggingface.co/dspy/cache/resolve/main/ragqa_arena_tech_examples.jsonl"
if not DATA_PATH.exists():
    print(f"Downloading to {DATA_PATH}...")
    r = requests.get(URL, stream=True)
    r.raise_for_status()
    with open(DATA_PATH, "wb") as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
else:
    print(f"Using existing file: {DATA_PATH}")

In [12]:
with open(DATA_PATH) as f:
    data = [orjson.loads(line) for line in f]
    
data[0]

{'question': 'why igp is used in mpls?',
 'response': "An IGP exchanges routing prefixes between gateways/routers.  \nWithout a routing protocol, you'd have to configure each route on every router and you'd have no dynamic updates when routes change because of link failures. \nFuthermore, within an MPLS network, an IGP is vital for advertising the internal topology and ensuring connectivity for MP-BGP inside the network.",
 'gold_doc_ids': [2822, 2823]}

In [13]:
import dspy
data = [dspy.Example(**d).with_inputs('question') for d in data]

In [14]:
data

[Example({'question': 'why igp is used in mpls?', 'response': "An IGP exchanges routing prefixes between gateways/routers.  \nWithout a routing protocol, you'd have to configure each route on every router and you'd have no dynamic updates when routes change because of link failures. \nFuthermore, within an MPLS network, an IGP is vital for advertising the internal topology and ensuring connectivity for MP-BGP inside the network.", 'gold_doc_ids': [2822, 2823]}) (input_keys={'question'}),
 Example({'question': 'how do you achieve a numeric versioning scheme with git?', 'response': "Take a look at the process of using 'git-describe' by way of GIT-VERSION-GEN and how you can add this via your build process when you tag your release. \nFor guidance, can also refer to the provided blog at http://cd34.com/blog/programming/using-git-to-generate-an-automatic-version-number/.", 'gold_doc_ids': [3965]}) (input_keys={'question'}),
 Example({'question': 'why are my text messages coming up as maybe

In [15]:
import random

random.Random(0).shuffle(data)
trainset, devset, testset = data[:200], data[200:500], data[500:1000]

len(trainset), len(devset), len(testset)



(200, 300, 500)

In [21]:
example = data[4]
example

Example({'question': 'how create a temporary file in shell script?', 'response': 'Use mktemp to create a temporary file `temp_file=$(mktemp)` or, alternatively, to create a temporary directory: `temp_dir=$(mktemp -d)`. \nAt the end of the script you have to delete the temporary file or directory `rm ${temp_file} rm -R ${temp_dir} mktemp`.', 'gold_doc_ids': [4276]}) (input_keys={'question'})

In [22]:
from dspy.evaluate import SemanticF1

# Instantiate the metric.
metric = SemanticF1(decompositional=True)

# Produce a prediction from our `cot` module, using the `example` above as input.
pred = cot(**example.inputs())

# Compute the metric score for the prediction.
score = metric(example, pred)

print(f"Question: \t {example.question}\n")
print(f"Gold Response: \t {example.response}\n")
print(f"Predicted Response: \t {pred.response}\n")
print(f"Semantic F1 Score: {score:.2f}")

Question: 	 how create a temporary file in shell script?

Gold Response: 	 Use mktemp to create a temporary file `temp_file=$(mktemp)` or, alternatively, to create a temporary directory: `temp_dir=$(mktemp -d)`. 
At the end of the script you have to delete the temporary file or directory `rm ${temp_file} rm -R ${temp_dir} mktemp`.

Predicted Response: 	 You can create a temporary file in a shell script using the following command:

```bash
temp_file=$(mktemp /tmp/mytempfile.XXXXXX)
```

This command creates a temporary file in the `/tmp` directory with a name that starts with `mytempfile.` followed by six random characters. You can then use `$temp_file` to refer to the temporary file in your script. Remember to clean up the temporary file after use by removing it with `rm $temp_file`.

Semantic F1 Score: 0.67


In [35]:
dspy.inspect_history(n=1)






[34m[2026-02-10T13:56:01.591461][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str): 
2. `ground_truth` (str): 
3. `system_response` (str):
Your output fields are:
1. `reasoning` (str): 
2. `ground_truth_key_ideas` (str): enumeration of key ideas in the ground truth
3. `system_response_key_ideas` (str): enumeration of key ideas in the system response
4. `discussion` (str): discussion of the overlap between ground truth and system response
5. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response
6. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## ground_truth ## ]]
{ground_truth}

[[ ## system_response ## ]]
{system_response}

[[ ## reasoning ## ]]
{reasoning}

[[ ## ground_truth_key_ideas ## ]]
{ground_truth_key_ideas}

[[ ## system_response_key_ideas

In [23]:
# Define an evaluator that we can re-use.
evaluate = dspy.Evaluate(devset=devset, metric=metric, num_threads=24,
                         display_progress=True, display_table=2)

# Evaluate the Chain-of-Thought program.
evaluate(cot)


Average Metric: 125.66 / 300 (41.9%): 100%|██████████| 300/300 [00:00<00:00, 480.16it/s]

2026/02/13 11:53:01 INFO dspy.evaluate.evaluate: Average Metric: 125.65754569486975 / 300 (41.9%)





Unnamed: 0,question,example_response,gold_doc_ids,reasoning,pred_response,SemanticF1
0,"when to use c over c++, and c++ over c?","If you are equally familiar with both C++ and C, it's advisable to...",[733],"C and C++ are both powerful programming languages, but they serve ...","Use C when you need low-level system programming, performance, and...",
1,should images be stored in a git repository?,"One viewpoint expresses that there is no significant downside, esp...","[6253, 6254, 6275, 6278, 8215]",Storing images in a Git repository can be problematic for several ...,Images should generally not be stored in a Git repository due to i...,✔️ [0.333]


41.89

Advance it with RAG

Let's Build a RAG and see if this performs better

In [6]:
max_characters = 6000  # for truncating >99th percentile of documents
topk_docs_to_retrieve = 5  # number of documents to retrieve per search query

with open("ragqa_arena_tech_corpus.jsonl") as f:
    corpus = [orjson.loads(line)['text'][:max_characters] for line in f]
    print(f"Loaded {len(corpus)} documents. Will encode them below.")

embedder = dspy.Embedder('openai/text-embedding-3-small', dimensions=512)
search = dspy.retrievers.Embeddings(embedder=embedder, corpus=corpus, k=topk_docs_to_retrieve)

Loaded 28436 documents. Will encode them below.
Training a 32-byte FAISS index with 337 partitions, based on 28436 x 512-dim embeddings


In [7]:
class RAG(dspy.Module):
    def __init__(self):
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question).passages
        return self.respond(context=context, question=question)

In [8]:
rag = RAG()
rag(question="what are high memory and low memory on linux?")

Prediction(
    reasoning="High Memory and Low Memory are two segments of memory management in the Linux kernel, particularly relevant in 32-bit architectures. Low Memory refers to the portion of memory that the kernel can access directly, which is always mapped in the kernel's address space. This allows the kernel to perform operations without needing to map memory pages explicitly. High Memory, on the other hand, is a segment of memory that is not permanently mapped in the kernel's address space. When the kernel needs to access High Memory, it must temporarily map it into its address space using functions like `kmap`. This distinction is crucial for managing memory efficiently, especially when dealing with large amounts of RAM, as it allows the kernel to handle more memory than what is directly addressable in its space.",
    response="In Linux, High Memory refers to the segment of memory that is not permanently mapped into the kernel's address space, requiring the kernel to use func

In [24]:
evaluate(RAG())

Average Metric: 170.06 / 300 (56.7%): 100%|██████████| 300/300 [03:00<00:00,  1.66it/s]

2026/02/13 11:56:10 INFO dspy.evaluate.evaluate: Average Metric: 170.05686452754802 / 300 (56.7%)





Unnamed: 0,question,example_response,gold_doc_ids,reasoning,pred_response,SemanticF1
0,"when to use c over c++, and c++ over c?","If you are equally familiar with both C++ and C, it's advisable to...",[733],C should be used over C++ primarily in scenarios where simplicity ...,"Use C over C++ when working on embedded systems, requiring low-lev...",✔️ [0.500]
1,should images be stored in a git repository?,"One viewpoint expresses that there is no significant downside, esp...","[6253, 6254, 6275, 6278, 8215]",Storing images in a Git repository can be problematic due to Git's...,While it is technically possible to store images in a Git reposito...,✔️ [0.500]


56.69

Off the shelf, our RAG module scores 56.7%
!!!

Next, I will show you a further enhancement on the quality of the system provided by the signature of DSPy -> Prompt optimisation

Optimise self.respond = dspy.ChainOfThought('context, question -> response')
However,if there are many sub-modules in your program, all of them will be optimized together.

In [25]:
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(RAG(), trainset=trainset,
                           max_bootstrapped_demos=2, max_labeled_demos=2)

2026/02/13 12:02:47 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
num_trials: 18
minibatch: True
num_fewshot_candidates: 12
num_instruct_candidates: 6
valset size: 160



[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m6[0m[93m * [94m[1m1[0m[93m lm calls in program + ([94m[1m2[0m[93m) lm calls in program-aware proposer = [94m[1m18[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m35[0m[93m examples in minibatch * [94m[1m18[0m[93m batches + [94m[1m160[0m[93m examples in val set * [94m[1m4[0m[93m full evals = [94m[1m1270[0m[93m LM Program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token)
            + (Number of program calls * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model

2026/02/13 12:03:08 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2026/02/13 12:03:08 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2026/02/13 12:03:08 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=12 sets of demonstrations...



No input received within 20 seconds. Proceeding with execution...
Bootstrapping set 1/12
Bootstrapping set 2/12
Bootstrapping set 3/12


  from .autonotebook import tqdm as notebook_tqdm
  5%|▌         | 2/40 [00:30<09:42, 15.33s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 4/12


 15%|█▌        | 6/40 [01:27<08:13, 14.53s/it]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 5/12


  5%|▌         | 2/40 [00:29<09:13, 14.55s/it]


Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 6/12


  5%|▌         | 2/40 [00:41<13:13, 20.88s/it]


Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 7/12


  5%|▌         | 2/40 [00:31<09:52, 15.58s/it]


Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 8/12


 10%|█         | 4/40 [00:52<07:55, 13.21s/it]


Bootstrapped 1 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 9/12


  5%|▌         | 2/40 [00:30<09:48, 15.50s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 10/12


  5%|▌         | 2/40 [00:20<06:38, 10.47s/it]


Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 11/12


  2%|▎         | 1/40 [00:12<08:09, 12.56s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 12/12


  5%|▌         | 2/40 [00:25<08:01, 12.66s/it]
2026/02/13 12:09:10 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2026/02/13 12:09:10 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.


2026/02/13 12:09:36 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=6 instructions...

2026/02/13 12:10:29 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2026/02/13 12:10:29 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `context`, `question`, produce the fields `response`.

2026/02/13 12:10:29 INFO dspy.teleprompt.mipro_optimizer_v2: 1: In a high-stakes situation where you are facing a critical technical issue on macOS or Linux, you need to quickly troubleshoot and resolve the problem. Given the provided `context` that contains relevant troubleshooting information and the `question` detailing your specific issue, produce a detailed and coherent `response`. This response should guide the user through the necessary steps to solve the issue effectively, ensuring they can regain functionality without further delay.

2026/02/13 12:10:29 INFO dspy.teleprompt.mipro_optimizer_v2: 2: You are a technical support specialist assisting users wi

Average Metric: 88.81 / 160 (55.5%): 100%|██████████| 160/160 [01:27<00:00,  1.83it/s]

2026/02/13 12:11:57 INFO dspy.evaluate.evaluate: Average Metric: 88.81023055967225 / 160 (55.5%)
2026/02/13 12:11:57 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 55.51

  optuna_warn(
2026/02/13 12:11:57 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 23 - Minibatch ==



Average Metric: 21.51 / 35 (61.4%): 100%|██████████| 35/35 [00:26<00:00,  1.31it/s]

2026/02/13 12:12:24 INFO dspy.evaluate.evaluate: Average Metric: 21.50658845868181 / 35 (61.4%)
2026/02/13 12:12:24 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.45 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:12:24 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45]
2026/02/13 12:12:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51]
2026/02/13 12:12:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 55.51


2026/02/13 12:12:24 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 23 - Minibatch ==



Average Metric: 19.48 / 35 (55.6%): 100%|██████████| 35/35 [00:22<00:00,  1.58it/s]

2026/02/13 12:12:46 INFO dspy.evaluate.evaluate: Average Metric: 19.47621983861602 / 35 (55.6%)
2026/02/13 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 55.65 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 2'].
2026/02/13 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65]
2026/02/13 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51]
2026/02/13 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 55.51


2026/02/13 12:12:46 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 23 - Minibatch ==



Average Metric: 22.35 / 35 (63.9%): 100%|██████████| 35/35 [00:26<00:00,  1.34it/s]

2026/02/13 12:13:12 INFO dspy.evaluate.evaluate: Average Metric: 22.354205752328923 / 35 (63.9%)
2026/02/13 12:13:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 63.87 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 6'].





2026/02/13 12:13:13 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87]
2026/02/13 12:13:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51]
2026/02/13 12:13:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 55.51


2026/02/13 12:13:13 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 23 - Minibatch ==


Average Metric: 21.16 / 35 (60.4%): 100%|██████████| 35/35 [00:30<00:00,  1.13it/s]

2026/02/13 12:13:44 INFO dspy.evaluate.evaluate: Average Metric: 21.157303053486697 / 35 (60.4%)
2026/02/13 12:13:44 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.45 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4'].
2026/02/13 12:13:44 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45]
2026/02/13 12:13:44 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51]
2026/02/13 12:13:44 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 55.51


2026/02/13 12:13:44 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 23 - Minibatch ==



Average Metric: 19.62 / 35 (56.1%): 100%|██████████| 35/35 [00:29<00:00,  1.18it/s]

2026/02/13 12:14:13 INFO dspy.evaluate.evaluate: Average Metric: 19.623056895603526 / 35 (56.1%)
2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 56.07 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 5'].
2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07]
2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51]
2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 55.51


2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 23 - Full Evaluation =====
2026/02/13 12:14:13 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 63.87) from minibatch trials...



Average Metric: 95.27 / 160 (59.5%): 100%|██████████| 160/160 [01:27<00:00,  1.83it/s]

2026/02/13 12:15:41 INFO dspy.evaluate.evaluate: Average Metric: 95.27402803389197 / 160 (59.5%)
2026/02/13 12:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 59.55
2026/02/13 12:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55
2026/02/13 12:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: 

2026/02/13 12:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 23 - Minibatch ==



Average Metric: 22.00 / 35 (62.8%): 100%|██████████| 35/35 [00:23<00:00,  1.48it/s]

2026/02/13 12:16:05 INFO dspy.evaluate.evaluate: Average Metric: 21.99591415854313 / 35 (62.8%)
2026/02/13 12:16:05 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.85 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:16:05 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85]
2026/02/13 12:16:05 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:16:05 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55


2026/02/13 12:16:05 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 23 - Minibatch ==



Average Metric: 20.56 / 35 (58.7%): 100%|██████████| 35/35 [00:30<00:00,  1.17it/s]

2026/02/13 12:16:35 INFO dspy.evaluate.evaluate: Average Metric: 20.55897267808781 / 35 (58.7%)
2026/02/13 12:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 58.74 on minibatch of size 35 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 1'].
2026/02/13 12:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74]
2026/02/13 12:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55


2026/02/13 12:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 23 - Minibatch ==



Average Metric: 19.36 / 35 (55.3%): 100%|██████████| 35/35 [00:24<00:00,  1.41it/s]

2026/02/13 12:17:00 INFO dspy.evaluate.evaluate: Average Metric: 19.363254139344495 / 35 (55.3%)
2026/02/13 12:17:00 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 55.32 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 3'].
2026/02/13 12:17:00 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32]
2026/02/13 12:17:00 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:17:00 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55


2026/02/13 12:17:00 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 23 - Minibatch ==



Average Metric: 20.12 / 35 (57.5%): 100%|██████████| 35/35 [00:37<00:00,  1.06s/it]

2026/02/13 12:17:37 INFO dspy.evaluate.evaluate: Average Metric: 20.12047252118904 / 35 (57.5%)
2026/02/13 12:17:37 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.49 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 9'].
2026/02/13 12:17:37 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49]
2026/02/13 12:17:37 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:17:37 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55


2026/02/13 12:17:37 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 23 - Minibatch ==



Average Metric: 21.96 / 35 (62.7%): 100%|██████████| 35/35 [00:25<00:00,  1.36it/s]

2026/02/13 12:18:03 INFO dspy.evaluate.evaluate: Average Metric: 21.955208844747204 / 35 (62.7%)
2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.73 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73]
2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55]
2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 59.55


2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 23 - Full Evaluation =====
2026/02/13 12:18:03 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 62.79) from minibatch trials...



Average Metric: 96.46 / 160 (60.3%): 100%|██████████| 160/160 [01:16<00:00,  2.10it/s]

2026/02/13 12:19:19 INFO dspy.evaluate.evaluate: Average Metric: 96.46223639410225 / 160 (60.3%)
2026/02/13 12:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 60.29
2026/02/13 12:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29
2026/02/13 12:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: 

2026/02/13 12:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 14 / 23 - Minibatch ==



Average Metric: 19.95 / 35 (57.0%): 100%|██████████| 35/35 [00:33<00:00,  1.06it/s]

2026/02/13 12:19:52 INFO dspy.evaluate.evaluate: Average Metric: 19.950705531189197 / 35 (57.0%)
2026/02/13 12:19:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:19:52 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0]
2026/02/13 12:19:52 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:19:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:19:52 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 15 / 23 - Minibatch ==



Average Metric: 22.51 / 35 (64.3%): 100%|██████████| 35/35 [00:41<00:00,  1.18s/it]

2026/02/13 12:20:33 INFO dspy.evaluate.evaluate: Average Metric: 22.512206781207883 / 35 (64.3%)
2026/02/13 12:20:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.32 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:20:33 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32]
2026/02/13 12:20:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:20:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:20:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 16 / 23 - Minibatch ==



Average Metric: 22.52 / 35 (64.3%): 100%|██████████| 35/35 [00:34<00:00,  1.00it/s]

2026/02/13 12:21:08 INFO dspy.evaluate.evaluate: Average Metric: 22.519941063221182 / 35 (64.3%)
2026/02/13 12:21:08 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.34 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:21:08 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34]
2026/02/13 12:21:08 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:21:08 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:21:08 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 17 / 23 - Minibatch ==



Average Metric: 21.36 / 35 (61.0%): 100%|██████████| 35/35 [00:38<00:00,  1.10s/it]

2026/02/13 12:21:47 INFO dspy.evaluate.evaluate: Average Metric: 21.36156748904507 / 35 (61.0%)
2026/02/13 12:21:47 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.03 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:21:47 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34, 61.03]
2026/02/13 12:21:47 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:21:47 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:21:47 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 18 / 23 - Minibatch ==



Average Metric: 21.67 / 35 (61.9%): 100%|██████████| 35/35 [00:39<00:00,  1.14s/it]

2026/02/13 12:22:27 INFO dspy.evaluate.evaluate: Average Metric: 21.673829269341656 / 35 (61.9%)
2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 61.93 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 10'].
2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34, 61.03, 61.93]
2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29]
2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 23 - Full Evaluation =====
2026/02/13 12:22:27 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 63.23) from minibatch trials...



Average Metric: 96.17 / 160 (60.1%): 100%|██████████| 160/160 [01:14<00:00,  2.15it/s]

2026/02/13 12:23:41 INFO dspy.evaluate.evaluate: Average Metric: 96.17359005513732 / 160 (60.1%)
2026/02/13 12:23:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29, 60.11]
2026/02/13 12:23:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29
2026/02/13 12:23:41 INFO dspy.teleprompt.mipro_optimizer_v2: 

2026/02/13 12:23:41 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 20 / 23 - Minibatch ==



Average Metric: 20.39 / 35 (58.2%): 100%|██████████| 35/35 [00:41<00:00,  1.19s/it]

2026/02/13 12:24:23 INFO dspy.evaluate.evaluate: Average Metric: 20.386243699426224 / 35 (58.2%)
2026/02/13 12:24:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 58.25 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 7'].
2026/02/13 12:24:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34, 61.03, 61.93, 58.25]
2026/02/13 12:24:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29, 60.11]
2026/02/13 12:24:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:24:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 21 / 23 - Minibatch ==



Average Metric: 18.92 / 35 (54.1%): 100%|██████████| 35/35 [00:39<00:00,  1.12s/it]

2026/02/13 12:25:02 INFO dspy.evaluate.evaluate: Average Metric: 18.91922188774477 / 35 (54.1%)
2026/02/13 12:25:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 54.05 on minibatch of size 35 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 11'].
2026/02/13 12:25:02 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34, 61.03, 61.93, 58.25, 54.05]
2026/02/13 12:25:02 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29, 60.11]
2026/02/13 12:25:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:25:02 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 22 / 23 - Minibatch ==



Average Metric: 19.77 / 35 (56.5%): 100%|██████████| 35/35 [00:01<00:00, 26.81it/s]

2026/02/13 12:25:04 INFO dspy.evaluate.evaluate: Average Metric: 19.77398428662599 / 35 (56.5%)
2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 56.5 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 6'].
2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [61.45, 55.65, 63.87, 60.45, 56.07, 62.85, 58.74, 55.32, 57.49, 62.73, 57.0, 64.32, 64.34, 61.03, 61.93, 58.25, 54.05, 56.5]
2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29, 60.11]
2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29


2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 23 / 23 - Full Evaluation =====
2026/02/13 12:25:04 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 61.93) from minibatch trials...



Average Metric: 96.19 / 160 (60.1%): 100%|██████████| 160/160 [01:30<00:00,  1.76it/s]

2026/02/13 12:26:35 INFO dspy.evaluate.evaluate: Average Metric: 96.19174065637468 / 160 (60.1%)
2026/02/13 12:26:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [55.51, 59.55, 60.29, 60.11, 60.12]
2026/02/13 12:26:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 60.29
2026/02/13 12:26:35 INFO dspy.teleprompt.mipro_optimizer_v2: 

2026/02/13 12:26:35 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 60.29!





In [26]:
baseline = rag(question="cmd+tab does not work on hidden or minimized windows")
print(baseline.response)

You are correct that cmd+tab does not work on hidden or minimized windows. It is primarily used to switch between active applications. To manage minimized windows, you may need to use other key combinations or adjust your system settings.


In [27]:
pred = optimized_rag(question="cmd+tab does not work on hidden or minimized windows")
print(pred.response)

The Command + Tab functionality on Mac does not work with hidden or minimized windows directly. When you use Command + Tab, it allows you to cycle through your most recently used applications, but if an application is minimized, you cannot switch back to it using this shortcut unless you first switch to another application and let it take focus. This means that minimized windows will not be activated by Command + Tab until they are restored manually or until you switch focus away from them. If you want to manage minimized windows, you may need to use other shortcuts or methods, such as Command + Option + H + M to hide others and minimize the most recent item.


In [30]:
dspy.inspect_history(n=2)





[34m[2026-02-13T13:47:25.196387][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): 
2. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `context`, `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## context ## ]]
[1] «If you hold command and quickly tap tab you will cycle between your two most recently used applications without bringing up the heads up display. If you want to use a third party app Witch has options to ignore minimized windows.»
[2] «Try this: On your Mac, Navigate to System Preferences Go to Mission Control Uncheck When switching to an application, switch to a

In [31]:
evaluate(optimized_rag)

Average Metric: 183.67 / 300 (61.2%): 100%|██████████| 300/300 [02:44<00:00,  1.82it/s]

2026/02/13 14:00:57 INFO dspy.evaluate.evaluate: Average Metric: 183.66667203555758 / 300 (61.2%)





Unnamed: 0,question,example_response,gold_doc_ids,reasoning,pred_response,SemanticF1
0,"when to use c over c++, and c++ over c?","If you are equally familiar with both C++ and C, it's advisable to...",[733],The context provides insights into the strengths and weaknesses of...,Use C over C++ when working on embedded systems or projects where ...,✔️ [0.545]
1,should images be stored in a git repository?,"One viewpoint expresses that there is no significant downside, esp...","[6253, 6254, 6275, 6278, 8215]",The context presents various perspectives on whether images should...,Storing images in a Git repository is generally not recommended du...,✔️ [0.421]


61.22

In [32]:
cost = sum([x['cost'] for x in lm.history if x['cost'] is not None])  # in USD, as calculated by LiteLLM for certain providers

In [33]:
cost

1.8327576000000048

In [34]:
lm.history

[{'prompt': None,
  'messages': [{'role': 'system',
    'content': 'Your input fields are:\n1. `context` (str): \n2. `question` (str):\nYour output fields are:\n1. `reasoning` (str): \n2. `response` (str):\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## context ## ]]\n{context}\n\n[[ ## question ## ]]\n{question}\n\n[[ ## reasoning ## ]]\n{reasoning}\n\n[[ ## response ## ]]\n{response}\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n        Given the fields `context`, `question`, produce the fields `response`.'},
   {'role': 'user',
    'content': '[[ ## context ## ]]\n[1] «As far as I remember, High Memory is used for application space and Low Memory for the kernel. Advantage is that (user-space) applications cant access kernel-space memory.»\n[2] «HIGHMEM is a range of kernels memory space, but it is NOT memory you access but its a place where you put what you want to access. A typical 32bit Linu

In [36]:
optimized_rag.save("optimized_rag.json")

loaded_rag = RAG()
loaded_rag.load("optimized_rag.json")

loaded_rag(question="cmd+tab does not work on hidden or minimized windows")

Prediction(
    reasoning="The context provides various insights into how the Command + Tab functionality works on Mac systems, particularly regarding switching between applications. It highlights that Command + Tab allows users to cycle through their most recently used applications, but it does not activate minimized or hidden windows unless specific conditions are met. For instance, if an application is minimized, the user must first switch to another application and let it take focus before returning to the minimized app. This indicates that Command + Tab does not directly interact with minimized windows, which aligns with the user's observation that it does not work on hidden or minimized windows.",
    response='The Command + Tab functionality on Mac does not work with hidden or minimized windows directly. When you use Command + Tab, it allows you to cycle through your most recently used applications, but if an application is minimized, you cannot switch back to it using this shor

In [37]:
rag.save("rag.json")

Take a look at the prompts before vs after!
