<img src="../../docs/images/DSPy8.png" alt="DSPy7 Image" height="120"/>

### Multi-Agent DSPy Programs: Bootstrapping & Aggregating Multiple `ReAct` Agents

This is a quick (somewhat advanced) example of DSPy. You're given a hard QA task and an agent architecture (`dspy.ReAct`), how do you get high scores without tinkering with prompts?

There are many ways, but this notebook shows one complex strategy that DSPy makes near-trivial to achieve: we'll automatically bootstrap five different highly-effective prompts for ReAct, then optimize an aggregator that combines their powers.

As is usually the case with DSPy, the code to do this is probably shorter than describing it in English, so let's jump right into that.

### 0) TLDR.

We'll build a ReAct agent in DSPy that scores 30% accuracy on a retrieval-based question answering task.

Then, we'll optimize it with `BootstrapFewShotWithRandomSearch` to get 46% accuracy.

Then, we'll build a multi-agent aggregator over five different optimized versions of the agent.

Our unoptimized aggregator will score 26%. It doesn't understand the task. Hence, we'll optimize the aggregator too.

We'll end up with an optimized multi-agent system that scores a whopping 60% accuracy on the same task.

The core portion of the code to do this can be fit into 10 lines of DSPy, but we'll sprinkle some short explanations below.

### 1) Setting Up.

We'll configure the language model (GPT-3.5) and the retrieval model (ColBERTv2 over Wikipedia).

In [1]:
import dspy
from dspy.evaluate import Evaluate
from dspy.datasets.hotpotqa import HotPotQA
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

gpt3 = dspy.OpenAI('gpt-3.5-turbo-0125', max_tokens=1000)
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(lm=gpt3, rm=colbert)

  from .autonotebook import tqdm as notebook_tqdm


### 2) Loading some data.

We'll load 150 examples for training (`trainset`), 50 examples for validation & optimization (`valset`), and 300 examples for evaluation (`devset`).

In [2]:
dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=300, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train[0:150]]
valset = [x.with_inputs('question') for x in dataset.train[150:200]]
devset = [x.with_inputs('question') for x in dataset.dev]

# show an example datapoint; it's just a question-answer pair
trainset[0]

  table = cls._concat_blocks(blocks, axis=0)


Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

### 3) ReAct Agent.

Our agent will just be a DSPy ReAct agent that takes a `question` and outputs the `answer` by using a ColBERTv2 retrieval tool.

In [3]:
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

Let's evaluate this **unoptimized** ReAct agent on the `devset`.

In [5]:
# Set up an evaluator on the first 300 examples of the devset.
config = dict(num_threads=8, display_progress=True, display_table=5)
evaluate = Evaluate(devset=devset, metric=dspy.evaluate.answer_exact_match, **config)

evaluate(agent)



ValueError: not enough values to unpack (expected 2, got 1)

### 4) Optimized ReAct.

Let's use DSPy's simple `BootstrapFewShotWithRandomSearch` optimizer to create successful examples of the ReAct program and attempt to optimize the prompts using those constructed examples. In the future, we could try more sophisticated DSPy optimizers too, like `MIPRO`.

We'll bootstrap 20 programs that way. Examples will be bootstrapped starting from the `trainset` and optimized over our tiny `valset`. We'll evaluate later on the `devset`.

In [6]:
config = dict(max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=20, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **config)
optimized_react = tp.compile(agent, trainset=trainset, valset=valset)

Going to sample between 1 and 2 traces per predictor.
Will attempt to train 20 candidate sets.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Average Metric: 15.0 / 50  (30.0): 100%|██████████| 50/50 [00:21<00:00,  2.35it/s]


Average Metric: 15.0 / 50  (30.0%)
Score: 30.0 for set: [0, 0, 0, 0, 0]
New best score: 30.0 for seed -3
Scores so far: [30.0]
Best score: 30.0
Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Average Metric: 15.0 / 50  (30.0): 100%|██████████| 50/50 [00:00<00:00, 1628.27it/s]


Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)
Average Metric: 15.0 / 50  (30.0%)
Score: 30.0 for set: [0, 0, 0, 0, 0]
Scores so far: [30.0, 30.0]
Best score: 30.0



[A
[A
[A
[A
[A
  3%|▎         | 5/150 [00:17<08:28,  3.51s/it]


Bootstrapped 2 full traces after 6 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 2.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 8.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 29.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 5.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 10.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 2.3 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 27.3 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 16.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 27.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 4.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 15.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 39.9 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 25.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 44.2 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 41.4 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 9.3 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Average Metric: 25 / 50  (50.0): 100%|██████████| 50/50 [01:54<00:00,  2.28s/it]


Average Metric: 25 / 50  (50.0%)
Score: 50.0 for set: [2, 2, 1, 0, 0]
New best score: 50.0 for seed -1
Scores so far: [30.0, 30.0, 50.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.62
Average of max per entry across top 3 scores: 0.72
Average of max per entry across top 5 scores: 0.72
Average of max per entry across top 8 scores: 0.72
Average of max per entry across top 9999 scores: 0.72



[A
[A
[A
[A
  3%|▎         | 4/150 [00:12<07:26,  3.06s/it]


Bootstrapped 2 full traces after 5 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 5.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 13.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with k


[A
[A

Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 2.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 26.3 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 5.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 9.2 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A

Backing off 54.9 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [01:46<00:00,  2.12s/it]


Average Metric: 19 / 50  (38.0%)
Score: 38.0 for set: [2, 2, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.68
Average of max per entry across top 3 scores: 0.76
Average of max per entry across top 5 scores: 0.78
Average of max per entry across top 8 scores: 0.78
Average of max per entry across top 9999 scores: 0.78



[A
[A
[A
[A
  3%|▎         | 4/150 [00:15<09:31,  3.91s/it]


Bootstrapped 1 full traces after 5 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 6.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {'max_tokens': 500, 'n': 1, 'temperature': 0.0}
Backing off 4.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A

Backing off 1.3 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [00:45<00:00,  1.11it/s]


Average Metric: 20 / 50  (40.0%)
Score: 40.0 for set: [1, 1, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.78
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.9
Average of max per entry across top 8 scores: 0.9
Average of max per entry across top 9999 scores: 0.9



[A
  1%|▏         | 2/150 [00:01<02:15,  1.09it/s]


Bootstrapped 1 full traces after 3 examples in round 0.



[A
[A
[A

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A

Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 4.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 11.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 13.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 1.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 4.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 18.2 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 24.8 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 9.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 31.7 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 13.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 30.5 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A

Backing off 46.2 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A

Backing off 0.1 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 62.9 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 46.0 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Average Metric: 14 / 50  (28.0): 100%|██████████| 50/50 [02:05<00:00,  2.52s/it]


Average Metric: 14 / 50  (28.0%)
Score: 28.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.78
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.9
Average of max per entry across top 8 scores: 0.9
Average of max per entry across top 9999 scores: 0.9



[A
[A
[A
[A
  3%|▎         | 4/150 [00:13<08:12,  3.38s/it]


Bootstrapped 1 full traces after 5 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A

Error for example in dev set: 		 HTTPConnectionPool(host='20.102.90.50', port=2017): Read timed out. (read timeout=10)



[A
Average Metric: 22.0 / 50  (44.0): 100%|██████████| 50/50 [00:24<00:00,  2.01it/s]


Average Metric: 22.0 / 50  (44.0%)
Score: 44.0 for set: [1, 1, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.92
Average of max per entry across top 9999 scores: 0.92



[A
[A
[A
[A
[A
[A
  4%|▍         | 6/150 [00:16<06:35,  2.75s/it]


Bootstrapped 1 full traces after 7 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 4.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 9.6 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 27.1 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 47.6 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 13.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [01:55<00:00,  2.32s/it]


Average Metric: 18 / 50  (36.0%)
Score: 36.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.96
Average of max per entry across top 9999 scores: 0.96



[A
[A
[A
[A
[A
[A
[A

Failed to run or to evaluate example Example({'question': 'Are Princess Sumaya University for Technology and Tennessee Technological University from the same country?', 'answer': 'no'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).



  5%|▍         | 7/150 [00:24<08:24,  3.52s/it]


Bootstrapped 2 full traces after 8 examples in round 0.



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 4.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 20.6 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A

Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 11.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A

Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
Average Metric: 21 / 50  (42.0): 100%|██████████| 50/50 [00:57<00:00,  1.15s/it]


Average Metric: 21 / 50  (42.0%)
Score: 42.0 for set: [2, 2, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.96
Average of max per entry across top 9999 scores: 0.96



[A
[A
Average Metric: 41.0 / 167  (24.6):  56%|█████▌    | 167/300 [24:42<19:40,  8.88s/it] 

[A
[A

Failed to run or to evaluate example Example({'question': 'Michael Giacchino composed the scores to many films such as a 2015 computer-animated film produced by what studio?', 'answer': 'Pixar Animation Studios'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).



[A
[A
[A
[A
[A
  8%|▊         | 12/150 [00:37<07:15,  3.16s/it]


Bootstrapped 1 full traces after 13 examples in round 0.


Average Metric: 9 / 21  (42.9):  42%|████▏     | 21/50 [00:06<00:11,  2.59it/s]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 22  (40.9):  44%|████▍     | 22/50 [00:08<00:19,  1.47it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 23  (39.1):  46%|████▌     | 23/50 [00:09<00:19,  1.36it/s]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 24  (41.7):  48%|████▊     | 24/50 [00:11<00:29,  1.12s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 26  (38.5):  52%|█████▏    | 26/50 [00:12<00:19,  1.22it/s]

Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 28  (35.7):  56%|█████▌    | 28/50 [00:14<00:19,  1.13it/s]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kw

Average Metric: 10 / 29  (34.5):  58%|█████▊    | 29/50 [00:21<00:53,  2.55s/it]

Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 30  (33.3):  60%|██████    | 30/50 [00:21<00:39,  1.96s/it]

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 31  (32.3):  62%|██████▏   | 31/50 [00:22<00:32,  1.71s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 32  (31.2):  64%|██████▍   | 32/50 [00:26<00:39,  2.17s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 33  (30.3):  66%|██████▌   | 33/50 [00:26<00:28,  1.68s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 34  (29.4):  68%|██████▊   | 34/50 [00:27<00:24,  1.53s/it]

Backing off 3.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 35  (28.6):  70%|███████   | 35/50 [00:31<00:32,  2.20s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 36  (30.6):  72%|███████▏  | 36/50 [00:33<00:31,  2.27s/it]

Backing off 2.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 14.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 37  (32.4):  74%|███████▍  | 37/50 [00:35<00:27,  2.08s/it]

Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 39  (30.8):  78%|███████▊  | 39/50 [00:38<00:19,  1.82s/it]

Backing off 29.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 40  (30.0):  80%|████████  | 40/50 [00:42<00:23,  2.34s/it]

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 41  (29.3):  82%|████████▏ | 41/50 [00:44<00:20,  2.29s/it]

Backing off 5.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 43  (30.2):  86%|████████▌ | 43/50 [00:47<00:11,  1.71s/it]

Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 50  (34.0): 100%|██████████| 50/50 [01:08<00:00,  1.37s/it]


Average Metric: 17 / 50  (34.0%)
Score: 34.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  5%|▍         | 7/150 [00:16<05:28,  2.29s/it]


Bootstrapped 2 full traces after 8 examples in round 0.


Average Metric: 5 / 8  (62.5):  16%|█▌        | 8/50 [00:03<00:11,  3.63it/s] 

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 10  (60.0):  18%|█▊        | 9/50 [00:05<00:32,  1.28it/s]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 12  (50.0):  24%|██▍       | 12/50 [00:07<00:26,  1.44it/s]

Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 7 / 13  (53.8):  26%|██▌       | 13/50 [00:09<00:36,  1.02it/s]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 8 / 14  (57.1):  28%|██▊       | 14/50 [00:10<00:42,  1.19s/it]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 15  (60.0):  30%|███       | 15/50 [00:13<00:53,  1.52s/it]

Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 16  (56.2):  32%|███▏      | 16/50 [00:17<01:15,  2.22s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 17  (52.9):  34%|███▍      | 17/50 [00:18<01:05,  1.99s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 18  (55.6):  36%|███▌      | 18/50 [00:21<01:08,  2.13s/it]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 19  (52.6):  38%|███▊      | 19/50 [00:22<00:57,  1.86s/it]

Backing off 14.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 20  (50.0):  40%|████      | 20/50 [00:23<00:50,  1.68s/it]

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 22  (54.5):  44%|████▍     | 22/50 [00:26<00:41,  1.48s/it]

Backing off 4.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 23  (52.2):  46%|████▌     | 23/50 [00:28<00:46,  1.72s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 24  (50.0):  48%|████▊     | 24/50 [00:31<00:57,  2.21s/it]

Backing off 3.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.2 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 25  (48.0):  50%|█████     | 25/50 [00:34<00:55,  2.23s/it]

Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 14.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 26  (46.2):  52%|█████▏    | 26/50 [00:36<00:53,  2.24s/it]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 27  (44.4):  54%|█████▍    | 27/50 [00:37<00:45,  2.00s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 28  (42.9):  56%|█████▌    | 28/50 [00:39<00:44,  2.04s/it]

Backing off 21.7 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 29  (44.8):  58%|█████▊    | 29/50 [00:41<00:40,  1.92s/it]

Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 27.0 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 30  (43.3):  60%|██████    | 30/50 [00:43<00:41,  2.09s/it]

Backing off 0.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 32  (40.6):  64%|██████▍   | 32/50 [00:48<00:38,  2.15s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 14.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 34  (38.2):  68%|██████▊   | 34/50 [00:50<00:24,  1.51s/it]

Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 10.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 35  (40.0):  70%|███████   | 35/50 [00:53<00:30,  2.01s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 36  (38.9):  72%|███████▏  | 36/50 [00:58<00:41,  2.97s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 37  (40.5):  74%|███████▍  | 37/50 [01:00<00:32,  2.51s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 38  (39.5):  76%|███████▌  | 38/50 [01:02<00:30,  2.53s/it]

Backing off 6.4 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 39  (38.5):  78%|███████▊  | 39/50 [01:06<00:29,  2.72s/it]

Backing off 45.0 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 40  (37.5):  80%|████████  | 40/50 [01:08<00:26,  2.61s/it]

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 41  (39.0):  82%|████████▏ | 41/50 [01:12<00:27,  3.08s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 50.5 seconds after 8 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 42  (40.5):  84%|████████▍ | 42/50 [01:13<00:20,  2.51s/it]

Backing off 0.8 seconds after 8 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 43  (39.5):  86%|████████▌ | 43/50 [01:15<00:15,  2.18s/it]

Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 18 / 44  (40.9):  88%|████████▊ | 44/50 [01:15<00:10,  1.71s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 45  (42.2):  90%|█████████ | 45/50 [01:17<00:09,  1.81s/it]

Backing off 3.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 191.5 seconds after 9 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 21 / 50  (42.0): 100%|██████████| 50/50 [04:30<00:00,  5.42s/it]


Average Metric: 21 / 50  (42.0%)
Score: 42.0 for set: [2, 2, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  1%|          | 1/150 [00:00<00:00, 1775.74it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 22 / 50  (44.0): 100%|██████████| 50/50 [00:02<00:00, 24.89it/s]  


Average Metric: 22 / 50  (44.0%)
Score: 44.0 for set: [1, 1, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  0%|          | 0/150 [00:00<?, ?it/s]

Failed to run or to evaluate example Example({'question': 'Are Princess Sumaya University for Technology and Tennessee Technological University from the same country?', 'answer': 'no'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).


  5%|▍         | 7/150 [00:11<03:49,  1.61s/it]


Bootstrapped 2 full traces after 8 examples in round 0.


Average Metric: 13 / 36  (36.1):  72%|███████▏  | 36/50 [00:13<00:04,  3.38it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 37  (35.1):  74%|███████▍  | 37/50 [00:14<00:06,  2.01it/s]

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 38  (36.8):  76%|███████▌  | 38/50 [00:16<00:08,  1.46it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 39  (35.9):  78%|███████▊  | 39/50 [00:17<00:08,  1.23it/s]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 40  (35.0):  80%|████████  | 40/50 [00:19<00:11,  1.20s/it]

Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kw

Average Metric: 15 / 41  (36.6):  82%|████████▏ | 41/50 [00:22<00:15,  1.73s/it]

Backing off 7.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 42  (35.7):  84%|████████▍ | 42/50 [00:24<00:15,  1.88s/it]

Backing off 0.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 43  (37.2):  86%|████████▌ | 43/50 [00:26<00:12,  1.74s/it]

Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 9.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 44  (36.4):  88%|████████▊ | 44/50 [00:27<00:10,  1.72s/it]

Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 47  (36.2):  94%|█████████▍| 47/50 [00:32<00:04,  1.49s/it]

Backing off 10.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [00:44<00:00,  1.11it/s]


Average Metric: 19 / 50  (38.0%)
Score: 38.0 for set: [2, 2, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  1%|▏         | 2/150 [00:07<09:27,  3.83s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 4 / 6  (66.7):  10%|█         | 5/50 [00:03<00:25,  1.78it/s] 

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 9  (66.7):  18%|█▊        | 9/50 [00:06<00:30,  1.35it/s]

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 10  (60.0):  20%|██        | 10/50 [00:08<00:37,  1.06it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 11  (54.5):  22%|██▏       | 11/50 [00:08<00:34,  1.13it/s]

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 7 / 12  (58.3):  24%|██▍       | 12/50 [00:11<00:49,  1.31s/it]

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 7 / 13  (53.8):  26%|██▌       | 13/50 [00:12<00:50,  1.35s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 8 / 14  (57.1):  28%|██▊       | 14/50 [00:14<00:50,  1.42s/it]

Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 8 / 15  (53.3):  30%|███       | 15/50 [00:16<00:54,  1.57s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 16  (56.2):  32%|███▏      | 16/50 [00:18<01:01,  1.80s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 17  (52.9):  34%|███▍      | 17/50 [00:19<00:49,  1.51s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 18  (55.6):  36%|███▌      | 18/50 [00:21<00:57,  1.78s/it]

Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 20  (55.0):  40%|████      | 20/50 [00:24<00:50,  1.69s/it]

Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 21  (52.4):  42%|████▏     | 21/50 [00:26<00:51,  1.76s/it]

Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 15.4 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 22  (50.0):  44%|████▍     | 22/50 [00:28<00:50,  1.79s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 23  (47.8):  46%|████▌     | 23/50 [00:30<00:49,  1.84s/it]

Backing off 2.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 8.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 24  (45.8):  48%|████▊     | 24/50 [00:33<00:57,  2.21s/it]

Backing off 7.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 25  (48.0):  50%|█████     | 25/50 [00:36<01:00,  2.42s/it]

Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 26  (46.2):  52%|█████▏    | 26/50 [00:39<01:02,  2.60s/it]

Backing off 30.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 27  (44.4):  54%|█████▍    | 27/50 [00:40<00:46,  2.00s/it]

Backing off 2.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 31.5 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 29  (44.8):  58%|█████▊    | 29/50 [00:44<00:42,  2.05s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 14.2 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 30  (46.7):  60%|██████    | 30/50 [00:47<00:43,  2.18s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 32  (46.9):  64%|██████▍   | 32/50 [00:50<00:33,  1.88s/it]

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 33  (45.5):  66%|██████▌   | 33/50 [00:51<00:30,  1.80s/it]

Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 34  (44.1):  68%|██████▊   | 34/50 [00:54<00:34,  2.13s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 35  (45.7):  70%|███████   | 35/50 [00:57<00:35,  2.35s/it]

Backing off 1.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 36  (44.4):  72%|███████▏  | 36/50 [00:58<00:24,  1.74s/it]

Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 37  (43.2):  74%|███████▍  | 37/50 [01:00<00:26,  2.03s/it]

Backing off 2.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 11.9 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 21.7 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 38  (42.1):  76%|███████▌  | 38/50 [01:01<00:21,  1.78s/it]

Backing off 11.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 40  (40.0):  80%|████████  | 40/50 [01:05<00:17,  1.70s/it]

Backing off 3.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 43  (39.5):  86%|████████▌ | 43/50 [01:11<00:12,  1.79s/it]

Backing off 57.7 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 45  (42.2):  90%|█████████ | 45/50 [01:14<00:09,  1.81s/it]

Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 47  (40.4):  94%|█████████▍| 47/50 [01:16<00:04,  1.35s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [02:16<00:00,  2.74s/it]


Average Metric: 20 / 50  (40.0%)
Score: 40.0 for set: [1, 1, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  2%|▏         | 3/150 [00:06<05:19,  2.18s/it]

Failed to run or to evaluate example Example({'question': 'Are Princess Sumaya University for Technology and Tennessee Technological University from the same country?', 'answer': 'no'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).


  7%|▋         | 10/150 [00:16<03:27,  1.48s/it]

Failed to run or to evaluate example Example({'question': 'Michael Giacchino composed the scores to many films such as a 2015 computer-animated film produced by what studio?', 'answer': 'Pixar Animation Studios'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).


  8%|▊         | 12/150 [00:18<03:31,  1.53s/it]


Bootstrapped 2 full traces after 13 examples in round 0.


Average Metric: 12 / 50  (24.0): 100%|██████████| 50/50 [00:29<00:00,  1.67it/s]


Average Metric: 12 / 50  (24.0%)
Score: 24.0 for set: [2, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 0.98


  1%|▏         | 2/150 [00:09<10:44,  4.35s/it]

Failed to run or to evaluate example Example({'question': 'Are Princess Sumaya University for Technology and Tennessee Technological University from the same country?', 'answer': 'no'}) (input_keys={'question'}) with <function answer_exact_match at 0x76ac97b17a60> due to not enough values to unpack (expected 2, got 1).


 11%|█▏        | 17/150 [00:33<04:24,  1.99s/it]


Bootstrapped 2 full traces after 18 examples in round 0.


Average Metric: 8 / 19  (42.1):  38%|███▊      | 19/50 [00:06<00:12,  2.40it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 21  (42.9):  42%|████▏     | 21/50 [00:09<00:21,  1.33it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 23  (43.5):  46%|████▌     | 23/50 [00:11<00:21,  1.23it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 24  (41.7):  48%|████▊     | 24/50 [00:13<00:34,  1.32s/it]

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 25  (40.0):  50%|█████     | 25/50 [00:16<00:41,  1.65s/it]

Backing off 2.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 26  (38.5):  52%|█████▏    | 26/50 [00:18<00:43,  1.80s/it]

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 27  (37.0):  54%|█████▍    | 27/50 [00:22<00:52,  2.30s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 28  (35.7):  56%|█████▌    | 28/50 [00:23<00:47,  2.18s/it]

Backing off 1.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 24.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 29  (34.5):  58%|█████▊    | 29/50 [00:25<00:44,  2.13s/it]

Backing off 3.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 30  (36.7):  60%|██████    | 30/50 [00:28<00:42,  2.13s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 31  (38.7):  62%|██████▏   | 31/50 [00:31<00:45,  2.41s/it]

Backing off 2.1 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 32  (37.5):  64%|██████▍   | 32/50 [00:33<00:42,  2.37s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 33  (39.4):  66%|██████▌   | 33/50 [00:34<00:35,  2.11s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 34  (38.2):  68%|██████▊   | 34/50 [00:35<00:28,  1.79s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 35  (37.1):  70%|███████   | 35/50 [00:39<00:33,  2.24s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 10.1 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 31.9 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 37  (35.1):  74%|███████▍  | 37/50 [00:43<00:26,  2.01s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.5 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 38  (36.8):  76%|███████▌  | 38/50 [00:48<00:33,  2.83s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 39  (38.5):  78%|███████▊  | 39/50 [00:49<00:26,  2.40s/it]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 31.0 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 40  (37.5):  80%|████████  | 40/50 [00:52<00:26,  2.63s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 41  (39.0):  82%|████████▏ | 41/50 [00:53<00:19,  2.16s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 42  (38.1):  84%|████████▍ | 42/50 [00:56<00:20,  2.52s/it]

Backing off 36.1 seconds after 8 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 43  (37.2):  86%|████████▌ | 43/50 [00:58<00:15,  2.19s/it]

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [01:34<00:00,  1.90s/it]


Average Metric: 19 / 50  (38.0%)
Score: 38.0 for set: [2, 2, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  5%|▌         | 8/150 [00:09<02:47,  1.18s/it]


Bootstrapped 2 full traces after 9 examples in round 0.


Average Metric: 11 / 26  (42.3):  52%|█████▏    | 26/50 [00:08<00:10,  2.38it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 28  (39.3):  56%|█████▌    | 28/50 [00:10<00:12,  1.77it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 29  (37.9):  58%|█████▊    | 29/50 [00:12<00:23,  1.13s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 30  (36.7):  60%|██████    | 30/50 [00:14<00:23,  1.18s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 31  (35.5):  62%|██████▏   | 31/50 [00:15<00:23,  1.22s/it]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kw

Average Metric: 12 / 32  (37.5):  64%|██████▍   | 32/50 [00:20<00:41,  2.30s/it]

Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 33  (36.4):  66%|██████▌   | 33/50 [00:23<00:41,  2.43s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 34  (35.3):  68%|██████▊   | 34/50 [00:27<00:47,  2.94s/it]

Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 35  (34.3):  70%|███████   | 35/50 [00:27<00:34,  2.27s/it]

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 36  (36.1):  72%|███████▏  | 36/50 [00:30<00:31,  2.28s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 38  (36.8):  76%|███████▌  | 38/50 [00:32<00:19,  1.65s/it]

Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 39  (35.9):  78%|███████▊  | 39/50 [00:33<00:16,  1.47s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 41  (34.1):  82%|████████▏ | 41/50 [00:35<00:10,  1.22s/it]

Backing off 0.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 42  (35.7):  84%|████████▍ | 42/50 [00:40<00:17,  2.20s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 43  (37.2):  86%|████████▌ | 43/50 [00:41<00:13,  1.98s/it]

Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 47  (36.2):  94%|█████████▍| 47/50 [00:46<00:03,  1.28s/it]

Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [00:52<00:00,  1.06s/it]


Average Metric: 18 / 50  (36.0%)
Score: 36.0 for set: [2, 2, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  1%|▏         | 2/150 [00:05<06:40,  2.71s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 22 / 50  (44.0): 100%|██████████| 50/50 [00:00<00:00, 4995.48it/s]


Average Metric: 22 / 50  (44.0%)
Score: 44.0 for set: [1, 1, 1, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  1%|          | 1/150 [00:00<00:00, 1997.29it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 0 / 1  (0.0):   0%|          | 0/50 [00:01<?, ?it/s]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 0 / 2  (0.0):   4%|▍         | 2/50 [00:02<00:49,  1.02s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 2 / 4  (50.0):   6%|▌         | 3/50 [00:02<00:41,  1.14it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 4 / 6  (66.7):  12%|█▏        | 6/50 [00:04<00:24,  1.76it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 4 / 7  (57.1):  14%|█▍        | 7/50 [00:04<00:26,  1.65it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 4 / 9  (44.4):  18%|█▊        | 9/50 [00:05<00:22,  1.79it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 4 / 10  (40.0):  20%|██        | 10/50 [00:06<00:23,  1.71it/s]

Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 4 / 11  (36.4):  22%|██▏       | 11/50 [00:07<00:30,  1.28it/s]

Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 5 / 14  (35.7):  26%|██▌       | 13/50 [00:09<00:29,  1.25it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 15  (40.0):  30%|███       | 15/50 [00:10<00:28,  1.24it/s]

Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 16  (37.5):  32%|███▏      | 16/50 [00:11<00:22,  1.51it/s]

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 5.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 7 / 18  (38.9):  36%|███▌      | 18/50 [00:13<00:24,  1.30it/s]

Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 8 / 19  (42.1):  38%|███▊      | 19/50 [00:14<00:25,  1.19it/s]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 9 / 21  (42.9):  42%|████▏     | 21/50 [00:16<00:25,  1.13it/s]

Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 24  (41.7):  48%|████▊     | 24/50 [00:18<00:20,  1.27it/s]

Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 6.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 8.0 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 25  (44.0):  50%|█████     | 25/50 [00:22<00:45,  1.82s/it]

Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 27  (40.7):  54%|█████▍    | 27/50 [00:24<00:29,  1.28s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 29  (37.9):  58%|█████▊    | 29/50 [00:26<00:24,  1.19s/it]

Backing off 2.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 30  (36.7):  60%|██████    | 30/50 [00:26<00:18,  1.08it/s]

Backing off 3.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 32  (37.5):  64%|██████▍   | 32/50 [00:28<00:15,  1.18it/s]

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 35  (34.3):  70%|███████   | 35/50 [00:31<00:14,  1.03it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 17.9 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 13.1 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 36  (33.3):  72%|███████▏  | 36/50 [00:34<00:21,  1.53s/it]

Backing off 36.4 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 38  (34.2):  74%|███████▍  | 37/50 [00:35<00:19,  1.48s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 39  (33.3):  78%|███████▊  | 39/50 [00:37<00:12,  1.16s/it]

Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 42  (35.7):  84%|████████▍ | 42/50 [00:41<00:09,  1.24s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 44  (36.4):  88%|████████▊ | 44/50 [00:43<00:07,  1.24s/it]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 46  (37.0):  92%|█████████▏| 46/50 [00:47<00:06,  1.58s/it]

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 18 / 47  (38.3):  94%|█████████▍| 47/50 [00:49<00:04,  1.63s/it]

Backing off 1.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [01:12<00:00,  1.44s/it]


Average Metric: 19 / 50  (38.0%)
Score: 38.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0, 38.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  3%|▎         | 4/150 [00:12<07:27,  3.07s/it]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 10 / 18  (55.6):  34%|███▍      | 17/50 [00:05<00:15,  2.17it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 19  (52.6):  38%|███▊      | 19/50 [00:06<00:13,  2.23it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 20  (55.0):  40%|████      | 20/50 [00:07<00:15,  1.94it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 21  (52.4):  42%|████▏     | 21/50 [00:08<00:15,  1.87it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 22  (50.0):  42%|████▏     | 21/50 [00:09<00:15,  1.87it/s]

Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 11 / 22  (50.0):  44%|████▍     | 22/50 [00:09<00:24,  1.16it/s]

Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 23  (52.2):  46%|████▌     | 23/50 [00:10<00:24,  1.10it/s]

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 24  (50.0):  48%|████▊     | 24/50 [00:12<00:28,  1.09s/it]

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 25  (48.0):  50%|█████     | 25/50 [00:13<00:23,  1.07it/s]

Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 26  (46.2):  52%|█████▏    | 26/50 [00:14<00:27,  1.14s/it]

Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 27  (44.4):  54%|█████▍    | 27/50 [00:15<00:22,  1.03it/s]

Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 28  (42.9):  56%|█████▌    | 28/50 [00:17<00:26,  1.22s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.2 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 29  (41.4):  58%|█████▊    | 29/50 [00:18<00:26,  1.25s/it]

Backing off 3.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 30  (40.0):  60%|██████    | 30/50 [00:19<00:24,  1.25s/it]

Backing off 7.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.5 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 31  (41.9):  62%|██████▏   | 31/50 [00:22<00:34,  1.82s/it]

Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 11.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 33  (39.4):  66%|██████▌   | 33/50 [00:25<00:27,  1.60s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 34  (41.2):  68%|██████▊   | 34/50 [00:26<00:24,  1.50s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 35  (40.0):  70%|███████   | 35/50 [00:28<00:21,  1.46s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 36  (38.9):  72%|███████▏  | 36/50 [00:29<00:19,  1.42s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 9.9 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kw

Average Metric: 14 / 37  (37.8):  74%|███████▍  | 37/50 [00:39<00:52,  4.07s/it]

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 38  (36.8):  76%|███████▌  | 38/50 [00:43<00:47,  3.98s/it]

Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 39  (35.9):  78%|███████▊  | 39/50 [00:45<00:35,  3.20s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 40  (35.0):  80%|████████  | 40/50 [00:46<00:27,  2.73s/it]

Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 46.3 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 41  (34.1):  82%|████████▏ | 41/50 [00:47<00:20,  2.25s/it]

Backing off 4.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 42  (35.7):  84%|████████▍ | 42/50 [00:49<00:15,  1.97s/it]

Backing off 6.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 50  (30.0): 100%|██████████| 50/50 [01:34<00:00,  1.90s/it]


Average Metric: 15 / 50  (30.0%)
Score: 30.0 for set: [2, 2, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0, 38.0, 30.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  3%|▎         | 4/150 [00:13<08:05,  3.33s/it]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 12 / 23  (52.2):  46%|████▌     | 23/50 [00:10<00:14,  1.83it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 24  (50.0):  48%|████▊     | 24/50 [00:16<00:57,  2.23s/it]

Backing off 0.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 25  (48.0):  50%|█████     | 25/50 [00:21<01:12,  2.92s/it]

Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kw

Average Metric: 12.0 / 26  (46.2):  52%|█████▏    | 26/50 [00:29<01:51,  4.66s/it]

Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.6 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12.0 / 28  (42.9):  56%|█████▌    | 28/50 [00:33<01:10,  3.21s/it]

Backing off 3.5 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12.0 / 29  (41.4):  58%|█████▊    | 29/50 [00:35<01:01,  2.93s/it]

Backing off 1.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12.0 / 30  (40.0):  60%|██████    | 30/50 [00:37<00:51,  2.57s/it]

Backing off 10.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12.0 / 31  (38.7):  62%|██████▏   | 31/50 [00:38<00:37,  1.99s/it]

Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 8.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 11.6 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13.0 / 32  (40.6):  64%|██████▍   | 32/50 [00:43<00:54,  3.04s/it]

Backing off 7.9 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13.0 / 34  (38.2):  68%|██████▊   | 34/50 [00:47<00:38,  2.40s/it]

Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 23.2 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 29.6 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 12.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13.0 / 35  (37.1):  70%|███████   | 35/50 [00:55<01:02,  4.13s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.9 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13.0 / 36  (36.1):  72%|███████▏  | 36/50 [01:01<01:03,  4.55s/it]

Error for example in dev set: 		 not enough values to unpack (expected 2, got 1)


Average Metric: 14.0 / 37  (37.8):  74%|███████▍  | 37/50 [01:02<00:47,  3.62s/it]

Backing off 19.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14.0 / 38  (36.8):  76%|███████▌  | 38/50 [01:05<00:40,  3.40s/it]

Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 23.4 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14.0 / 39  (35.9):  78%|███████▊  | 39/50 [01:12<00:50,  4.58s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15.0 / 40  (37.5):  80%|████████  | 40/50 [01:15<00:39,  3.99s/it]

Backing off 20.1 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16.0 / 41  (39.0):  82%|████████▏ | 41/50 [01:18<00:33,  3.73s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16.0 / 42  (38.1):  84%|████████▍ | 42/50 [01:19<00:24,  3.05s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16.0 / 44  (36.4):  88%|████████▊ | 44/50 [01:24<00:15,  2.59s/it]

Backing off 44.4 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 56.8 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20.0 / 50  (40.0): 100%|██████████| 50/50 [02:22<00:00,  2.86s/it]


Average Metric: 20.0 / 50  (40.0%)
Score: 40.0 for set: [2, 1, 1, 1, 1]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0, 38.0, 30.0, 40.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  1%|▏         | 2/150 [00:00<00:00, 449.96it/s]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 14 / 46  (30.4):  92%|█████████▏| 46/50 [00:17<00:02,  1.82it/s]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 47  (31.9):  94%|█████████▍| 47/50 [00:19<00:02,  1.04it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 15 / 50  (30.0): 100%|██████████| 50/50 [00:30<00:00,  1.66it/s]


Average Metric: 15 / 50  (30.0%)
Score: 30.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0, 38.0, 30.0, 40.0, 30.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0


  2%|▏         | 3/150 [00:07<06:20,  2.59s/it]


Bootstrapped 1 full traces after 4 examples in round 0.


Average Metric: 2 / 4  (50.0):   8%|▊         | 4/50 [00:02<00:21,  2.12it/s] 

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 9  (66.7):  16%|█▌        | 8/50 [00:04<00:18,  2.22it/s]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 10  (60.0):  20%|██        | 10/50 [00:04<00:17,  2.32it/s]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 11  (54.5):  22%|██▏       | 11/50 [00:05<00:17,  2.17it/s]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 6 / 12  (50.0):  24%|██▍       | 12/50 [00:06<00:20,  1.89it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 7 / 13  (53.8):  26%|██▌       | 13/50 [00:07<00:28,  1.29it/s]

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 8 / 14  (57.1):  28%|██▊       | 14/50 [00:09<00:32,  1.11it/s]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 10 / 17  (58.8):  34%|███▍      | 17/50 [00:10<00:22,  1.44it/s]

Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 7.4 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 19  (63.2):  38%|███▊      | 19/50 [00:12<00:22,  1.41it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 20  (60.0):  40%|████      | 20/50 [00:13<00:28,  1.05it/s]

Backing off 3.3 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 21  (57.1):  42%|████▏     | 21/50 [00:14<00:28,  1.02it/s]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 22  (54.5):  44%|████▍     | 22/50 [00:16<00:32,  1.16s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 4.1 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 12 / 24  (50.0):  48%|████▊     | 24/50 [00:19<00:34,  1.32s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 10.0 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 15.3 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 26  (50.0):  52%|█████▏    | 26/50 [00:23<00:37,  1.57s/it]

Backing off 6.7 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 27  (48.1):  54%|█████▍    | 27/50 [00:24<00:30,  1.34s/it]

Backing off 9.9 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 3.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 13 / 29  (44.8):  58%|█████▊    | 29/50 [00:25<00:21,  1.05s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 31  (45.2):  62%|██████▏   | 31/50 [00:28<00:20,  1.07s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.1 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 32  (43.8):  64%|██████▍   | 32/50 [00:32<00:38,  2.12s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 34  (41.2):  68%|██████▊   | 34/50 [00:34<00:22,  1.44s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 35  (40.0):  70%|███████   | 35/50 [00:36<00:23,  1.55s/it]

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 42.2 seconds after 7 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 2.0 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 29.0 seconds after 6 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 14 / 36  (38.9):  72%|███████▏  | 36/50 [00:39<00:29,  2.08s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 38  (42.1):  76%|███████▌  | 38/50 [00:40<00:16,  1.35s/it]

Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 11.2 seconds after 5 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 16 / 39  (41.0):  78%|███████▊  | 39/50 [00:42<00:16,  1.49s/it]

Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 17 / 40  (42.5):  80%|████████  | 40/50 [00:44<00:17,  1.71s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 18 / 42  (42.9):  84%|████████▍ | 42/50 [00:46<00:11,  1.39s/it]

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 19 / 43  (44.2):  86%|████████▌ | 43/50 [00:47<00:08,  1.16s/it]

Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20 / 44  (45.5):  88%|████████▊ | 44/50 [00:50<00:09,  1.65s/it]

Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20 / 45  (44.4):  90%|█████████ | 45/50 [00:53<00:11,  2.24s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20 / 46  (43.5):  92%|█████████▏| 46/50 [00:55<00:08,  2.18s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x76aceb702700> with kwargs {}


Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [01:20<00:00,  1.61s/it]

Average Metric: 20 / 50  (40.0%)
Score: 40.0 for set: [1, 1, 0, 0, 0]
Scores so far: [30.0, 30.0, 50.0, 38.0, 40.0, 28.0, 44.0, 36.0, 42.0, 34.0, 42.0, 44.0, 38.0, 40.0, 24.0, 38.0, 36.0, 44.0, 38.0, 30.0, 40.0, 30.0, 40.0]
Best score: 50.0
Average of max per entry across top 1 scores: 0.5
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.86
Average of max per entry across top 5 scores: 0.94
Average of max per entry across top 8 scores: 0.98
Average of max per entry across top 9999 scores: 1.0
23 candidate programs found.





In [None]:
evaluate(optimized_react)

### 5) Zero-Shot Aggregator.

Let's now extract the best five bootstrapped ReAct programs. We'll build a simple DSPy aggregator that runs all of them then produces a final answer.

In [None]:
from dsp.utils import flatten, deduplicate

# the best-performing five ReAct programs from the optimization process
AGENTS = [x[-1] for x in optimized_react.candidate_programs[:5]]

class Aggregator(dspy.Module):
	def __init__(self, temperature=0.0):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')
		self.temperature = temperature

	def forward(self, question):
		# Run all five agents with high temperature, then extract and deduplicate their observed contexts
		with dspy.context(lm=gpt3.copy(temperature=self.temperature)):
			preds = [agent(question=question) for agent in AGENTS]
			context = deduplicate(flatten([flatten(p.observations) for p in preds]))

		# Run the aggregation step to produce a final answer
		return self.aggregate(context=context, question=question)

Let's quickly evaluate the aggregator prior to optimization.

In [None]:
aggregator = Aggregator()
evaluate(aggregator)

### 6) Optimized Aggregator.

In [None]:
kwargs = dict(max_bootstrapped_demos=2, max_labeled_demos=6, num_candidate_programs=10, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **kwargs)
optimized_aggregator = tp.compile(aggregator, trainset=trainset, valset=valset)

In [None]:
optimized_aggregator2 = optimized_aggregator.deepcopy()
optimized_aggregator2.temperature = 0.7

evaluate(optimized_aggregator2)

### 7) Conclusion.

Normally, we like to release notebooks with pre-computed caches and to inspect the prompts with `gpt3.inspect_history` to explore the behavior of optimization. See the intro notebook (or any of the Colab notebooks on the README) for such annotated examples!

To keep the current release super quick, Omar will extend this notebook into an annotated version if there's significant interest.

### 8) Post-Conclusion Note.

With a little bit of syntactic sugar, the main code in this notebook could be as short as 10 lines excluding whitespace:

```python
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

optimizer = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match)
optimized_react = optimizer.compile(agent, trainset=trainset, valset=valset)

class Aggregator(dspy.Module):
	def __init__(self):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')

	def forward(self, question):
        preds = [agent(question=question) for agent in optimized_react.best_programs[:5]]
		return self.aggregate(context=deduplicate(flatten([p.observations for p in preds])), question=question)
	
optimized_aggregator = optimizer.compile(aggregator, trainset=trainset, valset=valset)

# Use it!
optimized_aggregator(question="How many storeys are in the castle that David Gregory inherited?")
```