# Optimization Code

In this sandbox, I optimize the prompts for the LLM chain using Dspy


## Settings

This import all the packages, separate functions and sets the LLM to the chain defined in the main module.

In [5]:
import groq
import os
from dotenv import load_dotenv
import dspy
import numpy as np
from dspy.evaluate.metrics import answer_exact_match
import pandas as pd
from IPython.core.display import Markdown
from dspy import Example
from dspy.teleprompt import BootstrapFewShot
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
from dspy.evaluate import Evaluate
from module_v002 import FullLLMChain
from optimize import passage_similarity_metric, custom_evaluation_function, similar_score_metric, evaluate_expectations_metric
from data.preprocess import create_dspy_examples_train_test_validation_sets
import json
import pandas as pd

load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")


llama3_8b = dspy.OllamaLocal(model = "llama3:8b",
                             temperature = 0,
                             max_tokens = 800)

gpt35turbo = dspy.OpenAI(model = "gpt-3.5-turbo-0125",
                         api_key = openai_api_key,
                         temperature = 0,
                         max_tokens = 800,
                         model_type = "chat")    


# Preprocess Data

In [6]:
data = pd.read_excel("data/300_snippets_transcripts_all_labeled_v002.xlsx")

In [7]:
train_set, test_set, validation_set = create_dspy_examples_train_test_validation_sets(
    data=data, 
    train_size=50, 
    test_size=50,
    validation_size=50
)

## Optimization

This optimizes the prompts and saves the optimized LLM as well as the last 10 instances prompted to the LLM

In [8]:
dspy.settings.configure(lm=gpt35turbo)
full_llm_chain = FullLLMChain()

In [9]:
config = dict(max_bootstrapped_demos=2, max_labeled_demos=1, max_rounds=1, max_errors=1, num_candidate_programs = 4)
teleprompter = BootstrapFewShotWithRandomSearch(metric=evaluate_expectations_metric, **config)
optimized_llm = teleprompter.compile(full_llm_chain, trainset=train_set, valset=test_set)

Going to sample between 1 and 2 traces per predictor.
Will attempt to train 4 candidate sets.


Average Metric: 22.0 / 50  (44.0): 100%|██████████| 50/50 [00:24<00:00,  2.03it/s]


Average Metric: 22.0 / 50  (44.0%)
Score: 44.0 for set: [0, 0, 0]
New best score: 44.0 for seed -3
Scores so far: [44.0]
Best score: 44.0


Average Metric: 21.0 / 47  (44.7):  94%|█████████▍| 47/50 [00:24<00:02,  1.36it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000861588493462}


Average Metric: 23.0 / 50  (46.0): 100%|██████████| 50/50 [00:25<00:00,  1.95it/s]


Average Metric: 23.0 / 50  (46.0%)
Score: 46.0 for set: [1, 1, 1]
New best score: 46.0 for seed -2
Scores so far: [44.0, 46.0]
Best score: 46.0


 10%|█         | 5/50 [00:12<01:55,  2.56s/it]


Bootstrapped 2 full traces after 6 examples in round 0.


Average Metric: 1.5 / 5  (30.0):  10%|█         | 5/50 [00:02<00:21,  2.10it/s]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000830910832171}


Average Metric: 2.5 / 7  (35.7):  12%|█▏        | 6/50 [00:04<00:33,  1.33it/s]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000046818144722}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000443977300059}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000918079761941}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000830910832171}


Average Metric: 3.5 / 8  (43.8):  16%|█▌        | 8/50 [00:05<00:31,  1.34it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000573580416064}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000443977300059}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000918079761941}


Average Metric: 3.5 / 9  (38.9):  18%|█▊        | 9/50 [00:08<00:52,  1.28s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000046818144722}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000573580416064}


Average Metric: 4.5 / 10  (45.0):  20%|██        | 10/50 [00:10<00:58,  1.47s/it]

Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000918079761941}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700075073737791}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000046818144722}


Average Metric: 5.0 / 12  (41.7):  24%|██▍       | 12/50 [00:12<00:48,  1.28s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}


Average Metric: 6.0 / 13  (46.2):  26%|██▌       | 13/50 [00:13<00:44,  1.20s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 4.8 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000918079761941}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000002154634184}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}


Average Metric: 6.0 / 14  (42.9):  28%|██▊       | 14/50 [00:16<01:02,  1.73s/it]

Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}


Average Metric: 6.0 / 15  (40.0):  30%|███       | 15/50 [00:18<01:01,  1.76s/it]

Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000002154634184}


Average Metric: 6.5 / 16  (40.6):  32%|███▏      | 16/50 [00:19<00:50,  1.50s/it]

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000218666568226}


Average Metric: 6.5 / 17  (38.2):  34%|███▍      | 17/50 [00:21<00:53,  1.61s/it]

Backing off 7.0 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 0.9 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000002154634184}


Average Metric: 8.5 / 19  (44.7):  38%|███▊      | 19/50 [00:23<00:42,  1.38s/it]

Backing off 1.6 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}


Average Metric: 9.5 / 20  (47.5):  40%|████      | 20/50 [00:24<00:39,  1.31s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000147956470134}


Average Metric: 10.5 / 21  (50.0):  42%|████▏     | 21/50 [00:26<00:40,  1.38s/it]

Backing off 5.8 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000002154634184}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000147956470134}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}


Average Metric: 11.5 / 23  (50.0):  46%|████▌     | 23/50 [00:30<00:43,  1.63s/it]

Backing off 5.8 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000147956470134}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000781303796165}


Average Metric: 12.5 / 24  (52.1):  48%|████▊     | 24/50 [00:35<01:06,  2.55s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}


Average Metric: 13.0 / 25  (52.0):  50%|█████     | 25/50 [00:36<00:54,  2.20s/it]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}


Average Metric: 13.0 / 26  (50.0):  52%|█████▏    | 26/50 [00:37<00:46,  1.95s/it]

Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}
Backing off 4.1 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000867486203216}
Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000058123062843}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}


Average Metric: 14.0 / 27  (51.9):  54%|█████▍    | 27/50 [00:39<00:43,  1.89s/it]

Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}


Average Metric: 15.0 / 28  (53.6):  56%|█████▌    | 28/50 [00:41<00:38,  1.76s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000951439797125}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000372896251305}
Backing off 3.1 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000508214286507}


Average Metric: 15.0 / 29  (51.7):  58%|█████▊    | 29/50 [00:43<00:38,  1.84s/it]

Backing off 5.2 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}
Backing off 37.8 seconds after 7 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000731129226365}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000372896251305}


Average Metric: 15.5 / 30  (51.7):  60%|██████    | 30/50 [00:45<00:37,  1.88s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000653312585743}


Average Metric: 15.5 / 31  (50.0):  62%|██████▏   | 31/50 [00:48<00:43,  2.30s/it]

Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000653312585743}


Average Metric: 15.5 / 32  (48.4):  64%|██████▍   | 32/50 [00:49<00:35,  1.97s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000372896251305}
Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000628863505403}
Backing off 4.2 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}


Average Metric: 17.0 / 34  (50.0):  68%|██████▊   | 34/50 [00:51<00:24,  1.55s/it]

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000628863505403}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000372896251305}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000871636345122}


Average Metric: 18.0 / 36  (50.0):  72%|███████▏  | 36/50 [00:55<00:21,  1.51s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000747210817275}
Backing off 4.5 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}


Average Metric: 18.0 / 37  (48.6):  74%|███████▍  | 37/50 [00:57<00:22,  1.72s/it]

Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000628863505403}


Average Metric: 18.0 / 38  (47.4):  76%|███████▌  | 38/50 [00:58<00:17,  1.44s/it]

Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000747210817275}


Average Metric: 19.0 / 40  (47.5):  80%|████████  | 40/50 [01:00<00:13,  1.32s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000622048499311}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000429107718127}


Average Metric: 19.0 / 42  (45.2):  84%|████████▍ | 42/50 [01:03<00:10,  1.33s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000959745507211}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000429107718127}
Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000628863505403}
Backing off 62.0 seconds after 7 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000161182678746}


Average Metric: 20.5 / 44  (46.6):  88%|████████▊ | 44/50 [01:06<00:08,  1.39s/it]

Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000429107718127}


Average Metric: 26.5 / 50  (53.0): 100%|██████████| 50/50 [02:07<00:00,  2.55s/it]


Average Metric: 26.5 / 50  (53.0%)
Score: 53.0 for set: [2, 2, 1]
New best score: 53.0 for seed -1
Scores so far: [44.0, 46.0, 53.0]
Best score: 53.0
Average of max per entry across top 1 scores: 0.53
Average of max per entry across top 2 scores: 0.69
Average of max per entry across top 3 scores: 0.74
Average of max per entry across top 5 scores: 0.74
Average of max per entry across top 8 scores: 0.74
Average of max per entry across top 9999 scores: 0.74


  6%|▌         | 3/50 [00:10<02:46,  3.55s/it]


Bootstrapped 2 full traces after 4 examples in round 0.


Average Metric: 12.0 / 29  (41.4):  58%|█████▊    | 29/50 [00:16<00:08,  2.35it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000841651670915}


Average Metric: 15.0 / 34  (44.1):  68%|██████▊   | 34/50 [00:21<00:12,  1.24it/s]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000689378198371}
Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000767384656629}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000367853147456}


Average Metric: 16.0 / 35  (45.7):  70%|███████   | 35/50 [00:22<00:15,  1.05s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700035822123253}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000689378198371}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000899410502422}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000367853147456}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000767384656629}


Average Metric: 16.0 / 36  (44.4):  72%|███████▏  | 36/50 [00:26<00:24,  1.72s/it]

Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000689378198371}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000767384656629}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000040916634462}


Average Metric: 17.0 / 37  (45.9):  74%|███████▍  | 37/50 [00:29<00:29,  2.24s/it]

Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000367853147456}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000040916634462}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000551245177778}


Average Metric: 18.0 / 38  (47.4):  76%|███████▌  | 38/50 [00:31<00:23,  1.98s/it]

Backing off 6.3 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000767384656629}


Average Metric: 18.5 / 40  (46.2):  80%|████████  | 40/50 [00:33<00:15,  1.51s/it]

Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000551245177778}
Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000040916634462}


Average Metric: 18.5 / 41  (45.1):  82%|████████▏ | 41/50 [00:34<00:13,  1.47s/it]

Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000367853147456}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000551245177778}
Backing off 4.9 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000040916634462}


Average Metric: 18.5 / 42  (44.0):  84%|████████▍ | 42/50 [00:39<00:18,  2.32s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000759572844936}


Average Metric: 19.0 / 43  (44.2):  86%|████████▌ | 43/50 [00:41<00:15,  2.16s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000217087495961}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000759572844936}


Average Metric: 20.5 / 45  (45.6):  90%|█████████ | 45/50 [00:44<00:08,  1.75s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000217087495961}


Average Metric: 21.5 / 46  (46.7):  92%|█████████▏| 46/50 [00:45<00:06,  1.60s/it]

Backing off 10.0 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000040916634462}


Average Metric: 21.5 / 50  (43.0): 100%|██████████| 50/50 [00:57<00:00,  1.14s/it]


Average Metric: 21.5 / 50  (43.0%)
Score: 43.0 for set: [2, 2, 2]
Scores so far: [44.0, 46.0, 53.0, 43.0]
Best score: 53.0
Average of max per entry across top 1 scores: 0.53
Average of max per entry across top 2 scores: 0.69
Average of max per entry across top 3 scores: 0.74
Average of max per entry across top 5 scores: 0.86
Average of max per entry across top 8 scores: 0.86
Average of max per entry across top 9999 scores: 0.86


  4%|▍         | 2/50 [00:06<02:44,  3.43s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 3.0 / 5  (60.0):  10%|█         | 5/50 [00:03<00:23,  1.88it/s]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000764488006934}


Average Metric: 4.0 / 6  (66.7):  12%|█▏        | 6/50 [00:04<00:29,  1.50it/s]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000655695756235}


Average Metric: 4.0 / 7  (57.1):  14%|█▍        | 7/50 [00:06<00:45,  1.06s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000920331075348}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000764488006934}


Average Metric: 4.0 / 8  (50.0):  16%|█▌        | 8/50 [00:07<00:45,  1.07s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000685412389009}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 4.5 / 9  (50.0):  18%|█▊        | 9/50 [00:10<01:05,  1.59s/it]

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 4.5 / 10  (45.0):  20%|██        | 10/50 [00:12<01:03,  1.59s/it]

Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000685412389009}


Average Metric: 5.5 / 12  (45.8):  22%|██▏       | 11/50 [00:13<00:59,  1.52s/it]

Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 5.5 / 13  (42.3):  26%|██▌       | 13/50 [00:15<00:47,  1.29s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000630863897989}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000824136570934}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000685412389009}


Average Metric: 6.5 / 14  (46.4):  28%|██▊       | 14/50 [00:18<01:01,  1.70s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000505412363753}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000008055591119}


Average Metric: 6.5 / 15  (43.3):  30%|███       | 15/50 [00:20<01:02,  1.79s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}
Backing off 3.7 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000685412389009}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000505412363753}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000008055591119}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000505412363753}


Average Metric: 7.5 / 16  (46.9):  32%|███▏      | 16/50 [00:24<01:19,  2.34s/it]

Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 7.5 / 17  (44.1):  34%|███▍      | 17/50 [00:25<01:05,  1.97s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000008055591119}
Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000250295546432}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 8.0 / 19  (42.1):  38%|███▊      | 19/50 [00:29<00:56,  1.83s/it]

Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}
Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000744451195007}
Backing off 5.8 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000008055591119}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}


Average Metric: 8.0 / 20  (40.0):  40%|████      | 20/50 [00:32<01:03,  2.13s/it]

Backing off 2.0 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000744451195007}


Average Metric: 9.0 / 22  (40.9):  44%|████▍     | 22/50 [00:36<01:00,  2.17s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700003278867638}
Backing off 0.5 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}
Backing off 15.3 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 9.0 / 23  (39.1):  46%|████▌     | 23/50 [00:38<00:52,  1.95s/it]

Backing off 4.3 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000008055591119}


Average Metric: 10.0 / 24  (41.7):  48%|████▊     | 24/50 [00:39<00:46,  1.78s/it]

Backing off 13.9 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}


Average Metric: 10.5 / 26  (40.4):  52%|█████▏    | 26/50 [00:42<00:37,  1.54s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000208853690072}


Average Metric: 12.0 / 29  (41.4):  58%|█████▊    | 29/50 [00:45<00:28,  1.36s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000904891826528}


Average Metric: 13.0 / 30  (43.3):  60%|██████    | 30/50 [00:47<00:26,  1.33s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000863606094162}


Average Metric: 13.0 / 31  (41.9):  62%|██████▏   | 31/50 [00:51<00:43,  2.31s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700049223788767}


Average Metric: 14.0 / 33  (42.4):  64%|██████▍   | 32/50 [00:53<00:36,  2.03s/it]

Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000863606094162}
Backing off 15.3 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 15.0 / 34  (44.1):  68%|██████▊   | 34/50 [00:54<00:23,  1.48s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700049223788767}
Backing off 24.7 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000585918821479}


Average Metric: 16.0 / 35  (45.7):  70%|███████   | 35/50 [00:56<00:24,  1.63s/it]

Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700049223788767}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000294652014744}


Average Metric: 17.0 / 36  (47.2):  72%|███████▏  | 36/50 [00:57<00:18,  1.35s/it]

Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000294652014744}


Average Metric: 18.0 / 38  (47.4):  76%|███████▌  | 38/50 [01:02<00:20,  1.72s/it]

Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000294652014744}


Average Metric: 18.0 / 40  (45.0):  80%|████████  | 40/50 [01:05<00:15,  1.58s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.700074857218905}


Average Metric: 18.0 / 41  (43.9):  82%|████████▏ | 41/50 [01:07<00:16,  1.81s/it]

Backing off 7.7 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000294652014744}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000106635928416}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000364343397876}


Average Metric: 18.0 / 42  (42.9):  84%|████████▍ | 42/50 [01:09<00:15,  1.92s/it]

Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000364343397876}


Average Metric: 18.0 / 43  (41.9):  86%|████████▌ | 43/50 [01:11<00:13,  1.93s/it]

Backing off 62.3 seconds after 7 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000029014551009}


Average Metric: 18.5 / 44  (42.0):  88%|████████▊ | 44/50 [01:14<00:12,  2.10s/it]

Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000364343397876}


Average Metric: 20.0 / 50  (40.0): 100%|██████████| 50/50 [02:16<00:00,  2.73s/it]


Average Metric: 20.0 / 50  (40.0%)
Score: 40.0 for set: [1, 1, 1]
Scores so far: [44.0, 46.0, 53.0, 43.0, 40.0]
Best score: 53.0
Average of max per entry across top 1 scores: 0.53
Average of max per entry across top 2 scores: 0.69
Average of max per entry across top 3 scores: 0.74
Average of max per entry across top 5 scores: 0.89
Average of max per entry across top 8 scores: 0.89
Average of max per entry across top 9999 scores: 0.89


  2%|▏         | 1/50 [00:03<02:35,  3.16s/it]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 24.0 / 50  (48.0): 100%|██████████| 50/50 [00:22<00:00,  2.25it/s]


Average Metric: 24.0 / 50  (48.0%)
Score: 48.0 for set: [1, 1, 1]
Scores so far: [44.0, 46.0, 53.0, 43.0, 40.0, 48.0]
Best score: 53.0
Average of max per entry across top 1 scores: 0.53
Average of max per entry across top 2 scores: 0.76
Average of max per entry across top 3 scores: 0.81
Average of max per entry across top 5 scores: 0.92
Average of max per entry across top 8 scores: 0.94
Average of max per entry across top 9999 scores: 0.94


  4%|▍         | 2/50 [00:06<02:25,  3.04s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 2 / 4  (50.0):   8%|▊         | 4/50 [00:01<00:16,  2.77it/s] 

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000287617851588}


Average Metric: 3 / 5  (60.0):  10%|█         | 5/50 [00:02<00:26,  1.68it/s]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000559161067597}


Average Metric: 4 / 6  (66.7):  12%|█▏        | 6/50 [00:03<00:20,  2.15it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}


Average Metric: 6 / 9  (66.7):  18%|█▊        | 9/50 [00:05<00:23,  1.77it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000398559506662}
Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000499059254902}


Average Metric: 6 / 10  (60.0):  20%|██        | 10/50 [00:06<00:36,  1.08it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000205846660467}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000499059254902}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000398559506662}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}


Average Metric: 7 / 11  (63.6):  22%|██▏       | 11/50 [00:09<00:55,  1.43s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000887730381801}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000205846660467}


Average Metric: 7 / 12  (58.3):  24%|██▍       | 12/50 [00:10<00:54,  1.43s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000887730381801}


Average Metric: 9 / 14  (64.3):  28%|██▊       | 14/50 [00:12<00:41,  1.16s/it]

Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}


Average Metric: 10 / 15  (66.7):  30%|███       | 15/50 [00:13<00:39,  1.13s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000956374000822}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000956374000822}


Average Metric: 11 / 17  (64.7):  32%|███▏      | 16/50 [00:15<00:46,  1.36s/it]

Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000887730381801}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}
Backing off 5.7 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000887730381801}
Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000849083725643}
Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}


Average Metric: 12 / 19  (63.2):  38%|███▊      | 19/50 [00:18<00:35,  1.15s/it]

Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}
Backing off 0.3 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}


Average Metric: 13 / 20  (65.0):  40%|████      | 20/50 [00:21<00:44,  1.47s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}


Average Metric: 15 / 22  (68.2):  44%|████▍     | 22/50 [00:22<00:32,  1.17s/it]

Backing off 1.8 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}
Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000048211210192}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}
Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}


Average Metric: 16 / 24  (66.7):  48%|████▊     | 24/50 [00:26<00:39,  1.51s/it]

Backing off 30.1 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000964760855238}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}


Average Metric: 16 / 25  (64.0):  50%|█████     | 25/50 [00:28<00:37,  1.50s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000048211210192}


Average Metric: 17 / 26  (65.4):  52%|█████▏    | 26/50 [00:29<00:31,  1.32s/it]

Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}


Average Metric: 18 / 27  (66.7):  54%|█████▍    | 27/50 [00:30<00:28,  1.23s/it]

Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000048211210192}


Average Metric: 19 / 28  (67.9):  56%|█████▌    | 28/50 [00:31<00:29,  1.36s/it]

Backing off 0.6 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}
Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000083328992484}


Average Metric: 19.5 / 29  (67.2):  58%|█████▊    | 29/50 [00:34<00:35,  1.70s/it]

Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000083328992484}
Backing off 0.2 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}
Backing off 13.8 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}


Average Metric: 19.5 / 31  (62.9):  62%|██████▏   | 31/50 [00:37<00:30,  1.61s/it]

Backing off 14.8 seconds after 5 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000344745963888}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000083328992484}


Average Metric: 23.5 / 36  (65.3):  72%|███████▏  | 36/50 [00:42<00:15,  1.09s/it]

Backing off 0.9 seconds after 4 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000083328992484}


Average Metric: 25.5 / 38  (67.1):  76%|███████▌  | 38/50 [00:45<00:13,  1.10s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000124110942705}


Average Metric: 26.5 / 40  (66.2):  80%|████████  | 40/50 [00:47<00:10,  1.03s/it]

Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000124110942705}


Average Metric: 27.5 / 43  (64.0):  86%|████████▌ | 43/50 [00:51<00:08,  1.21s/it]

Backing off 28.4 seconds after 6 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000656664557154}


Average Metric: 27.5 / 44  (62.5):  88%|████████▊ | 44/50 [00:52<00:06,  1.14s/it]

Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x10a33ed40> with kwargs {'temperature': 0.7000124110942705}


Average Metric: 30.0 / 50  (60.0): 100%|██████████| 50/50 [01:22<00:00,  1.65s/it]

Average Metric: 30.0 / 50  (60.0%)
Score: 60.0 for set: [1, 1, 1]
New best score: 60.0 for seed 3
Scores so far: [44.0, 46.0, 53.0, 43.0, 40.0, 48.0, 60.0]
Best score: 60.0
Average of max per entry across top 1 scores: 0.6
Average of max per entry across top 2 scores: 0.78
Average of max per entry across top 3 scores: 0.9
Average of max per entry across top 5 scores: 0.93
Average of max per entry across top 8 scores: 0.96
Average of max per entry across top 9999 scores: 0.96
7 candidate programs found.





In [10]:
optimized_llm.save("optimized_llm_chains/optimized_gpt_chain_v010.json")

In [11]:
llama3_8b.inspect_history(n=1)

In [12]:
gpt35turbo.inspect_history(n=2)





---CONTEXT---
    You are an experienced financial analyst known for your expertise in evaluating and interpreting expectations related to the financial stability and solvency of countries.

    ---TASK---
    Please assess the expectation towards the solvency of the country mentioned in the given text excerpt. This includes evaluating the country's financial stability and ability to meet its obligations.    
    
    ---GUIDELINES---
    - Use the following scale for your assessment: -2 = very negative, -1 = somewhat negative, 0 = neutral, 1 = somewhat positive, 2 = very positive

---

Follow the following format.

Country Keyword: keyword that represents a country

Country Role: role of the country in the excerpt of a financial services company's earnings call transcript

Excerpt: excerpt from a financial services company's earnings conference call

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: one of: [-2, -1, 0, 1, 2]. Only respond wiht 

## Custom Evaluation

Here, I just manually inspect what happens to control the workflow.

In [13]:
custom_evaluation_function(validation_set=valset, 
                           llm= full_llm_chain, 
                           metric_for_evaluation="evaluate_expectations", 
                           show_examples=2)

NameError: name 'valset' is not defined

In [None]:
gpt35turbo.inspect_history(n = 2)

In [None]:
evaluate = Evaluate(devset=valset, metric= evaluate_expectations_metric, num_threads=4, display_progress=True, display_table=4)