In [1]:
# Define initial prompt and output format
initial_prompt = (
    "Please look for the following risk factor: Is the client at risk of self-harm? For instance, do they mention suicidal thoughts or ideation? Do they imply they might do physical damage to themselves or to property? Do they reference wanting to 'end it all or say it's 'not worth living'? Please output: 1. risk_type: // suicidality, 2. risk_output: // 'risk present' : this means there is evidence this risk is present in the case 'risk not present' : there is evidence the risk is NOT present or there is no evidence whether the case contains that risk or not. (If in doubt, it is better to err on the side of caution and say 'risk present') 3. explanation: // State words/terms that indicate the reason the risk_output was chosen. Be brief in your explanation. State facts found in the text, do not infer. E.g. 'Client expressed suicidal ideation'. Leave blank for 'risk not present.'"
)

output_format_prompt = (
    "Output should be STRICT JSON, containing: dictionary containing the type of risk with their output and explanation, formatted like this: {'risk_type': 'suicidality', 'risk_output': str, 'explanation': str}'"
)

In [2]:
# initial_prompt = (
#         "You are a sentiment analysis classifier. Determine whether the provided text expresses a positive sentiment. "
#         "Respond with '1' if it is positive, or '0' if it is negative."
#     )

# # Output format prompt
# output_format_prompt = (
#     "You are to act as a binary responder. For every question asked, reply strictly with '1' for positive or '0' for negative. "
#     "Do NOT include any additional text or explanation."
# )

In [3]:
# Define output schema
output_schema = {
    'key_to_extract': 'risk_output',
    'value_mapping': {
        'risk_present': 1,
        'risk_not_present': 0
    },
    'regex_pattern': r'"risk_output":\s*"(.*?)"'
}

In [4]:
# output_schema = {
#     'key_to_extract': None,  # Set to None for direct output
#     'value_mapping': None,   # Set to None for direct mapping
#     'regex_pattern': r'^(0|1)$'  # Match the entire output for binary classification
# }

In [5]:
# Set number of optimization iterations
iterations = 3

In [6]:
# Define model providers and models for evaluation and optimization
eval_provider = "ollama"
eval_model = "llama3.1"
optim_provider = "ollama"
optim_model = "llama3.1"

In [7]:
# Path to the CSV file containing review data for evaluation
eval_datapath = "reviews.csv"

------------------------------------------------------------------------------------------

In [8]:
# Import necessary libraries
import pandas as pd
import sys
import os
# Add the parent directory to sys.path
# Use getcwd() to get the current working directory for Jupyter notebooks
current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
sys.path.append(parent_dir)
from src.iterative_prompt_optimization import optimize_prompt

In [9]:
# Load and prepare data
eval_data = pd.read_csv(eval_datapath, encoding='ISO-8859-1', usecols=['Text', 'Sentiment'])
eval_data.columns = ['text', 'label']
# Randomly select 50 positive and 50 negative samples
eval_data = (
    eval_data.groupby('label')
    .apply(lambda x: x.sample(n=5, random_state=42))
    .reset_index(drop=True)
)
# Shuffle the DataFrame randomly
eval_data = eval_data.sample(frac=1, random_state=42).reset_index(drop=True)
print(f"Evaluation data shape: {eval_data.shape}")
print(eval_data.head())

Evaluation data shape: (10, 2)
                                                text  label
0  Was this based on a comic-book? A video-game? ...      1
1  If you ask me the first one was really better ...      0
2  When I was a kid, I loved "Tiny Toons". I espe...      1
3  I hate guns and have never murdered anyone, bu...      0
4  I do not have much to say than this is a great...      1


In [10]:
# Run the prompt optimization process
best_prompt, best_metrics = optimize_prompt(
    initial_prompt,
    output_format_prompt,
    eval_data,
    iterations,
    eval_provider=eval_provider,
    eval_model=eval_model,
    optim_provider=optim_provider,
    optim_model=optim_model,
    output_schema=output_schema
)
print("\nBest Prompt:")
print(best_prompt)
print("\nBest Metrics:")
print(best_metrics)

# After running the optimization process, you can analyze the results by checking 
# the generated log files in the `runs/prompt_optimization_logs_YYYYMMDD_HHMMSS` directory.

Selected evaluation provider: ollama
Selected evaluation model: llama3.1
Selected optimization provider: ollama
Selected optimization model: llama3.1
Estimated token usage: 112440
Estimated cost: $0 API Costs - Running on Local Hardware

Do you want to proceed with the optimization? (Y/N): 
Iteration 1/3
Processing text 1/10
Invalid output: Extracted value 'not present' not found in value_mapping
Prediction 1/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 2/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 2/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 3/10
Prediction 3/10: 0 | Ground Truth: 1 ❌ (FN)
Processing text 4/10
Prediction 4/10: 0 | Ground Truth: 0 ✅ (TN)
Processing text 5/10
Prediction 5/10: 0 | Ground Truth: 1 ❌ (FN)
Processing text 6/10
Prediction 6/10: 0 | Ground Truth: 0 ✅ (TN)
Processing text 7/10
Prediction 7/10: 0 | Ground Truth: 1 ❌ (FN)
Processing text 8/10
Prediction 8/10: 0 | Ground Truth: 0 ✅ (TN)
Proce


Analyzing misclassifications...



Generating new prompt...

Iteration 2/3
Processing text 1/10
Invalid output: Extracted value 'not present' not found in value_mapping
Prediction 1/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 2/10
Unable to find risk_output in the raw output
Prediction 2/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 3/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 3/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 4/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 4/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 5/10
Invalid output: Extracted value 'not present' not found in value_mapping
Prediction 5/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 6/10
Invalid output: Extracted value 'not present' not found in value_mapping
Prediction 6/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 7/10
Invalid output: Extracted value 'not present' 


Analyzing misclassifications...



Generating new prompt...

Iteration 3/3
Processing text 1/10
Unable to find risk_output in the raw output
Prediction 1/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 2/10
Unable to find risk_output in the raw output
Prediction 2/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 3/10
Invalid output: Extracted value 'None' not found in value_mapping
Prediction 3/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 4/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 4/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 5/10
Invalid output: Extracted value 'Not found' not found in value_mapping
Prediction 5/10: None | Ground Truth: 1 🛠️ (Invalid Output Format)
Processing text 6/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 6/10: None | Ground Truth: 0 🛠️ (Invalid Output Format)
Processing text 7/10
JSON Decode Error: Unable to parse extracted JSON
Prediction 7/10: None | Ground Tr


All logs saved in directory: /Users/danielfiuzadosil/Documents/GitHub/Data-Science/LLMs/iterative_prompt_optimisation/runs/prompt_optimization_logs_20240925_152258

Best Prompt:
Please look for the following risk factor: Is the client at risk of self-harm? For instance, do they mention suicidal thoughts or ideation? Do they imply they might do physical damage to themselves or to property? Do they reference wanting to 'end it all or say it's 'not worth living'? Please output: 1. risk_type: // suicidality, 2. risk_output: // 'risk present' : this means there is evidence this risk is present in the case 'risk not present' : there is evidence the risk is NOT present or there is no evidence whether the case contains that risk or not. (If in doubt, it is better to err on the side of caution and say 'risk present') 3. explanation: // State words/terms that indicate the reason the risk_output was chosen. Be brief in your explanation. State facts found in the text, do not infer. E.g. 'Client e