In [1]:
# Define initial prompt and output format
initial_prompt = (
    "Please look for the following risk factor: Is the client at risk of self-harm? For instance, do they mention suicidal thoughts or ideation? Do they imply they might do physical damage to themselves or to property? Do they reference wanting to 'end it all' or say it's 'not worth living'? Provide a step-by-step chain of thought in your analysis. Please output: 1. risk_type: // suicidality, 2. chain_of_thought: // Provide your step-by-step reasoning process. 3. risk_output: // 'risk present' : this means there is evidence this risk is present in the case 'risk not present' : there is evidence the risk is NOT present or there is no evidence whether the case contains that risk or not. (If in doubt, it is better to err on the side of caution and say 'risk present') 4. explanation: // State words/terms that indicate the reason the risk_output was chosen. Be brief in your explanation. State facts found in the text, do not infer. E.g. 'Client expressed suicidal ideation'. Leave blank for 'risk not present.'"
)

output_format_prompt = (
    "Output should be STRICT JSON, containing: dictionary containing the type of risk, chain of thought, output, and explanation, formatted like this: {'risk_type': 'suicidality', 'chain_of_thought': str, 'risk_output': str, 'explanation': str}"
)

In [2]:
invalid_analysis_comments = '''
Make sure the output is a STRICT JSON following the format {'risk_type': 'suicidality', 'risk_output': str, 'explanation': str}
The only option for risk_type is "sucidality", do not create a new key.
'''

In [3]:
# Define output schema
output_schema = {
    'key_to_extract': 'risk_output',
    'value_mapping': {
        'risk_present': 1,
        'risk_not_present': 0
    },
    'regex_pattern': r'"risk_output":\s*"(.*?)"',
    'chain_of_thought_key': 'chain_of_thought',  
    'chain_of_thought_regex': r'"chain_of_thought":\s*"(.*?)"',
    'use_json_mode': True,
}

In [4]:
# Set number of optimization iterations
iterations = 3

In [5]:
# Define model providers and models for evaluation and optimization
eval_provider = "ollama"
eval_model = "llama3.1"
optim_provider = "ollama"
optim_model = "llama3.1"

In [6]:
# Path to the CSV file containing review data for evaluation
eval_datapath = "risks.csv"

------------------------------------------------------------------------------------------

In [7]:
# Import necessary libraries
import pandas as pd
import sys
import os
# Add the parent directory to sys.path
# Use getcwd() to get the current working directory for Jupyter notebooks
current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
sys.path.append(parent_dir)
from src.iterative_prompt_optimization import optimize_prompt

In [8]:
# Load and prepare data
eval_data = pd.read_csv(eval_datapath, encoding='ISO-8859-1', usecols=['text', 'label'])
# Randomly select 50 positive and 50 negative samples
eval_data = (
    eval_data.groupby('label')
    .apply(lambda x: x.sample(n=10, random_state=42))
    .reset_index(drop=True)
)
# Shuffle the DataFrame randomly
eval_data = eval_data.sample(frac=1, random_state=42).reset_index(drop=True)
print(f"Evaluation data shape: {eval_data.shape}")
print(eval_data.head())

Evaluation data shape: (20, 2)
                                                text  label
0  Good afternoon <person>, I refer to the compla...      0
1  C said to <supplier> yesterday that she wishes...      1
2  Health concerns - consumer has mental health i...      1
3  Please confirm if the final balance on the acc...      0
4  I am writing to request a status update on the...      0


In [9]:
# Run the prompt optimization process
best_prompt, best_metrics = optimize_prompt(
    initial_prompt = initial_prompt,
    output_format_prompt = output_format_prompt,
    eval_data = eval_data,
    iterations =iterations,
    eval_provider=eval_provider,
    eval_model=eval_model,
    eval_temperature=0.7,
    optim_provider=optim_provider,
    optim_model=optim_model,
    optim_temperature=0,
    output_schema=output_schema,
    use_cache=True,  # Set to False if you want to disable caching,
    fp_comments = "",
    fn_comments = "",
    tp_comments = "",
    invalid_comments=invalid_analysis_comments,
    validation_comments="",
)
# After running the optimization process, you can analyze the results by checking 
# the generated log files in the `runs/prompt_optimization_logs_YYYYMMDD_HHMMSS` directory.

Selected evaluation provider: ollama
Selected evaluation model: llama3.1
Evaluation temperature: 0.7
Selected optimization provider: ollama
Selected optimization model: llama3.1
Optimization temperature: 0
Estimated token usage: 51600
Estimated cost: $0 API Costs - Running on Local Hardware

Do you want to proceed with the optimization? (Y/N): 
Iteration 1/3


Processing text 1/20
Using cached output for text 1/20
Python literal evaluation failed...
Trying JSON parsing...
JSON parsing failed...
Trying JSON-like structure extraction...
All parsing methods failed!
Prediction 1/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: There doesn't appear to be any mention of self-harm or suicidal thoughts in this conversation. 1. risk_type: 2. chain_of_thought:	* Did the client express any desire for physical harm towards themselves?	* Was there a mention of suicidal ideation or a statement about wanting to 'end it all'?	* Were there implications of property damage?3. risk_output: risk not present4. explanation:
Processing text 2/20
Using cached output for text 2/20
Python literal evaluation failed...
Trying JSON parsing...
JSON parsing failed...
Trying JSON-like structure extraction...
All parsing methods failed!
Prediction 2/20: None | Ground Truth: 1 🛠️ (Invalid Output Format) - Raw output: Based on the information provided, here 


Analyzing misclassifications, true positives, and invalid outputs...


KeyboardInterrupt: 