In [1]:
initial_prompt = (
        "You are a sentiment analysis classifier. Determine whether the provided text expresses a positive sentiment. "
        "Respond with '1' if it is positive, or '0' if it is negative."
    )

# Output format prompt
output_format_prompt = (
    "You are to act as a binary responder. For every question asked, reply strictly with '1' for positive or '0' for negative. "
    "Do NOT include any additional text or explanation."
)

In [2]:
output_schema = {
    'key_to_extract': None,  # Set to None for direct output
    'value_mapping': None,   # Set to None for direct mapping
    'regex_pattern': r'^(0|1)$',  # Match the entire output for binary classification
    'use_json_mode': False
}

In [3]:
# Set number of optimization iterations
iterations = 10

In [4]:
# Define model providers and models for evaluation and optimization
eval_provider = "ollama"
eval_model = "llama3.1"
optim_provider = "ollama"
optim_model = "llama3.1"

In [5]:
# Path to the CSV file containing review data for evaluation
eval_datapath = "sentiments.csv"
sample_size = 20

------------------------------------------------------------------------------------------

In [6]:
# Import necessary libraries
import pandas as pd
import sys
import os
# Add the parent directory to sys.path
# Use getcwd() to get the current working directory for Jupyter notebooks
current_dir = os.getcwd()
parent_dir = os.path.dirname(current_dir)
sys.path.append(parent_dir)
from src.iterative_prompt_optimization import optimize_prompt

In [7]:
# Load and prepare data
eval_data = pd.read_csv(eval_datapath, encoding='ISO-8859-1', usecols=['Text', 'Sentiment'])
eval_data.columns = ['text', 'label']
# Randomly select 50 positive and 50 negative samples
eval_data = (
    eval_data.groupby('label')
    .apply(lambda x: x.sample(n=round(sample_size/2), random_state=42))
    .reset_index(drop=True)
)
# Shuffle the DataFrame randomly
eval_data = eval_data.sample(frac=1, random_state=42).reset_index(drop=True)
print(f"Evaluation data shape: {eval_data.shape}")
print(eval_data.head())

Evaluation data shape: (20, 2)
                                                text  label
0  I hate guns and have never murdered anyone, bu...      0
1  "A Cry in the Dark" is a masterful piece of ci...      1
2  I was watching the Perfect Storm, and thought ...      1
3  If you ask me the first one was really better ...      0
4  The movie 'Heart of Darkness', based on the 18...      0


In [8]:
# best_prompt, best_metrics = optimize_prompt(
#     initial_prompt,
#     output_format_prompt,
#     eval_data,
#     iterations)

In [9]:
# Run the prompt optimization process
best_prompt, best_metrics = optimize_prompt(
    initial_prompt,
    output_format_prompt,
    eval_data,
    iterations,
    eval_provider=eval_provider,
    eval_model=eval_model,
    optim_provider=optim_provider,
    optim_model=optim_model,
    output_schema=output_schema,
    use_cache=True  # Set to False if you want to disable caching
)
# After running the optimization process, you can analyze the results by checking 
# the generated log files in the `runs/prompt_optimization_logs_YYYYMMDD_HHMMSS` directory.

Selected evaluation provider: ollama
Selected evaluation model: llama3.1
Selected optimization provider: ollama
Selected optimization model: llama3.1
Estimated token usage: 1071600
Estimated cost: $0 API Costs - Running on Local Hardware

Do you want to proceed with the optimization? (Y/N): 
Iteration 1/10


Processing text 1/20
Using cached output for text 1/20
Prediction 1/20: 1 | Ground Truth: 0 ❌ (FP)
Processing text 2/20
Using cached output for text 2/20
Prediction 2/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 3/20
Using cached output for text 3/20
Prediction 3/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 4/20
Using cached output for text 4/20
Prediction 4/20: 0 | Ground Truth: 0 ✅ (TN)
Processing text 5/20
Using cached output for text 5/20
Prediction 5/20: 1 | Ground Truth: 0 ❌ (FP)
Processing text 6/20
Using cached output for text 6/20
Prediction 6/20: 0 | Ground Truth: 0 ✅ (TN)
Processing text 7/20
Using cached output for text 7/20
Prediction 7/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 8/20
Using cached output for text 8/20
Prediction 8/20: 0 | Ground Truth: 0 ✅ (TN)
Processing text 9/20
Using cached output for text 9/20
Prediction 9/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 10/20
Using cached output for text 10/20
Prediction 10/20: 1 | Ground Truth: 1 ✅ (TP)
Process


Analyzing misclassifications, true positives, and invalid outputs...



Iteration 2/10


Processing text 1/20
Prediction 1/20: 0 | Ground Truth: 0 ✅ (TN)
Processing text 2/20
Prediction 2/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 3/20
Prediction 3/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 4/20
Unable to match the output pattern: ^(0|1)$
Prediction 4/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 5/20
Unable to match the output pattern: ^(0|1)$
Prediction 5/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 6/20
Unable to match the output pattern: ^(0|1)$
Prediction 6/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 7/20
Prediction 7/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 8/20
Unable to match the output pattern: ^(0|1)$
Prediction 8/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 9/20
Prediction 9/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 10/20
Prediction 10/20: 1 | Ground Truth: 1 ✅ (TP)
Processing t


Analyzing misclassifications, true positives, and invalid outputs...



Iteration 3/10


Processing text 1/20
Prediction 1/20: 1 | Ground Truth: 0 ❌ (FP)
Processing text 2/20
Prediction 2/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 3/20
Prediction 3/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 4/20
Unable to match the output pattern: ^(0|1)$
Prediction 4/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 5/20
Unable to match the output pattern: ^(0|1)$
Prediction 5/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 6/20
Unable to match the output pattern: ^(0|1)$
Prediction 6/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 7/20
Prediction 7/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 8/20
Unable to match the output pattern: ^(0|1)$
Prediction 8/20: None | Ground Truth: 0 🛠️ (Invalid Output Format) - Raw output: -1
Processing text 9/20
Prediction 9/20: 1 | Ground Truth: 1 ✅ (TP)
Processing text 10/20
Prediction 10/20: 1 | Ground Truth: 1 ✅ (TP)
Processing t


Analyzing misclassifications, true positives, and invalid outputs...
