## 📚 Step 1: Import Everything

This imports all the functions we need to run this notebook.

In [1]:
from prompt_runner import (
    load_environment,
    load_configuration, 
    load_dataset,
    tasks_to_dataframe,
    run_all_tasks,
    results_summary_dataframe,
    save_results,
    create_results_table
)
import os
import pandas as pd
from litellm import supports_response_schema
pd.set_option('display.max_colwidth', None)


print("✅ Imports complete!")

✅ Imports complete!


## 🔍 Step 2: Check Which Models Support Structured Output

Before we start, let's see which models work with our Pydantic structured output. Using structured output is a way of making sure that the models give us the output in the format we want, like only answering "positive" or negative". 
This helps you choose the right models for your config.json file, which is how we control what models we're using.

In [2]:
print("🔍 Checking model support for structured output (Pydantic)...")
# Test models for json_schema support (what we actually need)
test_models = [
    ("gpt-4o-mini", "openai"),
    ("gpt-4o", "openai"), 
    ("gpt-3.5-turbo", "openai"),
    ("claude-3-haiku-20240307", "anthropic"),
    ("claude-3-5-sonnet-20241022", "anthropic"),
]

supported = []
for model, provider in test_models:
    try:
        # Use the correct function for checking Pydantic/json_schema support
        has_support = supports_response_schema(model=model, custom_llm_provider=provider)
        if has_support:
            supported.append(f"{model} ({provider})")
            print(f"✅ {model} - Supports Pydantic structured output")
        else:
            print(f"❌ {model} - No Pydantic structured output support")
    except Exception as e:
        print(f"❓ {model} - Could not check: {str(e)}")
        
print(f"\n💡 Supported models for config.json: {supported}")

🔍 Checking model support for structured output (Pydantic)...
✅ gpt-4o-mini - Supports Pydantic structured output
✅ gpt-4o - Supports Pydantic structured output
❌ gpt-3.5-turbo - No Pydantic structured output support
✅ claude-3-haiku-20240307 - Supports Pydantic structured output
✅ claude-3-5-sonnet-20241022 - Supports Pydantic structured output

💡 Supported models for config.json: ['gpt-4o-mini (openai)', 'gpt-4o (openai)', 'claude-3-haiku-20240307 (anthropic)', 'claude-3-5-sonnet-20241022 (anthropic)']


## 🔑 Step 3: Load Environment Variables

This loads your API keys from the .env file.
Make sure you have the API keys you need for the providers you want to use.

In [3]:
load_environment()

# Show what API keys we actually found
api_keys_to_check = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY", "GEMINI_API_KEY"]

for key in api_keys_to_check:
    os.getenv(key)

✓ Found 3 API key(s) in .env:
   - OPENAI_API_KEY
   - ANTHROPIC_API_KEY
   - GEMINI_API_KEY
✓ Environment variables loaded from .env


## ⚙️ Step 4: Load Configuration

This reads your config.json file which contains:
- Which prompts to test
- Which models to use
- What temperatures to try
- How to format the output

In [4]:
config = load_configuration("config.json")

print(f"\n💡 To modify your experiment:")
print(f"   📝 Edit config.json to change:")
print(f"      - prompts: Add new prompt templates")
print(f"      - models: Try different LLMs")
print(f"      - temperatures: test effect of consistency")
print(f"      - sample_size: Test on more data (currently {config['data_settings']['sample_size']})")
print(f"\n🔄 After editing config.json, restart this notebook to see changes")

✓ 3 prompts, 2 models, 2 temperatures

📝 Prompts:
   1. sentiment_basic: Classify this movie review as either positive or negative: '{text}'

   2. sentiment_instruction: Read the following movie review and determine if it's positive or negative.

Review: '{text}'

   3. sentiment_bad: hey is this good or bad idk: '{text}'


💡 To modify your experiment:
   📝 Edit config.json to change:
      - prompts: Add new prompt templates
      - models: Try different LLMs
      - temperatures: test effect of consistency
      - sample_size: Test on more data (currently 10)

🔄 After editing config.json, restart this notebook to see changes


## 📁 Step 5: Load Your Data

This loads your CSV file and samples data for testing.
It automatically balances the classes (equal positive/negative samples).

In [5]:
samples = load_dataset("IMDB Dataset.csv", config)

print(f"📊 Loaded {len(samples)} samples")
print(f"📈 Label distribution:")
label_counts = samples[config['data_settings']['label_column']].value_counts()
for label, count in label_counts.items():
    print(f"   - {label}: {count}")

# Show sample data
samples.head(3)

✓ 10 samples loaded
📊 Loaded 10 samples
📈 Label distribution:
   - positive: 5
   - negative: 5


Unnamed: 0,review,sentiment
0,"Some films just simply should not be remade. This is one of them. In and of itself it is not a bad film. But it fails to capture the flavor and the terror of the 1963 film of the same title. Liam Neeson was excellent as he always is, and most of the cast holds up, with the exception of Owen Wilson, who just did not bring the right feel to the character of Luke. But the major fault with this version is that it strayed too far from the Shirley Jackson story in it's attempts to be grandiose and lost some of the thrill of the earlier film in a trade off for snazzier special effects. Again I will say that in and of itself it is not a bad film. But you will enjoy the friction of terror in the older version much more.",positive
1,"""Ardh Satya"" is one of the finest film ever made in Indian Cinema. Directed by the great director Govind Nihalani, this one is the most successful Hard Hitting Parallel Cinema which also turned out to be a Commercial Success. Even today, Ardh Satya is an inspiration for all leading directors of India.<br /><br />The film tells the Real-life Scenario of Mumbai Police of the 70s. Unlike any Police of other cities in India, Mumbai Police encompasses a Different system altogether. Govind Nihalani creates a very practical Outlay with real life approach of Mumbai Police Environment.<br /><br />Amongst various Police officers & colleagues, the film describes the story of Anand Velankar, a young hot-blooded Cop coming from a poor family. His father is a harsh Police Constable. Anand himself suffers from his father's ideologies & incidences of his father's Atrocities on his mother. Anand's approach towards immediate action against crime, is an inert craving for his own Job satisfaction. The film is here revolved in a Plot wherein Anand's constant efforts against crime are trampled by his seniors.This leads to frustrations, as he cannot achieve the desired Job-satisfaction. Resulting from the frustrations, his anger is expressed in excessive violence in the remand rooms & bars, also turning him to an alcoholic.<br /><br />The Spirit within him is still alive, as he constantly fights the system. He is aware of the system of the Metro, where the Police & Politicians are a inertly associated by far end. His compromise towards unethical practice is negative. Finally he gets suspended.<br /><br />The Direction is a master piece & thoroughly hard core. One of the best memorable scenes is when Anand breaks in the Underworld gangster Rama Shetty's house to arrest him, followed by short conversation which is fantastic. At many scenes, the film has Hair-raising moments.<br /><br />The Practical approach of Script is a major Punch. Alcoholism, Corruption, Political Influence, Courage, Deceptions all are integral part of Mumbai police even today. Those aspects are dealt brilliantly.<br /><br />Finally, the films belongs to the One man show, Om Puri portraying Anand Velankar traversing through all his emotions absolutely brilliantly.",positive
2,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",positive


## 🎯 Step 6: Create All Task Combinations

This creates every possible combination of:
- Each prompt × Each model × Each temperature × Each sample

This is where we define exactly what experiments to run!

In [7]:
tasks_df = tasks_to_dataframe(config, samples)

print(f"🎯 Total: {len(tasks_df)} tasks")
print(f"⏱️  Estimated time: {tasks_df.attrs['estimated_time_min']:.1f} minutes")

# Show sample tasks
tasks_df[['prompt_id', 'model', 'temperature', 'text']].head()

🎯 Total: 120 tasks
⏱️  Estimated time: 1.0 minutes


Unnamed: 0,prompt_id,model,temperature,text
0,sentiment_basic,gpt-4o-mini,0.0,"Some films just simply should not be remade. This is one of them. In and of itself it is not a bad film. But it fails to capture the flavor and the terror of the 1963 film of the same title. Liam Neeson was excellent as he always is, and most of the cast holds up, with the exception of Owen Wilson, who just did not bring the right feel to the character of Luke. But the major fault with this version is that it strayed too far from the Shirley Jackson story in it's attempts to be grandiose and lost some of the thrill of the earlier film in a trade off for snazzier special effects. Again I will say that in and of itself it is not a bad film. But you will enjoy the friction of terror in the older version much more."
1,sentiment_basic,gpt-4o-mini,0.0,"""Ardh Satya"" is one of the finest film ever made in Indian Cinema. Directed by the great director Govind Nihalani, this one is the most successful Hard Hitting Parallel Cinema which also turned out to be a Commercial Success. Even today, Ardh Satya is an inspiration for all leading directors of India.<br /><br />The film tells the Real-life Scenario of Mumbai Police of the 70s. Unlike any Police of other cities in India, Mumbai Police encompasses a Different system altogether. Govind Nihalani creates a very practical Outlay with real life approach of Mumbai Police Environment.<br /><br />Amongst various Police officers & colleagues, the film describes the story of Anand Velankar, a young hot-blooded Cop coming from a poor family. His father is a harsh Police Constable. Anand himself suffers from his father's ideologies & incidences of his father's Atrocities on his mother. Anand's approach towards immediate action against crime, is an inert craving for his own Job satisfaction. The film is here revolved in a Plot wherein Anand's constant efforts against crime are trampled by his seniors.This leads to frustrations, as he cannot achieve the desired Job-satisfaction. Resulting from the frustrations, his anger is expressed in excessive violence in the remand rooms & bars, also turning him to an alcoholic.<br /><br />The Spirit within him is still alive, as he constantly fights the system. He is aware of the system of the Metro, where the Police & Politicians are a inertly associated by far end. His compromise towards unethical practice is negative. Finally he gets suspended.<br /><br />The Direction is a master piece & thoroughly hard core. One of the best memorable scenes is when Anand breaks in the Underworld gangster Rama Shetty's house to arrest him, followed by short conversation which is fantastic. At many scenes, the film has Hair-raising moments.<br /><br />The Practical approach of Script is a major Punch. Alcoholism, Corruption, Political Influence, Courage, Deceptions all are integral part of Mumbai police even today. Those aspects are dealt brilliantly.<br /><br />Finally, the films belongs to the One man show, Om Puri portraying Anand Velankar traversing through all his emotions absolutely brilliantly."
2,sentiment_basic,gpt-4o-mini,0.0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side."
3,sentiment_basic,gpt-4o-mini,0.0,"Return to the 36th Chamber is one of those classic Kung-Fu movies which Shaw produces back in the 70s and 80s, whose genre is equivalent to the spaghetti westerns of Hollywood, and the protagonist Gordon Liu, the counterpart to the western's Clint Eastwood. Digitally remastered and a new print made for the Fantastic Film Fest, this is ""Presented in Shaw Scope"", just like the good old days.<br /><br />This film is a simple story of good versus evil, told in 3 acts, which more or less sums up the narrative of martial arts films in that era.<br /><br />Act One sets up the premise. Workers in a dye-mill of a small village are unhappy with their lot, having their wages cut by 20% by incoming manchu gangsters. They can't do much about their exploitation because none of them are martial arts skilled to take on the gangsters, and their boss. At first they had a minor success in getting Liu to impersonate a highly skilled Shaolin monk (one of the best comedy sequences), but their rouse got exposed when they pushed the limit of credibility by impersonating one too many times.<br /><br />Act Two shows the protagonist wanting to get back at the mob. However, without real martial arts, he embarks on a journey to Shaolin Temple, to try and infiltrate and learn martial arts on the sly. After some slapstick moments, he finally gets accepted by the abbot (whom he impersonated!) but is disappointed at the teaching methods - kinda like Mr Miyagi's style in Karate Kid, but instead of painting fences, he gets to erect scaffoldings all around the temple. Nothing can keep a good man down, and he unwittingly builds strength, endurance and learns kung-fu the unorthodox way.<br /><br />Act Three is where the fight fest begins. With cheesy sound effects, each obvious non-contact on film is given the maximum impact treatment. But it is rather refreshing watching the fight scenes here, with its wide angled shots to highlight clarity and detail between the sparring partners, and the use of slow-motion only to showcase stunts in different angles. You may find the speed of fights a tad too slow, with some pause in between moves, but with Yuen Wo Ping and his style being used ad-nausem in Hollywood flicks, they sure don't make fight scenes like they used to! Return to the 36th chamber gets a repeat screening on Monday, so, if you're game for a nostalgic trip down memory lane, what are you waiting for?"
4,sentiment_basic,gpt-4o-mini,0.0,"What an absolutely stunning movie, if you have 2.5 hrs to kill, watch it, you won't regret it, it's too much fun! Rajnikanth carries the movie on his shoulders and although there isn't anything more other than him, I still liked it. The music by A.R.Rehman takes time to grow on you but after you heard it a few times, you really start liking it."


## 🚀 Step 7: Run All Prompt Tasks

This is where the magic happens! 
- Makes API calls to test each prompt
- Uses parallel processing to go faster
- Shows real-time progress
- Handles errors gracefully

⚠️ This will cost money (API calls) and take a few minutes!

In [8]:
print("🚀 Starting prompt execution...")
print("💰 Note: This will make API calls and cost money!")

# Convert DataFrame to tasks list for execution
tasks = tasks_df.to_dict('records')
results = run_all_tasks(tasks, config, max_workers=4)

print(f"\n🎉 Completed {len(results)} tasks!")

# Show sample results
results_df = results_summary_dataframe(results)
print(f"✅ Success rate: {results_df.attrs['success_rate']:.1f}%")

results_df[['prompt_id', 'model', 'temperature', 'true_label', 'prediction']].head()

🚀 Starting prompt execution...
💰 Note: This will make API calls and cost money!
Running 120 tasks with 4 parallel workers...
Progress: 120/120 (100%)
✓ Completed all 120 tasks

🎉 Completed 120 tasks!
✅ Success rate: 100.0%


Unnamed: 0,prompt_id,model,temperature,true_label,prediction
0,sentiment_basic,gpt-4o-mini,0.0,positive,positive
1,sentiment_basic,gpt-4o-mini,0.0,positive,negative
2,sentiment_basic,gpt-4o-mini,0.0,positive,positive
3,sentiment_basic,gpt-4o-mini,0.0,positive,positive
4,sentiment_basic,gpt-4o-mini,0.0,positive,positive


## 📈 Step 8: Quick Analysis

This gives you a fast overview of how well each combination performed.
Look for patterns:
- Which prompts work better?
- Does temperature matter?
- Are there differences between models?

In [9]:
create_results_table(results)

Prompt Testing Results,Prompt Testing Results,Prompt Testing Results,Prompt Testing Results,Prompt Testing Results,Prompt Testing Results
12 combinations tested,12 combinations tested,12 combinations tested,12 combinations tested,12 combinations tested,12 combinations tested
Prompt,Model,Temperature,Accuracy,Correct,Total
sentiment_bad,claude-3-haiku-20240307,0.0,90.0%,9,10
sentiment_bad,claude-3-haiku-20240307,0.7,90.0%,9,10
sentiment_bad,gpt-4o-mini,0.0,90.0%,9,10
sentiment_bad,gpt-4o-mini,0.7,90.0%,9,10
sentiment_basic,claude-3-haiku-20240307,0.0,90.0%,9,10
sentiment_basic,claude-3-haiku-20240307,0.7,90.0%,9,10
sentiment_basic,gpt-4o-mini,0.0,90.0%,9,10
sentiment_basic,gpt-4o-mini,0.7,90.0%,9,10
sentiment_instruction,claude-3-haiku-20240307,0.0,90.0%,9,10
sentiment_instruction,claude-3-haiku-20240307,0.7,90.0%,9,10


## 💾 Step 9: Optional: Save Results to Files

This saves your results as a CSV file.
Files are timestamped so you won't overwrite previous experiments.

In [10]:
file_path = save_results(results, output_path="experiment_results")