# Export and Reuse Optimized Prompts

This notebook covers exporting optimized prompts for reuse in production. Learn how to:

1. Export as Karenina verification presets
2. Export as lightweight JSON files
3. Load saved prompts
4. Generate comparison reports
5. Use optimized prompts in verification

---

## Setup

In [None]:
import json
import sys
import tempfile
from pathlib import Path

sys.path.insert(0, str(Path.cwd().parent.parent.parent / "src"))

from karenina import Benchmark
from karenina.integrations.gepa import (
    OptimizationTarget,
    export_comparison_report,
    export_prompts_json,
    export_to_preset,
    load_prompts_json,
)
from karenina.schemas import ModelConfig, VerificationConfig

# Create temp directory for exports
temp_dir = Path(tempfile.mkdtemp(prefix="gepa_export_"))
print(f"Using temp directory: {temp_dir}")

In [None]:
# Load benchmark
benchmark_path = Path.home() / "Projects/karenina-monorepo/local_data/data/checkpoints/aime_2025.jsonld"
benchmark = Benchmark.load(benchmark_path)
print(f"Loaded: {benchmark.name}")

---

## Sample Optimized Prompts

Let's create sample optimized prompts (as would result from GEPA optimization):

In [None]:
# Simulated optimization result
optimized_prompts = {
    "answering_system_prompt": """You are an expert competition mathematician specializing in AIME problems.

IMPORTANT GUIDELINES:
1. AIME answers are ALWAYS integers from 0 to 999
2. Show complete step-by-step reasoning
3. Verify your answer by checking edge cases
4. State your final answer clearly: "The answer is [N]"

Solve the following problem systematically:""",
    "parsing_instructions": """Extract the final integer answer from the response.

Look for patterns like:
- "The answer is [N]"
- "Therefore, [N]"
- Boxed answers: \\boxed{N}
- Final integer mentioned at the end

Return only the integer value (0-999).""",
}

print("Optimized prompts:")
for key, value in optimized_prompts.items():
    print(f"\n{key}:")
    print(f"  {value[:100]}...")

---

## export_to_preset(): Full Verification Preset

Export optimized prompts as a complete Karenina verification preset.

In [None]:
# Create base verification config
base_config = VerificationConfig(
    answering_models=[
        ModelConfig(
            id="claude-haiku",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
            system_prompt="PLACEHOLDER",  # Will be replaced
        )
    ],
    parsing_models=[
        ModelConfig(
            id="parser",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
        )
    ],
    evaluation_mode="template_only",
    replicate_count=1,
)

print("Base config created")

In [None]:
# Export as preset
preset_path = export_to_preset(
    optimized_prompts=optimized_prompts,
    base_config=base_config,
    output_path=temp_dir / "aime_optimized.json",
    targets=[
        OptimizationTarget.ANSWERING_SYSTEM_PROMPT,
        OptimizationTarget.PARSING_INSTRUCTIONS,
    ],
)

print(f"Exported preset to: {preset_path}")
print(f"File size: {preset_path.stat().st_size} bytes")

In [None]:
# View the preset content
with open(preset_path) as f:
    preset_data = json.load(f)

print("Preset structure:")
print(f"  Keys: {list(preset_data.keys())}")

# Check the injected system prompt
answering_models = preset_data.get("answering_models", [])
if answering_models:
    system_prompt = answering_models[0].get("system_prompt", "")
    print("\nInjected system prompt:")
    print(f"  {system_prompt[:100]}...")

# Check GEPA metadata
gepa_meta = preset_data.get("_gepa_optimization", {})
print("\nGEPA metadata:")
print(f"  Exported at: {gepa_meta.get('exported_at')}")
print(f"  Targets: {gepa_meta.get('targets')}")

### Using the Preset with CLI

The exported preset can be used with the Karenina CLI:

In [None]:
print("""
# Use the preset with karenina CLI:

karenina verify aime_2025.jsonld --preset aime_optimized.json

# Or with additional options:
karenina verify aime_2025.jsonld \\
    --preset aime_optimized.json \\
    --output results.json \\
    --format json
""")

---

## export_prompts_json(): Lightweight Export

Export just the prompts with metadata (smaller, more portable).

In [None]:
# Export as lightweight JSON
metadata = {
    "benchmark": "AIME 2025",
    "train_score": 0.80,
    "val_score": 0.75,
    "test_score": 0.72,
    "improvement": 0.25,
    "reflection_model": "anthropic/claude-haiku-4-5",
    "total_generations": 15,
}

prompts_path = export_prompts_json(
    optimized_prompts=optimized_prompts,
    metadata=metadata,
    output_path=temp_dir / "optimized_prompts.json",
)

print(f"Exported prompts to: {prompts_path}")
print(f"File size: {prompts_path.stat().st_size} bytes")

In [None]:
# View the prompts file
with open(prompts_path) as f:
    prompts_data = json.load(f)

print("Prompts file structure:")
print(json.dumps(prompts_data, indent=2)[:500])

---

## load_prompts_json(): Loading Saved Prompts

Load prompts from a saved file for reuse.

In [None]:
# Load prompts
loaded_prompts, loaded_metadata = load_prompts_json(prompts_path)

print("Loaded prompts:")
for key in loaded_prompts:
    print(f"  - {key}")

print("\nLoaded metadata:")
for key, value in loaded_metadata.items():
    print(f"  {key}: {value}")

In [None]:
# Use loaded prompts in a new verification config
new_config = VerificationConfig(
    answering_models=[
        ModelConfig(
            id="claude-haiku",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
            system_prompt=loaded_prompts["answering_system_prompt"],  # Use loaded prompt
        )
    ],
    parsing_models=[
        ModelConfig(
            id="parser",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
        )
    ],
    evaluation_mode="template_only",
    replicate_count=1,
)

print("Created new config with loaded prompts")
print(f"  System prompt: {new_config.answering_models[0].system_prompt[:50]}...")

---

## export_comparison_report(): Multi-Run Reports

Generate comparison reports across multiple optimization runs.

In [None]:
# Simulate multiple runs
runs_data = [
    {
        "run_id": "run_001",
        "benchmark_name": "AIME 2025",
        "targets": ["answering_system_prompt"],
        "train_score": 0.70,
        "val_score": 0.65,
        "improvement": 0.08,
        "reflection_model": "claude-haiku",
        "metric_calls": 50,
    },
    {
        "run_id": "run_002",
        "benchmark_name": "AIME 2025",
        "targets": ["answering_system_prompt"],
        "train_score": 0.75,
        "val_score": 0.70,
        "improvement": 0.17,
        "reflection_model": "claude-haiku",
        "metric_calls": 75,
    },
    {
        "run_id": "run_003",
        "benchmark_name": "AIME 2025",
        "targets": ["answering_system_prompt", "parsing_instructions"],
        "train_score": 0.80,
        "val_score": 0.75,
        "improvement": 0.25,
        "reflection_model": "claude-sonnet",
        "metric_calls": 100,
    },
]

# Export comparison report
report_path = export_comparison_report(
    runs=runs_data,
    output_path=temp_dir / "comparison_report.json",
)

print(f"Exported comparison report to: {report_path}")

In [None]:
# View the report
with open(report_path) as f:
    report = json.load(f)

print("Comparison Report:")
print(f"  Generated at: {report['generated_at']}")
print(f"  Number of runs: {report['num_runs']}")

print("\nSummary:")
for key, value in report["summary"].items():
    if isinstance(value, float):
        print(f"  {key}: {value:.2%}")
    else:
        print(f"  {key}: {value}")

---

## Using Optimized Prompts in Verification

Complete example of loading and using optimized prompts:

In [None]:
# Load optimized prompts
prompts, meta = load_prompts_json(prompts_path)

print(f"Loaded prompts from run with {meta['improvement']:.2%} improvement")

# Create verification config
optimized_config = VerificationConfig(
    answering_models=[
        ModelConfig(
            id="claude-haiku",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
            system_prompt=prompts["answering_system_prompt"],
        )
    ],
    parsing_models=[
        ModelConfig(
            id="parser",
            model_provider="anthropic",
            model_name="claude-haiku-4-5",
            temperature=0.0,
            interface="langchain",
        )
    ],
    evaluation_mode="template_only",
    replicate_count=1,
)

# Run verification
print("\nRunning verification with optimized prompts...")
question_ids = benchmark.get_question_ids()[:3]
results = benchmark.run_verification(optimized_config, question_ids=question_ids)

# Show results
passed = sum(1 for r in results.results if r.template and r.template.verify_result)
print(f"\nResults: {passed}/{len(results.results)} passed")

---

## Cleanup

In [None]:
# List exported files
print("Exported files:")
for f in temp_dir.iterdir():
    print(f"  {f.name}: {f.stat().st_size} bytes")

In [None]:
# Clean up
import shutil

shutil.rmtree(temp_dir, ignore_errors=True)
print(f"Cleaned up: {temp_dir}")

---

## Summary

| Function | Output | Use Case |
|----------|--------|----------|
| `export_to_preset()` | Full VerificationConfig JSON | CLI usage with `--preset` |
| `export_prompts_json()` | Prompts + metadata JSON | Portable, lightweight sharing |
| `load_prompts_json()` | (prompts, metadata) tuple | Load for programmatic use |
| `export_comparison_report()` | Multi-run analysis JSON | Compare optimization strategies |

## Next Steps

- [09_full_optimization_workflow.ipynb](09_full_optimization_workflow.ipynb) - Complete end-to-end example