# OpenAI DPO Fine-tuning

This notebook implements the fine-tuning process for OpenAI 4o-mini using DPO (Direct Preference Optimization) with your preference data.

## Process Overview
1. **Setup** OpenAI client and authentication
2. **Extract** transformed data from `dpo_training_data` table
3. **Format** data for OpenAI DPO training
4. **Upload** training file to OpenAI
5. **Start** DPO fine-tuning job
6. **Monitor** training progress
7. **Deploy** the fine-tuned model


## 1. Setup and Dependencies


In [1]:
# Import required libraries
import os
import json
import time
from datetime import datetime
from typing import List, Dict, Any
import warnings
warnings.filterwarnings('ignore')

# OpenAI
import openai
from openai import OpenAI

# Database
from supabase import create_client, Client

# Environment
from dotenv import load_dotenv

# Load environment variables
load_dotenv('.env.local')

print("✅ All libraries imported successfully")


✅ All libraries imported successfully


## 2. Initialize OpenAI Client


In [2]:
# Initialize OpenAI client
def initialize_openai():
    """Initialize OpenAI client using environment variables"""
    api_key = os.getenv("OPENAI_API_KEY")
    
    if not api_key:
        raise ValueError("Missing OPENAI_API_KEY in .env.local")
    
    client = OpenAI(api_key=api_key)
    return client

# Initialize OpenAI connection
try:
    openai_client = initialize_openai()
    print("✅ OpenAI client initialized successfully")
    
    # Test the connection
    models = openai_client.models.list()
    print(f"✅ Connection verified - {len(models.data)} models available")
    
except Exception as e:
    print(f"❌ Error initializing OpenAI: {e}")
    print("Please check your OPENAI_API_KEY in .env.local")


✅ OpenAI client initialized successfully
✅ Connection verified - 86 models available


## 3. Initialize Supabase Connection


In [3]:
# Initialize Supabase client
def initialize_supabase():
    """Initialize Supabase client using environment variables"""
    url = os.getenv("NEXT_PUBLIC_SUPABASE_URL")
    key = os.getenv("SUPABASE_SERVICE_ROLE_KEY")
    
    if not url or not key:
        raise ValueError("Missing Supabase credentials in .env.local")
    
    supabase: Client = create_client(url, key)
    return supabase

# Initialize Supabase connection
try:
    supabase = initialize_supabase()
    print("✅ Supabase client initialized successfully")
except Exception as e:
    print(f"❌ Error initializing Supabase: {e}")


✅ Supabase client initialized successfully


## 4. Extract DPO Training Data


In [4]:
# Extract DPO training data from Supabase
def extract_dpo_training_data(limit=None):
    """
    Extract DPO training data from the dpo_training_data table
    
    Args:
        limit: Optional limit for number of records to extract
    
    Returns:
        List of DPO training records
    """
    try:
        query = supabase.table('dpo_training_data').select('*')
        
        if limit:
            query = query.limit(limit)
        
        response = query.execute()
        
        if response.data:
            print(f"✅ Extracted {len(response.data)} DPO training records")
            return response.data
        else:
            print("⚠️ No data found in dpo_training_data table")
            print("Please run the ETL pipeline first to populate the table")
            return []
            
    except Exception as e:
        print(f"❌ Error extracting DPO training data: {e}")
        return []

# Extract the data
print("📊 Extracting DPO training data...")
dpo_data = extract_dpo_training_data()

# Show sample data structure
if dpo_data:
    print(f"\n📝 Sample DPO training record:")
    sample = dpo_data[0]
    for key, value in sample.items():
        if key == 'input_message':
            print(f"  {key}: {value[:100]}... (length: {len(value)})")
        else:
            print(f"  {key}: {value[:100]}{'...' if len(value) > 100 else ''}")
else:
    print("❌ No DPO training data available")


📊 Extracting DPO training data...
✅ Extracted 46 DPO training records

📝 Sample DPO training record:
  uuid: 8e99580f-45ee-432d-93c2-55200a6a67f4
  input_message: Jun 28, 2018 ... Engineering CRISPR/Cpf1 with tRNA promotes genome ... molecules enhance CRISPR/Cas9... (length: 160)
  preferred_output: How to create state-of-the-art genetic model systems: strategies for …
This article outlines practic...
  non_preferred_output: Summary of Synthetic Biology Article


Summary of "How to create state-of-the-art genetic model syst...
  created_at: 2025-09-12T22:28:49.699077+00:00


## 5. Format Data for OpenAI DPO


In [5]:
# Format data for OpenAI DPO training
# 
# IMPORTANT: Differences from standard fine-tuning:
# 1. We're using OpenAI SDK (openai_client) - YES, correct
# 2. Hyperparameters: DPO uses "n_epochs": "auto" (not learning_rate_multiplier)
# 3. Data format: We only need preferred responses, not both preferred/rejected
# 4. OpenAI DPO handles preference optimization internally
def format_dpo_training_data(dpo_records):
    """
    Format DPO training data for OpenAI fine-tuning
    
    Creates the exact format required by OpenAI's DPO fine-tuning API.
    Each JSONL line contains BOTH preferred and non-preferred outputs.
    
    Args:
        dpo_records: List of DPO training records from Supabase
    
    Returns:
        List of formatted training examples for OpenAI DPO
    """
    formatted_examples = []
    
    for record in dpo_records:
        # Create DPO training example with BOTH preferred and non-preferred outputs
        dpo_example = {
            "input": {
                "messages": [
                    {
                        "role": "system",
                        "content": "You are an expert science educator who specializes in making complex scientific concepts accessible to college students. Your audience is a College sophomore with a good foundation in basic biology. Create a concise summary of this synthetic biology article with the following requirements:\n\n1. Use simple, clear language\n2. Explain complex terms when they appear\n3. Focus on the main findings and their importance\n4. Make it engaging and interesting\n5. Keep it concise (2-3 paragraphs)\n6. Content of your response will be used in an email newsletter\n\nPlease provide a simplified explanation that maintains scientific accuracy while being accessible to a college sophomore."
                    },
                    {
                        "role": "user",
                        "content": record['input_message']
                    }
                ],
                "tools": [],
                "parallel_tool_calls": True
            },
            "preferred_output": [
                {
                    "role": "assistant",
                    "content": record['preferred_output']
                }
            ],
            "non_preferred_output": [
                {
                    "role": "assistant",
                    "content": record['non_preferred_output']
                }
            ]
        }
        formatted_examples.append(dpo_example)
    
    return formatted_examples

def save_training_file(formatted_examples, filename=None):
    """
    Save formatted examples to JSONL file for OpenAI upload
    
    Args:
        formatted_examples: List of formatted training examples
        filename: Optional custom filename (if None, auto-generates with timestamp)
    
    Returns:
        Path to saved file
    """
    try:
        import os
        from datetime import datetime
        
        # Create TrainingData directory if it doesn't exist
        training_dir = "TrainingData"
        os.makedirs(training_dir, exist_ok=True)
        
        # Generate filename with timestamp if not provided
        if filename is None:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"dpo_training_data_{timestamp}.jsonl"
        
        # Ensure filename has .jsonl extension
        if not filename.endswith('.jsonl'):
            filename += '.jsonl'
        
        # Create full path
        file_path = os.path.join(training_dir, filename)
        
        with open(file_path, 'w', encoding='utf-8') as f:
            for example in formatted_examples:
                f.write(json.dumps(example) + '\n')
        
        print(f"✅ Training file saved: {file_path}")
        print(f"📊 Total examples: {len(formatted_examples)}")
        print(f"📁 Directory: {os.path.abspath(training_dir)}")
        return file_path
        
    except Exception as e:
        print(f"❌ Error saving training file: {e}")
        return None

def list_training_files():
    """
    List all existing training files in the TrainingData directory
    
    Returns:
        List of training file paths
    """
    import os
    import glob
    
    training_dir = "TrainingData"
    if not os.path.exists(training_dir):
        print(f"📁 TrainingData directory doesn't exist yet")
        return []
    
    # Find all .jsonl files in TrainingData directory
    pattern = os.path.join(training_dir, "*.jsonl")
    training_files = glob.glob(pattern)
    
    if training_files:
        print(f"📁 Found {len(training_files)} training file(s) in TrainingData:")
        for i, file_path in enumerate(sorted(training_files), 1):
            file_size = os.path.getsize(file_path)
            print(f"  {i}. {os.path.basename(file_path)} ({file_size:,} bytes)")
    else:
        print(f"📁 No training files found in TrainingData directory")
    
    return training_files

# Format the data for OpenAI DPO fine-tuning
if dpo_data:
    print("🔄 Formatting data for OpenAI DPO fine-tuning...")
    formatted_examples = format_dpo_training_data(dpo_data)
    
    print(f"✅ Formatted {len(formatted_examples)} training examples")
    print(f"📊 Training pairs: {len(dpo_data)}")
    
    # Save to file with timestamp
    training_file = save_training_file(formatted_examples)
    
    # Show sample formatted example
    if formatted_examples:
        print(f"\n📝 Sample DPO training example:")
        sample = formatted_examples[0]
        print(json.dumps(sample, indent=2)[:500] + "...")
        
        print(f"\n🔍 DPO format structure:")
        print(f"  - input.messages: Contains system + user messages")
        print(f"  - input.tools: Empty array []")
        print(f"  - input.parallel_tool_calls: True")
        print(f"  - preferred_output: Array with assistant message")
        print(f"  - non_preferred_output: Array with assistant message")
        
    # List all training files
    print(f"\n📁 All training files:")
    list_training_files()
else:
    print("❌ No data to format")


🔄 Formatting data for OpenAI DPO fine-tuning...
✅ Formatted 46 training examples
📊 Training pairs: 46
✅ Training file saved: TrainingData/dpo_training_data_20250912_185907.jsonl
📊 Total examples: 46
📁 Directory: /Users/ashtekar/SourceCode/FineTuning-BioSummary/TrainingData

📝 Sample DPO training example:
{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "You are an expert science educator who specializes in making complex scientific concepts accessible to college students. Your audience is a College sophomore with a good foundation in basic biology. Create a concise summary of this synthetic biology article with the following requirements:\n\n1. Use simple, clear language\n2. Explain complex terms when they appear\n3. Focus on the main findings and their impor...

🔍 DPO format structure:
  - input.messages: Contains system + user messages
  - input.tools: Empty array []
  - input.parallel_tool_calls: True
  - preferred_output: Array with assistant me

In [6]:
# List all existing training files in TrainingData directory
print("📁 Checking existing training files...")
existing_files = list_training_files()


📁 Checking existing training files...
📁 Found 1 training file(s) in TrainingData:
  1. dpo_training_data_20250912_185907.jsonl (177,280 bytes)


## 6. Upload Training File to OpenAI & start the fine tuning


In [13]:
# Upload training file to OpenAI
def upload_and_start_dpo_finetuning(file_path, model="gpt-4.1-nano-2025-04-14"):
    """
    Upload training file and start DPO fine-tuning job in one step
    
    Based on OpenAI documentation example:
    https://platform.openai.com/docs/guides/direct-preference-optimization
    
    Args:
        file_path: Path to the JSONL training file
        model: Base model to fine-tune (default: gpt-4o-mini)
    
    Returns:
        Fine-tuning job object
    """
    try:
        print(f"📤 Uploading training file: {file_path}")
        
        # Upload the training file
        with open(file_path, 'rb') as f:
            file_response = openai_client.files.create(
                file=f,
                purpose='fine-tune'
            )
        
        file_id = file_response.id
        print(f"✅ Training file uploaded successfully")
        print(f"📋 File ID: {file_id}")
        
        print(f"🚀 Starting DPO fine-tuning job...")
        print(f"📋 Model: {model}")
        
        # Start the DPO fine-tuning job using the new format
        job_response = openai_client.fine_tuning.jobs.create(
            training_file=file_id,
            model=model,
            method={
                "type": "dpo",
                "dpo": {
                    "hyperparameters": {"beta": 0.1}
                }
            }
        )
        
        job_id = job_response.id
        print(f"✅ DPO fine-tuning job started successfully")
        print(f"📋 Job ID: {job_id}")
        print(f"📊 Status: {job_response.status}")
        
        return job_response
        
    except Exception as e:
        print(f"❌ Error uploading file and starting fine-tuning: {e}")
        return None

# Upload training file and start DPO fine-tuning job
if 'training_file' in locals() and training_file:
    print("🚀 Uploading training file and starting DPO fine-tuning...")
    finetuning_job = upload_and_start_dpo_finetuning(training_file)
    
    if finetuning_job:
        print(f"\n🎉 DPO fine-tuning job initiated!")
        print(f"📋 Job ID: {finetuning_job.id}")
        print(f"📊 Status: {finetuning_job.status}")
        print(f"\n⏳ Training will take some time to complete...")
        print(f"📱 You can monitor progress in the OpenAI dashboard or run the monitoring cell below")
    else:
        print("❌ Failed to upload file and start fine-tuning job")
else:
    print("❌ No training file available to upload and start fine-tuning")


🚀 Uploading training file and starting DPO fine-tuning...
📤 Uploading training file: TrainingData/dpo_training_data_20250912_185907.jsonl
✅ Training file uploaded successfully
📋 File ID: file-3APfFWrkPtdwFQqBpRXwsi
🚀 Starting DPO fine-tuning job...
📋 Model: gpt-4.1-nano-2025-04-14
✅ DPO fine-tuning job started successfully
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: validating_files

🎉 DPO fine-tuning job initiated!
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: validating_files

⏳ Training will take some time to complete...
📱 You can monitor progress in the OpenAI dashboard or run the monitoring cell below


In [19]:
# Enhanced DPO Training Metrics
def calculate_training_metrics(response, start_time=None):
    """
    Calculate enhanced DPO training metrics
    
    Args:
        response: OpenAI fine-tuning job response
        start_time: Job start time for duration calculation
    
    Returns:
        Dictionary of calculated metrics
    """
    import time
    from datetime import datetime
    
    metrics = {}
    
    # Training Loss (DPO-specific)
    if hasattr(response, 'training_loss'):
        metrics['training_loss'] = response.training_loss
    elif hasattr(response, 'result_files'):
        # If training loss is in result files, we'd need to download and parse
        metrics['training_loss'] = "Available in result files"
    
    # Preference Accuracy (DPO-specific)
    if hasattr(response, 'preference_accuracy'):
        metrics['preference_accuracy'] = response.preference_accuracy
    else:
        metrics['preference_accuracy'] = "Calculated post-training"
    
    # Cost Estimation
    if hasattr(response, 'trained_tokens') and response.trained_tokens:
        # OpenAI DPO pricing (approximate)
        dpo_cost_per_1k_tokens = 0.008  # $8 per 1M tokens for DPO
        estimated_cost = (response.trained_tokens / 1000) * dpo_cost_per_1k_tokens
        metrics['estimated_cost'] = f"${estimated_cost:.4f}"
    else:
        metrics['estimated_cost'] = "Calculating..."
    
    # Validation Loss
    if hasattr(response, 'validation_loss'):
        metrics['validation_loss'] = response.validation_loss
    else:
        metrics['validation_loss'] = "Not available during training"
    
    # Preference Learning Rate
    if hasattr(response, 'learning_rate'):
        metrics['preference_learning_rate'] = response.learning_rate
    elif hasattr(response, 'hyperparameters'):
        metrics['preference_learning_rate'] = getattr(response.hyperparameters, 'learning_rate', 'Auto')
    else:
        metrics['preference_learning_rate'] = "Auto (DPO default)"
    
    # Training Duration
    if start_time and hasattr(response, 'finished_at') and response.finished_at:
        duration = response.finished_at - start_time
        metrics['training_duration'] = f"{duration:.0f} seconds"
    elif start_time:
        current_time = time.time()
        duration = current_time - start_time
        metrics['training_duration'] = f"{duration:.0f} seconds (ongoing)"
    
    return metrics

def monitor_finetuning_job(job_id, start_time=None):
    """
    Monitor the progress of a DPO fine-tuning job with enhanced metrics
    
    Args:
        job_id: ID of the fine-tuning job
        start_time: Job start time for duration calculation
    
    Returns:
        Job status and details
    """
    try:
        print(f"📊 Checking DPO fine-tuning job status...")
        print(f"📋 Job ID: {job_id}")
        
        # Get job status
        response = openai_client.fine_tuning.jobs.retrieve(job_id)
        
        status = response.status
        print(f"📊 Status: {status}")
        
        # Calculate enhanced metrics
        metrics = calculate_training_metrics(response, start_time)
        
        # Display metrics
        print(f"\n📈 DPO Training Metrics:")
        print(f"  🔥 Training Loss: {metrics.get('training_loss', 'In progress...')}")
        print(f"  🎯 Preference Accuracy: {metrics.get('preference_accuracy', 'Calculating...')}")
        print(f"  💰 Estimated Cost: {metrics.get('estimated_cost', 'Calculating...')}")
        print(f"  📊 Validation Loss: {metrics.get('validation_loss', 'Not available')}")
        print(f"  📚 Learning Rate: {metrics.get('preference_learning_rate', 'Auto')}")
        
        if 'training_duration' in metrics:
            print(f"  ⏱️ Training Duration: {metrics['training_duration']}")
        
        if hasattr(response, 'fine_tuned_model') and response.fine_tuned_model:
            print(f"\n✅ Fine-tuned model: {response.fine_tuned_model}")
        
        if hasattr(response, 'error') and response.error:
            print(f"\n❌ Error: {response.error}")
        
        return response
        
    except Exception as e:
        print(f"❌ Error monitoring job: {e}")
        return None

print("✅ Enhanced DPO monitoring functions created!")
print("📊 New metrics: Training Loss, Preference Accuracy, Cost Estimation, Validation Loss, Learning Rate")
print("🚫 Removed: Trained tokens, Completion time")


✅ Enhanced DPO monitoring functions created!
📊 New metrics: Training Loss, Preference Accuracy, Cost Estimation, Validation Loss, Learning Rate
🚫 Removed: Trained tokens, Completion time


In [15]:
# Test the fine-tuned model
def test_finetuned_model(model_name, test_prompt="Please summarize this article about machine learning."):
    """
    Test the fine-tuned model with a sample prompt
    
    Args:
        model_name: Name of the fine-tuned model
        test_prompt: Test prompt to send to the model
    
    Returns:
        Model response
    """
    try:
        print(f"🧪 Testing fine-tuned model: {model_name}")
        print(f"📝 Test prompt: {test_prompt}")
        
        response = openai_client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "user", "content": test_prompt}
            ],
            max_tokens=500,
            temperature=0.7
        )
        
        model_response = response.choices[0].message.content
        print(f"✅ Model response:")
        print(f"{model_response}")
        
        return model_response
        
    except Exception as e:
        print(f"❌ Error testing model: {e}")
        return None

def compare_models(original_model, finetuned_model, test_prompt):
    """
    Compare responses from original and fine-tuned models
    
    Args:
        original_model: Original model name (e.g., 'gpt-4o-mini')
        finetuned_model: Fine-tuned model name
        test_prompt: Test prompt for comparison
    """
    print(f"🔍 Comparing model responses...")
    print(f"📝 Test prompt: {test_prompt}")
    
    # Test original model
    print(f"\n🤖 Original Model ({original_model}):")
    original_response = test_finetuned_model(original_model, test_prompt)
    
    # Test fine-tuned model
    print(f"\n🎯 Fine-tuned Model ({finetuned_model}):")
    finetuned_response = test_finetuned_model(finetuned_model, test_prompt)
    
    return {
        'original': original_response,
        'finetuned': finetuned_response
    }

# Test the model (run this after fine-tuning completes)
if 'finetuning_job' in locals() and finetuning_job:
    # Get the fine-tuned model name from the job
    job_status = monitor_finetuning_job(finetuning_job.id)
    
    if hasattr(job_status, 'fine_tuned_model') and job_status.fine_tuned_model:
        model_name = job_status.fine_tuned_model
        print(f"\n🎯 Testing fine-tuned model: {model_name}")
        
        # Test with a sample prompt (formatted as user message)
        test_prompt = {
            "role": "user",
            "content": "You are an expert science educator who specializes in making complex scientific concepts accessible to college students. Your audience is a College sophomore with a good foundation in basic biology. Create a concise summary of this synthetic biology article with the following requirements:\n\n1. Use simple, clear language\n2. Explain complex terms when they appear\n3. Focus on the main findings and their importance\n4. Make it engaging and interesting\n5. Keep it concise (2-3 paragraphs)\n6. Content of your response will be used in an email newsletter\n\nPlease provide a simplified explanation that maintains scientific accuracy while being accessible to a college sophomore."
        }
        
        # Test the model with proper message format
        response = openai_client.chat.completions.create(
            model=model_name,
            messages=[test_prompt],
            max_tokens=500,
            temperature=0.7
        )
        
        print(f"✅ Fine-tuned model response:")
        print(f"{response.choices[0].message.content}")
        
        print(f"\n🔍 To compare with original model, run:")
        print(f"compare_models('gpt-4o-mini', '{model_name}', 'Your test prompt here')")
    else:
        print("❌ No fine-tuned model available yet")
        print("Please wait for the fine-tuning job to complete")
else:
    print("❌ No fine-tuning job to test")
    print("Please run the previous cells to start and complete a fine-tuning job")


📊 Checking DPO fine-tuning job status...
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: validating_files

📈 DPO Training Metrics:
  🔥 Training Loss: Available in result files
  🎯 Preference Accuracy: Calculated post-training
  💰 Estimated Cost: Calculating...
  📊 Validation Loss: Not available during training
  📚 Learning Rate: Auto

❌ Error: Error(code=None, message=None, param=None)
❌ No fine-tuned model available yet
Please wait for the fine-tuning job to complete


In [20]:
# Use monitoring for DPO fine-tuning jobs
if 'finetuning_job' in locals() and finetuning_job:
    job_id = finetuning_job.id
    print("📊 Current DPO job status with enhanced metrics:")
    monitor_finetuning_job(job_id)
    
    print(f"\n🔄 To wait for completion with monitoring, run:")
    print(f"wait_for_completion('{job_id}')")
else:
    print("❌ No fine-tuning job to monitor")
    print("Please run the previous cells to start a fine-tuning job first")


📊 Current DPO job status with enhanced metrics:
📊 Checking DPO fine-tuning job status...
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: running

📈 DPO Training Metrics:
  🔥 Training Loss: Available in result files
  🎯 Preference Accuracy: Calculated post-training
  💰 Estimated Cost: Calculating...
  📊 Validation Loss: Not available during training
  📚 Learning Rate: Auto

❌ Error: Error(code=None, message=None, param=None)

🔄 To wait for completion with monitoring, run:
wait_for_completion('ftjob-ZQWn425zo10RD80XraLFv1Ms')


In [21]:
# Wait for completion with DPO monitoring
def wait_for_completion(job_id, check_interval=60, start_time=None):
    """
    Wait for DPO fine-tuning job to complete with enhanced monitoring
    
    Args:
        job_id: ID of the fine-tuning job
        check_interval: Seconds between status checks
        start_time: Job start time for duration calculation
    """
    print(f"⏳ Waiting for DPO fine-tuning job to complete...")
    print(f"📋 Job ID: {job_id}")
    print(f"🔄 Checking every {check_interval} seconds with enhanced metrics")
    
    while True:
        job_status = monitor_finetuning_job(job_id, start_time)
        
        if not job_status:
            break
            
        if job_status.status == 'succeeded':
            print(f"\n🎉 DPO Fine-tuning completed successfully!")
            if hasattr(job_status, 'fine_tuned_model'):
                print(f"✅ Fine-tuned model: {job_status.fine_tuned_model}")
            break
        elif job_status.status == 'failed':
            print(f"\n❌ DPO Fine-tuning failed!")
            if hasattr(job_status, 'error'):
                print(f"Error: {job_status.error}")
            break
        elif job_status.status in ['cancelled', 'cancelling']:
            print(f"\n⚠️ DPO Fine-tuning was cancelled")
            break
        else:
            print(f"⏳ Status: {job_status.status} - waiting...")
            time.sleep(check_interval)

print("✅ Enhanced wait_for_completion function created!")
print("📊 Uses monitor_finetuning_job() with DPO-specific metrics")


✅ Enhanced wait_for_completion function created!
📊 Uses monitor_finetuning_job() with DPO-specific metrics


In [22]:
# IMPORTANT: Function Usage Guide
# ================================
# 
# ✅ USE THESE DPO FUNCTIONS:
# - monitor_finetuning_job(job_id, start_time=None)
# - wait_for_completion(job_id, check_interval=60, start_time=None)
# - calculate_training_metrics(response, start_time=None)
#
# 📊 DPO functions provide:
# - Training Loss, Preference Accuracy, Cost Estimation
# - Validation Loss, Learning Rate, Training Duration
# - DPO-specific metrics and better insights

wait_for_completion('ftjob-ZQWn425zo10RD80XraLFv1Ms')


⏳ Waiting for DPO fine-tuning job to complete...
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
🔄 Checking every 60 seconds with enhanced metrics
📊 Checking DPO fine-tuning job status...
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: running

📈 DPO Training Metrics:
  🔥 Training Loss: Available in result files
  🎯 Preference Accuracy: Calculated post-training
  💰 Estimated Cost: Calculating...
  📊 Validation Loss: Not available during training
  📚 Learning Rate: Auto

❌ Error: Error(code=None, message=None, param=None)
⏳ Status: running - waiting...
📊 Checking DPO fine-tuning job status...
📋 Job ID: ftjob-ZQWn425zo10RD80XraLFv1Ms
📊 Status: running

📈 DPO Training Metrics:
  🔥 Training Loss: Available in result files
  🎯 Preference Accuracy: Calculated post-training
  💰 Estimated Cost: Calculating...
  📊 Validation Loss: Not available during training
  📚 Learning Rate: Auto

❌ Error: Error(code=None, message=None, param=None)
⏳ Status: running - waiting...
📊 Checking DPO fine-tuning job s