# Transcription Testing with Whisper API

This notebook tests audio transcription capabilities using OpenAI's Whisper API.

## Objectives
1. Test Whisper API integration
2. Benchmark transcription accuracy
3. Test different audio formats and quality levels
4. Measure processing time

## Setup
- Requires OpenAI API key in `.env`
- Test audio files in `test_audio/` directory

In [None]:
import os
import sys
from pathlib import Path
from dotenv import load_dotenv
from openai import OpenAI
import time

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("âœ… OpenAI client initialized")

## Test 1: Basic Transcription

In [None]:
def transcribe_audio(audio_file_path: str, language: str = None):
    """
    Transcribe audio file using Whisper API
    
    Args:
        audio_file_path: Path to audio file
        language: Optional ISO-639-1 language code (e.g., 'en', 'es')
    
    Returns:
        dict: Transcription result with text and metadata
    """
    start_time = time.time()
    
    with open(audio_file_path, 'rb') as audio_file:
        if language:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                language=language,
                response_format="verbose_json"
            )
        else:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                response_format="verbose_json"
            )
    
    processing_time = time.time() - start_time
    
    return {
        'text': transcript.text,
        'language': transcript.language if hasattr(transcript, 'language') else 'unknown',
        'duration': transcript.duration if hasattr(transcript, 'duration') else 0,
        'processing_time': processing_time
    }

# Test with sample audio file (you'll need to provide your own)
# audio_path = "test_audio/sample_medical_interpretation.mp3"
# result = transcribe_audio(audio_path, language='en')
# print(f"Transcription: {result['text']}")
# print(f"Language: {result['language']}")
# print(f"Duration: {result['duration']:.2f}s")
# print(f"Processing Time: {result['processing_time']:.2f}s")

## Test 2: Bilingual Medical Interpretation

Test transcription of a medical interpretation session with code-switching between English and Spanish.

In [None]:
# Example: Transcribe bilingual interpretation
# bilingual_audio = "test_audio/bilingual_interpretation.mp3"
# result = transcribe_audio(bilingual_audio)
# print(f"Detected Language: {result['language']}")
# print(f"Full Transcript:\n{result['text']}")

## Test 3: Accuracy Benchmarking

Compare Whisper transcription against ground truth transcripts.

In [None]:
def calculate_wer(reference: str, hypothesis: str) -> float:
    """
    Calculate Word Error Rate (WER)
    
    WER = (S + D + I) / N
    S = substitutions, D = deletions, I = insertions, N = total words in reference
    """
    ref_words = reference.lower().split()
    hyp_words = hypothesis.lower().split()
    
    # Simple Levenshtein distance calculation
    d = [[0] * (len(hyp_words) + 1) for _ in range(len(ref_words) + 1)]
    
    for i in range(len(ref_words) + 1):
        d[i][0] = i
    for j in range(len(hyp_words) + 1):
        d[0][j] = j
    
    for i in range(1, len(ref_words) + 1):
        for j in range(1, len(hyp_words) + 1):
            if ref_words[i-1] == hyp_words[j-1]:
                d[i][j] = d[i-1][j-1]
            else:
                substitution = d[i-1][j-1] + 1
                insertion = d[i][j-1] + 1
                deletion = d[i-1][j] + 1
                d[i][j] = min(substitution, insertion, deletion)
    
    wer = d[len(ref_words)][len(hyp_words)] / len(ref_words) if ref_words else 0
    return wer

# Example usage:
# ground_truth = "The patient has hypertension and diabetes."
# whisper_output = "The patient has high blood pressure and diabetes."
# wer = calculate_wer(ground_truth, whisper_output)
# print(f"Word Error Rate: {wer:.2%}")

## Next Steps

1. Create a collection of test audio files (various accents, audio quality, medical terminology)
2. Benchmark WER across different scenarios
3. Test edge cases (background noise, multiple speakers, heavy accents)
4. Implement speaker diarization if needed
5. Move successful patterns to `app/nlp/transcription.py`