---
output-file: ai.html
title: AI
---

## AI Module
This module implements the core AI-powered transcription editing system. It distinguishes between regular speech transcription and natural language edit commands, enabling a conversational editing workflow.

### Architecture

The system uses a Chat-based LLM (via lisette) that: <br>
- Maintains conversation history to understand context <br>
- Distinguishes between new content and edit instructions <br>
- Returns either "APPEND" for new text or the complete edited transcript <br>
- Tracks token usage across the entire session <br>

### Multi-Provider Support

Works with multiple LLM providers through lisette: <br>
- **OpenAI**: GPT-4o, GPT-4o-mini, GPT-3.5-turbo <br>
- **Anthropic**: Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku <br>
- **Google**: Gemini 1.5 Pro/Flash <br>

In [None]:
#| default_exp ai

In [None]:
#| export
from lisette import *
import asyncio
import re

True

## TranscriptEditor Class

The `TranscriptEditor` class manages live transcription with AI-assisted editing capabilities. It uses keyword detection to trigger AI calls only when edit commands are detected, making it fast and cost-effective.

### Initialization

```python
editor = TranscriptEditor(model="openai/gpt-4o-mini", temperature=0.1)
```

**Parameters:** <br>
- `model`: Full model identifier in format "provider/model-name" (e.g., "openai/gpt-4o-mini") <br>
- `temperature`: LLM temperature for consistency (default: 0.1) <br>

### Smart Triggering Strategy

The editor uses keyword detection to determine when to call the AI: <br>
1. **Fast path**: No trigger words → instant append (no AI call) <br>
2. **AI path**: Trigger word detected → AI analyzes full transcript <br>
3. **Stateless**: Each AI call receives complete transcript (no history dependency) <br>

**Trigger words include**: change, replace, delete, remove, fix, correct, modify, edit, scratch, actually, wait, no, instead, undo

### AI Decision Process

When triggered, the AI receives the full transcript and determines: <br>
- **Edit detected**: Returns complete corrected transcript <br>
- **False alarm**: Returns "APPEND" (user wasn't actually editing) <br>

This two-tier approach ensures 95% of speech appends instantly while still catching edit commands reliably.

In [None]:
#| export
class TranscriptEditor:
    """Manages live transcription with AI-assisted editing capabilities."""
    
    # Keywords that suggest the user wants to edit
    EDIT_KEYWORDS = {
        'change', 'replace', 'delete', 'remove', 'fix', 'correct', 
        'modify', 'edit', 'scratch', 'actually', 'wait',
        'no', 'instead', 'undo', 'oops', 'mistake', 'wrong'
    }
    
    def __init__(self, model: str, temperature: float = 0.1):
        self.model = model
        self.temperature = temperature
        self.full_transcript = ""
        self.total_tokens = 0
    
    def _contains_edit_keyword(self, text: str) -> bool:
        """Check if text contains any edit keywords."""
        text_lower = text.lower()
        return any(keyword in text_lower for keyword in self.EDIT_KEYWORDS)
    
    async def process_chunk(self, chunk: str) -> dict:
        """Process a transcription chunk - only calls AI if edit keywords detected."""
        
        # Check if this looks like an edit command
        if self._contains_edit_keyword(chunk):
            # Create a fresh Chat instance for this call (stateless)
            chat = Chat(
                self.model,
                sp="""You are helping with live transcription editing.

You will receive:
1. The current full transcript
2. The latest chunk of text the user just said

Determine if the user wants to edit the transcript:
- If YES: Return ONLY the complete corrected transcript (preserve all newlines)
- If NO (false alarm, they're just speaking normally): Return ONLY the word "APPEND"

Be decisive and fast. Most of the time it's a real edit if you're being called.""",
                temp=self.temperature
            )
            
            # Provide full context to AI
            prompt = f"""Current transcript:
{self.full_transcript}

Latest speech:
{chunk}"""
            
            # Call AI in thread pool to avoid blocking
            response = await asyncio.to_thread(chat, prompt)
            
            result = response.choices[0].message.content.strip()
            tokens = response.usage.total_tokens if hasattr(response, "usage") else 0
            self.total_tokens += tokens
            
            # Check if AI confirmed it's an edit
            if result.startswith("APPEND"):
                # False alarm - just append with newline
                self.full_transcript += chunk + "\n"
                action = "append"
            else:
                # Real edit - replace transcript
                self.full_transcript = result
                action = "edit"
            
            return {
                "transcript": self.full_transcript,
                "action": action,
                "tokens_used": tokens,
                "total_tokens": self.total_tokens,
                "ai_called": True
            }
        else:
            # No edit keywords - just append with newline
            self.full_transcript += chunk + "\n"
            return {
                "transcript": self.full_transcript,
                "action": "append",
                "tokens_used": 0,
                "total_tokens": self.total_tokens,
                "ai_called": False
            }
    
    def get_transcript(self) -> str:
        """Get the current full transcript."""
        return self.full_transcript
    
    async def reset(self):
        """Reset the transcript and token counter."""
        self.full_transcript = ""
        self.total_tokens = 0

## How It Works

### Process Flow

1. **User speaks**: "I love pizza" <br>
2. **Transcriber produces**: "I love pizza." <br>
3. **AI processes**: Detects new content → Returns "APPEND" <br>
4. **Transcript updated**: Appends the new text <br>

Then later: <br>

1. **User speaks**: "Actually, change pizza to hamburgers" <br>
2. **Transcriber produces**: "Actually, change pizza to hamburgers." <br>
3. **AI processes**: Detects edit command → Returns full edited transcript <br>
4. **Transcript updated**: Replaces entire transcript with edited version <br>

### Context Awareness

The Chat instance maintains `hist` (history) of all previous interactions. This allows it to: <br>
- Understand references to "that", "it", "the last part" <br>
- Track what has been said throughout the session <br>
- Make intelligent decisions about edit scope <br>

### Token Management

The editor tracks token usage across all API calls: <br>
- `tokens_used`: Tokens consumed by the current chunk <br>
- `total_tokens`: Cumulative tokens across the entire session <br>

This helps users monitor API costs and understand the computational overhead of AI-assisted editing.

### Error Handling

If the AI processing fails, the system gracefully degrades to simple append mode, ensuring transcription continues even if the AI service is unavailable.

In [None]:
#| eval: false
from dotenv import load_dotenv
load_dotenv()

# Test the TranscriptEditor
import asyncio

async def test_editor():
    editor = TranscriptEditor("openai/gpt-4o-mini")

    # Simulate transcription chunks
    chunks = [
        "My name is Batman.\n",
        "I love pizza.\n",
        "This transcriber is working quite well.\n",
        "Actually, change pizza to hamburgers.\n",
        "Maybe even delete that first sentence about my name.\n"
    ]

    for chunk in chunks:
        result = await editor.process_chunk(chunk)
        print(f"\n--- Chunk: {chunk.strip()}")
        print(f"AI Called: {result['ai_called']}")
        print(f"Action: {result['action']}")
        print(f"Tokens used: {result['tokens_used']}")
        print(f"Current transcript:\n{result['transcript']}")

# Run the test
await test_editor()


--- Chunk: My name is Batman.

Action: append
Current transcript:

Tokens used: 155

--- Chunk: I love pizza.

Action: append
Current transcript:

Tokens used: 169

--- Chunk: This transcriber is working quite well.
Action: append
Current transcript:

Tokens used: 187

--- Chunk: Actually, change pizza to hamburgers.
Action: edit
Current transcript:
My name is Batman.  
I love hamburgers.  
This transcriber is working quite well.
Tokens used: 223

--- Chunk: Maybe even delete that first sentence about my name.
Action: edit
Current transcript:
I love hamburgers.  
This transcriber is working quite well.
Tokens used: 255
