# cjm-transcription-plugin-system

> A flexible plugin system for audio transcription intended to make it easy to add support for multiple backends.

## Install

```bash
pip install cjm_transcription_plugin_system
```

## Project Structure

```
nbs/
├── core.ipynb             # DTOs for audio transcription with FileBackedDTO support for zero-copy transfer
└── plugin_interface.ipynb # Domain-specific plugin interface for audio transcription
```

Total: 2 notebooks

## Module Dependencies

```mermaid
graph LR
    core[core<br/>Core Data Structures]
    plugin_interface[plugin_interface<br/>Transcription Plugin Interface]

    plugin_interface --> core
```

*1 cross-module dependencies detected*

## CLI Reference

No CLI commands found in this project.

## Module Overview

Detailed documentation for each module in the project:

### Core Data Structures (`core.ipynb`)
> DTOs for audio transcription with FileBackedDTO support for zero-copy transfer

#### Import

```python
from cjm_transcription_plugin_system.core import (
    AudioData,
    TranscriptionResult
)
```
#### Classes

```python
@dataclass
class AudioData:
    """
    Container for raw audio data.
    Implements FileBackedDTO for zero-copy transfer between Host and Worker processes.
    """
    
    samples: np.ndarray  # Audio sample data as numpy array
    sample_rate: int  # Sample rate in Hz (e.g., 16000, 44100)
    
    def to_temp_file(self) -> str: # Absolute path to temporary WAV file
            """Save audio to a temp file for zero-copy transfer to Worker process."""
            # Create temp file (delete=False so Worker can read it)
            tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
            
            # Ensure float32 format
            audio = self.samples
            if audio.dtype != np.float32
        "Save audio to a temp file for zero-copy transfer to Worker process."
    
    def to_dict(self) -> Dict[str, Any]: # Serialized representation
            """Convert to dictionary for smaller payloads."""
            return {
                "samples": self.samples.tolist(),
        "Convert to dictionary for smaller payloads."
    
    def from_file(
            cls,
            filepath: str # Path to audio file
        ) -> "AudioData": # AudioData instance
        "Load audio from a file."
```

```python
@dataclass
class TranscriptionResult:
    "Standardized output for all transcription plugins."
    
    text: str  # The transcribed text
    confidence: Optional[float]  # Overall confidence (0.0 to 1.0)
    segments: Optional[List[Dict[str, Any]]]  # Timestamped segments
    metadata: Dict[str, Any] = field(...)  # Additional metadata
```


### Transcription Plugin Interface (`plugin_interface.ipynb`)
> Domain-specific plugin interface for audio transcription

#### Import

```python
from cjm_transcription_plugin_system.plugin_interface import (
    TranscriptionPlugin
)
```
#### Classes

```python
class TranscriptionPlugin(PluginInterface):
    """
    Abstract base class for all transcription plugins.
    
    Extends PluginInterface with transcription-specific requirements:
    - `supported_formats`: List of audio file extensions this plugin can handle
    - `execute`: Accepts audio path (str) or AudioData, returns TranscriptionResult
    
    NOTE: When running via RemotePluginProxy, AudioData objects are automatically
    serialized to temp files via FileBackedDTO, so the Worker receives a file path.
    """
    
    def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
            """List of supported audio file extensions (without the dot)."""
            ...
    
        @abstractmethod
        def execute(
            self,
            audio: Union[AudioData, str, Path], # Audio data or file path
            **kwargs
        ) -> TranscriptionResult: # Transcription result with text, confidence, segments
        "List of supported audio file extensions (without the dot)."
    
    def execute(
            self,
            audio: Union[AudioData, str, Path], # Audio data or file path
            **kwargs
        ) -> TranscriptionResult: # Transcription result with text, confidence, segments
        "Transcribe audio to text.

When called via Proxy, AudioData is auto-converted to a file path string
before reaching this method in the Worker process."
```
