Technical Documentation

📐 Architecture Overview

MVVM Pattern

CaptionMate follows the Model-View-ViewModel (MVVM) architecture pattern for clean separation of concerns and maintainability.

┌─────────────────────────────────────────────────┐
│                    View Layer                    │
│         (SwiftUI Views - Presentation)          │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                 ViewModel Layer                  │
│         (ContentViewModel - Business Logic)     │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                  Model Layer                     │
│         (Data Models, Services, APIs)           │
└─────────────────────────────────────────────────┘

🗂️ Project Structure

CaptionMate/
├── App/
│   └── CaptionMateApp.swift          # App entry point
├── Configuration/
│   ├── Constants/
│   │   └── Constants.swift                 # App constants
│   └── Extensions/
│       ├── Ex+Animation.swift              # Animation helpers
│       ├── Ex+Color.swift                  # Color extensions
│       ├── Ex+NSSavePanel.swift            # File save dialog
│       ├── Ex+Theme.swift                  # Theme management
│       ├── Ex+TimeFormatter.swift          # Time formatting
│       └── Ex+UTType.swift                 # File type utilities
├── Source/
│   ├── Data/
│   │   ├── Models/
│   │   │   ├── StateModels.swift           # State management
│   │   │   └── DownloadError.swift         # Error types
│   │   └── Services/
│   │       ├── WhisperService.swift        # Whisper integration
│   │       ├── ExportService.swift         # Subtitle export
│   │       └── FileTypeService.swift       # File type handling
│   └── Presentation/
│       ├── ViewModels/
│       │   └── ContentViewModel.swift      # Main ViewModel
│       └── Views/
│           ├── ContentView.swift           # Main view
│           ├── AudioViews/                 # Audio UI components
│           ├── ModelManagementViews/       # Model management UI
│           └── TranscriptionViews/         # Transcription UI
└── Tests/
    ├── CaptionMateTests/             # Unit tests
    └── CaptionMateUITests/           # UI tests

🧩 Core Components

1. ContentViewModel

Purpose: Central ViewModel managing all app state and business logic

Key Responsibilities:

Model lifecycle management
Audio file processing
Transcription orchestration
Export coordination
User settings persistence

Published Properties:

@Published var whisperKit: WhisperKit?
@Published var transcriptionState: TranscriptionState
@Published var modelManagementState: ModelManagementState
@Published var audioState: AudioState
@Published var uiState: UIState
@Published var transcriptionResult: TranscriptionResult?
@Published var audioPlayer: AVAudioPlayer?

Key Methods:

// Model Management
func loadModel(_ model: String, redownload: Bool = false)
func releaseModel() async
func downloadModel(_ model: String)
func cancelDownload(_ model: String)
func deleteModel(_ model: String)

// Transcription
func transcribeFile(path: String)
func transcribeAudioSamples(_ samples: [Float]) async throws -> TranscriptionResult?

// Audio Control
func playImportedAudio()
func pauseImportedAudio()
func seekToPosition(_ position: Double)
func setVolume(_ volume: Double)

// Export
func exportTranscription() async

2. State Models

TranscriptionState

struct TranscriptionState {
    var currentText: String
    var currentChunks: [Int: (chunkText: [String], fallbacks: Int)]
    var confirmedSegments: [TranscriptionSegment]
    
    // Performance metrics
    var tokensPerSecond: TimeInterval
    var effectiveRealTimeFactor: TimeInterval
    var effectiveSpeedFactor: TimeInterval
    var totalInferenceTime: TimeInterval
}

ModelManagementState

struct ModelManagementState {
    var modelState: ModelState
    var availableModels: [String]
    var localModels: [String]
    var downloadProgress: [String: Float]
    var currentDownloadingModels: Set<String>
    var modelSizes: [String: Int64]
}

AudioState

struct AudioState {
    var importedAudioURL: URL?
    var audioFileName: String
    var isPlaying: Bool
    var isTranscribing: Bool
    var totalDuration: Double
    var waveformSamples: [Float]
}

UIState

struct UIState {
    var showAdvancedOptions: Bool
    var isModelmanagerViewPresented: Bool
    var showDownloadErrorAlert: Bool
    var downloadError: DownloadError?
}

🎯 Key Features Implementation

Speech-to-Text Pipeline

Audio File → AudioProcessor → WhisperKit → DecodingOptions → TranscriptionResult → Export

Process Flow:

Audio Loading

let samples = try AudioProcessor.loadAudioAsFloatArray(fromPath: path)

Model Inference

let results = try await whisperKit.transcribe(
    audioArray: samples,
    decodeOptions: options,
    callback: progressCallback
)

DecodingOptions Configuration

DecodingOptions(
    task: .transcribe,
    language: languageCode,
    temperature: Float(temperatureStart),
    temperatureFallbackCount: Int(fallbackCount),
    sampleLength: sampleLength,
    usePrefillPrompt: enablePromptPrefill,
    usePrefillCache: enableCachePrefill,
    skipSpecialTokens: !enableSpecialCharacters,
    withoutTimestamps: !enableTimestamps,
    wordTimestamps: enableWordTimestamp,
    concurrentWorkerCount: Int(concurrentWorkerCount),
    chunkingStrategy: chunkingStrategy
)

Result Processing
- Segment cleaning and sorting
- Overlap adjustment
- Empty segment removal
Export
- Format conversion (SRT/WebVTT/JSON/FCPXML)
- File writing with user-selected path

Model Management System

Supported Models:

Model	Size	Speed	Memory	Use Case
Tiny	~150MB	~10x RT	~500MB	Quick drafts
Base	~250MB	~8x RT	~800MB	General use
Small	~500MB	~5x RT	~1.5GB	Balanced
Medium	~1.5GB	~3x RT	~3GB	High quality
Large-v3	~3GB	~2x RT	~6GB	Maximum accuracy

RT = Real-time factor (1x = same duration as audio)

Model States:

enum ModelState: String {
    case unloaded      // No model in memory
    case downloading   // Downloading from remote
    case downloaded    // Downloaded, not loaded
    case prewarming    // Optimizing for device
    case loading       // Loading into memory
    case loaded        // Ready for use
    case unloading     // Releasing resources
}

Download Management:

Concurrent downloads (max 3)
Progress tracking (weighted by model size)
Automatic disk space verification
Graceful cancellation handling
Partial download cleanup

Export Service

SRT (SubRip) Format

1
00:00:00,000 --> 00:00:05,000
Subtitle text here

2
00:00:05,000 --> 00:00:10,000
Next subtitle

Timestamp Format:

func toSRTTimestamp() -> String {
    let hours = Int(self) / 3600
    let minutes = (Int(self) % 3600) / 60
    let seconds = Int(self) % 60
    let milliseconds = Int((self - Double(Int(self))) * 1000)
    return String(format: "%02d:%02d:%02d,%03d", hours, minutes, seconds, milliseconds)
}

WebVTT Format

WEBVTT

00:00:00.000 --> 00:00:05.000
Subtitle text here

00:00:05.000 --> 00:00:10.000
Next subtitle

JSON Format

{
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 5.0,
      "text": "Subtitle text here"
    }
  ],
  "text": "Full transcription text",
  "language": "en"
}

Final Cut Pro XML (FCPXML 1.11)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fcpxml>
<fcpxml version="1.11">
  <resources>
    <format id="r1" frameDuration="100/3000s" width="1920" height="1080"/>
  </resources>
  <library>
    <event name="Subtitles">
      <project name="Subtitle Project">
        <sequence format="r1">
          <spine>
            <title offset="0/30s" duration="150/30s">
              <text>Subtitle text here</text>
            </title>
          </spine>
        </sequence>
      </project>
    </event>
  </library>
</fcpxml>

🔧 Configuration & Settings

Transcription Settings

Setting	Type	Range	Default	Description
Temperature	Double	0.0 - 1.0	0.0	Sampling randomness
Fallback Count	Int	0 - 10	5	Temperature fallback attempts
Sample Length	Int	224 - 448	224	Audio sample window
Compression Check	Int	20 - 100	60	Token window for compression check
Concurrent Workers	Int	1 - 8	4	Parallel processing workers

Chunking Strategies

enum ChunkingStrategy {
    case vad              // Voice Activity Detection
    case none             // No chunking
    case timeBasedSplit   // Fixed time intervals
}

VAD (Voice Activity Detection):

Automatically detects speech segments
Skips silent portions
More efficient processing
Better segment boundaries

Time-Based Split:

Fixed duration chunks
Predictable processing time
Useful for continuous speech

🎨 Theme System

AppTheme Implementation

enum AppTheme: String, CaseIterable, Identifiable {
    case light = "Light"
    case dark = "Dark"
    case auto = "Auto"
    
    var colorScheme: ColorScheme? {
        switch self {
        case .light: return .light
        case .dark: return .dark
        case .auto:
            let appearance = NSApp.effectiveAppearance
            return appearance.bestMatch(from: [.darkAqua, .aqua]) == .darkAqua ? .dark : .light
        }
    }
}

Dynamic Color System

All custom colors use NSColor.dynamicProvider for automatic theme adaptation:

static var transcriptionBackground: Color {
    Color(nsColor: NSColor(name: nil, dynamicProvider: { appearance in
        appearance.bestMatch(from: [.darkAqua, .aqua]) == .darkAqua
            ? NSColor(Color(red: 0.12, green: 0.12, blue: 0.14))  // Dark
            : NSColor(Color(red: 0.99, green: 0.99, blue: 1.0))   // Light
    }))
}

Custom Colors:

transcriptionBackground
audioControlBackground
audioControlSectionBackground
audioControlContainerBackground
audioControlDivider
audioPlaceholderBackground
waveformBackground
controlButtonForeground
modelSelectorText

🧪 Testing

Unit Tests

Test Suites:

ContentViewModelTests: ViewModel state management
FileTypeServiceTests: File type conversions
TimeIntervalExtensionTests: Time formatting
ColorExtensionTests: Theme colors
StateModelsTests: State initialization
ExportServiceTests: Subtitle export

Example Test:

@Test("SRT 타임스탬프 변환 테스트")
func testToSRTTimestamp() throws {
    let result = TimeInterval(3661.5).toSRTTimestamp()
    #expect(result == "01:01:01,500")
}

UI Tests

Test Classes:

BasicUITests: App launch and UI elements (single launch)
InteractionTests: User interactions (fresh launch per test)
PerformanceTests: Launch performance metrics

Test Strategy:

Use waitForExistence(timeout:) for reliable element detection
Support localization with NSPredicate queries
Minimize app launches for speed
Use accessibilityIdentifier for stable element references

📊 Performance Characteristics

Transcription Speed (Apple M1)

Model	Real-Time Factor	1 Hour Audio	Memory Usage
Tiny	~10x	~6 minutes	~500MB
Base	~8x	~7.5 minutes	~800MB
Small	~5x	~12 minutes	~1.5GB
Medium	~3x	~20 minutes	~3GB
Large	~2x	~30 minutes	~6GB

Optimization Techniques

Memory Management:

let samples = try await Task {
    try autoreleasepool {
        try AudioProcessor.loadAudioAsFloatArray(fromPath: path)
    }
}.value

UI Performance:

LazyVStack for long lists
Combine timers for smooth updates (30fps)
Debounced progress updates (0.5s interval)
Background task execution

Audio Processing:

Chunking strategies (VAD/time-based)
Parallel worker processing
Automatic volume normalization

🔄 Model Lifecycle

Loading Process

1. Check Disk Space
   ↓
2. Initialize WhisperKit
   ↓
3. Download Model (if needed)
   ↓
4. Prewarm Model (optimize for device)
   ↓
5. Load Model (into memory)
   ↓
6. Ready for Transcription

Progress Stages:

Initialization: 0% - 20%
Prewarming: 20% - 70%
Loading: 70% - 100%

Download System

Features:

Concurrent downloads (max 3 simultaneous)
Weighted progress calculation by model size
Automatic retry on transient failures
Cancellation with cleanup
Network error detection

Error Handling:

enum DownloadError {
    case networkOffline          // No internet
    case networkLost             // Connection dropped
    case cannotConnectToServer   // Server unreachable
    case timeout                 // Request timeout
    case diskSpaceFull           // Insufficient storage
    case permissionDenied        // File access denied
    case fileNotFound            // Missing file
    case unknown(String)         // Other errors
}

🎵 Audio Processing

Waveform Generation

Process:

Load audio as Float array
Compute RMS (Root Mean Square) values
Chunk into 1024-sample segments
Generate visual representation

Implementation:

private func computeWaveform(from samples: [Float]) -> [Float] {
    let chunkSize = 1024
    var rmsValues = [Float]()
    var index = 0
    while index < samples.count {
        let chunk = samples[index..<min(index + chunkSize, samples.count)]
        let sumSquares = chunk.reduce(0) { $0 + $1 * $1 }
        let rms = sqrt(sumSquares / Float(chunk.count))
        rmsValues.append(rms)
        index += chunkSize
    }
    return rmsValues
}

Volume Normalization

Target: -14 LUFS (Loudness Units relative to Full Scale)

Process:

Sample audio at multiple points
Calculate average power (dB)
Estimate LUFS
Compute gain adjustment
Apply attenuation only (no amplification)

🌐 Localization System

String Catalog

File: Localizable.xcstrings

Structure:

{
  "sourceLanguage": "en",
  "strings": {
    "key": {
      "localizations": {
        "en": { "stringUnit": { "value": "English text" } },
        "ko": { "stringUnit": { "value": "한국어 텍스트" } }
      }
    }
  }
}

Dynamic Language Switching

Implementation:

.environment(\.locale, Locale(identifier: appLanguage))

Features:

No app restart required
Instant UI update
Menu bar localization
Sheet content localization

🔍 Error Handling

Error Detection

Network Errors:

if nsError.domain == NSURLErrorDomain {
    switch nsError.code {
    case NSURLErrorNotConnectedToInternet:
        return .networkOffline
    case NSURLErrorNetworkConnectionLost:
        return .networkLost
    // ...
    }
}

File System Errors:

if nsError.domain == NSCocoaErrorDomain {
    switch nsError.code {
    case NSFileWriteOutOfSpaceError:
        return .diskSpaceFull
    case NSFileWriteNoPermissionError:
        return .permissionDenied
    // ...
    }
}

Error Recovery

Automatic:

CoreML cache clearing on space issues
Model reload retry on MPSGraph errors
Partial download cleanup on cancellation

User-Initiated:

Retry button on download failures
Manual cache clearing
State reset option

📈 Performance Optimization

Memory Optimization

Autoreleasepool:

try autoreleasepool {
    try AudioProcessor.loadAudioAsFloatArray(fromPath: path)
}

Resource Cleanup:

func releaseModel() async {
    await whisperKit?.unloadModels()
    whisperKit = nil
    clearCoreMLRuntimeCache()
}

UI Optimization

Lazy Loading:

LazyVStack {
    ForEach(models) { model in
        ModelRowView(model: model)
    }
}

Efficient Updates:

// 30fps playback timer
Timer.publish(every: 0.033, on: .main, in: .common)
    .autoconnect()
    .sink { [weak self] _ in
        self?.currentPlayerTime = player.currentTime
    }

🛠️ Development

Build Requirements

Xcode: 15.0+
macOS SDK: 15.0+
Swift: 5.0+
Swift Package Manager: For dependencies

Dependencies

dependencies: [
    .package(url: "https://github.com/argmaxinc/WhisperKit", from: "0.9.0")
]

Frameworks:

SwiftUI
AVFoundation
Core ML
AppKit
Combine
UniformTypeIdentifiers

Code Quality Tools

SwiftFormat:

mint run swiftformat .

Validation Scripts:

scripts/check_whitespace.sh      # Trailing whitespace check
scripts/check_filename_spaces.sh # Filename validation
scripts/check_copyright.sh       # Copyright header verification
scripts/style.sh                 # Code style check

🔄 CI/CD Pipeline

GitHub Actions

Build & Test Workflow

name: CaptionMate CI
on: [push, pull_request]
jobs:
  build:
    - Checkout code
    - Setup Xcode
    - Build macOS app
    - Run unit tests
    - Run UI tests

Code Quality Workflow

name: Code Quality Check
on: [push, pull_request]
jobs:
  quality:
    - Check trailing whitespace
    - Validate filenames
    - Verify copyright headers
    - Run SwiftFormat

📱 User Interface Components

Main Views

ContentView

Split view with sidebar and main content
Model selector in sidebar
Audio control and transcription in main area

ModelManagerView

Model search and filtering
Download progress visualization
Model size and status display
Grouped sections (Downloaded/Available)

SettingsView

Grouped settings (Format/Quality/Performance)
Slider controls with info buttons
Real-time preview of settings

AudioControlView

Waveform visualization
Playback controls
Volume and speed adjustment
File information display

TranscriptionView

Scrollable segment list
Timestamped text display
Text selection enabled
Auto-scroll to latest segment

🔗 WhisperKit Integration

Initialization

let config = WhisperKitConfig(
    computeOptions: ModelComputeOptions(
        audioEncoderCompute: .cpuAndNeuralEngine,
        textDecoderCompute: .cpuAndNeuralEngine
    ),
    verbose: true,
    logLevel: .debug
)
let whisperKit = try await WhisperKit(config)

Model Operations

Download:

let folder = try await WhisperKit.download(
    variant: modelName,
    from: repositoryName,
    progressCallback: { progress in
        // Update UI with progress.fractionCompleted
    }
)

Load:

try await whisperKit.prewarmModels()  // Optimize
try await whisperKit.loadModels()     // Load

Transcribe:

let results = try await whisperKit.transcribe(
    audioArray: samples,
    decodeOptions: options,
    callback: decodingCallback
)

📊 Data Flow

Transcription Flow

User Action (Import File)
    ↓
AudioProcessor.loadAudioAsFloatArray()
    ↓
ContentViewModel.transcribeAudioSamples()
    ↓
WhisperKit.transcribe()
    ↓
DecodingCallback (Progress Updates)
    ↓
TranscriptionResult
    ↓
UI Update (TranscriptionView)
    ↓
User Action (Export)
    ↓
ExportService.exportTranscriptionResult()
    ↓
NSSavePanel (File Selection)
    ↓
File Written to Disk

State Management Flow

User Input → ViewModel (@Published) → SwiftUI View
                ↓
         UserDefaults (@AppStorage)
                ↓
         Persistent Storage

🎯 Key Algorithms

Progress Calculation (Weighted Average)

private var totalDownloadProgress: Double {
    let downloadingModels = viewModel.modelManagementState.currentDownloadingModels
    if downloadingModels.isEmpty { return 0 }
    
    var totalSize: Int64 = 0
    var downloadedSize: Double = 0
    
    for model in downloadingModels {
        let modelSize = viewModel.modelManagementState.modelSizes[model] ?? 0
        let progress = Double(viewModel.modelManagementState.downloadProgress[model] ?? 0)
        
        totalSize += modelSize
        downloadedSize += Double(modelSize) * progress
    }
    
    return totalSize > 0 ? downloadedSize / Double(totalSize) : 0
}

Exponential Progress Bar

func updateProgressBar(startProgress: Float, targetProgress: Float, maxTime: TimeInterval) async {
    let progressRange = targetProgress - startProgress
    let decayConstant = -log(1 - 0.95) / Float(maxTime)
    let startTime = Date()
    
    while !Task.isCancelled {
        let elapsedTime = Date().timeIntervalSince(startTime)
        let decayFactor = exp(-decayConstant * Float(elapsedTime))
        let progressIncrement = progressRange * (1 - decayFactor)
        let currentProgress = startProgress + progressIncrement
        
        await MainActor.run {
            modelManagementState.loadingProgressValue = min(currentProgress, targetProgress)
        }
        
        if currentProgress >= targetProgress { break }
        try? await Task.sleep(nanoseconds: 200_000_000) // 0.2s
    }
}

🌐 Supported Languages

Interface Languages

English (en)
Korean (ko)

Transcription Languages

All languages supported by Whisper model:

English, Spanish, French, German, Italian, Portuguese
Chinese, Japanese, Korean
Russian, Arabic, Hindi
And 90+ more languages

📦 File Type Support

Input Formats (Audio/Video)

MP3
M4A (AAC)
WAV
FLAC
AIFF
CAF
Any format supported by AVFoundation

Output Formats (Subtitles)

SRT (SubRip)
WebVTT (Web Video Text Tracks)
JSON (JavaScript Object Notation)
FCPXML (Final Cut Pro XML 1.11)

UTType Declarations

extension UTType {
    static var srt: UTType {
        UTType(exportedAs: "com.CaptionMate.srtsubtitle")
    }
    
    static var vtt: UTType {
        UTType(exportedAs: "com.CaptionMate.vttsubtitle")
    }
    
    static var fcpxml: UTType {
        UTType(importedAs: "com.apple.finalcutpro.fcpxml")
    }
}

🔧 Build Configuration

Debug Configuration

Optimization Level: None (-Onone)
Debug Symbols: Yes
Assertions: Enabled
Dead Code Stripping: Yes

Release Configuration

Optimization Level: Aggressive (-O)
Debug Symbols: No
Assertions: Disabled
Dead Code Stripping: Yes
Whole Module Optimization: Yes

Deployment Target

macOS: 15.0 (Sequoia)
Architecture: Universal (Apple Silicon + Intel)

📐 Design Patterns

Reactive Programming

@Published var audioState: AudioState
// Automatically triggers UI updates when changed

Dependency Injection

struct ContentView: View {
    @ObservedObject var viewModel: ContentViewModel
    // ViewModel injected from parent
}

State Management

@AppStorage("selectedModel") var selectedModel: String
// Persistent storage with UserDefaults

Async/Await

func transcribeFile(path: String) {
    Task {
        try await transcribeCurrentFile(path: path)
    }
}

🎨 UI/UX Patterns

Progressive Disclosure

Basic controls visible by default
Advanced settings in separate view
Info buttons for detailed explanations

Immediate Feedback

Progress bars for long operations
Loading indicators for async tasks
Visual state changes (colors, animations)

Drag and Drop

.onDrop(of: [.fileURL], isTargeted: $isTargeted) { providers in
    handleDroppedFiles(providers: providers)
    return true
}

📚 API Reference

ContentViewModel Public Interface

Model Management

func loadModel(_ model: String, redownload: Bool = false)
func releaseModel() async
func downloadModel(_ model: String)
func cancelDownload(_ model: String)
func deleteModel(_ model: String)
func fetchModels()

Transcription

func transcribeFile(path: String)
func transcribeCurrentFile(path: String) async throws
func transcribeAudioSamples(_ samples: [Float]) async throws -> TranscriptionResult?

Audio Control

func playImportedAudio()
func pauseImportedAudio()
func stopImportedAudio()
func seekToPosition(_ position: Double)
func seekToPositionInLine(lineIndex: Int, secondsPerLine: Double, ratio: Double, totalDuration: Double)
func setVolume(_ volume: Double)
func toggleMute()
func changePlaybackRate(faster: Bool)
func currentPlaybackRateText() -> String
func skipForward()
func skipBackward()

File Management

func selectFile()
func handleFilePicker(result: Result<[URL], Error>)
func handleDroppedFiles(providers: [NSItemProvider])
func deleteImportedAudio()

Export

func exportTranscription() async

Settings

func changeAppLanguage(to language: String)
func getCurrentLanguageDisplayName() -> String
var appTheme: AppTheme { get set }

🔄 State Synchronization

@Published Properties

Automatically trigger UI updates when changed:

@Published var transcriptionState = TranscriptionState()
// Any change to transcriptionState updates all observing views

@AppStorage Properties

Persist across app launches:

@AppStorage("enableTimestamps") var enableTimestamps: Bool = true
// Saved to UserDefaults, survives app restart

Environment Values

Propagate through view hierarchy:

.environment(\.locale, Locale(identifier: appLanguage))
// All child views receive updated locale

🧮 Time Formatting

SRT Timestamp Format

// Input: 3661.5 seconds
// Output: "01:01:01,500"

func toSRTTimestamp() -> String {
    let hours = Int(self) / 3600
    let minutes = (Int(self) % 3600) / 60
    let seconds = Int(self) % 60
    let milliseconds = Int((self - Double(Int(self))) * 1000)
    return String(format: "%02d:%02d:%02d,%03d", hours, minutes, seconds, milliseconds)
}

Display Format

// Input: 3661.5 seconds
// Output: "01:01:01 - 01:01:06"

func formatTimeRange(to endTime: TimeInterval) -> String {
    let startFormatted = self.toSRTTimestamp().replacingOccurrences(of: ",", with: ".")
    let endFormatted = endTime.toSRTTimestamp().replacingOccurrences(of: ",", with: ".")
    return "\(startFormatted) - \(endFormatted)"
}

🎬 Export Implementation

SRT Writer

func writeSRT(segments: [TranscriptionSegment], to url: URL) throws {
    var srtContent = ""
    for (index, segment) in segments.enumerated() {
        srtContent += "\(index + 1)\n"
        srtContent += "\(segment.start.toSRTTimestamp()) --> \(segment.end.toSRTTimestamp())\n"
        srtContent += "\(segment.text)\n\n"
    }
    try srtContent.write(to: url, atomically: true, encoding: .utf8)
}

FCPXML Writer

func writeFCPXML(segments: [TranscriptionSegment], frameRate: Double, to url: URL) throws {
    let frameDuration = "100/\(Int(frameRate * 100))s"
    // Generate XML structure with proper timing
    // Convert seconds to frame-based timing
    // Write to file
}

🔍 Debugging

Logging

WhisperKit Verbose Mode:

let config = WhisperKitConfig(
    verbose: true,
    logLevel: .debug
)

Console Output:

Model loading progress
Transcription metrics
Error details
Performance statistics

Performance Metrics

Available Metrics:

struct TranscriptionState {
    var tokensPerSecond: TimeInterval
    var effectiveRealTimeFactor: TimeInterval
    var effectiveSpeedFactor: TimeInterval
    var totalInferenceTime: TimeInterval
    var currentFallbacks: Int
    var currentEncodingLoops: Int
    var currentDecodingLoops: Int
}

🚀 Deployment

Build Process

Clean build folder
Update version and build number
Archive for distribution
Automatic code signing
Notarization submission
App Store Connect upload

Version Numbering

Format: MAJOR.MINOR.PATCH

MAJOR: Breaking changes
MINOR: New features (backward compatible)
PATCH: Bug fixes

📞 Technical Support

System Information for Bug Reports

- macOS Version: [e.g., 15.0]
- App Version: [e.g., 1.0.0]
- Model Used: [e.g., openai_whisper-small]
- Audio Format: [e.g., MP3, 44.1kHz]
- File Duration: [e.g., 30 minutes]
- Error Message: [exact error text]

Contact

Email: parfume407@gmail.com
Response Time: 3-5 business days