Skip to content

Technical Documentation

HyungGu(Harrison) Cho edited this page Oct 13, 2025 · 5 revisions

Technical Documentation

📐 Architecture Overview

MVVM Pattern

CaptionMate follows the Model-View-ViewModel (MVVM) architecture pattern for clean separation of concerns and maintainability.

┌─────────────────────────────────────────────────┐
│                    View Layer                    │
│         (SwiftUI Views - Presentation)          │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                 ViewModel Layer                  │
│         (ContentViewModel - Business Logic)     │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│                  Model Layer                     │
│         (Data Models, Services, APIs)           │
└─────────────────────────────────────────────────┘

🗂️ Project Structure

CaptionMate/
├── App/
│   └── CaptionMateApp.swift          # App entry point
├── Configuration/
│   ├── Constants/
│   │   └── Constants.swift                 # App constants
│   └── Extensions/
│       ├── Ex+Animation.swift              # Animation helpers
│       ├── Ex+Color.swift                  # Color extensions
│       ├── Ex+NSSavePanel.swift            # File save dialog
│       ├── Ex+Theme.swift                  # Theme management
│       ├── Ex+TimeFormatter.swift          # Time formatting
│       └── Ex+UTType.swift                 # File type utilities
├── Source/
│   ├── Data/
│   │   ├── Models/
│   │   │   ├── StateModels.swift           # State management
│   │   │   └── DownloadError.swift         # Error types
│   │   └── Services/
│   │       ├── WhisperService.swift        # Whisper integration
│   │       ├── ExportService.swift         # Subtitle export
│   │       └── FileTypeService.swift       # File type handling
│   └── Presentation/
│       ├── ViewModels/
│       │   └── ContentViewModel.swift      # Main ViewModel
│       └── Views/
│           ├── ContentView.swift           # Main view
│           ├── AudioViews/                 # Audio UI components
│           ├── ModelManagementViews/       # Model management UI
│           └── TranscriptionViews/         # Transcription UI
└── Tests/
    ├── CaptionMateTests/             # Unit tests
    └── CaptionMateUITests/           # UI tests

🧩 Core Components

1. ContentViewModel

Purpose: Central ViewModel managing all app state and business logic

Key Responsibilities:

  • Model lifecycle management
  • Audio file processing
  • Transcription orchestration
  • Export coordination
  • User settings persistence

Published Properties:

@Published var whisperKit: WhisperKit?
@Published var transcriptionState: TranscriptionState
@Published var modelManagementState: ModelManagementState
@Published var audioState: AudioState
@Published var uiState: UIState
@Published var transcriptionResult: TranscriptionResult?
@Published var audioPlayer: AVAudioPlayer?

Key Methods:

// Model Management
func loadModel(_ model: String, redownload: Bool = false)
func releaseModel() async
func downloadModel(_ model: String)
func cancelDownload(_ model: String)
func deleteModel(_ model: String)

// Transcription
func transcribeFile(path: String)
func transcribeAudioSamples(_ samples: [Float]) async throws -> TranscriptionResult?

// Audio Control
func playImportedAudio()
func pauseImportedAudio()
func seekToPosition(_ position: Double)
func setVolume(_ volume: Double)

// Export
func exportTranscription() async

2. State Models

TranscriptionState

struct TranscriptionState {
    var currentText: String
    var currentChunks: [Int: (chunkText: [String], fallbacks: Int)]
    var confirmedSegments: [TranscriptionSegment]
    
    // Performance metrics
    var tokensPerSecond: TimeInterval
    var effectiveRealTimeFactor: TimeInterval
    var effectiveSpeedFactor: TimeInterval
    var totalInferenceTime: TimeInterval
}

ModelManagementState

struct ModelManagementState {
    var modelState: ModelState
    var availableModels: [String]
    var localModels: [String]
    var downloadProgress: [String: Float]
    var currentDownloadingModels: Set<String>
    var modelSizes: [String: Int64]
}

AudioState

struct AudioState {
    var importedAudioURL: URL?
    var audioFileName: String
    var isPlaying: Bool
    var isTranscribing: Bool
    var totalDuration: Double
    var waveformSamples: [Float]
}

UIState

struct UIState {
    var showAdvancedOptions: Bool
    var isModelmanagerViewPresented: Bool
    var showDownloadErrorAlert: Bool
    var downloadError: DownloadError?
}

🎯 Key Features Implementation

Speech-to-Text Pipeline

Audio File → AudioProcessor → WhisperKit → DecodingOptions → TranscriptionResult → Export

Process Flow:

  1. Audio Loading

    let samples = try AudioProcessor.loadAudioAsFloatArray(fromPath: path)
  2. Model Inference

    let results = try await whisperKit.transcribe(
        audioArray: samples,
        decodeOptions: options,
        callback: progressCallback
    )
  3. DecodingOptions Configuration

    DecodingOptions(
        task: .transcribe,
        language: languageCode,
        temperature: Float(temperatureStart),
        temperatureFallbackCount: Int(fallbackCount),
        sampleLength: sampleLength,
        usePrefillPrompt: enablePromptPrefill,
        usePrefillCache: enableCachePrefill,
        skipSpecialTokens: !enableSpecialCharacters,
        withoutTimestamps: !enableTimestamps,
        wordTimestamps: enableWordTimestamp,
        concurrentWorkerCount: Int(concurrentWorkerCount),
        chunkingStrategy: chunkingStrategy
    )
  4. Result Processing

    • Segment cleaning and sorting
    • Overlap adjustment
    • Empty segment removal
  5. Export

    • Format conversion (SRT/WebVTT/JSON/FCPXML)
    • File writing with user-selected path

Model Management System

Supported Models:

Model Size Speed Memory Use Case
Tiny ~150MB ~10x RT ~500MB Quick drafts
Base ~250MB ~8x RT ~800MB General use
Small ~500MB ~5x RT ~1.5GB Balanced
Medium ~1.5GB ~3x RT ~3GB High quality
Large-v3 ~3GB ~2x RT ~6GB Maximum accuracy

RT = Real-time factor (1x = same duration as audio)

Model States:

enum ModelState: String {
    case unloaded      // No model in memory
    case downloading   // Downloading from remote
    case downloaded    // Downloaded, not loaded
    case prewarming    // Optimizing for device
    case loading       // Loading into memory
    case loaded        // Ready for use
    case unloading     // Releasing resources
}

Download Management:

  • Concurrent downloads (max 3)
  • Progress tracking (weighted by model size)
  • Automatic disk space verification
  • Graceful cancellation handling
  • Partial download cleanup

Export Service

SRT (SubRip) Format

1
00:00:00,000 --> 00:00:05,000
Subtitle text here

2
00:00:05,000 --> 00:00:10,000
Next subtitle

Timestamp Format:

func toSRTTimestamp() -> String {
    let hours = Int(self) / 3600
    let minutes = (Int(self) % 3600) / 60
    let seconds = Int(self) % 60
    let milliseconds = Int((self - Double(Int(self))) * 1000)
    return String(format: "%02d:%02d:%02d,%03d", hours, minutes, seconds, milliseconds)
}

WebVTT Format

WEBVTT

00:00:00.000 --> 00:00:05.000
Subtitle text here

00:00:05.000 --> 00:00:10.000
Next subtitle

JSON Format

{
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 5.0,
      "text": "Subtitle text here"
    }
  ],
  "text": "Full transcription text",
  "language": "en"
}

Final Cut Pro XML (FCPXML 1.11)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE fcpxml>
<fcpxml version="1.11">
  <resources>
    <format id="r1" frameDuration="100/3000s" width="1920" height="1080"/>
  </resources>
  <library>
    <event name="Subtitles">
      <project name="Subtitle Project">
        <sequence format="r1">
          <spine>
            <title offset="0/30s" duration="150/30s">
              <text>Subtitle text here</text>
            </title>
          </spine>
        </sequence>
      </project>
    </event>
  </library>
</fcpxml>

🔧 Configuration & Settings

Transcription Settings

Setting Type Range Default Description
Temperature Double 0.0 - 1.0 0.0 Sampling randomness
Fallback Count Int 0 - 10 5 Temperature fallback attempts
Sample Length Int 224 - 448 224 Audio sample window
Compression Check Int 20 - 100 60 Token window for compression check
Concurrent Workers Int 1 - 8 4 Parallel processing workers

Chunking Strategies

enum ChunkingStrategy {
    case vad              // Voice Activity Detection
    case none             // No chunking
    case timeBasedSplit   // Fixed time intervals
}

VAD (Voice Activity Detection):

  • Automatically detects speech segments
  • Skips silent portions
  • More efficient processing
  • Better segment boundaries

Time-Based Split:

  • Fixed duration chunks
  • Predictable processing time
  • Useful for continuous speech

🎨 Theme System

AppTheme Implementation

enum AppTheme: String, CaseIterable, Identifiable {
    case light = "Light"
    case dark = "Dark"
    case auto = "Auto"
    
    var colorScheme: ColorScheme? {
        switch self {
        case .light: return .light
        case .dark: return .dark
        case .auto:
            let appearance = NSApp.effectiveAppearance
            return appearance.bestMatch(from: [.darkAqua, .aqua]) == .darkAqua ? .dark : .light
        }
    }
}

Dynamic Color System

All custom colors use NSColor.dynamicProvider for automatic theme adaptation:

static var transcriptionBackground: Color {
    Color(nsColor: NSColor(name: nil, dynamicProvider: { appearance in
        appearance.bestMatch(from: [.darkAqua, .aqua]) == .darkAqua
            ? NSColor(Color(red: 0.12, green: 0.12, blue: 0.14))  // Dark
            : NSColor(Color(red: 0.99, green: 0.99, blue: 1.0))   // Light
    }))
}

Custom Colors:

  • transcriptionBackground
  • audioControlBackground
  • audioControlSectionBackground
  • audioControlContainerBackground
  • audioControlDivider
  • audioPlaceholderBackground
  • waveformBackground
  • controlButtonForeground
  • modelSelectorText

🧪 Testing

Unit Tests

Test Suites:

  • ContentViewModelTests: ViewModel state management
  • FileTypeServiceTests: File type conversions
  • TimeIntervalExtensionTests: Time formatting
  • ColorExtensionTests: Theme colors
  • StateModelsTests: State initialization
  • ExportServiceTests: Subtitle export

Example Test:

@Test("SRT 타임스탬프 변환 테스트")
func testToSRTTimestamp() throws {
    let result = TimeInterval(3661.5).toSRTTimestamp()
    #expect(result == "01:01:01,500")
}

UI Tests

Test Classes:

  • BasicUITests: App launch and UI elements (single launch)
  • InteractionTests: User interactions (fresh launch per test)
  • PerformanceTests: Launch performance metrics

Test Strategy:

  • Use waitForExistence(timeout:) for reliable element detection
  • Support localization with NSPredicate queries
  • Minimize app launches for speed
  • Use accessibilityIdentifier for stable element references

📊 Performance Characteristics

Transcription Speed (Apple M1)

Model Real-Time Factor 1 Hour Audio Memory Usage
Tiny ~10x ~6 minutes ~500MB
Base ~8x ~7.5 minutes ~800MB
Small ~5x ~12 minutes ~1.5GB
Medium ~3x ~20 minutes ~3GB
Large ~2x ~30 minutes ~6GB

Optimization Techniques

Memory Management:

let samples = try await Task {
    try autoreleasepool {
        try AudioProcessor.loadAudioAsFloatArray(fromPath: path)
    }
}.value

UI Performance:

  • LazyVStack for long lists
  • Combine timers for smooth updates (30fps)
  • Debounced progress updates (0.5s interval)
  • Background task execution

Audio Processing:

  • Chunking strategies (VAD/time-based)
  • Parallel worker processing
  • Automatic volume normalization

🔄 Model Lifecycle

Loading Process

1. Check Disk Space
   ↓
2. Initialize WhisperKit
   ↓
3. Download Model (if needed)
   ↓
4. Prewarm Model (optimize for device)
   ↓
5. Load Model (into memory)
   ↓
6. Ready for Transcription

Progress Stages:

  • Initialization: 0% - 20%
  • Prewarming: 20% - 70%
  • Loading: 70% - 100%

Download System

Features:

  • Concurrent downloads (max 3 simultaneous)
  • Weighted progress calculation by model size
  • Automatic retry on transient failures
  • Cancellation with cleanup
  • Network error detection

Error Handling:

enum DownloadError {
    case networkOffline          // No internet
    case networkLost             // Connection dropped
    case cannotConnectToServer   // Server unreachable
    case timeout                 // Request timeout
    case diskSpaceFull           // Insufficient storage
    case permissionDenied        // File access denied
    case fileNotFound            // Missing file
    case unknown(String)         // Other errors
}

🎵 Audio Processing

Waveform Generation

Process:

  1. Load audio as Float array
  2. Compute RMS (Root Mean Square) values
  3. Chunk into 1024-sample segments
  4. Generate visual representation

Implementation:

private func computeWaveform(from samples: [Float]) -> [Float] {
    let chunkSize = 1024
    var rmsValues = [Float]()
    var index = 0
    while index < samples.count {
        let chunk = samples[index..<min(index + chunkSize, samples.count)]
        let sumSquares = chunk.reduce(0) { $0 + $1 * $1 }
        let rms = sqrt(sumSquares / Float(chunk.count))
        rmsValues.append(rms)
        index += chunkSize
    }
    return rmsValues
}

Volume Normalization

Target: -14 LUFS (Loudness Units relative to Full Scale)

Process:

  1. Sample audio at multiple points
  2. Calculate average power (dB)
  3. Estimate LUFS
  4. Compute gain adjustment
  5. Apply attenuation only (no amplification)

🌐 Localization System

String Catalog

File: Localizable.xcstrings

Structure:

{
  "sourceLanguage": "en",
  "strings": {
    "key": {
      "localizations": {
        "en": { "stringUnit": { "value": "English text" } },
        "ko": { "stringUnit": { "value": "한국어 텍스트" } }
      }
    }
  }
}

Dynamic Language Switching

Implementation:

.environment(\.locale, Locale(identifier: appLanguage))

Features:

  • No app restart required
  • Instant UI update
  • Menu bar localization
  • Sheet content localization

🔍 Error Handling

Error Detection

Network Errors:

if nsError.domain == NSURLErrorDomain {
    switch nsError.code {
    case NSURLErrorNotConnectedToInternet:
        return .networkOffline
    case NSURLErrorNetworkConnectionLost:
        return .networkLost
    // ...
    }
}

File System Errors:

if nsError.domain == NSCocoaErrorDomain {
    switch nsError.code {
    case NSFileWriteOutOfSpaceError:
        return .diskSpaceFull
    case NSFileWriteNoPermissionError:
        return .permissionDenied
    // ...
    }
}

Error Recovery

Automatic:

  • CoreML cache clearing on space issues
  • Model reload retry on MPSGraph errors
  • Partial download cleanup on cancellation

User-Initiated:

  • Retry button on download failures
  • Manual cache clearing
  • State reset option

📈 Performance Optimization

Memory Optimization

Autoreleasepool:

try autoreleasepool {
    try AudioProcessor.loadAudioAsFloatArray(fromPath: path)
}

Resource Cleanup:

func releaseModel() async {
    await whisperKit?.unloadModels()
    whisperKit = nil
    clearCoreMLRuntimeCache()
}

UI Optimization

Lazy Loading:

LazyVStack {
    ForEach(models) { model in
        ModelRowView(model: model)
    }
}

Efficient Updates:

// 30fps playback timer
Timer.publish(every: 0.033, on: .main, in: .common)
    .autoconnect()
    .sink { [weak self] _ in
        self?.currentPlayerTime = player.currentTime
    }

🛠️ Development

Build Requirements

  • Xcode: 15.0+
  • macOS SDK: 15.0+
  • Swift: 5.0+
  • Swift Package Manager: For dependencies

Dependencies

dependencies: [
    .package(url: "https://github.com/argmaxinc/WhisperKit", from: "0.9.0")
]

Frameworks:

  • SwiftUI
  • AVFoundation
  • Core ML
  • AppKit
  • Combine
  • UniformTypeIdentifiers

Code Quality Tools

SwiftFormat:

mint run swiftformat .

Validation Scripts:

scripts/check_whitespace.sh      # Trailing whitespace check
scripts/check_filename_spaces.sh # Filename validation
scripts/check_copyright.sh       # Copyright header verification
scripts/style.sh                 # Code style check

🔄 CI/CD Pipeline

GitHub Actions

Build & Test Workflow

name: CaptionMate CI
on: [push, pull_request]
jobs:
  build:
    - Checkout code
    - Setup Xcode
    - Build macOS app
    - Run unit tests
    - Run UI tests

Code Quality Workflow

name: Code Quality Check
on: [push, pull_request]
jobs:
  quality:
    - Check trailing whitespace
    - Validate filenames
    - Verify copyright headers
    - Run SwiftFormat

📱 User Interface Components

Main Views

ContentView

  • Split view with sidebar and main content
  • Model selector in sidebar
  • Audio control and transcription in main area

ModelManagerView

  • Model search and filtering
  • Download progress visualization
  • Model size and status display
  • Grouped sections (Downloaded/Available)

SettingsView

  • Grouped settings (Format/Quality/Performance)
  • Slider controls with info buttons
  • Real-time preview of settings

AudioControlView

  • Waveform visualization
  • Playback controls
  • Volume and speed adjustment
  • File information display

TranscriptionView

  • Scrollable segment list
  • Timestamped text display
  • Text selection enabled
  • Auto-scroll to latest segment

🔗 WhisperKit Integration

Initialization

let config = WhisperKitConfig(
    computeOptions: ModelComputeOptions(
        audioEncoderCompute: .cpuAndNeuralEngine,
        textDecoderCompute: .cpuAndNeuralEngine
    ),
    verbose: true,
    logLevel: .debug
)
let whisperKit = try await WhisperKit(config)

Model Operations

Download:

let folder = try await WhisperKit.download(
    variant: modelName,
    from: repositoryName,
    progressCallback: { progress in
        // Update UI with progress.fractionCompleted
    }
)

Load:

try await whisperKit.prewarmModels()  // Optimize
try await whisperKit.loadModels()     // Load

Transcribe:

let results = try await whisperKit.transcribe(
    audioArray: samples,
    decodeOptions: options,
    callback: decodingCallback
)

📊 Data Flow

Transcription Flow

User Action (Import File)
    ↓
AudioProcessor.loadAudioAsFloatArray()
    ↓
ContentViewModel.transcribeAudioSamples()
    ↓
WhisperKit.transcribe()
    ↓
DecodingCallback (Progress Updates)
    ↓
TranscriptionResult
    ↓
UI Update (TranscriptionView)
    ↓
User Action (Export)
    ↓
ExportService.exportTranscriptionResult()
    ↓
NSSavePanel (File Selection)
    ↓
File Written to Disk

State Management Flow

User Input → ViewModel (@Published) → SwiftUI View
                ↓
         UserDefaults (@AppStorage)
                ↓
         Persistent Storage

🎯 Key Algorithms

Progress Calculation (Weighted Average)

private var totalDownloadProgress: Double {
    let downloadingModels = viewModel.modelManagementState.currentDownloadingModels
    if downloadingModels.isEmpty { return 0 }
    
    var totalSize: Int64 = 0
    var downloadedSize: Double = 0
    
    for model in downloadingModels {
        let modelSize = viewModel.modelManagementState.modelSizes[model] ?? 0
        let progress = Double(viewModel.modelManagementState.downloadProgress[model] ?? 0)
        
        totalSize += modelSize
        downloadedSize += Double(modelSize) * progress
    }
    
    return totalSize > 0 ? downloadedSize / Double(totalSize) : 0
}

Exponential Progress Bar

func updateProgressBar(startProgress: Float, targetProgress: Float, maxTime: TimeInterval) async {
    let progressRange = targetProgress - startProgress
    let decayConstant = -log(1 - 0.95) / Float(maxTime)
    let startTime = Date()
    
    while !Task.isCancelled {
        let elapsedTime = Date().timeIntervalSince(startTime)
        let decayFactor = exp(-decayConstant * Float(elapsedTime))
        let progressIncrement = progressRange * (1 - decayFactor)
        let currentProgress = startProgress + progressIncrement
        
        await MainActor.run {
            modelManagementState.loadingProgressValue = min(currentProgress, targetProgress)
        }
        
        if currentProgress >= targetProgress { break }
        try? await Task.sleep(nanoseconds: 200_000_000) // 0.2s
    }
}

🌐 Supported Languages

Interface Languages

  • English (en)
  • Korean (ko)

Transcription Languages

All languages supported by Whisper model:

  • English, Spanish, French, German, Italian, Portuguese
  • Chinese, Japanese, Korean
  • Russian, Arabic, Hindi
  • And 90+ more languages

📦 File Type Support

Input Formats (Audio/Video)

  • MP3
  • M4A (AAC)
  • WAV
  • FLAC
  • AIFF
  • CAF
  • Any format supported by AVFoundation

Output Formats (Subtitles)

  • SRT (SubRip)
  • WebVTT (Web Video Text Tracks)
  • JSON (JavaScript Object Notation)
  • FCPXML (Final Cut Pro XML 1.11)

UTType Declarations

extension UTType {
    static var srt: UTType {
        UTType(exportedAs: "com.CaptionMate.srtsubtitle")
    }
    
    static var vtt: UTType {
        UTType(exportedAs: "com.CaptionMate.vttsubtitle")
    }
    
    static var fcpxml: UTType {
        UTType(importedAs: "com.apple.finalcutpro.fcpxml")
    }
}

🔧 Build Configuration

Debug Configuration

  • Optimization Level: None (-Onone)
  • Debug Symbols: Yes
  • Assertions: Enabled
  • Dead Code Stripping: Yes

Release Configuration

  • Optimization Level: Aggressive (-O)
  • Debug Symbols: No
  • Assertions: Disabled
  • Dead Code Stripping: Yes
  • Whole Module Optimization: Yes

Deployment Target

  • macOS: 15.0 (Sequoia)
  • Architecture: Universal (Apple Silicon + Intel)

📐 Design Patterns

Reactive Programming

@Published var audioState: AudioState
// Automatically triggers UI updates when changed

Dependency Injection

struct ContentView: View {
    @ObservedObject var viewModel: ContentViewModel
    // ViewModel injected from parent
}

State Management

@AppStorage("selectedModel") var selectedModel: String
// Persistent storage with UserDefaults

Async/Await

func transcribeFile(path: String) {
    Task {
        try await transcribeCurrentFile(path: path)
    }
}

🎨 UI/UX Patterns

Progressive Disclosure

  • Basic controls visible by default
  • Advanced settings in separate view
  • Info buttons for detailed explanations

Immediate Feedback

  • Progress bars for long operations
  • Loading indicators for async tasks
  • Visual state changes (colors, animations)

Drag and Drop

.onDrop(of: [.fileURL], isTargeted: $isTargeted) { providers in
    handleDroppedFiles(providers: providers)
    return true
}

📚 API Reference

ContentViewModel Public Interface

Model Management

func loadModel(_ model: String, redownload: Bool = false)
func releaseModel() async
func downloadModel(_ model: String)
func cancelDownload(_ model: String)
func deleteModel(_ model: String)
func fetchModels()

Transcription

func transcribeFile(path: String)
func transcribeCurrentFile(path: String) async throws
func transcribeAudioSamples(_ samples: [Float]) async throws -> TranscriptionResult?

Audio Control

func playImportedAudio()
func pauseImportedAudio()
func stopImportedAudio()
func seekToPosition(_ position: Double)
func seekToPositionInLine(lineIndex: Int, secondsPerLine: Double, ratio: Double, totalDuration: Double)
func setVolume(_ volume: Double)
func toggleMute()
func changePlaybackRate(faster: Bool)
func currentPlaybackRateText() -> String
func skipForward()
func skipBackward()

File Management

func selectFile()
func handleFilePicker(result: Result<[URL], Error>)
func handleDroppedFiles(providers: [NSItemProvider])
func deleteImportedAudio()

Export

func exportTranscription() async

Settings

func changeAppLanguage(to language: String)
func getCurrentLanguageDisplayName() -> String
var appTheme: AppTheme { get set }

🔄 State Synchronization

@Published Properties

Automatically trigger UI updates when changed:

@Published var transcriptionState = TranscriptionState()
// Any change to transcriptionState updates all observing views

@AppStorage Properties

Persist across app launches:

@AppStorage("enableTimestamps") var enableTimestamps: Bool = true
// Saved to UserDefaults, survives app restart

Environment Values

Propagate through view hierarchy:

.environment(\.locale, Locale(identifier: appLanguage))
// All child views receive updated locale

🧮 Time Formatting

SRT Timestamp Format

// Input: 3661.5 seconds
// Output: "01:01:01,500"

func toSRTTimestamp() -> String {
    let hours = Int(self) / 3600
    let minutes = (Int(self) % 3600) / 60
    let seconds = Int(self) % 60
    let milliseconds = Int((self - Double(Int(self))) * 1000)
    return String(format: "%02d:%02d:%02d,%03d", hours, minutes, seconds, milliseconds)
}

Display Format

// Input: 3661.5 seconds
// Output: "01:01:01 - 01:01:06"

func formatTimeRange(to endTime: TimeInterval) -> String {
    let startFormatted = self.toSRTTimestamp().replacingOccurrences(of: ",", with: ".")
    let endFormatted = endTime.toSRTTimestamp().replacingOccurrences(of: ",", with: ".")
    return "\(startFormatted) - \(endFormatted)"
}

🎬 Export Implementation

SRT Writer

func writeSRT(segments: [TranscriptionSegment], to url: URL) throws {
    var srtContent = ""
    for (index, segment) in segments.enumerated() {
        srtContent += "\(index + 1)\n"
        srtContent += "\(segment.start.toSRTTimestamp()) --> \(segment.end.toSRTTimestamp())\n"
        srtContent += "\(segment.text)\n\n"
    }
    try srtContent.write(to: url, atomically: true, encoding: .utf8)
}

FCPXML Writer

func writeFCPXML(segments: [TranscriptionSegment], frameRate: Double, to url: URL) throws {
    let frameDuration = "100/\(Int(frameRate * 100))s"
    // Generate XML structure with proper timing
    // Convert seconds to frame-based timing
    // Write to file
}

🔍 Debugging

Logging

WhisperKit Verbose Mode:

let config = WhisperKitConfig(
    verbose: true,
    logLevel: .debug
)

Console Output:

  • Model loading progress
  • Transcription metrics
  • Error details
  • Performance statistics

Performance Metrics

Available Metrics:

struct TranscriptionState {
    var tokensPerSecond: TimeInterval
    var effectiveRealTimeFactor: TimeInterval
    var effectiveSpeedFactor: TimeInterval
    var totalInferenceTime: TimeInterval
    var currentFallbacks: Int
    var currentEncodingLoops: Int
    var currentDecodingLoops: Int
}

🚀 Deployment

Build Process

  1. Clean build folder
  2. Update version and build number
  3. Archive for distribution
  4. Automatic code signing
  5. Notarization submission
  6. App Store Connect upload

Version Numbering

Format: MAJOR.MINOR.PATCH

  • MAJOR: Breaking changes
  • MINOR: New features (backward compatible)
  • PATCH: Bug fixes

📞 Technical Support

System Information for Bug Reports

- macOS Version: [e.g., 15.0]
- App Version: [e.g., 1.0.0]
- Model Used: [e.g., openai_whisper-small]
- Audio Format: [e.g., MP3, 44.1kHz]
- File Duration: [e.g., 30 minutes]
- Error Message: [exact error text]

Contact

Email: parfume407@gmail.com
Response Time: 3-5 business days


📚 Additional Resources

Clone this wiki locally