Skip to content

Investigate Nvidia Parakeet MLX model for offline transcription #4

@eddmann

Description

@eddmann

Overview

Explore integrating Nvidia's Parakeet model via MLX as an additional offline transcription option alongside WhisperKit.

Background

Parakeet is Nvidia's ASR (Automatic Speech Recognition) model that has been ported to MLX for Apple Silicon. This could provide an alternative local transcription option with potentially different accuracy/performance characteristics compared to WhisperKit.

Investigation Tasks

1. Technical Feasibility

  • Review the Swift Parakeet MLX library compatibility with our current stack (Swift 6.0, macOS 14.0+)
  • Assess model sizes and download requirements
  • Compare performance characteristics with WhisperKit (speed, accuracy, memory usage)
  • Verify Apple Silicon compatibility and requirements

2. Architecture Integration

  • Determine how to implement TranscriptionService protocol for Parakeet
  • Identify any conflicts or integration challenges with existing WhisperKit service
  • Plan model download and management workflow (similar to WhisperKit)
  • Consider shared MLX infrastructure between Parakeet transcription and MLX post-processing

3. User Experience

  • Design UI for Parakeet model selection in Settings
  • Plan user messaging for when to choose Parakeet vs WhisperKit
  • Consider download size and storage implications for users
  • Determine default model selection strategy

4. Testing Requirements

  • Identify test scenarios for accuracy comparison
  • Plan performance benchmarking approach
  • Consider edge cases (background noise, accents, technical jargon)

Success Criteria

  • Parakeet model integrates seamlessly as a third transcription option
  • Users can switch between WhisperKit and Parakeet models easily
  • Performance metrics documented for both models
  • Clear guidance provided for users on which model to choose

Out of Scope

  • Replacing WhisperKit entirely (keep both options available)
  • Supporting Intel Macs (Parakeet requires MLX/Apple Silicon like current MLX service)

References

  • Current WhisperKit implementation: Services/WhisperKitService.swift
  • TranscriptionService protocol: Core/Domain/TranscriptionService.swift
  • MLX service pattern: Services/MLXService.swift

Priority: Medium
Effort: Medium-Large (requires research, prototyping, and integration)
Type: Enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions