swift-tts-wrapper

A native Swift package that provides a unified interface for working with local offline and cloud-based Text-to-Speech (TTS) services on iOS and macOS. Inspired by js-tts-wrapper, it simplifies speech synthesis across multiple engines with a single consistent API.

Features

Unified API: A single protocol (TTSClient) with consistent methods for speech synthesis and playback control.
21 Engines: System, 19 cloud REST APIs, and on-device sherpa-onnx (VITS/Kokoro/Matcha/MMS).
Zero Dependencies: Core SwiftTTSWrapper target has no third-party dependencies (Polly signs requests with CryptoKit).
Real-Time Streaming: 16 cloud engines stream audio incrementally; system engine streams from synthesizer callbacks.
Word-Level Timing: ElevenLabs and System engines provide real word timestamps from the API; all others use a heuristic estimator.
On-Device TTS: Optional sherpa-onnx integration for fully offline synthesis with 1300+ models.
macOS 13+ & iOS 16+: Supports both platforms with swift-tools-version: 5.9.

Installation

Core Package (zero dependencies)

// Package.swift
dependencies: [
    .package(url: "https://github.com/willwade/swift-tts-wrapper.git", from: "0.1.0"),
],
targets: [
    .target(name: "YourApp", dependencies: ["SwiftTTSWrapper"]),
]

With sherpa-onnx (on-device TTS)

Add both packages:

dependencies: [
    .package(url: "https://github.com/willwade/swift-tts-wrapper.git", from: "0.1.0"),
    .package(url: "https://github.com/willwade/sherpa-onnx-spm.git", from: "1.13.3"),
],
targets: [
    .target(name: "YourApp", dependencies: ["SwiftTTSWrapperSherpaOnnx"]),
]

Supported Engines

System & On-Device

Engine ID	Client Class	Streaming	Word Events	Notes
`system`	`SystemTTSClient`	Real (AVSpeechSynthesizer.write)	Real (delegate callbacks)	Offline, 44 languages
`sherpaonnx`	`SherpaOnnxTTSClient`	Chunked (generates fully then yields)	Estimated	On-device, 1300+ models

Cloud REST APIs

Engine ID	Client Class	Streaming	Word Events	Audio Format
`elevenlabs`	`ElevenLabsTTSClient`	Real	Real (character alignment API)	MP3
`openai`	`OpenAITTSClient`	Real	Estimated	MP3/Opus/AAC/FLAC/WAV
`cartesia`	`CartesiaTTSClient`	Real	Estimated	PCM WAV
`playht`	`PlayHTTTSClient`	Real	Estimated	MP3
`deepgram`	`DeepgramTTSClient`	Real	Estimated	varies
`fishaudio`	`FishAudioTTSClient`	Real	Estimated	varies
`hume`	`HumeTTSClient`	Real	Estimated	varies
`mistral`	`MistralTTSClient`	Real (SSE)	Estimated	MP3
`murf`	`MurfTTSClient`	Real	Estimated	MP3
`polly`	`PollyTTSClient`	Real	Estimated	MP3/PCM/OGG
`resemble`	`ResembleTTSClient`	Real	Estimated	WAV/PCM
`unrealspeech`	`UnrealSpeechTTSClient`	Real	Estimated	MP3
`upliftai`	`UpliftAITTSClient`	Real	Estimated	MP3
`watson`	`WatsonTTSClient`	Real	Estimated	WAV
`witai`	`WitAITTSClient`	Real	Estimated	PCM/MP3/WAV
`xai`	`XAITTSClient`	Real	Estimated	varies
`azure`	`AzureTTSClient`	No (collects full response)	Estimated	MP3
`google`	`GoogleTTSClient`	No (collects full response)	Estimated	MP3
`modelslab`	`ModelsLabTTSClient`	Buffered (may poll)	Estimated	varies

Streaming: "Real" = audio chunks arrive incrementally from the API. "Buffered" = full audio collected before yielding. "Chunked" = audio generated locally then split into chunks.

Word Events: "Real" = the API returns native word/character timestamps. "Estimated" = timing is approximated using WordTimingEstimator (assumes ~150 WPM, scaled by word length).

Quick Start

Instantiating a Client

import SwiftTTSWrapper

// System (offline, no credentials needed)
let client = TTSClientFactory.create(engine: .system)

// Cloud engine
let client = TTSClientFactory.create(
    engine: .elevenlabs,
    credentials: ["apiKey": "your-api-key"]
)

// sherpa-onnx (on-device)
let client = TTSClientFactory.create(engine: .sherpaonnx)

Speech with Word Boundaries

let options = SpeakOptions(useWordBoundary: true)

client.onBoundary = { boundary in
    print("Word: \(boundary.text) at \(boundary.offset)ms")
}
client.onEnd = { print("Done") }

try await client.speak("Hello, world!", options: options)

Direct Audio Generation

let audioBytes = try await client.synthToBytes("Generate audio data")
// Returns raw Data (MP3, WAV, etc. depending on engine)

Voice Listing

let voices = try await client.getVoices()
for voice in voices {
    print("\(voice.name) - \(voice.languageCodes.first?.display ?? "")")
}

sherpa-onnx On-Device

import SwiftTTSWrapperSherpaOnnx

// Download a model
let manager = SherpaOnnxModelManager()
let catalog = SherpaOnnxModelsCatalog.loadBundled()
let entry = catalog["piper-en-ryan-low"]!
try await manager.downloadAndExtract(entry: entry)

// Create engine
let engine = SherpaOnnxDefaultEngine()
let paths = manager.resolveModelPaths(modelId: "piper-en-ryan-low")
try engine.initialize(
    modelPath: paths.modelPath,
    tokensPath: paths.tokensPath,
    voiceDir: paths.voiceDir,
    modelType: .vits,
    dataDir: paths.dataDir,
    lexiconPath: paths.lexiconPath,
    voicesPath: paths.voicesPath,
    vocoderPath: paths.vocoderPath,
    dictDir: paths.dictDir
)

// Use with the client
let client = SherpaOnnxTTSClient(engine: engine)
try await client.speak("Hello from on-device TTS!")

Structure

Sources/
├── SwiftTTSWrapper/
│   ├── Types.swift                    // Shared options, formats, voices, boundaries
│   ├── TTSClient.swift               // Unified protocol & abstract base client
│   ├── TTSClientFactory.swift        // Factory enum with all 21 engine cases
│   ├── Engines/
│   │   ├── SystemTTSClient.swift     // AVSpeechSynthesizer (offline, streaming)
│   │   ├── OpenAITTSClient.swift     // OpenAI REST API
│   │   ├── ElevenLabsTTSClient.swift // ElevenLabs REST API (real word timestamps)
│   │   ├── AzureTTSClient.swift      // Azure Cognitive Services
│   │   ├── GoogleTTSClient.swift     // Google Cloud TTS
│   │   ├── CartesiaTTSClient.swift   // Cartesia REST API
│   │   ├── PlayHTTTSClient.swift     // PlayHT REST API
│   │   ├── DeepgramTTSClient.swift   // Deepgram REST API
│   │   ├── FishAudioTTSClient.swift  // Fish Audio REST API
│   │   ├── HumeTTSClient.swift       // Hume REST API
│   │   ├── MistralTTSClient.swift    // Mistral SSE streaming
│   │   ├── ModelsLabTTSClient.swift  // ModelsLab REST API
│   │   ├── MurfTTSClient.swift       // Murf REST API
│   │   ├── PollyTTSClient.swift      // Amazon Polly (native SigV4, no AWS SDK)
│   │   ├── ResembleTTSClient.swift   // Resemble AI REST API
│   │   ├── UnrealSpeechTTSClient.swift
│   │   ├── UpliftAITTSClient.swift
│   │   ├── WatsonTTSClient.swift     // IBM Watson (IAM token refresh)
│   │   ├── WitAITTSClient.swift      // Wit.ai REST API
│   │   ├── XAITTSClient.swift        // xAI REST API
│   │   └── SherpaOnnxTTSClient.swift // On-device sherpa-onnx
│   ├── Utils/
│   │   ├── AudioPlayer.swift         // AVAudioPlayer with boundary timer
│   │   ├── WordTimingEstimator.swift // Heuristic timing for engines without API support
│   │   ├── SherpaOnnxEngine.swift    // Engine protocol, stub, WAV converter
│   │   ├── SherpaOnnxModels.swift    // Model catalog types & loader
│   │   └── SherpaOnnxModelManager.swift // Download/extract/cache models
│   └── Resources/
│       └── merged_models.json        // 1300+ sherpa-onnx model catalog
└── SwiftTTSWrapperSherpaOnnx/
    └── SherpaOnnxDefaultEngine.swift // Default engine calling C API directly

Requirements

Swift 5.9+
macOS 13+ / iOS 16+
Xcode 15+ (for XCTest runner)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Sources		Sources
Tests/SwiftTTSWrapperTests		Tests/SwiftTTSWrapperTests
.gitattributes		.gitattributes
.gitignore		.gitignore
.spi.yml		.spi.yml
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

swift-tts-wrapper

Features

Installation

Core Package (zero dependencies)

With sherpa-onnx (on-device TTS)

Supported Engines

System & On-Device

Cloud REST APIs

Quick Start

Instantiating a Client

Speech with Word Boundaries

Direct Audio Generation

Voice Listing

sherpa-onnx On-Device

Structure

Requirements

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

swift-tts-wrapper

Features

Installation

Core Package (zero dependencies)

With sherpa-onnx (on-device TTS)

Supported Engines

System & On-Device

Cloud REST APIs

Quick Start

Instantiating a Client

Speech with Word Boundaries

Direct Audio Generation

Voice Listing

sherpa-onnx On-Device

Structure

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages