Skip to content

cornball-ai/tts.api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tts.api

An R client for Text-to-Speech APIs.

Supports multiple backends:

  • OpenAI-compatible: OpenAI, Chatterbox, Qwen3-TTS, LM Studio, OpenWebUI, AnythingLLM
  • ElevenLabs: Separate API with voice cloning and multilingual models

Installation

# From CRAN (once available)
install.packages("tts.api")

# Development version
remotes::install_github("cornball-ai/tts.api")

Backend Setup

Chatterbox (Local, OpenAI-compatible)

Use the cornball-ai/chatterbox-tts-api fork:

git clone https://github.com/cornball-ai/chatterbox-tts-api.git
cd chatterbox-tts-api

# For newer Nvidia GPUs (Blackwell/50xx series)
docker build -f docker/Dockerfile.blackwell -t chatterbox-tts:blackwell .

docker run -d \
  --name chatterbox-blackwell \
  --gpus all \
  -p 7810:4123 \
  -v $(pwd)/cache:/cache \
  -v $(pwd)/voices:/voices \
  chatterbox-tts:blackwell

See upstream repo for CPU and other GPU options.

Qwen3-TTS (Local, OpenAI-compatible)

Qwen3-TTS supports voice cloning, voice design, and multilingual synthesis.

docker run -d --gpus all --network=host --name qwen3-tts-api \
  -v ~/.cache/huggingface:/cache \
  -e PORT=7811 \
  -e USE_FLASH_ATTENTION=false \
  qwen3-tts-api:blackwell

Built-in voices: Vivian, Serena, Uncle_Fu, Dylan, Eric, Ryan, Aiden, Ono_Anna, Sohee

OpenAI

  1. Create an account at https://platform.openai.com
  2. Generate an API key at https://platform.openai.com/api-keys
  3. Set the environment variable OPENAI_API_KEY

ElevenLabs

  1. Create an account at https://elevenlabs.io
  2. Get your API key from https://elevenlabs.io/app/settings/api-keys
  3. Set the environment variable ELEVENLABS_API_KEY

Usage

Setup

library(tts.api)

# For local Chatterbox server (OpenAI-compatible)
set_tts_base("http://localhost:7810")

# For local Qwen3-TTS server
set_tts_base("http://localhost:7811")

# For OpenAI
set_tts_base("https://api.openai.com")
set_tts_key(Sys.getenv("OPENAI_API_KEY"))

# For ElevenLabs (separate API key)
set_elevenlabs_key(Sys.getenv("ELEVENLABS_API_KEY"))

Check server health

tts_health()
#> $ok
#> [1] TRUE
#>
#> $status
#> [1] "OK (/health)"

List available voices

voices()

Generate speech

# Basic usage (uses configured base URL)
tts(
  input = "Hello, world!",
  voice = "alloy",
  file = "hello.mp3"
)

# OpenAI with voice instructions
tts(
  input = "Today is a wonderful day to build something people love!",
  voice = "coral",
  file = "speech.mp3",
  backend = "openai",
  model = "gpt-4o-mini-tts",
  instructions = "Speak in a cheerful and positive tone."
)

# Chatterbox with custom parameters
tts(
  input = "Hello with my custom voice!",
  voice = "MyCustomVoice",
  file = "speech.wav",
  temperature = 0.9,
  exaggeration = 1.2,
  cfg_weight = 0.3
)

# ElevenLabs (different API, not OpenAI-compatible)
tts(
  input = "Hello from ElevenLabs!",
  voice = "21m00Tcm4TlvDq8ikWAM",  # Rachel voice ID
  file = "hello_eleven.mp3",
  backend = "elevenlabs",
  stability = 0.5,
  similarity_boost = 0.75
)

# Qwen3-TTS with built-in voice
tts(
  input = "Hello from Qwen3!",
  voice = "Vivian",
  file = "hello_qwen3.wav",
  backend = "qwen3"
)

# Return raw bytes (useful for Shiny)
audio_bytes <- tts(
  input = "Hello!",
  voice = "alloy"
)

Voice cloning (Qwen3-TTS)

# Fast mode (x-vector only, no transcript needed)
speech_clone(
  input = "Hello in my cloned voice!",
  voice_file = "reference.wav",
  x_vector_only = TRUE,
  file = "cloned.wav",
  backend = "qwen3"
)

# High quality mode (with transcript)
speech_clone(
  input = "Hello in my cloned voice!",
  voice_file = "reference.wav",
  ref_text = "This is what I said in the recording.",
  file = "cloned.wav",
  backend = "qwen3"
)

Voice design (Qwen3-TTS only)

Create a custom voice from a natural language description:

speech_design(
  input = "Hello, I am your AI assistant.",
  voice_description = "A warm, professional female voice with a slight British accent",
  file = "designed_voice.wav"
)

Voice management (Chatterbox)

Upload a voice to the library for reuse:

# Upload once
voice_upload(
  voice_file = "my_voice.wav",
  voice_name = "my-custom-voice"
)

# With language
voice_upload(
  voice_file = "french_voice.wav",
  voice_name = "french-speaker",
  language = "fr"
)

# Use the saved voice by name
tts(
  input = "Hello with my custom voice!",
  voice = "my-custom-voice",
  file = "output.wav"
)

# Or for one-off cloning (uploads and generates in one call)
speech_clone(
  input = "Hello with my custom voice!",
  voice_file = "my_voice.mp3",
  file = "output.wav",
  exaggeration = 0.8
)

Parameters

tts()

Parameter Backend Description
input All Text to convert to speech
voice All Voice name or ID
file All Output file path (NULL returns raw bytes)
backend - "auto", "native", "chatterbox", "qwen3", "openai", "elevenlabs", or "fal"
model OpenAI, ElevenLabs Model name
instructions OpenAI Voice style instructions
temperature Chatterbox Sampling temperature
speed OpenAI, Chatterbox Playback speed multiplier
exaggeration Chatterbox Voice exaggeration
cfg_weight Chatterbox CFG weight
stability ElevenLabs Voice stability (0-1)
similarity_boost ElevenLabs Similarity boost (0-1)
seed Chatterbox Random seed for reproducibility
response_format OpenAI, Chatterbox Audio format

voice_upload() (Chatterbox)

Parameter Description
voice_file Path to voice sample file
voice_name Name to save the voice as
language Language code (e.g., "en", "fr")

speech_clone() (Chatterbox/Qwen3-TTS)

Parameter Backend Description
input All Text to convert to speech
voice_file All Path to voice sample file
file All Output file path (NULL returns raw bytes)
backend - "auto", "chatterbox", or "qwen3"
ref_text Qwen3 Transcript of reference audio (high quality)
x_vector_only Qwen3 Use only speaker embedding (faster)
language Qwen3 Language for synthesis
exaggeration Chatterbox Voice exaggeration
temperature All Sampling temperature
cfg_weight Chatterbox CFG weight
speed All Playback speed multiplier
seed All Random seed for reproducibility

speech_design() (Qwen3-TTS only)

Parameter Description
input Text to convert to speech
voice_description Natural language description of desired voice
file Output file path (NULL returns raw bytes)
language Language for synthesis (default "English")

Configuration functions

Function Purpose
set_tts_base() Set OpenAI-compatible API base URL
set_tts_key() Set OpenAI-compatible API key
set_elevenlabs_key() Set ElevenLabs API key

Health checks

Function Description
tts_health() Check server health (uses configured base URL)
chatterbox_available() Check if Chatterbox is running on port 7810
qwen3_available() Check if Qwen3-TTS is running on port 7811

Other functions

  • voices() - List available voices (OpenAI-compatible backends)
  • languages() - List supported languages

Dependencies

  • curl
  • jsonlite

About

No description, website, or topics provided.

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages