Skip to content

Mattbusel/llm-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-audio

Whisper transcription, translation, and TTS in a single C++ header.

Features

  • Transcribe audio files (mp3, wav, m4a, ogg, webm) via OpenAI Whisper
  • Translate audio to English
  • Text-to-speech with 6 voices and 4 formats
  • Single-header, C++17, namespace llm

Quick Start

#define LLM_AUDIO_IMPLEMENTATION
#include "llm_audio.hpp"

llm::TranscribeConfig cfg{ .api_key = "sk-..." };
auto result = llm::transcribe("audio.mp3", cfg);
std::cout << result.text;

API

TranscribeResult transcribe(const std::string& filepath, const TranscribeConfig&);
TranscribeResult transcribe_bytes(const std::vector<uint8_t>&, const std::string& filename, const TranscribeConfig&);
void             text_to_speech(const std::string& text, const std::string& output_path, const TTSConfig&);
std::vector<uint8_t> text_to_speech_bytes(const std::string& text, const TTSConfig&);

Build

cmake -B build && cmake --build build

Requires libcurl (vcpkg: vcpkg install curl).

License

MIT — Mattbusel, 2026

See Also

Repo Purpose
llm-stream SSE streaming
llm-cache Response caching
llm-cost Token cost estimation
llm-retry Retry + circuit breaker
llm-format Markdown/code formatting
llm-embed Embeddings + cosine similarity
llm-pool Connection pooling
llm-log Structured logging
llm-template Prompt templates
llm-agent Tool-use agent loop
llm-rag Retrieval-augmented generation
llm-eval Output evaluation
llm-chat Multi-turn chat
llm-vision Vision/image inputs
llm-mock Mock LLM for testing
llm-router Model routing
llm-guard Content moderation
llm-compress Prompt compression
llm-batch Batch processing
llm-audio Audio transcription/TTS
llm-finetune Fine-tuning jobs
llm-rank Passage reranking
llm-parse HTML/markdown parsing
llm-trace Distributed tracing
llm-ab A/B testing
llm-json JSON parsing/building

About

Zero-dependency single-header C++ library for audio transcription and speech-to-text via LLM APIs. Supports Whisper and compatible endpoints.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors