Audiobook Maker

A modern desktop application for creating audiobooks with advanced text-to-speech and voice cloning capabilities

v1.1.1 - Hotfix for remote backend connectivity. See Release Notes.

v1.1.0 - Docker-based deployment, Remote GPU hosts, Engine variants. See Release Notes.

Overview

Audiobook Maker is a powerful Tauri 2.0 desktop application that transforms text into high-quality audiobooks using state-of-the-art text-to-speech technology. Built with a modern tech stack combining React, TypeScript, and Python FastAPI, it offers professional-grade features in an intuitive interface.

Key Features

Docker-Based Deployment - One-command setup with prebuilt containers for backend and engines
Remote GPU Hosts - Offload GPU-intensive engines to dedicated servers via SSH
Multi-Engine Architecture - 4 engine types (TTS, STT, Text Processing, Audio Analysis)
Engine Variants - Run engines locally (subprocess), in Docker, or on remote hosts
Voice Cloning - Create custom voices using XTTS, Chatterbox, or VibeVoice with speaker samples
Quality Assurance - Whisper-based transcription analysis and Silero-VAD audio quality detection
Pronunciation Rules - Pattern-based text transformation to fix mispronunciations
Project Organization - Hierarchical structure with Projects, Chapters, and Segments
Drag & Drop Interface - Intuitive content organization and reordering
Multi-Language Support - 17+ languages including English, German, Spanish, French, Chinese, Japanese
Multiple Export Formats - Export to MP3, M4A, or WAV with quality presets
Smart Text Segmentation - Automatic text splitting using spaCy NLP engine
Real-Time Updates - Server-Sent Events for instant UI feedback
Job Management - Database-backed queue, resume cancelled jobs, track progress
Markdown or EPUB Import - Import entire projects from structured files

Screenshots

Sample audio

Moby Dick Sample (Chatterbox)

Moby Dick Sample (VibeVoice)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Audiobook Maker Desktop App                   │
│                     (Tauri + React Frontend)                     │
└───────────────────────────┬─────────────────────────────────────┘
                            │ HTTP/REST API + SSE
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Backend Container (Port 8765)                    │
│              ghcr.io/digijoe79/audiobook-maker/backend           │
├─────────────────────────────────────────────────────────────────┤
│  FastAPI │ SQLite │ TTS/Quality Workers │ Engine Managers        │
│          │        │                     │ (Docker Runner)        │
└───────────────────────────┬─────────────────────────────────────┘
                            │ Docker API
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Local Docker │   │  Local Docker │   │ Remote Docker │
│    Engines    │   │    Engines    │   │  Host (GPU)   │
│ xtts, spacy   │   │whisper,silero │   │ xtts,whisper  │
└───────────────┘   └───────────────┘   └───────────────┘

Key Architecture Features:

Backend and engines run as Docker containers
GPU engines can run on remote hosts via SSH tunnel
Automatic engine discovery from online catalog
Engine enable/disable with auto-stop after inactivity
Real-time updates via Server-Sent Events (SSE)

Quick Start

Prerequisites

Requirement	Purpose	Installation
Docker Desktop	Run backend and engines	Download
NVIDIA Container Toolkit	GPU support (optional)	Install Guide

Note: For GPU-accelerated TTS (XTTS, Chatterbox, Whisper), you need an NVIDIA GPU with CUDA support and the NVIDIA Container Toolkit installed.

Installation

1. Download the Desktop App

Download the latest Windows release from GitHub Releases:

Windows: Audiobook-Maker_1.1.1_x64-setup.exe

Linux/macOS: No prebuilt binaries available. See Development Setup to build from source.

2. Pull the Backend Container

docker pull ghcr.io/digijoe79/audiobook-maker/backend:latest

3. Start the Backend

docker run -d \
  --name audiobook-maker-backend \
  -p 8765:8765 \
  --add-host=host.docker.internal:host-gateway \
  -e DOCKER_ENGINE_HOST=host.docker.internal \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v audiobook-data:/app/data \
  -v audiobook-media:/app/media \
  ghcr.io/digijoe79/audiobook-maker/backend:latest

Important: The container must be named audiobook-maker-backend. On startup, the backend cleans up orphaned engine containers (prefix audiobook-) from previous sessions. Containers matching this prefix are stopped unless explicitly excluded by name.

4. Launch the App

Start the Audiobook Maker desktop app
Connect to backend: http://localhost:8765
Go to Settings → Engines and install engines from the catalog
Create a speaker and start creating audiobooks!

Installing Engines

Engines are pulled automatically from the online catalog:

Open Settings → Engines
Browse available engines in the catalog
Click Install to pull the Docker image
Enable the engine and it starts automatically

See audiobook-maker-engines for the full list of available engines.

GPU Offloading to Remote Hosts

Run GPU-intensive engines on a dedicated server:

1. Prepare the Remote Host

# On the remote GPU server
# Install Docker and NVIDIA Container Toolkit
curl -fsSL https://get.docker.com | sh
# Follow NVIDIA Container Toolkit installation guide

2. Add Host in Audiobook Maker

Open Settings → Hosts
Click Add Host
Enter connection details:
- Host Name: e.g., "GPU Server"
- SSH URL: e.g., ssh://user@192.168.1.100
Click Generate SSH Key
Copy the displayed install command and run it on the remote host
Click Test Connection to verify
Click Save

3. Install Engines on Remote Host

Go to Settings → Hosts
Click on + for your remote host
Install (GPU) engines (XTTS, Whisper, etc.)
Engines run on the remote host, audio streams back to your machine

Usage Guide

Creating Your First Audiobook

Create a Project - Click "+" in the sidebar
Add Chapters - Organize your content
Add Segments - Upload text or type manually
Configure Voice - Select speaker and language
Generate Audio - Click "Generate All"
Export - Download as MP3/M4A/WAV

Voice Cloning

Navigate to Speakers view (Ctrl+3)
Click Add Speaker
Upload 1-3 WAV samples (3-30 seconds each)
Use the speaker in your segments

Quality Analysis

Generate audio for segments
Click quality indicator or use Analyze Chapter
Review transcription accuracy and audio metrics
Re-generate segments with issues

Pronunciation Rules

Navigate to Pronunciation view (Ctrl+4)
Create rules for mispronounced words
Rules are automatically applied during generation

Development Setup

For contributors who want to develop locally without Docker:

Development Installation (click to expand)

Prerequisites

Node.js 18+ - Download
Python 3.12+ - Download
Rust 1.70+ - Install
FFmpeg - Install Guide

Backend Setup

cd backend
python -m venv venv
venv\Scripts\activate      # Windows
source venv/bin/activate   # Linux/Mac
pip install -r requirements.txt

Engine Setup (Subprocess Mode)

Clone the engines repository:

git clone https://github.com/DigiJoe79/audiobook-maker-engines backend/engines

Set up individual engines:

cd backend/engines/tts/xtts
setup.bat   # Windows
./setup.sh  # Linux/Mac

Frontend Setup

cd frontend
npm install
npm run dev:tauri

Project Structure

audiobook-maker/
├── frontend/                 # Tauri + React desktop app
│   ├── src/                  # React components, hooks, stores
│   ├── src-tauri/            # Rust backend (Tauri)
│   └── e2e/                  # Playwright E2E tests
│
├── backend/                  # Python FastAPI backend
│   ├── api/                  # REST endpoints
│   ├── core/                 # Engine managers, Docker runner
│   ├── services/             # Business logic
│   └── Dockerfile            # Backend container definition
│
└── .github/workflows/        # CI/CD for container builds

API Documentation

When the backend is running:

Swagger UI: http://localhost:8765/docs
ReDoc: http://localhost:8765/redoc

Troubleshooting

Backend container won't start

# Check logs
docker logs audiobook-maker-backend

# Verify port is available
docker ps -a | grep 8765

Backend container stops immediately

The backend cleans up orphaned engine containers on startup. If your container is named differently than audiobook-maker-backend, it may be stopped as an orphan. Always use the exact name audiobook-maker-backend.

GPU not detected in containers

# Verify NVIDIA Container Toolkit
nvidia-smi
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Engine fails to start

Check engine logs in Monitoring → Activity
Verify Docker has enough resources (memory, disk)
For GPU engines, ensure NVIDIA Container Toolkit is installed

Remote host connection fails

Verify SSH key is in remote ~/.ssh/authorized_keys
Check firewall allows SSH (port 22)
Test manually: ssh user@host

Tech Stack

Frontend

Tauri 2.9 - Desktop framework
React 19 + TypeScript 5.9 - UI framework
Material-UI 7 - Component library
React Query 5 - Server state
Zustand 5 - Local state

Backend

Python 3.12 - Runtime
FastAPI - Web framework
SQLite 3 - Database
Docker SDK - Container management

Engines

TTS: XTTS v2, Chatterbox, VibeVoice
STT: Whisper (5 model sizes)
Text: spaCy (11 languages)
Audio: Silero-VAD

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

TTS Engines

Coqui TTS - XTTS v2 voice cloning engine
Chatterbox - Expressive TTS by Resemble AI
VibeVoice - Long-form multi-speaker TTS by Microsoft

Analysis Engines

OpenAI Whisper - Speech recognition
Silero VAD - Voice activity detection
spaCy - NLP text segmentation

Frameworks

Tauri - Desktop app framework
FastAPI - Python web framework

Support

Issues: GitHub Issues

Made with care by DigiJoe79

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
backend		backend
database		database
docs		docs
frontend		frontend
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

License

DigiJoe79/AudioBook-Maker

Folders and files

Latest commit

History

Repository files navigation

Audiobook Maker

Overview

Key Features

Screenshots

Sample audio

Architecture

Quick Start

Prerequisites

Installation

1. Download the Desktop App

2. Pull the Backend Container

3. Start the Backend

4. Launch the App

Installing Engines

GPU Offloading to Remote Hosts

1. Prepare the Remote Host

2. Add Host in Audiobook Maker

3. Install Engines on Remote Host

Usage Guide

Creating Your First Audiobook

Voice Cloning

Quality Analysis

Pronunciation Rules

Development Setup

Prerequisites

Backend Setup

Engine Setup (Subprocess Mode)

Frontend Setup

Project Structure

API Documentation

Troubleshooting

Backend container won't start

Backend container stops immediately

GPU not detected in containers

Engine fails to start

Remote host connection fails

Tech Stack

Frontend

Backend

Engines

License

Acknowledgments

TTS Engines

Analysis Engines

Frameworks

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 3

Languages

Packages