AutoAudio converts a book file (EPUB, TXT, Markdown, or RST) into chapter and part audiobook files using ComfyUI + VibeVoice.
Install project dependencies:
python -m pip install -r requirements.txtAutoAudio uses ffmpeg and ffprobe for stitching audio and writing metadata. Make sure both are installed and on your PATH.
AutoAudio expects a running ComfyUI server and a compatible workflow/node setup:
- ComfyUI server reachable at
127.0.0.1:8188by default (or set--comfyui-server-address) - The VibeVoice Single Speaker custom node available in ComfyUI (
VibeVoiceSingleSpeakerNode) - A reference voice file available in ComfyUI's input files as
default_voice.wav- The bundled workflow
resources/workflows/vibevoice_single_speaker.jsonloads this filename by default.
- The bundled workflow
If you do not have a live ComfyUI runtime yet, you can still run pipeline logic with
--comfyui-mode spooffor testing/development.
- Start ComfyUI and verify the VibeVoice node loads correctly.
- Put your reference voice clip in ComfyUI input files as
default_voice.wav. - Choose an input book (
.epub,.txt,.md,.markdown, or.rst). - Run AutoAudio from CLI or GUI.
- Collect generated chapter/part files from your output directory (default:
audiobook_output/).
Basic run:
python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/outputRun with metadata fetch and MP3 output:
python auto_audiobook.py \
--input-book /path/to/book.epub \
--output-dir /path/to/output \
--fetch-metadata \
--output-format mp3Resume a prior compatible run checkpoint:
python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/output --resume yesLaunch desktop app:
python auto_audiobook.py --guiNotes:
- GUI mode requires
PySide6(already included inrequirements.txt). - In GUI, pick input/output paths, optionally enable Fetch metadata, then click Start.
- If a compatible checkpoint exists, the GUI enables Resume automatically.
--input-book <path>: input book file path.--output-dir <path>: output directory for generated files.--source-mode {auto,epub,text}: force source parser mode.--pages-per-chapter <int>: EPUB chapter grouping helper.--target-words-per-chapter <int>: text chapter sizing target.--min-paragraphs-per-chapter <int>: lower bound when grouping text chapters.--chapters-per-part <int>: how many chapter files per final "part" file.
--max-words-per-chunk <int>--diffusion-steps <int>--temperature <float>--top-p <float>--cfg-scale <float>--free-memory-after-generate(flag)
--output-format {flac,mp3,m4b}--fetch-metadata(flag; optional online Gutenberg/Gutendex lookup)--gutenberg-id <id>(manual Gutenberg ID override)--title <value>(manual title override)--author <value>(manual author override)
Metadata precedence is:
- User overrides (
--title,--author) - Embedded source metadata
- Fetched online metadata (if enabled)
- Fallback defaults
--comfyui-mode {network,spoof}--comfyui-server-address <host:port>--comfyui-timeout-seconds <float>--comfyui-spoof-scenario {success,timeout,malformed_history,missing_view_payload,connection_error}
--resume {auto,yes,no}--gui(launches desktop GUI instead of CLI pipeline run)
- Chapter files:
Chapter_###_<title>.<format> - Part files:
<book title> - Part_###.<format> - Segment cache:
<output-dir>/.segments/ - Run log:
<output-dir>/autoaudio_debug.log - Resume checkpoint state:
resources/.autoaudio_state/checkpoint_state.json
- Cannot connect to ComfyUI: verify server is running and address matches
--comfyui-server-address. - No audio generated: verify the VibeVoice node is installed and workflow-compatible.
- Missing reference voice: ensure
default_voice.wavexists in ComfyUI input files. - Metadata fetch gives nothing: this is optional; run without
--fetch-metadatato stay fully offline.
AutoAudio source code is licensed under the MIT License. See LICENSE.
Third-party dependencies are licensed under their own terms. See THIRD_PARTY_DEPENDENCIES.md.