AutoAudio

AutoAudio converts a book file (EPUB, TXT, Markdown, or RST) into chapter and part audiobook files using ComfyUI + VibeVoice.

What you need before running

1) Python and dependencies

Install project dependencies:

python -m pip install -r requirements.txt

2) System tools

AutoAudio uses ffmpeg and ffprobe for stitching audio and writing metadata. Make sure both are installed and on your PATH.

3) ComfyUI runtime requirements (required for real generation)

AutoAudio expects a running ComfyUI server and a compatible workflow/node setup:

ComfyUI server reachable at 127.0.0.1:8188 by default (or set --comfyui-server-address)
The VibeVoice Single Speaker custom node available in ComfyUI (VibeVoiceSingleSpeakerNode)
A reference voice file available in ComfyUI's input files as default_voice.wav
- The bundled workflow resources/workflows/vibevoice_single_speaker.json loads this filename by default.

If you do not have a live ComfyUI runtime yet, you can still run pipeline logic with --comfyui-mode spoof for testing/development.

Quick usage flow

Start ComfyUI and verify the VibeVoice node loads correctly.
Put your reference voice clip in ComfyUI input files as default_voice.wav.
Choose an input book (.epub, .txt, .md, .markdown, or .rst).
Run AutoAudio from CLI or GUI.
Collect generated chapter/part files from your output directory (default: audiobook_output/).

Run methods

CLI

Basic run:

python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/output

Run with metadata fetch and MP3 output:

python auto_audiobook.py \
  --input-book /path/to/book.epub \
  --output-dir /path/to/output \
  --fetch-metadata \
  --output-format mp3

Resume a prior compatible run checkpoint:

python auto_audiobook.py --input-book /path/to/book.epub --output-dir /path/to/output --resume yes

GUI

Launch desktop app:

python auto_audiobook.py --gui

Notes:

GUI mode requires PySide6 (already included in requirements.txt).
In GUI, pick input/output paths, optionally enable Fetch metadata, then click Start.
If a compatible checkpoint exists, the GUI enables Resume automatically.

CLI arguments

Input/output and source parsing

--input-book <path>: input book file path.
--output-dir <path>: output directory for generated files.
--source-mode {auto,epub,text}: force source parser mode.
--pages-per-chapter <int>: EPUB chapter grouping helper.
--target-words-per-chapter <int>: text chapter sizing target.
--min-paragraphs-per-chapter <int>: lower bound when grouping text chapters.
--chapters-per-part <int>: how many chapter files per final "part" file.

Generation tuning

--max-words-per-chunk <int>
--diffusion-steps <int>
--temperature <float>
--top-p <float>
--cfg-scale <float>
--free-memory-after-generate (flag)

Output and metadata

--output-format {flac,mp3,m4b}
--fetch-metadata (flag; optional online Gutenberg/Gutendex lookup)
--gutenberg-id <id> (manual Gutenberg ID override)
--title <value> (manual title override)
--author <value> (manual author override)

Metadata precedence is:

User overrides (--title, --author)
Embedded source metadata
Fetched online metadata (if enabled)
Fallback defaults

ComfyUI connection/runtime controls

--comfyui-mode {network,spoof}
--comfyui-server-address <host:port>
--comfyui-timeout-seconds <float>
--comfyui-spoof-scenario {success,timeout,malformed_history,missing_view_payload,connection_error}

Run control

--resume {auto,yes,no}
--gui (launches desktop GUI instead of CLI pipeline run)

Outputs and run artifacts

Chapter files: Chapter_###_<title>.<format>
Part files: <book title> - Part_###.<format>
Segment cache: <output-dir>/.segments/
Run log: <output-dir>/autoaudio_debug.log
Resume checkpoint state: resources/.autoaudio_state/checkpoint_state.json

Troubleshooting

Cannot connect to ComfyUI: verify server is running and address matches --comfyui-server-address.
No audio generated: verify the VibeVoice node is installed and workflow-compatible.
Missing reference voice: ensure default_voice.wav exists in ComfyUI input files.
Metadata fetch gives nothing: this is optional; run without --fetch-metadata to stay fully offline.

License

AutoAudio source code is licensed under the MIT License. See LICENSE.

Third-party dependencies are licensed under their own terms. See THIRD_PARTY_DEPENDENCIES.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
Docs		Docs
books		books
resources/workflows		resources/workflows
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_DEPENDENCIES.md		THIRD_PARTY_DEPENDENCIES.md
auto_audiobook.py		auto_audiobook.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoAudio

What you need before running

1) Python and dependencies

2) System tools

3) ComfyUI runtime requirements (required for real generation)

Quick usage flow

Run methods

CLI

GUI

CLI arguments

Input/output and source parsing

Generation tuning

Output and metadata

ComfyUI connection/runtime controls

Run control

Outputs and run artifacts

Troubleshooting

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AutoAudio

What you need before running

1) Python and dependencies

2) System tools

3) ComfyUI runtime requirements (required for real generation)

Quick usage flow

Run methods

CLI

GUI

CLI arguments

Input/output and source parsing

Generation tuning

Output and metadata

ComfyUI connection/runtime controls

Run control

Outputs and run artifacts

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages