# EPUB to Audiobook — Google Colab Pipeline

Convert EPUB files to `.m4b` audiobooks with chapter markers using
[Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) TTS.

**What you get:** An `.m4b` audiobook file with chapter markers and cover art,
ready for any audiobook player (Apple Books, Smart Audiobook Player, etc.).

**Requirements:** A Google account and an EPUB file. That's it — Colab provides the GPU.

### How to use

1. **Runtime → Change runtime type → T4 GPU** (if not already set)
2. Run each cell in order from top to bottom
3. Upload your EPUB when prompted
4. Configure voice and speed
5. Preview the book info
6. Run the conversion
7. Download your audiobook

A typical novel (80K–120K words) takes **5–15 minutes** on a T4 GPU.

---

In [None]:
#@title 1. Setup — Install dependencies and clone pipeline { display-mode: "form" }
#@markdown Run this cell once at the start of each Colab session.
#@markdown
#@markdown Set `REPO_URL` to your fork/clone of the repository.

REPO_URL = "https://github.com/Wondermonger-daydreaming/Claude-Code-Lab"  #@param {type:"string"}
SUBDIR = "corpus/code/audiblez-kokoro-pipeline"  #@param {type:"string"}

import subprocess, sys, os

# --- System dependencies ---
print("Installing system dependencies...")
subprocess.run(
    ["apt-get", "-qq", "install", "-y", "espeak-ng"],
    capture_output=True, check=True,
)
print("  espeak-ng installed")

# ffmpeg is pre-installed in Colab — just verify
result = subprocess.run(["ffmpeg", "-version"], capture_output=True)
assert result.returncode == 0, "ffmpeg not found — this shouldn't happen in Colab"
print("  ffmpeg verified")

# --- Python packages ---
# Install kokoro first, then audiblez with --no-deps to avoid version conflict.
# torch/numpy are pre-installed in Colab with CUDA support — don't reinstall.
print("\nInstalling Python packages (this takes 1-2 minutes)...")
subprocess.run(
    [sys.executable, "-m", "pip", "install", "-q", "kokoro>=0.9.4"],
    check=True,
)
subprocess.run(
    [sys.executable, "-m", "pip", "install", "-q", "audiblez", "--no-deps"],
    check=True,
)
subprocess.run(
    [sys.executable, "-m", "pip", "install", "-q",
     "ebooklib>=0.18", "beautifulsoup4>=4.12", "lxml>=5.0",
     "soundfile>=0.13.1", "misaki>=0.7.10"],
    check=True,
)
print("  Python packages installed")

# --- Clone repository ---
CLONE_DIR = "/content/repo"
PIPELINE_DIR = f"{CLONE_DIR}/{SUBDIR}"
if not os.path.exists(CLONE_DIR):
    print(f"\nCloning repository (shallow)...")
    subprocess.run(["git", "clone", "--depth", "1", REPO_URL, CLONE_DIR], check=True)
else:
    print(f"\nUpdating existing clone...")
    subprocess.run(["git", "-C", CLONE_DIR, "pull", "--ff-only"],
                   capture_output=True)

sys.path.insert(0, PIPELINE_DIR)
print(f"  Pipeline loaded from {PIPELINE_DIR}")

# --- GPU check ---
print("\n--- Device ---")
import torch
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    vram_gb = torch.cuda.get_device_properties(0).total_mem / 1e9
    print(f"  GPU: {gpu_name} ({vram_gb:.1f} GB VRAM)")
    print(f"  Expected synthesis speed: ~500 chars/sec")
else:
    print("  CPU only (no GPU detected)")
    print("  Expected synthesis speed: ~60 chars/sec")
    print("  Tip: Runtime > Change runtime type > T4 GPU")

print("\nSetup complete!")

## 2. Upload your EPUB

Click **Choose Files** below to upload an `.epub` file from your computer.

In [None]:
#@title Upload EPUB file { display-mode: "form" }

import os
from pathlib import Path
from google.colab import files

INPUT_DIR = Path("/content/input")
INPUT_DIR.mkdir(exist_ok=True)

print("Select your EPUB file:")
uploaded = files.upload()

if not uploaded:
    raise RuntimeError("No file uploaded. Please upload an .epub file.")

# Save uploaded file(s)
for filename, content in uploaded.items():
    dest = INPUT_DIR / filename
    dest.write_bytes(content)
    size_mb = len(content) / 1e6
    print(f"  Saved: {dest.name} ({size_mb:.1f} MB)")

# Use the first uploaded file
epub_path = INPUT_DIR / list(uploaded.keys())[0]
print(f"\nReady to process: {epub_path.name}")

## 3. Configure voice and speed

Choose a narrator voice and playback speed. Click the form fields below to change settings.

In [None]:
#@title Conversion settings { display-mode: "form" }

#@markdown ### Voice
voice = "af_sky"  #@param ["af_sky", "af_heart", "af_bella", "af_nicole", "af_sarah", "am_adam", "am_echo", "am_liam", "am_michael", "bf_emma", "bf_alice", "bf_lily", "bm_george", "bm_lewis", "bm_daniel"] {type:"string"}

#@markdown ### Speed
speed = 1.0  #@param {type:"slider", min:0.5, max:2.0, step:0.1}

#@markdown ### Text preprocessing
preprocess_text = True  #@param {type:"boolean"}
#@markdown Expand abbreviations (`Dr.` → `Doctor`), numbers (`42` → `forty-two`), etc.

print(f"Voice:         {voice}")
print(f"Speed:         {speed}x")
print(f"Preprocessing: {'enabled' if preprocess_text else 'disabled'}")
print()
print("Voice naming: {lang}{gender}_{name}")
print("  a = American, b = British")
print("  f = female,   m = male")

## 4. Preview your book

Parse the EPUB and display chapter info before committing GPU time.

In [None]:
#@title Preview book info { display-mode: "form" }

from src.epub_parser import parse_epub

metadata, chapters = parse_epub(epub_path)

total_words = sum(ch.word_count for ch in chapters)
total_duration = sum(ch.estimated_duration_sec for ch in chapters)
total_chars = sum(len(ch.text) for ch in chapters)
hours = int(total_duration // 3600)
minutes = int((total_duration % 3600) // 60)

print(f"{'Title:':<16} {metadata.title}")
print(f"{'Author:':<16} {metadata.author}")
print(f"{'Language:':<16} {metadata.language}")
print(f"{'Chapters:':<16} {len(chapters)}")
print(f"{'Words:':<16} {total_words:,}")
print(f"{'Characters:':<16} {total_chars:,}")
print(f"{'Est. duration:':<16} {hours}h {minutes:02d}m (at 1.0x speed)")
print(f"{'Cover image:':<16} {'yes' if metadata.cover_image else 'no'}")
print()

# Chapter table
print(f"{'#':<5} {'Chapter':<50} {'Words':>8}  {'Est.':>8}")
print("-" * 75)
for ch in chapters:
    title = (ch.title[:47] + "...") if len(ch.title) > 50 else ch.title
    dur_min = ch.estimated_duration_sec / 60
    print(f"{ch.index:<5} {title:<50} {ch.word_count:>8,}  {dur_min:>6.1f}m")

# Estimate conversion time
import torch
rate = 500 if torch.cuda.is_available() else 60
est_mins = total_chars / rate / 60
print(f"\nEstimated conversion time: ~{est_mins:.0f} minutes")

## 5. Convert

Run the full pipeline: **parse → preprocess → synthesize → assemble M4B**.

This is the GPU-intensive step. A typical novel takes 5–15 minutes on a T4.

In [None]:
#@title Convert EPUB to M4B audiobook { display-mode: "form" }

import time
import tempfile
from pathlib import Path

from src.epub_parser import parse_epub
from src.preprocess import preprocess
from src.kokoro_direct import init_pipeline, synthesize_chapter
from src.audio_utils import create_m4b, cleanup_chapter_wavs

OUTPUT_DIR = Path("/content/output")
OUTPUT_DIR.mkdir(exist_ok=True)

# --- Parse ---
print("Parsing EPUB...")
metadata, chapters = parse_epub(epub_path)
print(f"  {metadata.title} by {metadata.author} — {len(chapters)} chapters")

if not chapters:
    raise RuntimeError(
        "No chapters extracted. The EPUB may have an unusual structure "
        "or contain only images."
    )

# --- Preprocess ---
if preprocess_text:
    print("Preprocessing text...")
    for ch in chapters:
        ch.text = preprocess(ch.text)
    print("  Done")

# --- Init TTS ---
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"\nInitializing Kokoro TTS on {device}...")
pipeline = init_pipeline(lang_code=metadata.language, device=device)
print(f"  Voice: {voice}, Speed: {speed}x")

# --- Synthesize chapters ---
tmp_dir = Path(tempfile.mkdtemp(prefix="colab_tts_"))
wav_paths = []
total_chars = sum(len(ch.text) for ch in chapters)
processed_chars = 0
start_time = time.time()

print(f"\nSynthesizing {len(chapters)} chapters ({total_chars:,} characters)...")
print()

for i, ch in enumerate(chapters):
    ch_path = tmp_dir / f"chapter_{i:02d}.wav"

    synthesize_chapter(
        pipeline=pipeline,
        text=ch.text,
        voice=voice,
        speed=speed,
        output_path=ch_path,
    )
    wav_paths.append(ch_path)

    processed_chars += len(ch.text)
    elapsed = time.time() - start_time
    rate = processed_chars / elapsed if elapsed > 0 else 0
    remaining = (total_chars - processed_chars) / rate if rate > 0 else 0
    pct = processed_chars / total_chars * 100

    title_display = (ch.title[:35] + "...") if len(ch.title) > 38 else ch.title
    print(
        f"  [{i+1:>{len(str(len(chapters)))}}/{len(chapters)}] "
        f"{title_display:<38}  {pct:5.1f}%  ~{remaining/60:.0f}m left"
    )

synth_elapsed = time.time() - start_time
print(f"\nSynthesis complete in {synth_elapsed/60:.1f} minutes")

# --- Assemble M4B ---
print("\nAssembling M4B with chapter markers...")

cover_path = None
if metadata.cover_image:
    cover_path = tmp_dir / "cover.jpg"
    cover_path.write_bytes(metadata.cover_image)

m4b_filename = f"{epub_path.stem}.m4b"
m4b_path = OUTPUT_DIR / m4b_filename

chapter_titles = [ch.title for ch in chapters]
create_m4b(
    chapter_wav_files=wav_paths,
    chapter_titles=chapter_titles,
    output_path=m4b_path,
    cover_image=cover_path,
)

# --- Cleanup ---
cleanup_chapter_wavs(wav_paths)
if cover_path and cover_path.exists():
    cover_path.unlink()
try:
    tmp_dir.rmdir()
except OSError:
    pass

size_mb = m4b_path.stat().st_size / 1e6
total_elapsed = time.time() - start_time

print(f"\nDone!")
print(f"  Output: {m4b_path.name} ({size_mb:.1f} MB)")
print(f"  Total time: {total_elapsed/60:.1f} minutes")
print(f"  Avg speed: {total_chars / synth_elapsed:.0f} chars/sec")

## 6. Download your audiobook

Click **Run** to download the `.m4b` file to your browser.

In [None]:
#@title Download M4B { display-mode: "form" }

from google.colab import files

print(f"Downloading {m4b_path.name} ({m4b_path.stat().st_size / 1e6:.1f} MB)...")
files.download(str(m4b_path))

---

## (Optional) Multi-Voice Conversion

Assign different voices to different chapters. Edit the JSON config below,
then run the multi-voice conversion cell.

### Voice config format

```json
{
  "default_voice": "af_sky",
  "default_speed": 1.0,
  "chapters": {
    "default": {"voice": "af_sky", "speed": 1.0},
    "0":       {"voice": "am_adam", "speed": 0.9},
    "1-3":     {"voice": "af_heart"},
    "7":       {"voice": "bf_emma", "speed": 1.1}
  }
}
```

**Resolution order:** exact match (`"5"`) → range match (`"1-3"`) → `"default"` entry → top-level `default_voice`.

In [None]:
#@title Edit multi-voice config { display-mode: "form" }
#@markdown Edit the JSON string below, then run this cell to save it.

import json
from pathlib import Path

voice_config_json = '{"default_voice": "af_sky", "default_speed": 1.0, "chapters": {"default": {"voice": "af_sky", "speed": 1.0}, "0": {"voice": "am_adam", "speed": 0.9}}}' #@param {type:"string"}

# Parse and pretty-print to verify
voice_config = json.loads(voice_config_json)
print("Voice config:")
print(json.dumps(voice_config, indent=2))

# Save to file
config_path = Path("/content/voice_config.json")
config_path.write_text(json.dumps(voice_config, indent=2))
print(f"\nSaved to {config_path}")

In [None]:
#@title Run multi-voice conversion { display-mode: "form" }
#@markdown Uses the voice config saved above. Make sure you've already
#@markdown uploaded an EPUB (step 2).

from pathlib import Path
from src.multi_voice import convert_with_multi_voice

config_path = Path("/content/voice_config.json")
if not config_path.exists():
    raise RuntimeError("Run the 'Edit multi-voice config' cell first.")

OUTPUT_DIR = Path("/content/output")
OUTPUT_DIR.mkdir(exist_ok=True)

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Running multi-voice conversion on {device}...")
print(f"  EPUB: {epub_path.name}")
print(f"  Config: {config_path}")
print()

m4b_path = convert_with_multi_voice(
    epub_path=epub_path,
    config_path=config_path,
    output_dir=OUTPUT_DIR,
    use_preprocessing=preprocess_text,
    device=device,
)

size_mb = m4b_path.stat().st_size / 1e6
print(f"\nDone! {m4b_path.name} ({size_mb:.1f} MB)")
print("\nRun the Download cell above to save the file.")