# Free EPUB to Audiobook Converter using KOKORO TTS

## ⚡ IMPORTANT: Enable GPU Runtime

**Before running this notebook:**
1. Go to **Runtime → Change runtime type → Hardware accelerator → GPU**
2. Click **Save**

### Performance Benefits:
- **With GPU:** 500 pages ≈ 16-18 minutes
- **Without GPU:** 500 pages ≈ 90+ minutes (6x slower!)

### What This Notebook Does:
1. **Extract text** from EPUB files
2. **Convert text to speech** using KOKORO TTS (creates multiple WAV files)
3. **Combine WAV files** into a single audiobook file

In [None]:
# Install required libraries for EPUB processing
!pip install ebooklib beautifulsoup4

In [None]:
from ebooklib import epub, ITEM_DOCUMENT
from bs4 import BeautifulSoup

def epub_to_text(epub_path):
    """
    Extracts plain text from an EPUB file.
    
    Args:
        epub_path (str): Path to the EPUB file
        
    Returns:
        str: Complete text content of the EPUB
    """
    # Read the EPUB file
    book = epub.read_epub(epub_path)
    
    texts = []
    
    # Iterate through all items in the EPUB
    for item in book.get_items():
        # Filter for document items (actual content)
        if item.get_type() == ITEM_DOCUMENT:
            # Parse HTML content using BeautifulSoup
            soup = BeautifulSoup(item.get_content(), "html.parser")
            
            # Extract clean text, removing HTML tags
            text = soup.get_text(separator=" ", strip=True)
            
            # Add non-empty text to our collection
            if text:
                texts.append(text)
    
    # Join all text chunks with double newlines for paragraph separation
    return "\n\n".join(texts)

# Path to your EPUB file - UPDATE THIS with your actual file path
epub_path = "/content/Higher Vibrations for Health, Happine... (Z-Library).epub"

# Extract text from EPUB
full_text = epub_to_text(epub_path)

# Display basic statistics
print(f"Total characters: {len(full_text)}")
print(full_text[:2000])  # Show first 2000 characters as preview

## Text Chunking

The text is split into smaller chunks to:
- Stay within TTS model input limits
- Allow progress tracking
- Enable parallel processing if needed

In [None]:
def chunk_text(text, max_chars=1000):
    """
    Splits text into manageable chunks for TTS processing.
    
    Args:
        text (str): Full text to chunk
        max_chars (int): Maximum characters per chunk
        
    Returns:
        list: List of text chunks
    """
    chunks = []
    current = ""
    
    # Split by sentences to maintain natural breaks
    for sentence in text.split(". "):
        # Check if adding this sentence would exceed the limit
        if len(current) + len(sentence) < max_chars:
            current += sentence + ". "
        else:
            # Save current chunk and start new one
            chunks.append(current.strip())
            current = sentence + ". "
    
    # Add the final chunk if it's not empty
    if current.strip():
        chunks.append(current.strip())
    
    return chunks

# Split the full text into chunks
chunks = chunk_text(full_text)
print(f"Total chunks: {len(chunks)}")

## Text-to-Speech Conversion

This section uses **KOKORO TTS** to convert text chunks into audio.

### What Happens:
- Each text chunk is processed individually
- Multiple WAV files are created (one per chunk)
- Audio segments are stored for later combination

In [None]:
# Install KOKORO TTS and audio processing libraries
!pip install kokoro torch soundfile

In [None]:
from kokoro import KPipeline
import numpy as np
import soundfile as sf
import torch

# Initialize the KOKORO TTS pipeline
# lang_code="a" = American English
# Other options: 'b' (British), 'f' (French), 'g' (German), 'i' (Italian), 'j' (Japanese), etc.
pipeline = KPipeline(lang_code="a")

# Store all audio segments
all_audio = []

print("\n" + "="*60)
print("STARTING TEXT-TO-SPEECH CONVERSION")
print("This will create multiple WAV files, one for each chunk")
print("="*60 + "\n")

In [None]:
# Process each text chunk and convert to speech
for idx, chunk in enumerate(chunks):
    print(f"Processing chunk {idx+1}/{len(chunks)}")
    
    # Generate audio using KOKORO TTS
    # voice="af_heart" uses a female American English voice
    generator = pipeline(chunk, voice="af_heart")
    
    # Process the generated audio segments
    for _, _, audio in generator:
        all_audio.append(audio)

print("\n" + "="*60)
print("COMBINING ALL AUDIO CHUNKS INTO SINGLE FILE")
print("="*60 + "\n")

## Combine Audio Files

All individual audio chunks are now combined into a single audiobook file.

In [None]:
# Concatenate all audio segments into one waveform
final_audio = np.concatenate(all_audio)

# Save the complete audiobook as WAV file
sf.write("audiobook.wav", final_audio, 24000)
print("✓ Complete audiobook saved as 'audiobook.wav'")

# Convert WAV to M4B format (audiobook format compatible with most players)
print("\nConverting to M4B format for better compatibility...")
!ffmpeg -i audiobook.wav -c:a aac -b:a 48k audiobook.m4b
print("✓ Audiobook saved as 'audiobook.m4b'")

## ✅ Processing Complete!

Your audiobook files are ready:
- **audiobook.wav** - High quality WAV format
- **audiobook.m4b** - Compressed M4B format for easy sharing

### Download Instructions:
1. Look at the file browser on the left side of Colab
2. Find `audiobook.wav` and `audiobook.m4b`
3. Right-click and select "Download"

### Processing Summary:
- **Input:** EPUB file with ~500 pages
- **Output:** Complete audiobook in multiple formats
- **Processing time:** ~16-18 minutes with GPU

## Alternative Method: Individual WAV Files

This section shows how to create and combine individual WAV files for each segment.
Useful if you want to:
- Review individual segments
- Edit specific parts
- Process segments separately

In [None]:
# Install required system packages
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
!pip install kokoro torch PyPDF2 soundfile

In [None]:
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch

# Initialize the Kokoro TTS pipeline
# 'a' = American English. You can switch this later to 'b' (British), 'f' (French), etc.
pipeline = KPipeline(lang_code='a')

# Sample text for demonstration
text = '''This is the first short test paragraph. It introduces the idea clearly and simply.

This is the second paragraph. It adds a small amount of continuation without extra detail.

This is the third paragraph. It closes the example in a clean, minimal way.
'''

print("Processing sample text and creating individual WAV files...")

In [None]:
# Generate audio and save each segment as a separate WAV file
generator = pipeline(text, voice='af_heart')

for i, (gs, ps, audio) in enumerate(generator):
    print(f"Creating segment {i}.wav")
    print("Graphemes:", gs)
    print("Phonemes:", ps)
    
    # Display audio player in notebook
    display(Audio(data=audio, rate=24000, autoplay=False))
    
    # Save individual WAV file
    sf.write(f'segment_{i}.wav', audio, 24000)

print("\n✓ Individual WAV files created")

## Combine WAV Files Function

This function combines multiple WAV files into a single audiobook file.

In [None]:
# Function to combine multiple WAV files into a single file
def combine_wav_files(file_prefix, output_filename="combined_audiobook.wav"):
    """
    Combines multiple WAV files with a common prefix into a single WAV file.
    
    Args:
        file_prefix (str): Prefix of WAV files to combine (e.g., 'segment_' for segment_0.wav, segment_1.wav)
        output_filename (str): Name of the output combined WAV file
    """
    import glob
    import numpy as np
    import soundfile as sf
    
    # Find all WAV files matching the prefix
    wav_files = sorted(glob.glob(f"{file_prefix}*.wav"))
    
    if not wav_files:
        print(f"No WAV files found with prefix '{file_prefix}'")
        return
    
    print(f"\nFound {len(wav_files)} WAV files to combine:")
    for file in wav_files:
        print(f"  - {file}")
    
    # Read and combine all audio files
    combined_audio = []
    sample_rate = None
    
    for wav_file in wav_files:
        audio, sr = sf.read(wav_file)
        if sample_rate is None:
            sample_rate = sr
        combined_audio.append(audio)
        print(f"✓ Loaded {wav_file} ({len(audio)} samples)")
    
    # Concatenate all audio arrays
    final_audio = np.concatenate(combined_audio)
    
    # Save the combined audio
    sf.write(output_filename, final_audio, sample_rate)
    print(f"\n✓ Combined audiobook saved as '{output_filename}'")
    print(f"  Total samples: {len(final_audio)}")
    print(f"  Duration: {len(final_audio)/sample_rate:.2f} seconds")
    
    return output_filename

In [None]:
# Combine the individual segment files
print("\n" + "="*60)
print("COMBINING INDIVIDUAL WAV FILES")
print("="*60)

combined_file = combine_wav_files("segment_", "final_audiobook.wav")

print("\n" + "="*60)
print("ALL TASKS COMPLETED SUCCESSFULLY!")
print("="*60)
print("\nYour audiobook files are ready:")
print("  - audiobook.wav (direct method)")
print("  - audiobook.m4b (M4B format)")
print(f"  - {combined_file} (combined from segments)")
print("\nYou can download these files from the Colab file browser on the left.")