Skip to content

LucaCappelletti94/talking-snake

Repository files navigation

Talking Snake

Talking Snake

PyPI CI codecov

PDF and web page to speech using Qwen3-TTS. Upload a document or URL, get it read aloud with 9 natural voices across English, Chinese, Japanese, and Korean. Audio streams progressively while generation continues.

Deploy Your Own

Deploy on Hugging Face Spaces

Click the button above to deploy your own GPU-powered instance. You'll be prompted to create a Hugging Face account and select hardware (L4 or A100 recommended for speed, ~$0.80-$4/hr).

Run Locally

Requires Python 3.11+, NVIDIA GPU (~6GB VRAM), and SoX (apt install sox libsox-dev). The GPU will be automatically freed if the app is idle for 5+ minutes. It can also run on CPU (no GPU, but much slower).

uv sync && uv run --no-sync talking-snake --port 8888  # Open http://localhost:8888

Flash Attention (Optional, ~2x faster)

Flash Attention requires matching your CUDA driver version. Check yours with nvidia-smi (top right shows "CUDA Version").

  1. Find a prebuilt wheel at flashattn.dev matching your:

    • CUDA version (e.g., cu130 for CUDA 13.0)
    • PyTorch version (e.g., torch2.10)
    • Python version (e.g., cp312 for Python 3.12)
  2. Install matching torch, torchaudio, and flash-attn:

    # Example for CUDA 13.0 + PyTorch 2.10 + Python 3.12
    uv pip install torch==2.10.0+cu130 torchaudio==2.10.0+cu130 --index-url https://download.pytorch.org/whl/cu130
    uv pip install <flash-attn-wheel-url>
  3. Run with --no-sync to prevent uv from removing the manually installed packages:

    uv run --no-sync talking-snake --port 8888

▶️ Listen to a sample

The website looks like this:

Upload interface

Audio playback with progress

License

This project is licensed under the MIT License. Dependencies and third-party components (e.g., Qwen3-TTS, SoX) are subject to their own licenses.

About

Just a talking snake that reads PDFs and web pages aloud.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors