VoiceEditor

A web-based voice extraction and editing tool. Collect audio from YouTube or system output, generate word-level transcripts with STT, edit with a synchronized waveform + text UI, cut and rearrange segments, and remove background music with AI — all in one place.

Features

YouTube Audio Extraction — Download audio from any YouTube URL (yt-dlp)
System Audio Recording — Record system output via BlackHole (macOS)
File Upload — Drag-and-drop local audio files
Speech-to-Text — Word-level timestamps via faster-whisper
Waveform + Text Sync — wavesurfer.js waveform with per-word highlighting and click-to-seek
Inline Text Editing — Edit transcript text per segment
Cut & Rearrange — Select regions on the waveform, cut into segments, drag to reorder
Background Removal — Vocal/music separation with Demucs AI, switch between stems
Export — Download as WAV/MP3 audio or TXT/SRT transcript

Tech Stack

Layer	Technology
Backend	Python + FastAPI + SQLite (SQLAlchemy async)
Frontend	React + TypeScript + Vite + TailwindCSS v4
State	Zustand
Audio	yt-dlp, sounddevice, ffmpeg
STT	faster-whisper (word_timestamps)
Separation	Demucs (htdemucs) + torchcodec
Waveform	wavesurfer.js + RegionsPlugin
Drag & Drop	@dnd-kit/core + @dnd-kit/sortable

Prerequisites

Python 3.11–3.13 (recommended) or Python 3.14+ (requires separate Python 3.11–3.13 for Demucs)
Node.js 18+
ffmpeg (brew install ffmpeg / sudo apt install ffmpeg)
BlackHole (macOS system audio recording — download)

Python version note: With Python 3.11–3.13, all dependencies (including Demucs) are installed in a single venv. Python 3.14+ is incompatible with Demucs, so the setup script automatically creates a separate venv.

Quick Start

Automated Setup

git clone https://github.com/chadingTV/voiceeditor.git
cd voiceeditor
./scripts/setup.sh

The setup script automatically:

Checks prerequisites (python3, node, npm, ffmpeg)
Detects Python version → single venv or separate Demucs venv
Creates backend Python venv and installs dependencies
(Python 3.14+ only) Creates Demucs venv with compatible Python
Installs SwitchAudioSource (macOS, for system audio recording)
Installs frontend npm packages

Run

# Start both backend and frontend
./scripts/dev.sh

Open http://localhost:5173 in your browser.

Manual Setup

Click here for manual installation steps

Backend

cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# If Python 3.11-3.13, also install demucs:
pip install -r requirements-demucs.txt

Demucs Separate Env (Python 3.14+ only)

cd backend
python3.12 -m venv .venv-demucs  # or python3.11, python3.13
source .venv-demucs/bin/activate
pip install -r requirements-demucs.txt
deactivate

Frontend

cd frontend
npm install

Run Individually

# Backend
cd backend && source .venv/bin/activate && uvicorn main:app --reload --port 8000

# Frontend
cd frontend && npm run dev

Project Structure

voiceeditor/
├── backend/
│   ├── main.py                  # FastAPI app
│   ├── config.py                # Configuration
│   ├── requirements.txt         # Main backend dependencies
│   ├── requirements-demucs.txt  # Demucs-specific dependencies
│   ├── routers/                 # API routers
│   │   ├── projects.py          # Project CRUD
│   │   ├── audio.py             # Audio import (YouTube/upload/recording)
│   │   ├── transcription.py     # STT + text editing + TXT/SRT download
│   │   ├── separation.py        # Background removal (Demucs subprocess)
│   │   └── editor.py            # Segment editing & export
│   ├── services/                # Business logic
│   ├── models/                  # DB models & schemas
│   └── tasks/                   # Background task manager
├── frontend/
│   └── src/
│       ├── api/                 # API client modules
│       ├── stores/              # Zustand stores
│       ├── components/
│       │   ├── layout/          # AppShell, Header, Sidebar
│       │   ├── import/          # YouTube, upload, recording UI
│       │   └── editor/          # Waveform editor, transcript panel, segment timeline
│       ├── hooks/               # Custom hooks
│       └── types/               # TypeScript types
└── scripts/
    ├── setup.sh                 # Automated setup script
    └── dev.sh                   # Dev server launcher

Usage

Create a project — Click "New Project" in the sidebar
Import audio — Paste a YouTube URL, upload a file, or record system audio
Generate transcript — Click "Generate STT" in the editor
Review & edit text — Click the pencil icon on any segment to edit inline
Cut segments — Drag-select a region on the waveform → "Cut Selection"
Reorder — Drag segments in the timeline to rearrange
Remove background — Click "Remove Background" → select Vocals/No Vocals stem
Export — Download audio (WAV/MP3) or transcript (TXT/SRT)

Architecture Notes

Demucs Execution

Demucs always runs as a subprocess. The Python executable is auto-detected based on the environment:

System Python	Demucs Strategy
3.11–3.13	Runs directly from the main venv (single venv)
3.14+	Runs from `.venv-demucs` with compatible Python (dual venv)

Backend (separation.py)
    │
    ├── _find_demucs_python()  ← auto-detect
    │       │
    │       ├── .venv-demucs exists? → .venv-demucs/bin/python3
    │       └── otherwise → try current python's demucs
    │
    └── subprocess.run([python, "-m", "demucs", ...])

Changelog

Bug Fixes

Stem selector reset — Switching between Original/Vocals/No Vocals no longer resets to Original
Export wrong audio — Export now correctly uses the current audio file, not the first one in the project
Export ignoring reorder — Exported audio now respects the drag-and-drop segment order
Export ignoring active stem — Exporting in Vocals mode now exports vocals only, not the original
pydub crash on Python 3.14 — Replaced pydub (broken audioop module) with direct ffmpeg subprocess
Demucs torchcodec missing — Added torchcodec to demucs dependencies for audio saving
System recording silence — Auto-switch to multi-output device when recording starts
Audio output stuck on multi-output — Fallback to built-in speaker when previous output device is disconnected
Audio output not restored on crash — Added atexit handler to restore output on server shutdown
DndContext hijacking clicks — Added pointer distance threshold so buttons work alongside drag-and-drop
Transcript edit not displaying — Edited text now correctly shown instead of original words
Download encoding error — Fixed Korean filename encoding in Content-Disposition header (RFC 5987)
STT infinite loading — Added error handling for failed background tasks

Features Added

Audio file rename (inline edit) and delete
Transcript download in TXT and SRT formats
Audio file download
Automated setup script with Python version detection
Cross-platform Demucs path auto-detection

License

This project is licensed under the MIT License.

If you redistribute or use this project in derivative works, please include the following attribution:

Original project: VoiceEditor by chadingTV https://github.com/chadingTV/voiceeditor

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.ko.md		README.ko.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceEditor

Features

Tech Stack

Prerequisites

Quick Start

Automated Setup

Run

Manual Setup

Backend

Demucs Separate Env (Python 3.14+ only)

Frontend

Run Individually

Project Structure

Usage

Architecture Notes

Demucs Execution

Changelog

Bug Fixes

Features Added

License

About

Uh oh!

Releases

Packages

Languages

License

chadingTV/voiceeditor

Folders and files

Latest commit

History

Repository files navigation

VoiceEditor

Features

Tech Stack

Prerequisites

Quick Start

Automated Setup

Run

Manual Setup

Backend

Demucs Separate Env (Python 3.14+ only)

Frontend

Run Individually

Project Structure

Usage

Architecture Notes

Demucs Execution

Changelog

Bug Fixes

Features Added

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages