Voice-Pro repacked for Pinokio — 1-click install
This is a repackaged version of ABUS's Voice-Pro AI voice app, modified to run cleanly through Pinokio's launcher system. The original project's conda-based installer has been replaced with Pinokio scripts that handle virtual environments, dependency installation, and GPU setup automatically.
All credit for Voice-Pro goes to ABUS / abus-aikorea. This is a repackaging for Pinokio compatibility, licensed under GPL-3.0 per the original project.
📄 Original Voice-Pro README
The best AI speech recognition, translation, and multilingual dubbing solution 🚀








Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.
- 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
- 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
- 📢 Multilingual text-to-speech: Edge-TTS, kokoro (Paid version includes Azure TTS)
- 🎥 YouTube processing & audio extraction: yt-dlp
- 🌍 Instant translation for 100+ languages: Deep-Translator (Paid version includes Azure Translator)
A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.
- Due to WeConnect development work, Voice-Pro development and updates are not possible for the time being.
- We have made all Voice-Pro code open source and completely free. Voice-Pro can now be freely distributed and modified by anyone.
- It works well on Windows with NVIDIA GPU. Operation on Mac and Linux has not been verified.
- Please leave your requests on the
or
pages.
version 3.2
- We have been focusing on WeConnect development for the past few months and have not been able to manage Voice-Pro at all.
- We have decided to open source all Voice-Pro code.
- Voice-Pro is completely free and supports Windows, Mac, Linux.
- WeConnect is an application for global cultural exchange.
- Connect with people from all over the world for meaningful cultural exchanges, language learning, and international friendships.
version 3.1
- 🪄 Support for fine-tuned models of F5-TTS
- 🌍 Supported languages
English &
Chinese: SWivid/F5-TTS_v1
Finnish: AsmoKoskinen/F5-TTS_Finnish_Model
French: RASPIAUDIO/F5-French-MixedSpeakers-reduced
Hindi: SPRINGLab/F5-Hindi-24KHz
Italian: alien79/F5-TTS-italian
Japanese: Jmica/F5TTS/JA_21999120
Russian: hotstone228/F5-TTS-Russian
Spanish: jpgallegoar/F5-Spanish
version 3.0
- 🔥 Removed the AI Cover feature.
- 🚀 Added support for m-bain/whisperX.
version 2.0
- 🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
- 🆓 Free trial supports media up to 60 seconds in length.
- 🔥 Added the AI Cover feature.
- 🎤 Introduced support for CosyVoice and kokoro.
- ⏳ Initial run downloads CozyVoice2-0.5B (9GB), which may take over an hour depending on network speed.
- 🎧 Voice samples for cloning will be continuously updated.
- 📝 Added spaCy for natural sentence-by-sentence translation and TTS.
- ☁️ Subscription version includes Microsoft Azure Translator and TTS.
- 🏪 Subscription offers unlimited usage (no 60-second limit) during the subscription period, available via
.
- YouTube video downloads & audio extraction
- Voice separation with Demucs
- Supports 100+ languages for speech recognition & translation
- Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
- Text-to-Speech:
- Edge-TTS: 100+ languages, 400+ voices
- E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
- kokoro: Ranked #2 in HuggingFace TTS Arena
- Instant speech recognition
- Multilingual translation on the fly
- Customizable audio inputs
- All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
- Supports all ffmpeg-compatible formats
- Output options: WAV, FLAC, MP3
- Subtitles & recognition for 100+ languages
- TTS with speed, volume, & pitch controls
- Subtitle-focused: 90+ languages
- Video-integrated subtitle display
- Word-level highlighting & denoise options
- Translation for 100+ languages
- Supports subtitle files (ASS, SSA, SRT, etc.)
- Real-time voice recognition & translation
- Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
- Celeb voice podcasts & multilingual support
- OS: Windows 10/11 (64-bit)
- GPU: NVIDIA with CUDA 12.4 (recommended)
- VRAM: 4GB+ (8GB+ preferred)
- RAM: 4GB+
- Storage: 20GB+ free space
- Internet: Required
- Voice-Pro: https://github.com/abus-aikorea/voice-pro
- Demucs: https://github.com/facebookresearch/demucs
- yt-dlp: https://github.com/yt-dlp/yt-dlp
- gradio: https://github.com/gradio-app/gradio
- edge-TTS: https://github.com/rany2/edge-tts
- F5-TTS: https://github.com/SWivid/F5-TTS.git
- openai-whisper: https://github.com/openai/whisper
- faster-whisper: https://github.com/SYSTRAN/faster-whisper
- whisper-timestamped: https://github.com/linto-ai/whisper-timestamped
- whisperX: https://github.com/m-bain/whisperX
- CosyVoice: https://github.com/FunAudioLLM/CosyVoice
- kokoro: https://github.com/hexgrad/kokoro
- Deep-Translator: https://github.com/nidhaloff/deep-translator
- spaCy: https://github.com/explosion/spaCy
by ABUS













