A simple yet powerful web application for converting text into natural-sounding speech using the Kokoro A.I model. This project leverages Streamlit for an interactive user interface, allowing users to input text, choose from various voices (American English and British English currently available), and generate downloadable audio.
- Intuitive User Interface: Built with Streamlit for a clean and easy-to-use experience.
- Multiple Voice Options: Choose between several male and female voices for American English and British English.
- Adjustable Speech Speed: Control the pace of the generated speech with a slider.
- GPU Acceleration: Option to utilize GPU for faster audio generation if available.
- Downloadable Audio: Download the generated speech as a WAV file.
- Phoneme Display: View the phonetic transcription (tokens) of the generated speech.
- Pre-loaded Text Examples: Quick buttons to load sample texts about Kokoro A.I, Natural Language Processing (NLP), and Text-to-Speech.
To run this application on your local machine, follow these steps:
-
Clone the repository:
git clone https://github.com/Pixel4bit/pxd-tts.git cd YOUR_REPO_NAME -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies: Create a
requirements.txtfile in the root of your project with the following content:streamlit kokoro>=0.9.2 soundfile torch [https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1.tar.gz#egg=en_core_web_sm](https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1.tar.gz#egg=en_core_web_sm)Then install them:
pip install -r requirements.txt
-
Install system-level dependencies: For the
espeak-nglibrary used by Kokoro, you need to install it on your system.- On Debian/Ubuntu:
sudo apt-get update sudo apt-get install espeak-ng
- On macOS (using Homebrew):
brew install espeak-ng
- On Windows: You might need to find a pre-compiled binary or build it from source.
- On Debian/Ubuntu:
-
Prepare example text files: Create
kokoro.md,nlp.md, andtts.mdfiles in the same directory aspxd-kokoro.py. You can fill them with any text you like. For example:kokoro.mdKokoro AI is an advanced text-to-speech system designed to convert written text into natural-sounding human speech. It leverages deep learning models to synthesize high-quality audio, making it suitable for various applications, including accessibility, content creation, and interactive voice assistants.
nlp.mdNatural Language Processing (NLP) is a subfield of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. NLP techniques are crucial for tasks such as sentiment analysis, machine translation, speech recognition, and text summarization.
tts.mdText-to-Speech (TTS) technology is the process of converting written language into spoken words. A TTS system is composed of several components, including text analysis, phonetic transcription, and waveform generation, to create an audible output that mimics human speech.
-
Run the Streamlit application:
streamlit run pxd-kokoro.py
The application will open in your web browser.
This application can be easily deployed on Streamlit Community Cloud. Ensure you have the following files in your repository:
app.py(your main application file)requirements.txt(as detailed in "How to Run Locally" step 3)packages.txtwith the content:espeak-ngkokoro.md,nlp.md,tts.md(your example text files)
Streamlit Community Cloud will automatically detect these files and install the necessary dependencies, including the espeak-ng system package and the spaCy model.
Feel free to fork this repository, open issues, or submit pull requests. Any contributions to improve the project are welcome!
This project is open-source and available under the Apache License 2.0.