A unified Text-to-Speech service supporting OpenAI, Gemini, and Local (Coqui TTS/XTTS) models. Built with FastAPI.
- Unified API: Switch between providers (OpenAI, Gemini, Local) with a simple JSON parameter.
- Local Model Support: run privately using Coqui XTTS (supports voice cloning and multi-language).
- FastAPI: reliable, high-performance Python web framework.
-
Clone the repository
git clone https://github.com/akashdeepiim/tts.git cd tts-service -
Install Dependencies
pip install -r requirements.txt
Note: For local TTS support, ensure you have the necessary system libraries (e.g.,
espeak). -
Configuration Create a
.envfile in the root directory:OPENAI_API_KEY=your_openai_key GEMINI_API_KEY=your_gemini_key USE_GPU=True # Set to False if you don't have a GPU/Metal
Start the server:
python app/main.pyThe API will run at http://localhost:8000.
Endpoint: POST /v1/tts/generate
Example (OpenAI):
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello world!",
"provider": "openai",
"voice_id": "alloy"
}' \
--output output.mp3Example (Local):
curl -X POST http://localhost:8000/v1/tts/generate \
-H "Content-Type: application/json" \
-d '{
"text": "Hello running locally.",
"provider": "local",
"voice_id": "default"
}' \
--output local.wavEndpoint: GET /v1/tts/providers
curl http://localhost:8000/v1/tts/providersInteractive API documentation is available at http://localhost:8000/docs.