This is a FastAPI-based server that acts as a interface between your application and cloud-based AI services. It focuses on three main tasks:
- Converting speech to text (transcription)
- Converting text to speech
- Converting speech to speech (a combination of the above two)
Currently, it uses OpenAI's API for these services, but it's designed so we can add other providers in the future.
-
Transcription (Speech-to-Text)
- Asynchronous file upload and transcription
- Streaming transcription via WebSocket
-
Text-to-Speech
- Convert text to speech with various voice options
-
Speech-to-Speech
- Convert speech input to text and then back to speech
- Support for both file upload and streaming via WebSocket
.
├── cloud_providers/
│ ├── base.py
│ └── openai_api_handler.py
├── server/
│ ├── main.py
│ ├── routers/
│ │ ├── transcribe.py
│ │ ├── tts.py
│ │ └── speech_to_speech.py
│ └── utils/
│ └── logger.py
|
└── requirements.txt
└── README.md
-
Clone the repository
-
Create a virtual environment:
python -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements -
Set up environment variables:
export OPENAI_API_KEY=your_openai_api_key
To start the server, navigate to the project directory and run:
python server/main.py
This will start the FastAPI server, typically on http://localhost:8000.
- For more details about API check
API docs
The application uses rotating file handlers for logging, with separate log files for different components:
logs/main.log: Main application logslogs/transcription.log: Transcription-specific logslogs/tts.log: Text-to-speech logslogs/speech_to_speech.log: Speech-to-speech logs
The application includes error handling for various scenarios, including API errors and WebSocket disconnections. Errors are logged and appropriate HTTP exceptions are raised.
The project is designed with extensibility in mind. The CloudProviderBase abstract base class in base.py allows for easy integration of additional cloud providers beyond OpenAI.
- Ensure that your OpenAI API key is kept secure and not exposed in the code or version control.
- The server currently allows all origins in CORS settings. In a production environment, you should restrict this to specific allowed origins.
- Add support for additional cloud providers (e.g., Google Cloud, Azure)
- Add more configuration options for the AI models
- Improve error handling and provide more detailed error messages
Contributions are welcome! Please feel free to submit a Pull Request.
[Specify your license here]