LanguageLearning Backend Server

A Python FastAPI server providing speech-to-text (using OpenAI's Whisper) and text-to-speech (using Piper TTS) functionalities. This server acts as the python backend running local machine learning models for a Unity-based language learning prototype.

Features

Speech Synthesis French(TTS): /synthesize endpoint - Converts text into speech (WAV audio) using Piper TTS and the tom-medium model.
Speech Synthesis German(TTS): /synthesizeDeutsch endpoint - Converts text into speech (WAV audio) using Piper TTS and the thorsten-high model.
Speech Recognition (STT): /transcribe endpoint - Converts speech (WAV audio) into text using Whisper, supporting Code-switching (FR/DE).

Setup

Prerequisites: Ensure you have Python 3.8+ and pip installed.

Create a virtual environment and activate it:

python -m venv .venv

On Windows:

.\.venv\Scripts\Activate.ps1
# If you encounter execution policy issues, run:
# Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

On Linux/Mac:

source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```

Download voice models: As Voice Models are too large for git (200MB) you have to download them manually: On Linux/Mac:

curl -L -o model/de_DE-thorsten-high.onnx "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/de/de_DE/thorsten/high/de_DE-thorsten-high.onnx?download=true"
curl -L -o model/fr_FR-tom-medium.onnx "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/fr/fr_FR/tom/medium/fr_FR-tom-medium.onnx?download=true"

On Windows:

curl.exe -L -o model/de_DE-thorsten-high.onnx "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/de/de_DE/thorsten/high/de_DE-thorsten-high.onnx?download=true"
curl.exe -L -o model/fr_FR-tom-medium.onnx "https://huggingface.co/rhasspy/piper-voices/resolve/v1.0.0/fr/fr_FR/tom/medium/fr_FR-tom-medium.onnx?download=true"

CUDA support

Code changes for CUDA support:

tts_french = PiperVoice.load(model_path, use_cuda=True) # change to true 
tts_german = PiperVoice.load(model_path2, use_cuda=True) # change to true

model = WhisperModel(model_size, device="cuda", compute_type="float16") # instead of device "cpu", compute_type="int8"

Note: The latest versions of ctranslate2 only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the 3.24.0 version of ctranslate2, for CUDA 12 and cuDNN 8, downgrade to the 4.4.0 version of ctranslate2, (This can be done with pip install --force-reinstall ctranslate2==4.4.0 or specifying the version in the requirements.txt).

Running the Server

To start the server in development mode with auto-reload (for code changes), run:

uvicorn whisper_server:app --host 0.0.0.0 --port 65432 --reload

API Usage

`POST /synthesize`

Synthesizes speech from the given text.

Request Body: Plain text string to be synthesized.
Response: A audio/wav file.

`POST /transcribe`

Transcribes speech from the given audio file.

Request Form-Data: A file parameter containing a WAV audio file.
Response: A JSON object with the transcribed text.
```
{"text": "Bonjour le monde"}
```

Test the server

You can test the server in a different terminal with curl: On Linux/Mac:

curl -X POST --data "Bonjour le monde" http://localhost:65432/synthesize -o "bonjour.wav"
curl -X POST -F "file=@bonjour.wav" http://localhost:65432/transcribe

On Windows:

curl.exe -X POST --data "Bonjour le monde" http://localhost:65432/synthesize -o "bonjour.wav"
curl.exe -X POST -F "file=@bonjour.wav" http://localhost:65432/transcribe

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
model		model
.gitignore		.gitignore
README.md		README.md
deepgram.py		deepgram.py
requirements.txt		requirements.txt
whisper_server.py		whisper_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LanguageLearning Backend Server

Features

Setup

CUDA support

Running the Server

API Usage

`POST /synthesize`

`POST /transcribe`

Test the server

About

Uh oh!

Releases

Packages

Languages

FrankHelp/Languagelearning

Folders and files

Latest commit

History

Repository files navigation

LanguageLearning Backend Server

Features

Setup

CUDA support

Running the Server

API Usage

POST /synthesize

POST /transcribe

Test the server

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /synthesize`

`POST /transcribe`

Packages