The Interpreter: An English-to-French Speech Translator

🌍 Overview

Welcome to The Interpreter! Our demo showcases the power of cutting-edge machine learning models to convert spoken words from one language directly into another, ensuring seamless communication across language barriers. It is powered by state-of-the-art models ensuring high translation fidelity.

🚀 Getting Started

Prerequisites

Python 3.10+
FFmpeg for audio processing

MODES

The Interpreter can be run in 3 possible modes which are 2STEP, API_2STEP and 3STEP.

2STEP: In this mode, you download all models used on initialization and load them locally. You also perform the speech to speech translation in 2 steps, first you transcribe the input speech into text and in the second step, you translate and transcribe the text into the output speech.
API_2STEP: In this mode, you use all models via API calls. It uses the same models in the 2STEP mode and is considerably faster than it since the models are already loaded. It also requires less RAM on startup. However, exposing the interpreter via an API endpoint in 2STEP mode could take away the speed advantage in deployment.
3STEP: In this mode, you break down the speech to speech translation into 3 steps. First, you transcribe the input speech, then you translate the text and finally, you synthesize the output speech. The models for the first 2 steps are loaded via APIs to reduce latency and RAM consumption in loading and initializing them. The model for the third step is however loaded and initialized locally since it requires minimal RAM and is relatively fast.

Installation

Clone the repo:

git clone https://github.com/NITHUB-AI/TheInterpreter.git

Navigate to the project directory:
```
cd TheInterpreter
```
Install the required packages:
```
pip install -r requirements.txt
```
Add the MODE you wish to use, your OpenAI and Huggingface API Keys to a .env file in the project directory in the form below.:
```
OPENAI_API_KEY=<YOUR_API_KEY>
HF_API_KEY=<YOUR_API_KEY>
MODE=3STEP
STREAM_KEY=<YOUTUBE_STREAM_KEY>
```

Usage

Run the application to interprete an audio file:

python interpreter.py audio_path.m4a

Run the streamlit web app:

streamlit run webapp.py

🛠️ Technologies Used

OpenAI's Whisper: As the backbone for audio translation.
Facebooks SeamlessM4T: For translation and speech generation.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
local_tests		local_tests
util		util
web		web
.gitignore		.gitignore
README.md		README.md
audio_stream.py		audio_stream.py
helper.py		helper.py
interpreter.py		interpreter.py
log_util.py		log_util.py
packages.txt		packages.txt
requirements.txt		requirements.txt
video_download.py		video_download.py
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Interpreter: An English-to-French Speech Translator

🌍 Overview

🚀 Getting Started

Prerequisites

MODES

Installation

Usage

🛠️ Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

NITHUB-AI/TheInterpreter

Folders and files

Latest commit

History

Repository files navigation

The Interpreter: An English-to-French Speech Translator

🌍 Overview

🚀 Getting Started

Prerequisites

MODES

Installation

Usage

🛠️ Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages