SLT.KIT

This repository contains a toolkit for speech translation. It provides a Docker container with a ready to use pipeline containing the following components:

a neural speech recognition system
a sentence segmentation system
an attention-based translation system

The speech recognition system processes the audio files and creates the transcription in the source language. Afterwords the sentence segmentation system adds punctuation and recases the output. Finally the output is translated by the machine translation system. We provide pipelines to train these model as well as pre-trained models for all components for the task of translating English lectures to German.

The system uses the following software:

Requirements:

Docker

Updates

2019-09 : Recipe for How2 dataset (https://github.com/srvk/how2-dataset) using transformer architecture for ASR,MT and end-to-end SLT.

Installation

    git clone https://github.com/isl-mt/SLT.KIT.git
    cd SLT.KIT
    docker build --build-arg CUDA=$CUDAVERSION -t slt.kit -f Dockerfile.ST-Baseline .
    with CUDAVERSION = 8.0 or 9.0 or 9.1

Run

Starting the docker container (e.g. source language English (en) and target language German (de))

    docker run -ti --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=$gpuid slt.kit
    export sl=en
    export tl=de

File Structure

The general file structure used by all models and systems is described in File structure

System

This repository contains different systems that can be used to do speech translation
- Cascaded systems: Systems that combine an ASR, sentence segmentation/puncation and MT component
  - ctc-tedlium2.smallTED: Combination of the ctc-tedlium2 ASR system and the smallTED system for sentence segmentation and MT
  - ctc-tedlium2.midSize: Combination of the ctc-tedlium2 ASR system and the midSize system for sentence segmentation and MT
- ASR systems: Systems to transcribe the audio
  - ctc-tedlium2: Simple LSTM network trained with the CTC loss that outputs BPE units
  - las-tedlium2: Attention-based ASR system
- Sentence segmentation/MT
  - ted: System trained on the TED corpus
  - midSize: System trained on TED and EPPS corpus

Test sets

English to German
- dev2010
- tst2010
- tst2013
- tst2014
- tst2015

Results

The results reported here are generated by Rover'ing the output of the three ASR systems (CTC 300, CTC 10k and the attention-based ASR system) and using the MT system trained on the TED corpus.

English to German

SET	BLEU	TER	BEER	CharacTER	BLEU(ci)	TER(ci)
dev2010	13.98	71.78	45.88	78.50	15.05	69.68
tst2010	14.08	71.66	44.40	77.66	15.12	69.36
tst2013	13.73	72.81	44.02	71.45	14.61	70.78
tst2014	13.28	74.34	42.43	78.38	14.01	72.62

Furthermore, results for the MT system can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
docker		docker
docs		docs
scripts		scripts
src/server		src/server
systems		systems
Dockerfile.ST-Baseline		Dockerfile.ST-Baseline
DownloadPyTorch-Anaconda.sh		DownloadPyTorch-Anaconda.sh
DownloadPyTorch.sh		DownloadPyTorch.sh
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

docs

docs

scripts

scripts

src/server

src/server

systems

systems

Dockerfile.ST-Baseline

Dockerfile.ST-Baseline

DownloadPyTorch-Anaconda.sh

DownloadPyTorch-Anaconda.sh

DownloadPyTorch.sh

DownloadPyTorch.sh

LICENSE

LICENSE

README.md

README.md

Repository files navigation

SLT.KIT

Updates

Installation

Run

File Structure

System

Test sets

Results

English to German

About

Releases

Packages

Languages

License

isl-mt/SLT.KIT

Folders and files

Latest commit

History

Repository files navigation

SLT.KIT

Updates

Installation

Run

File Structure

System

Test sets

Results

English to German

About

Resources

License

Stars

Watchers

Forks

Languages