Speech Translation Error Labelling (STEL)

Error span annotation + direct assessment for speech translation output.

Annotation protocol and UI: in this branch of a fork of pearmut
Dataset: under data
Automatic systems: baselines

Installation

pip install -e .

Project structure

src/                        ← source code
  baselines/                ← baseline models (xcomet, qwen25omni)
  data/                     ← data preprocessing
  meta-eval/                ← meta-evaluation
  pretty-print/             ← pretty-printing for visual inspection
scripts/                    ← pipeline entry points (run from project root)
  01_prepare_data/
  02_run_baselines/
  03_analysis/
data/                       ← processed data files
  <lang>_annotated_data.json            ← base annotations (gold transcript)
  <lang>_annotated_data_asr.json        ← ASR input + ASR-aligned spans
  <lang>_annotated_data_asr+spans-wer.json  ← ASR data with per-span WER
outputs/                    ← model outputs and evaluation results
  meta-eval/                ← annotation1 results
  meta-eval/annotation2/   ← annotation2 results (en_cs, en_de only)
  meta-eval/wer-split/      ← annotation1 WER-split results

Languages: cs_en, en_cs, en_de, en_he. Second annotation round available for en_cs and en_de only (annotation2).

Pipeline

1. Prepare data

Already done. Only re-run if source pearmut data or annotations change.

bash scripts/01_prepare_data/01_prepare_pearmut.sh   # build base annotated data files
bash scripts/01_prepare_data/02_merge_annotations.sh  # merge second annotation round

2. Run baselines

All scripts run all four language pairs. Each modality has variants for: gold vs ASR input, with/without severity labels, and context sizes 0/1/2/5 (*_ctx_all.sh covers ctx=1/2/5).

# XCOMET
bash scripts/02_run_baselines/comet/run_xcomet*.sh

# Qwen2.5-Omni — text input
bash scripts/02_run_baselines/qwen/text/run_qwen25omni*.sh

# Qwen2.5-Omni — audio input
bash scripts/02_run_baselines/qwen/audio/run_qwen25omni_audio*.sh

# Qwen2.5-Omni — text+audio input
bash scripts/02_run_baselines/qwen/textaudio/run_qwen25omni_textaudio*.sh

3. Analysis

Set ROUND=annotation1 or ROUND=annotation2 at the top of each script. annotation2 is restricted to en_cs and en_de. ASR model outputs use *_annotated_data_asr+spans-wer.json as the annotation source; non-ASR outputs use *_annotated_data.json.

bash scripts/03_analysis/01_run_meta_eval.sh       # compute F1 and correlations
bash scripts/03_analysis/02_print_results_table.sh  # print summary table + CSVs
bash scripts/03_analysis/03_run_prettyprint.sh      # visual inspection of predictions
bash scripts/03_analysis/04_run_wer_analysis.sh     # WER-split meta-eval

Results are written to outputs/meta-eval/ (annotation1) or outputs/meta-eval/annotation2/ (annotation2).

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Translation Error Labelling (STEL)

Installation

Project structure

Pipeline

1. Prepare data

2. Run baselines

3. Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech Translation Error Labelling (STEL)

Installation

Project structure

Pipeline

1. Prepare data

2. Run baselines

3. Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages