Error span annotation + direct assessment for speech translation output.
- Annotation protocol and UI: in this branch of a fork of pearmut
- Dataset: under
data - Automatic systems:
baselines
pip install -e .src/ ← source code
baselines/ ← baseline models (xcomet, qwen25omni)
data/ ← data preprocessing
meta-eval/ ← meta-evaluation
pretty-print/ ← pretty-printing for visual inspection
scripts/ ← pipeline entry points (run from project root)
01_prepare_data/
02_run_baselines/
03_analysis/
data/ ← processed data files
<lang>_annotated_data.json ← base annotations (gold transcript)
<lang>_annotated_data_asr.json ← ASR input + ASR-aligned spans
<lang>_annotated_data_asr+spans-wer.json ← ASR data with per-span WER
outputs/ ← model outputs and evaluation results
meta-eval/ ← annotation1 results
meta-eval/annotation2/ ← annotation2 results (en_cs, en_de only)
meta-eval/wer-split/ ← annotation1 WER-split results
Languages: cs_en, en_cs, en_de, en_he.
Second annotation round available for en_cs and en_de only (annotation2).
Already done. Only re-run if source pearmut data or annotations change.
bash scripts/01_prepare_data/01_prepare_pearmut.sh # build base annotated data files
bash scripts/01_prepare_data/02_merge_annotations.sh # merge second annotation roundAll scripts run all four language pairs. Each modality has variants for: gold vs ASR input, with/without severity labels,
and context sizes 0/1/2/5 (*_ctx_all.sh covers ctx=1/2/5).
# XCOMET
bash scripts/02_run_baselines/comet/run_xcomet*.sh
# Qwen2.5-Omni — text input
bash scripts/02_run_baselines/qwen/text/run_qwen25omni*.sh
# Qwen2.5-Omni — audio input
bash scripts/02_run_baselines/qwen/audio/run_qwen25omni_audio*.sh
# Qwen2.5-Omni — text+audio input
bash scripts/02_run_baselines/qwen/textaudio/run_qwen25omni_textaudio*.shSet ROUND=annotation1 or ROUND=annotation2 at the top of each script.
annotation2 is restricted to en_cs and en_de. ASR model outputs use
*_annotated_data_asr+spans-wer.json as the annotation source; non-ASR outputs
use *_annotated_data.json.
bash scripts/03_analysis/01_run_meta_eval.sh # compute F1 and correlations
bash scripts/03_analysis/02_print_results_table.sh # print summary table + CSVs
bash scripts/03_analysis/03_run_prettyprint.sh # visual inspection of predictions
bash scripts/03_analysis/04_run_wer_analysis.sh # WER-split meta-evalResults are written to outputs/meta-eval/ (annotation1) or
outputs/meta-eval/annotation2/ (annotation2).