Skip to content

CoderPat/MuDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilingual Discourse-Aware (MuDA) Benchmark

The Multilingual Discourse-Aware (MuDA) Benchmark is a comprehensive suite of taggers and evaluators aimed at advancing the field of context-aware Machine Translation (MT).

Traditional translation quality metrics output uninterpertable scores, and fail to accuratly measure performance on context-aware discourse phenomena. MuDA takes a different direction, relying on neural-based syntatical and morphalogical analysers to measure performance of translation models on specific words and discourse phenomena.

The MuDA taggers currently support 14 language pairs (see this directory) but easily supports adding new languages.

Installation

The tagger relies on Pytorch (<1.10) to run models. If you want to run these models, first install Pytorch. You can find instructions for your system here.

For example, to install PyTorch on a Linux system with CUDA support in a conda environment, run:

conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge

Then, to install the rest of the dependencies, run:

pip install -r requirements.txt

Example Usage

To tag an existing dataset, and extract the tags for later use, run the following command.

python muda/main.py \
    --src /path/to/src \
    --tgt /path/to/tgt \
    --docids /path/to/docids \
    --dump-tags /tmp/maia_ende.tags \
    --tgt-lang "$lang" \

To evaluate models on particular dataset (reporting per-tag metrics such as precision & recall), run

python muda/main.py \
    --src /path/to/src \
    --tgt /path/to/tgt \
    --docids /path/to/docids \
    --hyps /path/to/hyps.m1 /path/to/hyps.m2 \
    --tgt-lang "$lang"

Note that MuDA relies on an docids file, containing the same number of lines as the src/tgt files and where each line contains a document id to which the source/target in the line belong to.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published