consolidate evaluation (=https://github.com/cdli-gh/Semi-Supervised-NMT-for-Sumerian-English/issues/17) #2

chiarcos · 2020-09-27T15:52:19Z

At the moment, every MTAAC/CDLI MT system is independently evaluated, so that it is impossible to track progress.

e.g., Rachit's (2020) "mu usz-bar x 2(disz) tug2 usz-bar tur" seems to correspond to two independent (!) lines in Ravneet's (2019) system:

544,mu ucbar X
542, NUMB tug ucbar tur sumun

But it's likely that these are actually completely different texts (and that there is no overlap for the phrase "ucbar tur" / "usz-bar tur" in their data), because "sumun" is not in Rachit's text, and then, the systems are just incomparable.

Establish a consistent train/test set and replicate.

chiarcos · 2020-09-28T09:03:12Z

Official train/dev/test split for parallel data is under
https://github.com/cdli-gh/mtaac_cdli_ur3_corpus/blob/master/ur3_corpus_data/corpus_split_translated_20180514-125709.json
Official train/dev/test split for all data is under https://github.com/cdli-gh/mtaac_cdli_ur3_corpus/blob/master/ur3_corpus_data/corpus_split_20180418-225438.json.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidate evaluation (=https://github.com/cdli-gh/Semi-Supervised-NMT-for-Sumerian-English/issues/17) #2

consolidate evaluation (=https://github.com/cdli-gh/Semi-Supervised-NMT-for-Sumerian-English/issues/17) #2

chiarcos commented Sep 27, 2020

chiarcos commented Sep 28, 2020

consolidate evaluation (=https://github.com/cdli-gh/Semi-Supervised-NMT-for-Sumerian-English/issues/17) #2

consolidate evaluation (=https://github.com/cdli-gh/Semi-Supervised-NMT-for-Sumerian-English/issues/17) #2

Comments

chiarcos commented Sep 27, 2020

chiarcos commented Sep 28, 2020