Skip to content

Latest commit

 

History

History

eng-dra

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kan mal tam tel
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kan.eng.kan 6.3 0.348
Tatoeba-test.eng-mal.eng.mal 12.7 0.511
Tatoeba-test.eng.multi 10.5 0.461
Tatoeba-test.eng-tam.eng.tam 8.2 0.434
Tatoeba-test.eng-tel.eng.tel 7.7 0.377

opus-2020-07-26.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): kan mal tam tel
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-26.zip
  • test set translations: opus-2020-07-26.test.txt
  • test set scores: opus-2020-07-26.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-kan.eng.kan 4.7 0.348
Tatoeba-test.eng-mal.eng.mal 13.1 0.515
Tatoeba-test.eng.multi 10.7 0.463
Tatoeba-test.eng-tam.eng.tam 9.0 0.444
Tatoeba-test.eng-tel.eng.tel 7.1 0.363

opus4m+btTCv20210807-2021-09-30.zip

  • dataset: opus4m+btTCv20210807
  • model: transformer
  • source language(s): eng
  • target language(s): kan mal tam tcy tel
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aaf<< >>all<< >>bfq<< >>brh<< >>brw<< >>cde<< >>ctt<< >>daq<< >>emu<< >>era<< >>fmu<< >>gau<< >>gdb<< >>ggo<< >>gno<< >>gon<< >>hoy<< >>ima<< >>iru<< >>kan<< >>kej<< >>kep<< >>kev<< >>kfa<< >>kfb<< >>kfc<< >>kfd<< >>kfe<< >>kff<< >>kfg<< >>kfh<< >>kmj<< >>kpb<< >>kru<< >>kwx<< >>kxl<< >>kxu<< >>kxv<< >>mal<< >>mha<< >>mjo<< >>mjp<< >>mjq<< >>mjr<< >>mjt<< >>mju<< >>mjv<< >>mmk<< >>mrr<< >>mut<< >>muv<< >>nit<< >>oty<< >>pcf<< >>pcg<< >>pci<< >>peg<< >>pkr<< >>pty<< >>qkn<< >>sle<< >>tam<< >>tam_Deva<< >>tcx<< >>tcy<< >>tel<< >>thn<< >>udg<< >>ull<< >>url<< >>vis<< >>vmd<< >>wbq<< >>wkl<< >>wku<< >>xua<< >>xub<< >>xuj<< >>yea<< >>yeu<< >>ymr<<
  • download: opus4m+btTCv20210807-2021-09-30.zip
  • test set translations: opus4m+btTCv20210807-2021-09-30.test.txt
  • test set scores: opus4m+btTCv20210807-2021-09-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test-v2021-08-07.eng-multi 11.2 0.425 1541 7928 1.000
Tatoeba-test-v2021-08-07.multi-multi 11.2 0.425 1541 7928 1.000
tico19-test.eng-tam 10.9 0.417 2100 52966 1.000