Skip to content

Latest commit

 

History

History

eng-itc

opus-2020-07-14.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-14.zip
  • test set translations: opus-2020-07-14.test.txt
  • test set scores: opus-2020-07-14.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.5 0.120
Tatoeba-test.eng-ast.eng.ast 17.1 0.384
Tatoeba-test.eng-cat.eng.cat 47.1 0.666
Tatoeba-test.eng-cos.eng.cos 3.1 0.274
Tatoeba-test.eng-egl.eng.egl 0.2 0.105
Tatoeba-test.eng-ext.eng.ext 4.9 0.243
Tatoeba-test.eng-fra.eng.fra 44.1 0.629
Tatoeba-test.eng-frm.eng.frm 1.2 0.207
Tatoeba-test.eng-gcf.eng.gcf 0.3 0.092
Tatoeba-test.eng-glg.eng.glg 43.1 0.635
Tatoeba-test.eng-hat.eng.hat 28.3 0.509
Tatoeba-test.eng-ita.eng.ita 44.8 0.669
Tatoeba-test.eng-lad.eng.lad 5.2 0.276
Tatoeba-test.eng-lat.eng.lat 11.9 0.376
Tatoeba-test.eng-lij.eng.lij 1.3 0.172
Tatoeba-test.eng-lld.eng.lld 0.9 0.211
Tatoeba-test.eng-lmo.eng.lmo 0.3 0.150
Tatoeba-test.eng-mfe.eng.mfe 68.0 0.848
Tatoeba-test.eng.multi 37.2 0.583
Tatoeba-test.eng-mwl.eng.mwl 2.7 0.356
Tatoeba-test.eng-oci.eng.oci 7.7 0.286
Tatoeba-test.eng-pap.eng.pap 43.9 0.641
Tatoeba-test.eng-pms.eng.pms 1.8 0.177
Tatoeba-test.eng-por.eng.por 40.7 0.632
Tatoeba-test.eng-roh.eng.roh 2.2 0.247
Tatoeba-test.eng-ron.eng.ron 39.7 0.626
Tatoeba-test.eng-scn.eng.scn 0.7 0.132
Tatoeba-test.eng-spa.eng.spa 48.8 0.679
Tatoeba-test.eng-vec.eng.vec 2.2 0.222
Tatoeba-test.eng-wln.eng.wln 6.2 0.213

opus-2020-07-20.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-20.zip
  • test set translations: opus-2020-07-20.test.txt
  • test set scores: opus-2020-07-20.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-arg.eng.arg 1.5 0.117
Tatoeba-test.eng-ast.eng.ast 17.7 0.382
Tatoeba-test.eng-cat.eng.cat 47.4 0.665
Tatoeba-test.eng-cos.eng.cos 3.1 0.297
Tatoeba-test.eng-egl.eng.egl 0.9 0.113
Tatoeba-test.eng-ext.eng.ext 7.9 0.277
Tatoeba-test.eng-fra.eng.fra 44.6 0.632
Tatoeba-test.eng-frm.eng.frm 1.1 0.214
Tatoeba-test.eng-gcf.eng.gcf 0.4 0.101
Tatoeba-test.eng-glg.eng.glg 43.1 0.638
Tatoeba-test.eng-hat.eng.hat 30.0 0.528
Tatoeba-test.eng-ita.eng.ita 45.0 0.670
Tatoeba-test.eng-lad.eng.lad 6.2 0.285
Tatoeba-test.eng-lat.eng.lat 11.9 0.376
Tatoeba-test.eng-lij.eng.lij 1.7 0.189
Tatoeba-test.eng-lld.eng.lld 0.5 0.201
Tatoeba-test.eng-lmo.eng.lmo 0.8 0.192
Tatoeba-test.eng-mfe.eng.mfe 83.6 0.909
Tatoeba-test.eng-msa.eng.msa 30.9 0.546
Tatoeba-test.eng.multi 37.6 0.585
Tatoeba-test.eng-mwl.eng.mwl 3.2 0.327
Tatoeba-test.eng-oci.eng.oci 7.8 0.286
Tatoeba-test.eng-pap.eng.pap 41.4 0.613
Tatoeba-test.eng-pms.eng.pms 2.0 0.182
Tatoeba-test.eng-por.eng.por 40.8 0.633
Tatoeba-test.eng-roh.eng.roh 4.0 0.262
Tatoeba-test.eng-ron.eng.ron 40.1 0.628
Tatoeba-test.eng-scn.eng.scn 1.6 0.175
Tatoeba-test.eng-spa.eng.spa 48.8 0.680
Tatoeba-test.eng-vec.eng.vec 2.6 0.237
Tatoeba-test.eng-wln.eng.wln 6.8 0.228

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 26.9 0.562
newsdiscussdev2015-enfr-engfra.eng.fra 29.7 0.572
newsdiscusstest2015-enfr-engfra.eng.fra 34.9 0.607
newssyscomb2009-engfra.eng.fra 27.6 0.565
newssyscomb2009-engita.eng.ita 28.7 0.586
newssyscomb2009-engspa.eng.spa 29.3 0.567
news-test2008-engfra.eng.fra 25.0 0.535
news-test2008-engspa.eng.spa 26.9 0.546
newstest2009-engfra.eng.fra 26.3 0.555
newstest2009-engita.eng.ita 28.4 0.581
newstest2009-engspa.eng.spa 28.6 0.566
newstest2010-engfra.eng.fra 29.2 0.572
newstest2010-engspa.eng.spa 33.5 0.597
newstest2011-engfra.eng.fra 30.7 0.589
newstest2011-engspa.eng.spa 34.6 0.597
newstest2012-engfra.eng.fra 29.0 0.572
newstest2012-engspa.eng.spa 34.6 0.598
newstest2013-engfra.eng.fra 29.6 0.563
newstest2013-engspa.eng.spa 31.5 0.574
newstest2016-enro-engron.eng.ron 25.4 0.544
Tatoeba-test.eng-arg.eng.arg 1.6 0.126
Tatoeba-test.eng-ast.eng.ast 18.0 0.399
Tatoeba-test.eng-cat.eng.cat 47.7 0.669
Tatoeba-test.eng-cos.eng.cos 2.9 0.284
Tatoeba-test.eng-egl.eng.egl 0.2 0.076
Tatoeba-test.eng-ext.eng.ext 11.0 0.280
Tatoeba-test.eng-fra.eng.fra 44.5 0.632
Tatoeba-test.eng-frm.eng.frm 0.8 0.214
Tatoeba-test.eng-gcf.eng.gcf 0.4 0.108
Tatoeba-test.eng-glg.eng.glg 43.7 0.641
Tatoeba-test.eng-hat.eng.hat 29.6 0.525
Tatoeba-test.eng-ita.eng.ita 45.0 0.670
Tatoeba-test.eng-lad.eng.lad 6.2 0.286
Tatoeba-test.eng-lat.eng.lat 11.9 0.377
Tatoeba-test.eng-lij.eng.lij 1.7 0.178
Tatoeba-test.eng-lld.eng.lld 0.8 0.201
Tatoeba-test.eng-lmo.eng.lmo 1.1 0.201
Tatoeba-test.eng-mfe.eng.mfe 91.9 0.956
Tatoeba-test.eng-msa.eng.msa 30.9 0.546
Tatoeba-test.eng.multi 37.5 0.585
Tatoeba-test.eng-mwl.eng.mwl 3.8 0.339
Tatoeba-test.eng-oci.eng.oci 7.7 0.290
Tatoeba-test.eng-pap.eng.pap 42.0 0.626
Tatoeba-test.eng-pms.eng.pms 2.0 0.184
Tatoeba-test.eng-por.eng.por 41.0 0.634
Tatoeba-test.eng-roh.eng.roh 3.8 0.245
Tatoeba-test.eng-ron.eng.ron 40.4 0.630
Tatoeba-test.eng-scn.eng.scn 1.6 0.177
Tatoeba-test.eng-spa.eng.spa 48.9 0.681
Tatoeba-test.eng-vec.eng.vec 3.1 0.232
Tatoeba-test.eng-wln.eng.wln 5.1 0.218

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lat_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
newsdev2016-enro-engron.eng.ron 27.1 0.565
newsdiscussdev2015-enfr-engfra.eng.fra 29.9 0.574
newsdiscusstest2015-enfr-engfra.eng.fra 35.3 0.609
newssyscomb2009-engfra.eng.fra 27.7 0.567
newssyscomb2009-engita.eng.ita 28.6 0.586
newssyscomb2009-engspa.eng.spa 29.8 0.569
news-test2008-engfra.eng.fra 25.0 0.536
news-test2008-engspa.eng.spa 27.1 0.548
newstest2009-engfra.eng.fra 26.7 0.557
newstest2009-engita.eng.ita 28.9 0.583
newstest2009-engspa.eng.spa 28.9 0.567
newstest2010-engfra.eng.fra 29.6 0.574
newstest2010-engspa.eng.spa 33.8 0.598
newstest2011-engfra.eng.fra 30.9 0.590
newstest2011-engspa.eng.spa 34.8 0.598
newstest2012-engfra.eng.fra 29.1 0.574
newstest2012-engspa.eng.spa 34.9 0.600
newstest2013-engfra.eng.fra 30.1 0.567
newstest2013-engspa.eng.spa 31.8 0.576
newstest2016-enro-engron.eng.ron 25.9 0.548
Tatoeba-test.eng-arg.eng.arg 1.6 0.120
Tatoeba-test.eng-ast.eng.ast 17.2 0.389
Tatoeba-test.eng-cat.eng.cat 47.6 0.668
Tatoeba-test.eng-cos.eng.cos 4.3 0.287
Tatoeba-test.eng-egl.eng.egl 0.9 0.101
Tatoeba-test.eng-ext.eng.ext 8.7 0.287
Tatoeba-test.eng-fra.eng.fra 44.9 0.635
Tatoeba-test.eng-frm.eng.frm 1.0 0.225
Tatoeba-test.eng-gcf.eng.gcf 0.7 0.115
Tatoeba-test.eng-glg.eng.glg 44.9 0.648
Tatoeba-test.eng-hat.eng.hat 30.9 0.533
Tatoeba-test.eng-ita.eng.ita 45.4 0.673
Tatoeba-test.eng-lad.eng.lad 5.6 0.279
Tatoeba-test.eng-lat.eng.lat 12.1 0.380
Tatoeba-test.eng-lij.eng.lij 1.4 0.183
Tatoeba-test.eng-lld.eng.lld 0.5 0.199
Tatoeba-test.eng-lmo.eng.lmo 0.7 0.187
Tatoeba-test.eng-mfe.eng.mfe 83.6 0.909
Tatoeba-test.eng-msa.eng.msa 31.3 0.549
Tatoeba-test.eng.multi 38.0 0.588
Tatoeba-test.eng-mwl.eng.mwl 2.7 0.322
Tatoeba-test.eng-oci.eng.oci 8.2 0.293
Tatoeba-test.eng-pap.eng.pap 46.7 0.663
Tatoeba-test.eng-pms.eng.pms 2.1 0.194
Tatoeba-test.eng-por.eng.por 41.2 0.635
Tatoeba-test.eng-roh.eng.roh 2.6 0.237
Tatoeba-test.eng-ron.eng.ron 40.6 0.632
Tatoeba-test.eng-scn.eng.scn 1.6 0.181
Tatoeba-test.eng-spa.eng.spa 49.5 0.685
Tatoeba-test.eng-vec.eng.vec 1.6 0.223
Tatoeba-test.eng-wln.eng.wln 7.1 0.250

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): arg ast cat cbk cos egl ext fra frm gcf glg hat ita lad lat lij lld lmo mfe mol mwl oci osp pap pms pob por roh ron scn spa vec wln
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>acf<< >>aoa<< >>arg<< >>ast<< >>cat<< >>cbk<< >>cbk_Latn<< >>ccd<< >>cks<< >>cos<< >>cri<< >>crs<< >>dlm<< >>drc<< >>egl<< >>ext<< >>fab<< >>fax<< >>fra<< >>frc<< >>frm<< >>frm_Latn<< >>fro<< >>frp<< >>fur<< >>gcf<< >>gcf_Latn<< >>gcr<< >>glg<< >>hat<< >>idb<< >>ist<< >>ita<< >>itk<< >>kea<< >>kmv<< >>lad<< >>lad_Latn<< >>lat<< >>lat_Latn<< >>lij<< >>lld<< >>lld_Latn<< >>lmo<< >>lou<< >>mcm<< >>mfe<< >>mol<< >>mwl<< >>mxi<< >>mzs<< >>nap<< >>nrf<< >>oci<< >>osc<< >>osp<< >>osp_Latn<< >>pap<< >>pcd<< >>pln<< >>pms<< >>pob<< >>por<< >>pov<< >>pre<< >>pro<< >>qbb<< >>qhr<< >>rcf<< >>rgn<< >>roh<< >>ron<< >>ruo<< >>rup<< >>ruq<< >>scf<< >>scn<< >>sdc<< >>sdn<< >>spa<< >>spq<< >>spx<< >>src<< >>srd<< >>sro<< >>tmg<< >>tvy<< >>vec<< >>vkp<< >>wln<< >>xfa<< >>xum<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newsdev2016-enro.eng-ron 21.4 0.524 1999 51566 0.971
newsdiscussdev2015-enfr.eng-fra 27.7 0.556 1500 27986 1.000
newsdiscusstest2015-enfr.eng-fra 32.1 0.588 1500 28027 0.994
newssyscomb2009.eng-fra 26.6 0.558 502 12334 1.000
newssyscomb2009.eng-ita 27.4 0.578 502 11551 1.000
newssyscomb2009.eng-spa 28.8 0.565 502 12506 0.983
news-test2008.eng-fra 23.8 0.527 2051 52685 0.995
news-test2008.eng-spa 26.3 0.541 2051 52596 0.997
newstest2009.eng-fra 24.9 0.544 2525 69278 0.976
newstest2009.eng-ita 27.3 0.572 2525 63474 1.000
newstest2009.eng-spa 27.8 0.560 2525 68114 0.998
newstest2010.eng-fra 27.1 0.559 2489 66043 0.985
newstest2010.eng-spa 32.2 0.588 2489 65522 0.993
newstest2011.eng-fra 29.2 0.576 3003 80626 0.969
newstest2011.eng-spa 33.8 0.591 3003 79476 0.978
newstest2012.eng-fra 27.3 0.560 3003 78011 0.984
newstest2012.eng-spa 33.5 0.590 3003 79006 0.962
newstest2013.eng-fra 27.7 0.549 3000 70037 0.972
newstest2013.eng-spa 30.3 0.566 3000 70528 0.948
newstest2016-enro.eng-ron 20.8 0.510 1999 49094 0.984
Tatoeba-test.eng-arg 12.4 0.328 105 405 1.000
Tatoeba-test.eng-ast 24.4 0.476 99 720 0.980
Tatoeba-test.eng-cat 44.5 0.648 1631 12342 0.989
Tatoeba-test.eng-cbk 4.4 0.253 1498 10591 0.968
Tatoeba-test.eng-cos 39.5 0.680 5 45 0.931
Tatoeba-test.eng-egl 0.4 0.118 84 438 1.000
Tatoeba-test.eng-ext 11.4 0.345 69 353 1.000
Tatoeba-test.eng-fra 39.8 0.605 10000 80759 0.974
Tatoeba-test.eng-frm 2.1 0.221 18 211 1.000
Tatoeba-test.eng-gcf 0.8 0.118 99 560 0.989
Tatoeba-test.eng-glg 41.5 0.627 1008 7828 0.986
Tatoeba-test.eng-hat 33.1 0.549 64 416 0.978
Tatoeba-test.eng-ita 42.5 0.651 10000 65498 0.953
Tatoeba-test.eng-lad 7.5 0.288 629 3354 1.000
Tatoeba-test.eng-lad_Latn 8.0 0.314 582 3097 1.000
Tatoeba-test.eng-lat 10.4 0.371 10000 74902 0.930
Tatoeba-test.eng-lij 4.0 0.278 94 711 0.983
Tatoeba-test.eng-lld 1.0 0.213 21 228 0.973
Tatoeba-test.eng-lmo 8.8 0.317 17 124 1.000
Tatoeba-test.eng-mfe 83.6 0.905 7 36 1.000
Tatoeba-test.eng-multi 35.1 0.564 10000 74243 0.964
Tatoeba-test.eng-mwl 7.8 0.505 4 21 1.000
Tatoeba-test.eng-oci 9.9 0.330 841 5219 0.910
Tatoeba-test.eng-osp 13.9 0.331 3 20 1.000
Tatoeba-test.eng-pap 49.0 0.673 70 376 1.000
Tatoeba-test.eng-pms 14.3 0.359 268 2244 0.944
Tatoeba-test.eng-por 41.6 0.640 10000 75353 0.971
Tatoeba-test.eng-roh 22.5 0.476 16 198 1.000
Tatoeba-test.eng-ron 33.6 0.580 5000 36833 0.970
Tatoeba-test.eng-scn 38.9 0.482 4 42 1.000
Tatoeba-test.eng-spa 45.4 0.657 10000 77291 0.974
Tatoeba-test.eng-vec 5.6 0.315 19 127 0.927
Tatoeba-test.eng-wln 11.9 0.299 89 520 0.951
tico19-test.eng-fra 33.2 0.588 2100 64655 0.978
tico19-test.eng-pob 41.4 0.686 2100 62729 0.947
tico19-test.eng-por 40.7 0.683 2100 62729 0.959
tico19-test.eng-spa 42.4 0.681 2100 66591 0.950