Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction errors using pre-trained eng-kor model #21

Open
amesval opened this issue Jan 12, 2022 · 1 comment
Open

Prediction errors using pre-trained eng-kor model #21

amesval opened this issue Jan 12, 2022 · 1 comment

Comments

@amesval
Copy link

amesval commented Jan 12, 2022

Hi everyone. I would like to make some inferences and replicate the reported BLEU Score for the English to Korean Translation model (https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-kor).
I downloaded the files there and installed marian-nmt on Ubuntu 20.04.3, including protobuf to use Sentencepiece as required in https://marian-nmt.github.io/docs/ .
I ran the preprocess.sh, then with its output ran marian-decoder to get the translations, and finally ran the postprocess.sh.
The results were unexpected, in fact there where no Korean characters at all.

Am I doing something wrong?

@jorgtied
Copy link
Member

Ah, I forgot to include the vocab files that are mentioned in the decoder.yml. Thanks for pointing me into that direction. The *.vocab.yml file is not the correct one here as this model comes with separate vocabularies for source and target language. Look into the decoder.yml file to see that this is the case. But the *.vocab files mentioned there are missing. However, you can use the spm-files directly. Edit the decoder.yml file to look like this:

relative-paths: true
models:
  - opusTCv20210807+bt.spm32k-spm32k.transformer-align.model1.npz.best-perplexity.npz
vocabs:
  - source.spm
  - target.spm
beam-size: 6
normalize: 1
word-penalty: 0
mini-batch: 1
maxi-batch: 1
maxi-batch-sort: src

And then run something like that:

echo "This is a test." | ./preprocess.sh eng source.spm | marian-decoder -c decoder.yml

Does that work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants