Is it possible to train with just src, mt, ter? #46

jimbogill · 2019-11-08T14:01:27Z

Hi,
Thanks for making openkiwi available!
Can any of the openkiwi models be trained when the only data available is
[source sentence], [machine translation sentence], [TER score]
? As far as I can tell all the examples need more data, for example the tags. But maybe I missed something.
Thanks,
James

captainvera · 2019-11-11T14:52:40Z

Hi James!

Although this was something we never considered, we did have some requests about that and we added that functionality. This is the reason why all of the examples assume you would have more data. However, as I mentioned, you can train with just src, mt and TER.

To do so you need to specify the following in your config file (besides the rest of the parameters):

sentence-level: True
predict-gaps: False
predict-target: False
predict-source: False

Keep in mind that there are some config options based on word-level tags that might not work when training just for sentence-level.

Let us know if you find any errors while training only with sentences!
Miguel

captainvera · 2019-11-25T11:42:45Z

Closing this since there have been no updates, feel free to re-open if you have further questions!

jimbogill · 2019-11-28T10:10:20Z

Thanks Miguel. I successfully trained a predictor-estimator model on WMT data following your advice. I did this using a modified version of the config file in the experiments directory. In case it's useful, here are the modifications I made:

OpenKiwi/experiments/train_estimator.yaml

Line 32 in 715eba7

sentence-level: False

Changed to true

OpenKiwi/experiments/train_estimator.yaml

Line 46 in 715eba7

predict-target: true

Changed to false

OpenKiwi/experiments/train_estimator.yaml

Line 108 in 715eba7

train-target-tags: data/WMT17/word_level/train.tags

Changed to train-sentence-scores

OpenKiwi/experiments/train_estimator.yaml

Line 114 in 715eba7

valid-target-tags: data/WMT17/word_level/dev.tags

Changed to valid-sentence-scores

captainvera self-assigned this Nov 11, 2019

captainvera added the question Further information is requested label Nov 11, 2019

captainvera closed this as completed Nov 25, 2019

lluisg mentioned this issue Mar 1, 2021

Poor results when training Estimator with parallel data and TER scores #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to train with just src, mt, ter? #46

Is it possible to train with just src, mt, ter? #46

jimbogill commented Nov 8, 2019

captainvera commented Nov 11, 2019

captainvera commented Nov 25, 2019

jimbogill commented Nov 28, 2019

Is it possible to train with just src, mt, ter? #46

Is it possible to train with just src, mt, ter? #46

Comments

jimbogill commented Nov 8, 2019

captainvera commented Nov 11, 2019

captainvera commented Nov 25, 2019

jimbogill commented Nov 28, 2019