Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to train with just src, mt, ter? #46

Closed
jimbogill opened this issue Nov 8, 2019 · 3 comments
Closed

Is it possible to train with just src, mt, ter? #46

jimbogill opened this issue Nov 8, 2019 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@jimbogill
Copy link

Hi,
Thanks for making openkiwi available!
Can any of the openkiwi models be trained when the only data available is
[source sentence], [machine translation sentence], [TER score]
? As far as I can tell all the examples need more data, for example the tags. But maybe I missed something.
Thanks,
James

@captainvera
Copy link
Contributor

Hi James!

Although this was something we never considered, we did have some requests about that and we added that functionality. This is the reason why all of the examples assume you would have more data. However, as I mentioned, you can train with just src, mt and TER.

To do so you need to specify the following in your config file (besides the rest of the parameters):

sentence-level: True
predict-gaps: False
predict-target: False
predict-source: False

Keep in mind that there are some config options based on word-level tags that might not work when training just for sentence-level.

Let us know if you find any errors while training only with sentences!
Miguel

@captainvera captainvera self-assigned this Nov 11, 2019
@captainvera captainvera added the question Further information is requested label Nov 11, 2019
@captainvera
Copy link
Contributor

Closing this since there have been no updates, feel free to re-open if you have further questions!

@jimbogill
Copy link
Author

Thanks Miguel. I successfully trained a predictor-estimator model on WMT data following your advice. I did this using a modified version of the config file in the experiments directory. In case it's useful, here are the modifications I made:

sentence-level: False
Changed to true

predict-target: true
Changed to false

train-target-tags: data/WMT17/word_level/train.tags
Changed to train-sentence-scores

valid-target-tags: data/WMT17/word_level/dev.tags
Changed to valid-sentence-scores

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants