Question of out of vocabulary words #733

gmramaswamy · 2018-05-23T13:10:10Z

hello,
i am justing getting started using opennmt-py for translation. pardon me if i have missed something basic here.
the issue i am facing is regarding handling of OOV words. My requirement is that OOV needs to be given back exactly without translation. but i am unable to achieve that.
From what i could see in past issues, one recommendation was to preprocess the data with 2 additional parameters -dynamic_dict -share_vocab and while running translation run with -replace_unk. this didnt solve the problem.
I also tried the mechanism of doing pre-processing with the above metioned parameters and also giving -copy_attn in model training. In one past ticket it was mentioned that it is not needed but tried this also. Didnt work.

Can someone please help in terms of what i need to do to ensure that OOV are just given back without translation?

thanks,
ganesh

The text was updated successfully, but these errors were encountered:

tulongthienvu · 2018-05-25T08:14:08Z

As I know, I haven't any experience with -dynamic_dict and share_vocab so I just explain the result when you used -replace_unk.

A small question before I explain -replace_unk mechanism: How can a translation model know exactly what word in source sentence is equivalent to OOV word in target sentence? Human can know this easily but very hard for the model.
-replace_unk in OpenNMT works based on result of Attention mechanism. In translating phase, if the model meets an unknown word, it will look at Attention distribution of this unknown word on source words then pick source word that has maximum Attention weigths.
As I explained above, -replace_unk didn't copy exactly source words as your expectation because Attention weights that learned by the model wasn't good enough.
If you want to ensure the model work as your expectation, you need give the model word alignment information between source and target sentences. But it consumes so much effort.

Hope my explanation helps you.

vince62s closed this as completed Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question of out of vocabulary words #733

Question of out of vocabulary words #733

gmramaswamy commented May 23, 2018

tulongthienvu commented May 25, 2018

Question of out of vocabulary words #733

Question of out of vocabulary words #733

Comments

gmramaswamy commented May 23, 2018

tulongthienvu commented May 25, 2018