Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an empty gzipped phrase table #1

Closed
gvskalyan opened this issue Mar 20, 2020 · 1 comment
Closed

Getting an empty gzipped phrase table #1

gvskalyan opened this issue Mar 20, 2020 · 1 comment

Comments

@gvskalyan
Copy link

@gvskalyan gvskalyan commented Mar 20, 2020

Hi @jsenellart @srush ,
what should be the N value passed to docker while creating phrase table, not passing it and passing a huge value such as 128 (assuming phrase contains 10 words with around 10 characters) outputs an empty file after around 40 + minutes on a 16 core machine.

Input : train files containing sentencepiece-tokenized sentences.

while processing :
image

Ref : https://github.com/OpenNMT/papers/tree/master/WNMT2018/vmap#building-phrase-table

@guillaumekln

This comment has been minimized.

Copy link
Member

@guillaumekln guillaumekln commented Mar 24, 2020

For reference, this was discussed on the forum: https://forum.opennmt.net/t/get-vmap-from-the-corpus-to-be-used-in-ctranslate2/3573

There were some issues with the training data (mostly empty lines). This is fixed by c44b9ff which adds a basic filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.