-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of vocabulary word vectors #25
Comments
Hi @Henry-E, Have you read the paper? This is a question that would be easier to answer by looking at the paper than the code. For the pointer-generator model, OOV words in the source text are represented by the word vector for the UNK token (because as they're OOV, there is no word vector to use). In the paper and the code, we refer to e.g. "50k + number of unknown words in the article" as the extended vocabulary because that's the set of words that can be produced by the decoder. This doesn't mean that all the words in the extended vocabulary have word vectors, though. Only the words in the original vocabulary have word vectors. Hope this answers your question. |
Yep that sounds good. I was just trying to figure out how the probability distribution for the extended vocabulary was obtained in practice. For some reason I thought in order to attend properly the attention mechanism needed to be able to distinguish between unique OOV words. |
Ah ok, and the final bit that I also was confused about, for anyone who comes across this later, was how the loss is calculated. I subsequently thought that maybe the copy mechanism was only applied at test time but it appears that the loss is calculated based on the extended vocabulary.
|
I'm interested in implementing something similar in a different framework. And I was wondering how out of vocabulary word vectors are handled. From the code I can see for each example there's a vocab size of 50k + number of unknown words in the article. I'm not super familiar with tensorflow and I was wondering what the out of vocabulary word vectors looked like. Is the vocab size really 50k + maximum number of unknown words in an article. Or are the word vectors for the unknowns set to zero / randomised for every example?
The text was updated successfully, but these errors were encountered: