Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of vocabulary word vectors #25

Closed
Henry-E opened this issue Jun 18, 2017 · 4 comments
Closed

Out of vocabulary word vectors #25

Henry-E opened this issue Jun 18, 2017 · 4 comments
Labels

Comments

@Henry-E
Copy link

Henry-E commented Jun 18, 2017

I'm interested in implementing something similar in a different framework. And I was wondering how out of vocabulary word vectors are handled. From the code I can see for each example there's a vocab size of 50k + number of unknown words in the article. I'm not super familiar with tensorflow and I was wondering what the out of vocabulary word vectors looked like. Is the vocab size really 50k + maximum number of unknown words in an article. Or are the word vectors for the unknowns set to zero / randomised for every example?

@abisee
Copy link
Owner

abisee commented Jun 20, 2017

Hi @Henry-E,

Have you read the paper? This is a question that would be easier to answer by looking at the paper than the code.

For the pointer-generator model, OOV words in the source text are represented by the word vector for the UNK token (because as they're OOV, there is no word vector to use). In the paper and the code, we refer to e.g. "50k + number of unknown words in the article" as the extended vocabulary because that's the set of words that can be produced by the decoder. This doesn't mean that all the words in the extended vocabulary have word vectors, though. Only the words in the original vocabulary have word vectors.

Hope this answers your question.

@Henry-E
Copy link
Author

Henry-E commented Jun 21, 2017

Yep that sounds good. I was just trying to figure out how the probability distribution for the extended vocabulary was obtained in practice. For some reason I thought in order to attend properly the attention mechanism needed to be able to distinguish between unique OOV words.

@Henry-E Henry-E closed this as completed Jun 21, 2017
@abisee
Copy link
Owner

abisee commented Jun 21, 2017

@Henry-E re: distinguishing between unique OOV words, also check out the discussion in the comments started by RobinChen here.

@Henry-E
Copy link
Author

Henry-E commented Jun 21, 2017

Ah ok, and the final bit that I also was confused about, for anyone who comes across this later, was how the loss is calculated. I subsequently thought that maybe the copy mechanism was only applied at test time but it appears that the loss is calculated based on the extended vocabulary.

The loss function is as described in equations (6) and (7), but with respect to our modified probability distribution P(w) given in equation (9).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants