Conversation
Job PR-153/1 is complete. |
"\n", | ||
"def get_knn(vocab, k, word):\n", | ||
" word_vec = vocab.embedding[word].reshape((-1, 1))\n", | ||
" vocab_vecs = norm_vecs_by_row(vocab.embedding.idx_to_vec)\n", | ||
" dot_prod = nd.dot(vocab_vecs, word_vec)\n", | ||
" indices = nd.topk(dot_prod.reshape((len(vocab), )), k=k+5, ret_typ='indices')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep k+5 to avoid retrieving special tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why special tokens were retrieved is that their word vectors were nan
and the topk
operator considered nan
as the highest ranking. With the changes here the special tokens will not be among the topk elements, consequently instead of k + 5 we need k + 1 to only exclude the input token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With mxnet 1.3 the topk
operator however gets confused by the nan
values and returns some 'random' words instead of the topk words if nan
is present in the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Do you know why nan
is returned for special tokens? Is it because they are initialized as zero vecs so denominator of cosine sim has 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are initialized as 0 and consequently there was a division by 0 in norm_vecs_by_row
" indices = [int(i.asscalar()) for i in indices]\n", | ||
" # Remove unknown and input tokens.\n", | ||
" return vocab.to_tokens(indices[5:])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
Besides, please update
Otherwise, LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started review by accident
"\n", | ||
"def get_knn(vocab, k, word):\n", | ||
" word_vec = vocab.embedding[word].reshape((-1, 1))\n", | ||
" vocab_vecs = norm_vecs_by_row(vocab.embedding.idx_to_vec)\n", | ||
" dot_prod = nd.dot(vocab_vecs, word_vec)\n", | ||
" indices = nd.topk(dot_prod.reshape((len(vocab), )), k=k+5, ret_typ='indices')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are initialized as 0 and consequently there was a division by 0 in norm_vecs_by_row
The get_top_k_by_analogy in your last comment doesn't work due to some missing functionality in TokenEmbedding. Please confirm that you are fine with ed0b9e5 |
LGTM |
Job PR-153/5 is complete. |
* Workaround mxnet nd.topk regression apache/mxnet#11271 * Simplify get_top_k_by_analogy
apache/mxnet#11271
Description
The behavior of nd.topk under the presence of
nan
values changed, causing wrong results in the pretrained word embeddings notebook due todef norm_vecs_by_row(x)
inducingnan
values for0
word vectors. This PR adds a small epsilon todef norm_vecs_by_row(x)
so thatnan
values are avoided.Checklist
Essentials
Changes