Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange result when retrieving a not existing word embedding for a document #520

Closed
claudiogreco opened this issue Nov 10, 2015 · 5 comments
Assignees

Comments

@claudiogreco
Copy link

Hello,

after creating a doc2vec model using the code reported in the tutorial with python 3.5 and gensim 0.12.3, I have wrongly tried to retrieve a missing document, but I have received a strange numpy.ndarray as a result whose shape is (1, 1235, 300), where 1235 is the number of documents and 300 is the size of embeddings. Why does it happen? Shouldn't be raised an exception? Eventually, how can we check if a document is missing or not?

Thank you in advance,
Claudio

@gojomo
Copy link
Collaborator

gojomo commented Nov 10, 2015

This is caused by an unfortunate interaction between out internal method which converts a string-tag to an int-index (which returns None for not-present) and numpy array indexing, which returns that nested 1xCOUNTxSIZE result when passed None.

Was a KeyError what you'd expected?

Until that's fixed, you can check whether a tag is in the trained set with key in model.docvecs.

@gojomo gojomo self-assigned this Nov 10, 2015
@claudiogreco
Copy link
Author

Ok, I'll check whether a tag is in the trained set using your suggestion for now. Thank you for your help.

@tmylk
Copy link
Contributor

tmylk commented Jan 9, 2016

@gojomo should this be closed as a workaround exists?

@gojomo
Copy link
Collaborator

gojomo commented Jan 10, 2016

@tmylk no, there should really be a KeyError (rather than a giant nested array) when the requested key isn't present. I'll prep a fix before next week.

gojomo added a commit that referenced this issue Jan 12, 2016
gojomo added a commit that referenced this issue Jan 16, 2016
fix for #520: raise KeyError when no matching doctag
@gojomo
Copy link
Collaborator

gojomo commented Jan 16, 2016

Fixed by #582.

@gojomo gojomo closed this as completed Jan 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants