Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LsiModel: Only log top words that actually exist in the dictionary #3091

Merged
merged 5 commits into from
Apr 9, 2021

Conversation

kmurphy4
Copy link
Contributor

In some pathological cases, we might try to log the top N words, even
though we haven't seen N words yet. In these cases, we can just exit
the loop early.

Closes #3090.

@kmurphy4
Copy link
Contributor Author

@piskvorky should I add any documentation or test cases?

@piskvorky
Copy link
Owner

Thanks! Maybe add a code comment that links to #3090.

So that future people reading the code are not confused or don't accidentally delete that code, thinking it's useless.

In some pathological cases, we might try to log the top N words, even
though we haven't seen N words yet.  In these cases, we can just exit
the loop early.

Closes piskvorky#3090.
@kmurphy4
Copy link
Contributor Author

Ok, force-pushed a comment. Thanks for being so responsive 😄

In 8f8cb49, I added a check that checks

    val in self.id2word

When testing, `id2word` is actually an instance of `FakeDict`, which
doesn't implement `__contains__()` (so Python falls back to calling
`__getitem__()`[1]).  The tests didn't like this[2].

[1] https://docs.python.org/3.6/reference/datamodel.html#object.__contains__
[2] https://github.com/RaRe-Technologies/gensim/runs/2197137529
@piskvorky piskvorky changed the title lsimodel: Only log top words that actually exist in <id2word> LsiModel: Only log top words that actually exist in the dictionary Mar 25, 2021
@kmurphy4
Copy link
Contributor Author

kmurphy4 commented Apr 9, 2021

@piskvorky anything else you need from me here? No rush, just trying to make sure it doesn't get lost in the shuffle 😄

@piskvorky
Copy link
Owner

piskvorky commented Apr 9, 2021

Thanks, a reminder is definitely a good idea :)

PR already approved by me; the next step is a review by @mpenkov, who will merge if all OK.

@piskvorky piskvorky requested a review from mpenkov April 9, 2021 10:48
@mpenkov
Copy link
Collaborator

mpenkov commented Apr 9, 2021

Looks good to me! Thank you @kmurphy4 !

@mpenkov mpenkov merged commit 840df94 into piskvorky:develop Apr 9, 2021
@kmurphy4
Copy link
Contributor Author

kmurphy4 commented Apr 9, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError in LsiModel constructor
3 participants