New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_word2vec_format(): new parameter to skip init_sims() call #545
Conversation
…skips the init_sims call when False
Sounds useful and clean, +1. A unit test for this new parameter would be useful. Let's wait for @gojomo review & then merge. |
Also @svenkreiss , can you commit a brief description of this change in |
@piskvorky thanks for comments. Done. |
Perfect, thanks a lot @svenkreiss ! |
load_word2vec_format(): new parameter to skip init_sims() call
Looks fine, but another more radical simplification that reduces parameters & codepaths would also be worth considering: just don't do any automatic norming in the Then the ( If the user starts making similarity calls, If they instead want to convert to a compact, normed-only model, they'd call |
Oh yes, that makes sense too. I actually like @gojomo 's option better -- simpler is better. |
I also agree with @gojomo. I can prepare a PR that removes the init_sims and norm_only parameters next week. This would be api backwards incompatible and change the default behavior of this function. |
Yes, we'll need a prominent warning in CHANGELOG :) Thanks again @svenkreiss , you're really helpful! |
In certain use cases (custom doc2vec-type computations) only unnormalized vectors are used. The
init_sims()
call at the end ofload_word2vec_format
takes a lot of memory (even withnorm_only=True
) and is unnecessary in this scenario. This PR allows to skip the call which improves performance.