load_word2vec_format(): new parameter to skip init_sims() call #545

svenkreiss · 2015-11-24T16:55:31Z

In certain use cases (custom doc2vec-type computations) only unnormalized vectors are used. The init_sims() call at the end of load_word2vec_format takes a lot of memory (even with norm_only=True) and is unnecessary in this scenario. This PR allows to skip the call which improves performance.

…skips the init_sims call when False

piskvorky · 2015-11-25T02:51:14Z

Sounds useful and clean, +1. A unit test for this new parameter would be useful.

Let's wait for @gojomo review & then merge.

piskvorky · 2015-11-25T02:52:39Z

Also @svenkreiss , can you commit a brief description of this change in CHANGELOG.txt?

…vec_format()

Conflicts: CHANGELOG.txt

svenkreiss · 2015-11-25T16:21:18Z

@piskvorky thanks for comments. Done.

piskvorky · 2015-11-25T23:20:02Z

Perfect, thanks a lot @svenkreiss !

load_word2vec_format(): new parameter to skip init_sims() call

gojomo · 2015-11-25T23:54:49Z

Looks fine, but another more radical simplification that reduces parameters & codepaths would also be worth considering: just don't do any automatic norming in the load_word2vec_format(). (That is, remove the init_sims() call rather than make it switchable.)

Then the (syn0) result of a load is really just a load, not a load-and-do-other stuff. This would also be roughly consistently with the syn0 state after native gensim training: you have the raw vectors, what you do with them next is up to your explicit further steps.

If the user starts making similarity calls, syn0norm would be automatically backfilled, as with natively-trained vectors.... but someone who doesn't need that would just choose not to trigger it.

If they instead want to convert to a compact, normed-only model, they'd call init_sims(norm_only=True) themself – just as if they'd trained the vectors with gensim (rather than just loaded).

piskvorky · 2015-11-26T01:45:27Z

Oh yes, that makes sense too. I actually like @gojomo 's option better -- simpler is better.

svenkreiss · 2015-11-26T12:13:00Z

I also agree with @gojomo. I can prepare a PR that removes the init_sims and norm_only parameters next week.

This would be api backwards incompatible and change the default behavior of this function.

piskvorky · 2015-11-26T13:03:47Z

Yes, we'll need a prominent warning in CHANGELOG :)

Thanks again @svenkreiss , you're really helpful!

svenkreiss added 2 commits November 12, 2015 09:12

simply comment out L2 norm calculation for now

558cb71

load_word2vec_format(): add parameter init_sims (default true) which …

6dba02b

…skips the init_sims call when False

svenkreiss added 3 commits November 25, 2015 09:20

changelog entry: new parameter to skip init_sims() call in load_word2…

8863005

…vec_format()

Merge branch 'develop' into skip-l2-norm-calc

0196366

Conflicts: CHANGELOG.txt

unittest for init_sims=False in load_word2vec_format()

34a9da9

piskvorky added a commit that referenced this pull request Nov 25, 2015

Merge pull request #545 from svenkreiss/skip-l2-norm-calc

5535fcf

load_word2vec_format(): new parameter to skip init_sims() call

piskvorky merged commit 5535fcf into piskvorky:develop Nov 25, 2015

svenkreiss deleted the skip-l2-norm-calc branch November 30, 2015 16:06

svenkreiss mentioned this pull request Nov 30, 2015

load_word2vec_format(): remove init_sims() call #555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_word2vec_format(): new parameter to skip init_sims() call #545

load_word2vec_format(): new parameter to skip init_sims() call #545

svenkreiss commented Nov 24, 2015

piskvorky commented Nov 25, 2015

piskvorky commented Nov 25, 2015

svenkreiss commented Nov 25, 2015

piskvorky commented Nov 25, 2015

gojomo commented Nov 25, 2015

piskvorky commented Nov 26, 2015

svenkreiss commented Nov 26, 2015

piskvorky commented Nov 26, 2015

load_word2vec_format(): new parameter to skip init_sims() call #545

load_word2vec_format(): new parameter to skip init_sims() call #545

Conversation

svenkreiss commented Nov 24, 2015

piskvorky commented Nov 25, 2015

piskvorky commented Nov 25, 2015

svenkreiss commented Nov 25, 2015

piskvorky commented Nov 25, 2015

gojomo commented Nov 25, 2015

piskvorky commented Nov 26, 2015

svenkreiss commented Nov 26, 2015

piskvorky commented Nov 26, 2015