Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastTextKeyedVectors.word_vec(...,use_norm=True) nonsensically summing ngrams norms #2667

Closed
gojomo opened this issue Nov 3, 2019 · 1 comment
Assignees

Comments

@gojomo
Copy link
Collaborator

gojomo commented Nov 3, 2019

The logic in https://github.com/RaRe-Technologies/gensim/blob/ee6169100d13d7f684b96ac137065e302aeb7b1e/gensim/models/keyedvectors.py#L2090 regarding the summing of normed ngram-vectors, looks like nonsense to me.

A caller supplying use_norm expects (per doc-comment) a unit-normed vector back. But summing a bunch of individually unit-normed ngram vectors won't achieve that: you'd need to unit-norm the final result. And it's unlikely you'd want to use unit-normed summands: the original raw ngram vectors are what the algorithm has trained to model words-from-fragments.

Of course, matching the FB reference implementation's behavior should be a top goal, but I doubt it offers any such summing-of-normed-ngram-vectors.

If this current behavior is replaced with the more sensible "add raw vecs, unit-normalize final result", then there's no need to ever calculate/store the vectors_ngrams_norm array.

@gojomo
Copy link
Collaborator Author

gojomo commented Jul 14, 2020

Fixed in #2698.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants