Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2Vec ns_exponent cannot be changed from default #2746

Closed
coopwilliams opened this issue Feb 6, 2020 · 3 comments
Closed

Word2Vec ns_exponent cannot be changed from default #2746

coopwilliams opened this issue Feb 6, 2020 · 3 comments
Labels
bug Issue described a bug

Comments

@coopwilliams
Copy link

coopwilliams commented Feb 6, 2020

Problem description

I am trying to train Word2Vec and tune the ns_exponent hyperparameter. When I initialize the model, I set ns_exponent = 0.5, but find that it has reset to the default of ns_exponent = 0.75 immediately after initializing.

I looked through the Word2Vec source code for any mentions of ns_exponent, but found no reason for the class to ignore my argument. I suspected the Vocabulary initialization may have something to do with it, but that seems to take its argument straight from the __init__. Neither do I believe that I am overriding the ns_exponent setting with one of the other parameters, because this occurs even when ns_exponent is the only one explicitly set.

Steps/code/corpus to reproduce

model = Word2Vec(ns_exponent = 0.5)
print(model.ns_exponent)

The printed output is:

0.75

and the resulting model's ns_exponent attribute is set to 0.75 as well.

Versions

Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.0
SciPy 1.1.0
gensim 3.6.0
FAST_VERSION 0
@gojomo
Copy link
Collaborator

gojomo commented Feb 6, 2020

Thanks for the clear report! The problem can be even more compactly demonstrated:

In [1]: from gensim.models import Word2Vec                                                        
In [2]: model = Word2Vec(ns_exponent=0.1)                                                         
In [3]: model.ns_exponent                                                                         
Out[3]: 0.75

While this is a confusing bit of model state, I'm pretty sure your intended value still took effect – it's just that it was passed into a separatemodel.vocabulary.ns_exponent property, where it was consulted to build the scaled-cumulative-proportions table being used by the model. If I continue the reproduction REPL above:

In [4]: model.vocabulary.ns_exponent                                                              
Out[4]: 0.1

The code problem is that Word2Vec.__init__() isn't including the provided non-default ns_exponent value to its call to its abstract superclass's __init__, which then caches the default value into the (redundant & not-consulted) self.ns_exponent property – while having no effect on the actually-operative self.vocabulary.ns_exponent.

There's a refactor-in-progress (#2698) that will resolve this, making the model.ns_exponent the sole operative location, but in the meantime - your requested alternate value should still be taking effect, and be visible in model.vocabulary.ns_exponent, so no explicit workaround is necessary.

@gojomo gojomo added the bug Issue described a bug label Feb 6, 2020
@coopwilliams
Copy link
Author

Thank you for the thorough and prompt response! This sets my heart at ease. I'll edit the issue as you suggested and use your fix.

@gojomo
Copy link
Collaborator

gojomo commented Jul 14, 2020

Fixed by #2698.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug
Projects
None yet
Development

No branches or pull requests

2 participants