Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow initialization with max_final_vocab in lieu of min_count for gensim.models.Word2Vec. Fix #465 #1915

Merged
merged 29 commits into from
Mar 22, 2018

Conversation

aneesh-joshi
Copy link
Contributor

This is a rough implementation of the feature described in the Issue #465

Test code :

import gensim
from nltk.corpus import brown

sentences = brown.sents()
sentences = sentences[0:1000]

model = gensim.models.Word2Vec(sentences, min_count=1, use_max_vocab=True, max_vocab=8000)
print(model.most_similar('the'))
print(len(model.wv.vocab))

Output:

[('to', 0.9996629953384399), ('a', 0.9996598362922668), ('and', 0.9996397495269775), ('on', 0.999603271484375), ('by', 0.9995988607406616), ('that', 0.9995942115783691), ('of', 0.9995760321617126), (',', 0.9995731115341187), ("''", 0.9995728731155396), ('in', 0.9995712637901306)]
23

@aneesh-joshi
Copy link
Contributor Author

@gojomo please review

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Gordon suggestion - #465 (comment)

@@ -425,7 +425,8 @@ class Word2Vec(BaseWordEmbeddingsModel):
def __init__(self, sentences=None, size=100, alpha=0.025, window=5, min_count=5,
max_vocab_size=None, sample=1e-3, seed=1, workers=3, min_alpha=0.0001,
sg=0, hs=0, negative=5, cbow_mean=1, hashfxn=hash, iter=5, null_word=0,
trim_rule=None, sorted_vocab=1, batch_words=MAX_WORDS_IN_BATCH, compute_loss=False, callbacks=()):
trim_rule=None, sorted_vocab=1, batch_words=MAX_WORDS_IN_BATCH, compute_loss=False, callbacks=(),
use_max_vocab=False, max_vocab=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be implemented only for word2vec (or for other *2vec models too)?
CC: @gojomo

self.max_vocab_size = max_vocab_size
self.min_count = min_count
self.sample = sample
self.sorted_vocab = sorted_vocab
self.null_word = null_word
self.cum_table = None # for negative sampling
self.raw_vocab = None
self.use_max_vocab = use_max_vocab
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem with backward compatibility, here and above (when you add the new attribute, you should modify load function for the case when a user load old model (without this attribute) with new code (with new attribute)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where I should make changes?

        try:
            return super(Word2Vec, cls).load(*args, **kwargs)
        except AttributeError:
            logger.info('Model saved using code from earlier Gensim Version. Re-loading old model in a compatible way.')
            from gensim.models.deprecated.word2vec import load_old_word2vec
            return load_old_word2vec(*args, **kwargs)

@@ -1131,14 +1134,17 @@ def __iter__(self):


class Word2VecVocab(utils.SaveLoad):
def __init__(self, max_vocab_size=None, min_count=5, sample=1e-3, sorted_vocab=True, null_word=0):
def __init__(self, max_vocab_size=None, min_count=5, sample=1e-3, sorted_vocab=True, null_word=0,
use_max_vocab=False, max_vocab=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add 2 parameters, max_vocab already enough.

if self.max_vocab is not None:
import operator

sorted_vocab = sorted(self.raw_vocab.items(), key=operator.itemgetter(1), reverse=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer if only sorting the keys, and using a lambda – as is already done in sibling method sort_vocab().

calc_min_count = 0

for item in sorted_vocab:
curr_count += item[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each word only counts as 1 in final-vocabulary size, so its actual occurrence count shouldn't be part of any tallying. (If max_vocab=10, you just need to throw out all words with the same or fewer occurrences as the 11th word, sorted_vocab[10].)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh!
I took max_vocab to mean maximum number of words
Whereas, it is maximum number of unique words.

I should've realised this sooner!

I thought it would make sense for the user to choose a maximum number of words they'd like. (Would that be good idea as another parameter/option for the user?)

calc_min_count = item[1]
else:
break
min_count = calc_min_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clobbers any other min_count provided – rather than respecting both min_count and max_vocab if both are supplied. As per my comment in #465, "If both a min_count and max_vocab are specified, they should both be satisfied - which in practice would mean whichever implies the higher min_count."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_vocab() logging & return-value should provide the same visibility into this parameter's effects (including in a 'dry_run) as is available for min_count`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the lines to:

            if calc_min_count > min_count:
                min_count = calc_min_count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_vocab() logging & return-value should provide the same visibility into this parameter's effects (including in a 'dry_run) as is available formin_count`.

Do you mean I should add comments describing max_vocab and logging comments describing the outcome of the max_vocab processing?

@aneesh-joshi
Copy link
Contributor Author

aneesh-joshi commented Feb 20, 2018

This commit moves the code to prepare_vocab and makes a single parameter max_vocab which is None by default.

The code calculates the min_count required to get the max_vocab.
However, it should be noted, that it doesn't do so perfectly but as well as it can, considering it only has access to min_count

Use the following code to test it :

import gensim
from nltk.corpus import brown
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

sentences = brown.sents()
sentences = sentences[0:1000]

model = gensim.models.Word2Vec(sentences, max_vocab=15000)

varying the max_vocab will give different results like :

max_vocab = 15000 gives the following log:

2018-02-21 00:08:43,196 : INFO : collecting all words and their counts
2018-02-21 00:08:43,196 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-02-21 00:08:43,268 : INFO : collected 4641 word types from a corpus of 22079 raw words and 1000 sentences
2018-02-21 00:08:43,272 : INFO : Loading a fresh vocabulary
**2018-02-21 00:08:43,276 : INFO : min_count=6 retains 516 unique words (11% of original 4641, drops 4125)
2018-02-21 00:08:43,276 : INFO : min_count=6 leaves 15228 word corpus (68% of original 22079, drops 6851)**
2018-02-21 00:08:43,276 : INFO : deleting the raw counts dictionary of 4641 items
2018-02-21 00:08:43,276 : INFO : sample=0.001 downsamples 46 most-common words
2018-02-21 00:08:43,276 : INFO : downsampling leaves estimated 8737 word corpus (57.4% of prior 15228)
2018-02-21 00:08:43,280 : INFO : estimated required memory for 516 words and 100 dimensions: 670800 bytes
2018-02-21 00:08:43,280 : INFO : resetting layer weights

and

max_vocab = 20000 gives

2018-02-21 00:09:52,816 : INFO : collecting all words and their counts
2018-02-21 00:09:52,816 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2018-02-21 00:09:52,887 : INFO : collected 4641 word types from a corpus of 22079 raw words and 1000 sentences
2018-02-21 00:09:52,902 : INFO : Loading a fresh vocabulary
**2018-02-21 00:09:52,902 : INFO : min_count=1 retains 4641 unique words (100% of original 4641, drops 0)
2018-02-21 00:09:52,902 : INFO : min_count=1 leaves 22079 word corpus (100% of original 22079, drops 0)**
2018-02-21 00:09:52,918 : INFO : deleting the raw counts dictionary of 4641 items
2018-02-21 00:09:52,918 : INFO : sample=0.001 downsamples 38 most-common words
2018-02-21 00:09:52,918 : INFO : downsampling leaves estimated 16317 word corpus (73.9% of prior 22079)
2018-02-21 00:09:52,934 : INFO : estimated required memory for 4641 words and 100 dimensions: 6033300 bytes

As can be seen, although max_vocab is 20,000 it is only able to manage a min_count of 1 which results in the full vocab coming.

Working under the constraint of using min_count , this seems to be the only solution.

@aneesh-joshi
Copy link
Contributor Author

@gojomo

I have made most of the changes you suggested.

when max_vocab = 64

2018-02-21 01:46:30,315 : INFO : Loading a fresh vocabulary
2018-02-21 01:46:30,315 : INFO : min_count=31 retains 66 unique words (1% of original 4641, drops 4575)
2018-02-21 01:46:30,315 : INFO : min_count=31 leaves 10129 word corpus (45% of original 22079, drops 11950)
2018-02-21 01:46:30,315 : INFO : deleting the raw counts dictionary of 4641 items
2018-02-21 01:46:30,315 : INFO : sample=0.001 downsamples 66 most-common words
2018-02-21 01:46:30,315 : INFO : downsampling leaves estimated 2880 word corpus (28.4% of prior 10129)

when max_vocab = 11

2018-02-21 01:47:31,570 : INFO : min_count=202 retains 12 unique words (0% of original 4641, drops 4629)
2018-02-21 01:47:31,570 : INFO : min_count=202 leaves 6395 word corpus (28% of original 22079, drops 15684)
2018-02-21 01:47:31,570 : INFO : deleting the raw counts dictionary of 4641 items

At higher values of max_vocab like max_vocab = 3000, the min count defaults to the specified min_count which in this case is 5

2018-02-21 01:49:04,985 : INFO : min_count=5 retains 671 unique words (14% of original 4641, drops 3970)
2018-02-21 01:49:04,985 : INFO : min_count=5 leaves 16003 word corpus (72% of original 22079, drops 6076)
2018-02-21 01:49:04,985 : INFO : deleting the raw counts dictionary of 4641 items

@gojomo
Copy link
Collaborator

gojomo commented Feb 20, 2018

If max_vocab=64, then a retained vocabulary size of 66 is an error: it's over the specified max. Similarly, if max_vocab=11, then a retained vocabulary size of 12 is an error.

Also, the logging should indicate the effect of max_vocab, and especially if it causes the min_count to go higher than the otherwise-specified (or default) value.

@aneesh-joshi
Copy link
Contributor Author

@gojomo
The problem for max_vocab = 11 taking 12 was fixed by taking the min_count of one just before the condition wasn't met.

The problem with max_vocab=64 giving 66 was because the 64th, 65th and 66th word had the same frequency.
... ('their', 33), ('It', 32), ('but', 31), ('new', 31), ('plan', 31), ...

I implemented a simple check for this and now the highest min_count is selected such that vocab will be at most max_vocab but never more.

The previous examples with the same code and logging:

max_vocab = 11

2018-02-21 18:42:46,507 : INFO : min_count was set to 207 due to max_vocab being set to 11
2018-02-21 18:42:46,507 : INFO : Loading a fresh vocabulary
2018-02-21 18:42:46,507 : INFO : min_count=207 retains 11 unique words (0% of original 4641, drops 4630)
2018-02-21 18:42:46,507 : INFO : min_count=207 leaves 6193 word corpus (28% of original 22079, drops 15886)

max_vocab = 64

2018-02-21 18:43:30,308 : INFO : min_count was set to 32 due to max_vocab being set to 64
2018-02-21 18:43:30,308 : INFO : Loading a fresh vocabulary
2018-02-21 18:43:30,308 : INFO : min_count=32 retains 63 unique words (1% of original 4641, drops 4578)
2018-02-21 18:43:30,308 : INFO : min_count=32 leaves 10036 word corpus (45% of original 22079, drops 12043)
2018-02-21 18:43:30,308 : INFO : deleting the raw counts dictionary of 4641 items

max_vocab = 65

2018-02-21 18:44:16,365 : INFO : min_count was set to 32 due to max_vocab being set to 65
2018-02-21 18:44:16,365 : INFO : Loading a fresh vocabulary
2018-02-21 18:44:16,381 : INFO : min_count=32 retains 63 unique words (1% of original 4641, drops 4578)
2018-02-21 18:44:16,381 : INFO : min_count=32 leaves 10036 word corpus (45% of original 22079, drops 12043)
2018-02-21 18:44:16,381 : INFO : deleting the raw counts dictionary of 4641 items
2018-02-21 18:44:16,381 : INFO : sample=0.001 downsamples 63 most-common words

max_vocab = 66

2018-02-21 18:44:49,974 : INFO : min_count was set to 31 due to max_vocab being set to 66
2018-02-21 18:44:49,974 : INFO : Loading a fresh vocabulary
2018-02-21 18:44:49,974 : INFO : min_count=31 retains 66 unique words (1% of original 4641, drops 4575)
2018-02-21 18:44:49,974 : INFO : min_count=31 leaves 10129 word corpus (45% of original 22079, drops 11950)

max_vocab = 2000

2018-02-21 19:18:33,296 : INFO : specified min_count = 5 is larger that min_count calculated by max_vocab = 2, using specified min_count
2018-02-21 19:18:33,296 : INFO : Loading a fresh vocabulary
2018-02-21 19:18:33,312 : INFO : min_count=5 retains 671 unique words (14% of original 4641, drops 3970)
2018-02-21 19:18:33,312 : INFO : min_count=5 leaves 16003 word corpus (72% of original 22079, drops 6076)
2018-02-21 19:18:33,312 : INFO : deleting the raw counts dictionary of 4641 items
2018-02-21 19:18:33,312 : INFO : sample=0.001 downsamples 46 most-common words

@@ -1216,12 +1216,20 @@ def prepare_vocab(self, hs, negative, wv, update=False, keep_raw_vocab=False, tr
sorted_vocab = sorted(sorted_vocab_list, key=lambda word: word[1], reverse=True)
Copy link
Collaborator

@gojomo gojomo Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still suggest the sorted list be word (keys) only, with counts retrieved via dict-lookups. Constantly accessing the 2nd-item-of-a-tuple, via [1], is less clear about intent. That is:

sorted_vocab = sorted(self.raw_vocab.keys(), key=lambda word: self.raw_vocab[word], reverse=True)

@@ -1216,12 +1216,20 @@ def prepare_vocab(self, hs, negative, wv, update=False, keep_raw_vocab=False, tr
sorted_vocab = sorted(sorted_vocab_list, key=lambda word: word[1], reverse=True)

if self.max_vocab < len(sorted_vocab):
calc_min_count = sorted_vocab[self.max_vocab][1]
if sorted_vocab[self.max_vocab][1] != sorted_vocab[self.max_vocab - 1][1]:
Copy link
Collaborator

@gojomo gojomo Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this if-branch; calc_min_count = self.raw_vocab[sorted_vocab[self.max_vocab] + 1] will always set the threshold to the exact level necessary to eliminate the words at max_vocab and later ranks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant : calc_min_count = self.raw_vocab[sorted_vocab[self.max_vocab]] + 1

@aneesh-joshi
Copy link
Contributor Author

@gojomo I am sorry for this taking so long and my code being inefficient/unreadable. I am still learning.

@gojomo
Copy link
Collaborator

gojomo commented Feb 23, 2018

No worries! The progress has been good, and I believe the functionality is now correct. So the focus now should be docs/unit-testing/clarity. Specifically: (1) a clear doc-comment explanation of max_vocab's effects; (2) A test method (or two) which checks the proper handling of max_vocab values that are both more-restricting than min_count (forcing a larger effective min_count) and less-restricting (specified but essentially having no effect).

For maximal clarity-of-the-code, it may also help to draw a bigger distinction in variable names between the user-specified min_count, the max_vocab-implied calc_min_count, and the ultimate effective_min_count, which is essentially max(min_count, calc_min_count) and could be maintained as a separate property, to avoid clobbering min_count (so that even after a model goes through other steps, or is saved/reloaded much later, it's still clear what was specified & what was made-effective).

@aneesh-joshi
Copy link
Contributor Author

@gojomo
Sorry, it took so long.
I have made the specified changes.

I have introduced effective_min_count as you specified and set it to None if it isn't used.
This, however, has forced me to use self.min_count instead of min_count in the remaining places. I'm not sure this is what is expected.

Also added tests

@aneesh-joshi
Copy link
Contributor Author

The travis-ci seems to be failing because of a time out

gensim/test/test_sklearn_api.py .......................................................
The job exceeded the maximum time limit for jobs, and has been terminated.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Mar 6, 2018

@aneesh-joshi I re-run the test, this happens when one of the tests running more than 10 minutes (or Travis stuck)

@menshikh-iv menshikh-iv changed the title Addresses #465 : allow initialization with max_vocab in lieu of min_count Addresses #465 : allow initialization with max_vocab in lieu of min_count. Fix #465 Mar 9, 2018
@menshikh-iv menshikh-iv changed the title Addresses #465 : allow initialization with max_vocab in lieu of min_count. Fix #465 Allow initialization with max_vocab in lieu of min_count for gensim.models.Word2Vec. Fix #465 Mar 9, 2018
@aneesh-joshi aneesh-joshi changed the title Allow initialization with max_vocab in lieu of min_count for gensim.models.Word2Vec. Fix #465 Allow initialization with max_final_vocab in lieu of min_count for gensim.models.Word2Vec. Fix #465 Mar 11, 2018
@aneesh-joshi
Copy link
Contributor Author

@gojomo please review the changes

@menshikh-iv
I couldn't make a case where loading failed.
I am not sure it would fail anywhere considering that max_final_vocab is called only when prepare_vocab is called and this won't happen for old models which are loaded.

I loaded the models both in my version and the pypi version
and did the following:

- most_similar
- scan_vocab
- prepare_vocab with update = True

and the results were the same.

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Mar 12, 2018

@aneesh-joshi try to load old model & call build_vocab(..., update=True) (without change of load function, this should be failed)

@aneesh-joshi
Copy link
Contributor Author

Hey @menshikh-iv
The current code should work without any extra work as backward compatibility comes by default.

I will try to explain why:

Whenever the load function is called, it tries to load the model. If it is an older version, an exception is caught and the load_old_word2vec function is called from deprecated/word2vec.py https://github.com/aneesh-joshi/gensim/blob/8578e3d14f57b1c3274bd37fd48f8ba4b3efa597/gensim/models/word2vec.py#L985

return load_old_word2vec(*args, **kwargs)

In that function:
a new model is instantiated
new_model = NewWord2Vec(**params)
https://github.com/aneesh-joshi/gensim/blob/8578e3d14f57b1c3274bd37fd48f8ba4b3efa597/gensim/models/word2vec.py#L985

This model already has max_final_vocab=None in model and model.vocabulary

Thus, when the old model is loaded and this change takes place, max_final_vocab is set and there is no need for even checking if it has it through:
if not hasattr(old_model, 'max_final_vocab'):

I tested this theory by adding print statements in deprecated/word2vec.py when the new model is created:

.
.
.
new_model = NewWord2Vec(**params)

print('***************new model*************')
print(new_model.__dir__())
print('***************vocab*************')
print(new_model.vocabulary.__dir__())
print('****************************')
.
.
.

This resulted in:

***************new model*************
[    'max_final_vocab'     , 'callbacks', 'load', 'wv', 'vocabulary', 'trainables', . . . ,'__dir__', '__class__']
***************vocab*************
['max_vocab_size', 'max_final_vocab', 'min_count', 'sample', 'sorted_vocab', 'null_word', 'cum_table', 'raw_vocab', '__module__', . . . ,  '__dir__', '__class__']
****************************

This was further corroborated by my tests which were unable to cause any error when loading gensim models of versions 3.1 and 3.2 and calling the functions

  • build_vocab
  • scan_vocab
  • prepare_vocab

I will make a commit to remove the check for if not hasattr(old_model, 'max_final_vocab'):

Hopefully, the PR will now be merge ready

What do you think, @gojomo ?

@aneesh-joshi
Copy link
Contributor Author

ping @menshikh-iv

@menshikh-iv
Copy link
Contributor

Looks slightly strange to me, I'll check it manually later (to be fully sure).
@aneesh-joshi @gojomo current PR is ready to merge?

calc_min_count = self.raw_vocab[sorted_vocab[self.max_final_vocab]] + 1

self.effective_min_count = max(calc_min_count, min_count)
logger.info("max_final_vocab=%d and min_count=%d resulted in calc_min_count=%d, effective_min_count=%d",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this outside the max_final_vocab branch, so effective_min_count logged same way even in the simple case of max_final_vocab unset.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@menshikh-iv was this comment addressed?

@gojomo
Copy link
Collaborator

gojomo commented Mar 14, 2018

If a model was saved from gensim 3.3, what AttributeException would be raised in load() that causes load_old_word2vec() to even be run?

@aneesh-joshi
Copy link
Contributor Author

Nice catch, @gojomo
Hadn't considered that

@aneesh-joshi try to load old model & call build_vocab(..., update=True) (without change of load function, this should be failed)
-- @menshikh-iv

I was finally able to generate the error with the 3.3 models and fix it!

As for

I would put this outside the max_final_vocab branch, so effective_min_count logged same way even in the simple case of max_final_vocab unset.

This isn't entirely possible as calc_min_count does not exist out of this scope.
We could add an extra log outside but I don't think it's needed as effective_min_count is already captured by itself in the next few lines:

"effective_min_count=%d retains %i unique words (%i%% of original %i, drops %i)",

https://github.com/aneesh-joshi/gensim/blob/340a8cf158d33aca2b4700f9f0f3fa4c8b6c60e5/gensim/models/word2vec.py#L1263

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Mar 15, 2018

@aneesh-joshi good job! You add a backward-compatibility change, but I don't see needed test (that fails without this change), please add it too.

@menshikh-iv menshikh-iv removed the RFM label Mar 15, 2018
@aneesh-joshi
Copy link
Contributor Author

aneesh-joshi commented Mar 18, 2018

Hi @menshikh-iv
Sorry for the delay. I was trying to figure out what you meant and the correct way of getting there.

The problem is, I cannot add a simple straightforward test for the backward compatibility using something like:

    def testLoadOldModel(self):
        """Test loading word2vec models from previous version"""

        model_file = 'word2vec_old'
        model = word2vec.Word2Vec.load(datapath(model_file))

        self.assertEqual(model.max_final_vocab, None)
        self.assertEqual(model.vocabulary.max_final_vocab, None)
        .
        .

since this will have no relevance to old models (3.1 and 3.2) since:
when Word2Vec.load is called, an exception is raised due to old versions and load_old_word2vec is called.
load_old_word2vec returns a new_model which has max_final_vocab already set.

The code I wrote:

# for backward compatibility for `max_final_vocab` feature
            if not hasattr(model, 'max_final_vocab'):
                model.max_final_vocab = None
                model.vocabulary.max_final_vocab = None

comes into effect only for models made in 3.3 which do not trigger the exception on load.

Thus, the only way I see of adding a test for the above code I wrote would be to include a model trained in 3.3 and then load it. I will have to upload my model to the repo.

If that's ok, I will proceed.

what do you think @gojomo ?

@gojomo
Copy link
Collaborator

gojomo commented Mar 18, 2018

Yes, a (tiniest-possible) model that was saved from gensim 3.3.0 would need to be included as test material to be sure models from that version load properly.

@aneesh-joshi
Copy link
Contributor Author

@menshikh-iv
Please review.

@menshikh-iv
Copy link
Contributor

@aneesh-joshi LGTM 👍, if @gojomo have no more suggestions - I'll merge it (please let me know Gordon).

@aneesh-joshi
Copy link
Contributor Author

ping @gojomo

@gojomo
Copy link
Collaborator

gojomo commented Mar 22, 2018

Looks good to me! @aneesh-joshi thanks for your persistence!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants