Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2Vec.build_vocab(…, update=True) gives "ValueError: all the input array dimensions except for the concatenation axis must match exactly" #1162

Closed
chaxor opened this issue Feb 22, 2017 · 3 comments

Comments

@chaxor
Copy link

chaxor commented Feb 22, 2017

Thebuild_vocab function does not work with the ability to update online.

i.e.:

model.build_vocab(sentences, update=False)

works fine, but

model.build_vocab(sentences, update=True)

does not.

It provides the following error:

ValueError: all the input array dimensions except for the concatenation axis must match exactly
@gojomo
Copy link
Collaborator

gojomo commented Feb 23, 2017

Are you by chance passing the same sentences in consecutively, and thus on the 'update', there are no new words?

There is somewhat of an expectation that the corpus for an update would contain new words, or else you wouldn't need to be calling build_vocab(…, update=True) at all. But it would probably be better for this to be a no-op, ideally with some return value hinting that no update occurred, than give an error.

Separately, those cases where there are no words – perhaps because the new material is very small or very redundant with previous training – are also cases where subsequent training may not be a net benefit to the model. (This incremental-vocabulary-expansion option is best considered an experimental feature to should be evaluated carefully after each use – not a true 'online' training option where incremental new examples always or even usually lead to net improvements.) So I'd be wary of any process where an update-vocab is being casually called with small/redundant new batches.

@gojomo gojomo changed the title Online Training does not work Word2Vec.build_vocab(…, update=True) gives "ValueError: all the input array dimensions except for the concatenation axis must match exactly" Feb 23, 2017
@chaxor
Copy link
Author

chaxor commented Feb 23, 2017

Thank you for your response, I did not realize I was using this improperly.
My thought was this was a feature that allowed you to train the model such that when it encountered new data when being used, it would have the ability to incorporate that new data into it.
I am hoping to use word embeddings in a domain for which new words are created within the text fairly often, so I was hoping to find a manner to do this without it throwing an error and halting the program at every new word that it finds after the model has been trained.
I apologize if i misunderstood the use for this feature.
Thank you for your help, I will continue searching.

@tmylk tmylk closed this as completed Mar 3, 2017
@tmylk
Copy link
Contributor

tmylk commented Mar 3, 2017

It is not an issue but a better error message is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants