Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vocabulary_inv #18

Closed
linWujl opened this issue May 9, 2017 · 2 comments
Closed

vocabulary_inv #18

linWujl opened this issue May 9, 2017 · 2 comments

Comments

@linWujl
Copy link

linWujl commented May 9, 2017

Hello, thanks for sharing the implement of the paper. When i read the code of w2v.py, i think the vocabulary_inv is a list while the doc_string says it's a dict. I wonder if something is wrong.

Another question is that when train the word2vec model, it can be seen that sentences constitute of words are used, so why the code first load the data by turning the words into digits and later turn the digits into words for training, Is it necessary?

Thank you!

@alexander-rakhlin
Copy link
Owner

alexander-rakhlin commented May 9, 2017

Hi,
Thank you for interest. vocabulary_inv is list. There was typo in doc.

The second question - it is not necessary in general, but this is done to reuse keras imdb data set functionality which loads sentences as numeric array.

@alexander-rakhlin
Copy link
Owner

Please see updated version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants