Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

word2vec processing procedure #41

Closed
ay27 opened this issue Jan 3, 2018 · 2 comments
Closed

word2vec processing procedure #41

ay27 opened this issue Jan 3, 2018 · 2 comments

Comments

@ay27
Copy link

ay27 commented Jan 3, 2018

I am a green hand of word2vec, so I'm confuse about the complete process of "Download pre-trained word embeddings such as word2vec; make it into text format" in the classification task. I have been cloned the word2vec repo and followed the quick-start in https://code.google.com/archive/p/word2vec/, then run the demo script ./demo-word.sh and ./demo-phrases.sh. But I don't know how to "make it into text format". Could you please give a more precise description?
Thanks!

@taoleicn
Copy link
Contributor

taoleicn commented Jan 7, 2018

The pre-trained word embeddings are available at:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

You download that file and uncompress it. The file is a ".bin" binary file.

You need to read the binary file and write it to a text file with each word and word vector in a separate line:

word_1  \t  0.01  0.02 ...  0.12 
word_2 \t 0.03 0.02 ... 0.22
...

Here is an example that reads the binary file:
https://github.com/harvardnlp/sent-conv-torch/blob/master/preprocess.py#L8-L30

@ay27
Copy link
Author

ay27 commented Jan 16, 2018

Thanks

@ay27 ay27 closed this as completed Jan 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants