word2vec processing procedure #41

ay27 · 2018-01-03T11:37:29Z

I am a green hand of word2vec, so I'm confuse about the complete process of "Download pre-trained word embeddings such as word2vec; make it into text format" in the classification task. I have been cloned the word2vec repo and followed the quick-start in https://code.google.com/archive/p/word2vec/, then run the demo script ./demo-word.sh and ./demo-phrases.sh. But I don't know how to "make it into text format". Could you please give a more precise description?
Thanks!

taoleicn · 2018-01-07T22:30:04Z

The pre-trained word embeddings are available at:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

You download that file and uncompress it. The file is a ".bin" binary file.

You need to read the binary file and write it to a text file with each word and word vector in a separate line:

word_1  \t  0.01  0.02 ...  0.12 
word_2 \t 0.03 0.02 ... 0.22
...

Here is an example that reads the binary file:
https://github.com/harvardnlp/sent-conv-torch/blob/master/preprocess.py#L8-L30

ay27 · 2018-01-16T01:15:38Z

Thanks

ay27 closed this as completed Jan 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word2vec processing procedure #41

word2vec processing procedure #41

ay27 commented Jan 3, 2018 •

edited

Loading

taoleicn commented Jan 7, 2018

ay27 commented Jan 16, 2018

word2vec processing procedure #41

word2vec processing procedure #41

Comments

ay27 commented Jan 3, 2018 • edited Loading

taoleicn commented Jan 7, 2018

ay27 commented Jan 16, 2018

ay27 commented Jan 3, 2018 •

edited

Loading