Evaluation of word embeddings

Code for the blog post evaluating implementations of word2vec in gensim and GloVe in text2vec.

Running

This will create vocabulary, train text2vec GloVe and evaluate it: bash -i -c "./memusg Rscript ./run_glove_text2vec.R ~/Downloads/datasets/enwiki_splits/ ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k/" > ./enwiki_dim=600_vocab=30k/glove.log 2>&1 &
This will train gensim and evaluate its accuracy: bash -i -c "./memusg python ./run_word2vec.py ~/Downloads/datasets/title_tokens.txt.gz ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k" > ./enwiki_dim=600_vocab=30k/word2vec.log 2>&1 &

You can download pretrained vectors from my google drive.

To replicate my results from the blog article, download and preprocess Wikipedia using this code.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
enwiki_dim=600_vocab=30k		enwiki_dim=600_vocab=30k
.gitignore		.gitignore
README.md		README.md
create_vocab.py		create_vocab.py
memusg		memusg
run_glove_text2vec.R		run_glove_text2vec.R
run_word2vec.py		run_word2vec.py