Experiments on english wikipedia. GloVe and word2vec.
Python R Shell
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 8 commits ahead of piskvorky:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
enwiki_dim=600_vocab=30k benchmarks Dec 1, 2015
.gitignore benchmarks Dec 1, 2015
README.md pretrained word vectors Dec 1, 2015
create_vocab.py benchmarks Dec 1, 2015
memusg add memusg script Nov 27, 2015
run_glove_text2vec.R benchmarks Dec 1, 2015
run_word2vec.py run word2vec Nov 27, 2015

README.md

Evaluation of word embeddings

Code for the blog post evaluating implementations of word2vec in gensim and GloVe in text2vec.

Running

  1. This will create vocabulary, train text2vec GloVe and evaluate it: bash -i -c "./memusg Rscript ./run_glove_text2vec.R ~/Downloads/datasets/enwiki_splits/ ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k/" > ./enwiki_dim=600_vocab=30k/glove.log 2>&1 &
  2. This will train gensim and evaluate its accuracy: bash -i -c "./memusg python ./run_word2vec.py ~/Downloads/datasets/title_tokens.txt.gz ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k" > ./enwiki_dim=600_vocab=30k/word2vec.log 2>&1 &

Pretrained vectors

You can download pretrained vectors from my google drive.

To replicate my results from the blog article, download and preprocess Wikipedia using this code.