Skip to content

dselivanov/word_embeddings

 
 

Repository files navigation

Evaluation of word embeddings

Code for the blog post evaluating implementations of word2vec in gensim and GloVe in text2vec.

Running

  1. This will create vocabulary, train text2vec GloVe and evaluate it: bash -i -c "./memusg Rscript ./run_glove_text2vec.R ~/Downloads/datasets/enwiki_splits/ ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k/" > ./enwiki_dim=600_vocab=30k/glove.log 2>&1 &
  2. This will train gensim and evaluate its accuracy: bash -i -c "./memusg python ./run_word2vec.py ~/Downloads/datasets/title_tokens.txt.gz ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k" > ./enwiki_dim=600_vocab=30k/word2vec.log 2>&1 &

Pretrained vectors

You can download pretrained vectors from my google drive.

To replicate my results from the blog article, download and preprocess Wikipedia using this code.

About

Experiments on english wikipedia. GloVe and word2vec.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 61.5%
  • R 25.5%
  • Shell 13.0%