Skip to content
master
Go to file
Code
This branch is 8 commits ahead of piskvorky:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

README.md

Evaluation of word embeddings

Code for the blog post evaluating implementations of word2vec in gensim and GloVe in text2vec.

Running

  1. This will create vocabulary, train text2vec GloVe and evaluate it: bash -i -c "./memusg Rscript ./run_glove_text2vec.R ~/Downloads/datasets/enwiki_splits/ ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k/" > ./enwiki_dim=600_vocab=30k/glove.log 2>&1 &
  2. This will train gensim and evaluate its accuracy: bash -i -c "./memusg python ./run_word2vec.py ~/Downloads/datasets/title_tokens.txt.gz ~/Downloads/datasets/questions-words.txt ./enwiki_dim=600_vocab=30k" > ./enwiki_dim=600_vocab=30k/word2vec.log 2>&1 &

Pretrained vectors

You can download pretrained vectors from my google drive.

To replicate my results from the blog article, download and preprocess Wikipedia using this code.

About

Experiments on english wikipedia. GloVe and word2vec.

Resources

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.