Skip to content

akb89/word2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word2Vec

GitHub release PyPI release Build MIT License

This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets.

Works with python >= 3.6 and Tensorflow v2.0.

Install

via pip:

pip3 install tf-word2vec

or, after a git clone:

python3 setup.py install

Get data

You can download a sample of the English Wikipedia here:

wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z

Train Word2Vec

w2v train \
  --data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
  --outputdir /absolute/path/to/word2vec/models \
  --alpha 0.025 \
  --neg 5 \
  --window 2 \
  --epochs 5 \
  --size 300 \
  --min-count 50 \
  --sample 1e-5 \
  --train-mode skipgram \
  --t-num-threads 20 \
  --p-num-threads 25 \
  --keep-checkpoint-max 3 \
  --batch 1 \
  --shuffling-buffer-size 10000 \
  --save-summary-steps 10000 \
  --save-checkpoints-steps 100000 \
  --log-step-count-steps 10000