PSDVec

Source code for "A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution" (accepted by EMNLP'15) and "PSDVec: Positive Semidefinte Word Embedding" (about the use of this toolset, under review).

Update v0.4: Online block-wise factorization:

Obtain 25000 core embeddings, into 25000-500-EM.vec:
- python factorize.py -w 25000 top2grams-wiki.txt
Obtain 45000 noncore embeddings, totaling 70000 (25000 core + 45000 noncore), into 25000-70000-500-BLKEM.vec:
- python factorize.py -v 25000-500-EM.vec -o 45000 top2grams-wiki.txt
Incrementally learn other 50000 noncore embeddings (based on 25000 core), into 25000-120000-500-BLKEM.vec:
- python factorize.py -v 25000-70000-500-BLKEM.vec -b 25000 -o 50000 top2grams-wiki.txt
Repeat 3 a few times to get more embeddings of rarer words.

Pretrained 120,000 embeddings and evaluation results are uploaded.

Update v0.3: Block-wise factorization

Pretrained 100,000 embeddings and evaluation results are uploaded (now replaced by an expanded set of 120,000 embeddings).

Testsets are by courtesy of Omer Levy (https://bitbucket.org/omerlevy/hyperwords/src).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PSDVec

Update v0.4: Online block-wise factorization:

Update v0.3: Block-wise factorization

Files

README.md

Latest commit

History

README.md

File metadata and controls

PSDVec

Update v0.4: Online block-wise factorization:

Update v0.3: Block-wise factorization