Skip to content

ZhouJiaLinmumu/topicvec

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PSDVec

Source code for "A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution" (accepted by EMNLP'15) and "PSDVec: Positive Semidefinte Word Embedding" (about the use of this toolset, under review).

Update v0.4: Online block-wise factorization:

  1. Obtain 25000 core embeddings, into 25000-500-EM.vec:
    • python factorize.py -w 25000 top2grams-wiki.txt
  2. Obtain 45000 noncore embeddings, totaling 70000 (25000 core + 45000 noncore), into 25000-70000-500-BLKEM.vec:
    • python factorize.py -v 25000-500-EM.vec -o 45000 top2grams-wiki.txt
  3. Incrementally learn other 50000 noncore embeddings (based on 25000 core), into 25000-120000-500-BLKEM.vec:
    • python factorize.py -v 25000-70000-500-BLKEM.vec -b 25000 -o 50000 top2grams-wiki.txt
  4. Repeat 3 a few times to get more embeddings of rarer words.

Pretrained 120,000 embeddings and evaluation results are uploaded.

Update v0.3: Block-wise factorization

Pretrained 100,000 embeddings and evaluation results are uploaded (now replaced by an expanded set of 120,000 embeddings).

Testsets are by courtesy of Omer Levy (https://bitbucket.org/omerlevy/hyperwords/src).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.3%
  • Perl 25.4%
  • C 1.6%
  • Batchfile 0.7%