Skip to content

Generating count-based Distributional Semantic Models

Notifications You must be signed in to change notification settings

akb89/counterix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

counterix

GitHub release PyPI release Build MIT License

A small toolkit to generate count-based PPMI-weighed SVD Distributional Semantic Models.

Install

pip install counterix

or, after a git clone:

python3 setup.py install

Use

Generate

To generate a raw count matrix from a tokenized corpus, run:

counterix generate \
  --corpus /abs/path/to/corpus/txt/file \
  --min-count frequency_threshold \
  --win-size window_size

If the --output parameter is not set, the output files will be saved to the corpus directory.

Weigh

To weigh a raw count model with PPMI, run:

counterix weigh --model /abs/path/to/raw/count/npz/model

SVD

To apply SVD on a PPMI-weighed model, with k=10000, run:

counterix svd \
  --model /abs/path/to/ppmi/npz/model \
  --dim 10000

To control the number of threads used during SVD, run counterix with env OMP_NUM_THREADS=1

About

Generating count-based Distributional Semantic Models

Resources

Stars

Watchers

Forks

Packages

No packages published