Skip to content

recski/wordsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wordsim

Preparations

Building the components requires the installation of build-essential and python-dev packages with sudo apt-get install build-essential python-dev. You must also have setuptools installed for python.

Dependencies

4lang

Install the newest version of 4lang. Notes:

  • downloadable pre-compiled graphs are sufficient
  • you don't have to modify the config files
  • set only the FOURLANGPATH and HUNTOOLSBINPATH environmental variable

Additional libraries

Install the newest version of:

Resources

After preparing the resources you should get the following directory structure:

wordsim  
└───resources
    ├───embeddings
    │   ├───senna
    │   │   └───combined.txt
    │   ├───huang
    │   │   └───combined.txt
    │   ├───word2vec
    │   │   └───GoogleNews-vectors-negative300.bin
    │   ├───glove
    │   │   └───glove.840B.300d.w2v
    │   ├───sympat
    │   │   └───sp_plus_embeddings_500.w2v
    │   └───paragram_300
    │       └───paragram_300_sl999.txt
    └───sim_data
        └───simlex
            └───SimLex-999.txt

Embeddings

SimLex data

Usage

Run python src/wordsim/regression.py configs/default.cfg to get regression on features from 6 embeddings (6 features) + wordnet metrics (4 features) + 4lang (2 features). You should get average correlation: 0.755074732764 as the result.

NOTE: wordsim requires ca. 15 GB of RAM to load all models

Citing

If you use the wordsim system in your experiments, please cite

Gábor Recski, Eszter Iklódi, Katalin Pajkossy, András Kornai: Measuring semantic similarity of words using concept networks In: Proceedings of the 1st Workshop on Representation Learning for NLP, 2016

@InProceedings{Recski:2016c,
  author    = {Recski, G\'{a}bor  and  Ikl\'{o}di, Eszter  and  Pajkossy, Katalin  and  Kornai, Andras},
  title     = {Measuring Semantic Similarity of Words Using Concept Networks},
  booktitle = {Proceedings of the 1st Workshop on Representation Learning for NLP},
  year      = {2016},
  address   = {Berlin, Germany},
  publisher = {Association for Computational Linguistics},
  pages     = {193--200}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published