Skip to content

dice-group/Palmetto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maven Build Codacy Badge Codacy Badge

Palmetto

Palmetto is a quality measuring tool for topics

This is the implementation of coherence calculations for evaluating the quality of topics. If you want to learn more about coherence calculations and their meaning for topic evaluation, take a look at the project page or have a look at our publication "Exploring the Space of Topic Coherence Measures".

Palmetto from DICE is licensed under a AGPL v3.0 License.

Please take a look at the the wikipage to read how Palmetto can be used. If you would like to use a different index than the one we are providing, you can create your own index.

If you are using Palmetto for an experiment or something similar that leads to a publication, please cite the paper "Exploring the Space of Topic Coherence Measures" (you can find the Bibtex below). A link to the project website is welcome as well 🙂

Applicability

The coherence measures implemented with Palmetto mainly built on a reference index. This index is used to derive counts for the calculation of the coehrence values. These values can be used to measure the human interpretability of topics based on the topics' top words. It should be noted that the preprocessing of the index has an influence on the results.

It is highly suggested to use an index that fits to the preprocessing that has been applied to the corpus on which the topics have been generated.

We use an English Wikipedia which has been preprocessed using a Lemmatizer. In practice, this means that word groups with non-lemmatized words may lead to unintuitive results simply because these word forms are underrepresented or even missing in our index (e.g., #57). In these cases, it is recommended to generate an own index.

Directories

The palmetto directory contains the Palmetto library.

The webApp directory contains a web application offering a small demo as well as a web service API for using Palmetto.

Docker

Palmetto can be used as a docker container.

The index should be downloaded and extracted to some path (for example, /path/to/indexes). After extraction, the directory should contain the wikipedia_bd directory and the wikipedia_bd.histogram file.

path
+- to
  +- indexes
    +- wikipedia_bd
    +- wikipedia_bd.histogram

After that, the container can be run the following way:

docker run -p 7777:8080 -d -v /path/to/indexes/:/usr/local/indexes/:ro dicegroup/palmetto-service

After that the demo application can be accessed using http://localhost:7777/.

Adapted Docker image

In case the Palmetto code has been adapted locally, the Docker image can be build with the following command:

make build dockerize

Citation

@inproceedings{roeder2015palmetto,
    title = {{Exploring the Space of Topic Coherence Measures}},
    author = {R\"{o}der, Michael and Both, Andreas and Hinneburg, Alexander},
    year = {2015},
    isbn = {9781450333177},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/2684822.2685324},
    doi = {10.1145/2684822.2685324},
    booktitle = {Proceedings of the Eighth ACM International Conference on Web Search and Data Mining},
    pages = {399–408},
    numpages = {10},
    keywords = {topic coherence, topic evaluation, topic model},
    location = {Shanghai, China},
    series = {WSDM '15}
}