Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
How Palmetto can be used
If you are using Palmetto for an experiment or something similar that leads to a publication, please cite the paper "Exploring the Space of Topic Coherence Measures" that you can find on the project website.
There are three different ways, how Palmetto could be used.
As web service
You only want to evaluate your topics or word sets? Than you should simply program a client for the REST interface of our web service. Requesting the coherence for a word set can be done using the URL of the form
<words> are the space separated words and
<coherence> is the name of the coherence. At the moment, the following values can be used:
umassThe response contains the coherence.
If you want to request the C_V coherence for the word set "cake","apple","banana","cherry","chocolate", the URL should look like this
and the response should be
An alternative URL that can be used is
Thanks to Ivan Ermilov, there is a Python client available at https://github.com/earthquakesan/palmetto-py
As Java program
You would like to use Palmetto locally? No problem, it can be built as runable jar.
1. Download and extract the index
You will have to download a Lucene index containing the preprocessed Wikipedia from here. By extracting the files you should get a
wikipedia_bd directory and a
wikipedia_bd.histogramm file. Note that the file has to be in the same directory as the
There is a Dutch index that has been created by van der Zwaan, Marx and Kamps. It can be downloaded here.
2. Download the program
You can either download the runable jar file from here or you can checkout the master branch and create it by yourself using
cd palmetto mvn clean compile assembly:single
3. Run Palmetto
The program can be started using
java -jar palmetto-0.1.0-jar-with-dependencies.jar <some-path>/wikipedia_bd <coherence> <topics-file>
You have to set insert the path to the
wikipedia_bd directory (the program will assume that the histogramm file can be found under
The two last parameters are the coherence type and a file containing your topics (see below).
At the moment, there are 6 common coherences types that you can run directly with this jar.
The file containing your topics should have one single topic per line. In every line the top words of your topic are listed, separated by a single space. Your file should look like this:
company sell corporation own acquire purchase buy business sale owner age population household female family census live average median income
The jar will simply print out the topic's coherences.
As Java library
You want to include Palmetto into your own project? You can check out the last stable version using
git clone -b v0.1.1 https://github.com/dice-group/Palmetto.git
install it locally using
cd Palmetto mvn clean install
and add it as a Maven dependency
<dependency> <groupId>org.aksw</groupId> <artifactId>palmetto</artifactId> <version>0.1.1</version> </dependency>
Another way is to download the necessary files from here:
palmetto-0.1.0-sources.jar(optional) If you are using maven, you can install these files to your local repository using
mvn install:install-file -Dfile=./target/palmetto-0.1.0.jar -Dpackaging=jar -Djavadoc=./target/palmetto-0.1.0-javadoc.jar -Dsources=./target/palmetto-0.1.0-sources.jar
If you want to know how to use the coherence inside your source code, you should 1) read the paper to understand the parts a coherence comprises and 2) take a look into the