A fast version of the Rapid Automatic Keyword Extraction (RAKE) algorithm
Assuming you're using Maven, follow these two steps to use rakidrake
in your Java project:
- Include a dependency on
rapidrake
in your POM:
<dependency>
<groupId>io.github.crew102</groupId>
<artifactId>rapidrake</artifactId>
<version>0.1.4</version>
</dependency>
- Download the
opennlp
trained models for sentence detection and part-of-speech tagging. You can find these two models (trained on various languages) on opennlp's model page. For example, you could use the English versions of the sentence detection and POS-tagger models. You'll specify the file paths to these models when you instantiate aRakeAlgorithm
object (see below for example).
import io.github.crew102.rapidrake.RakeAlgorithm;
import io.github.crew102.rapidrake.data.SmartWords;
import io.github.crew102.rapidrake.model.RakeParams;
import io.github.crew102.rapidrake.model.Result;
public class Example {
public static void main(String[] args) throws java.io.IOException {
// Create an object to hold algorithm parameters
String[] stopWords = new SmartWords().getSmartWords();
String[] stopPOS = {"VB", "VBD", "VBG", "VBN", "VBP", "VBZ"};
int minWordChar = 1;
boolean shouldStem = true;
String phraseDelims = "[-,.?():;\"!/]";
RakeParams params = new RakeParams(stopWords, stopPOS, minWordChar, shouldStem, phraseDelims);
// Create a RakeAlgorithm object
// You can use the RakeAlgorithm(RakeParams, POSTaggerME, SentenceDetectorME)
// constructor instead of the one shown below if you want to pass in
// pre-initialized opennlp models.
String POStaggerURL = "model-bin/en-pos-maxent.bin"; // The path to your POS tagging model
String SentDetectURL = "model-bin/en-sent.bin"; // The path to your sentence detection model
RakeAlgorithm rakeAlg = new RakeAlgorithm(params, POStaggerURL, SentDetectURL);
// Call the rake method
String txt = "dogs are great, don't you agree? I love dogs, especially big dogs";
Result result = rakeAlg.rake(txt);
// Print the resulting keywords (not stemmed)
System.out.println(result.distinct());
}
}
// [dogs (1.33), great (1), big dogs (3.33)]
You can learn more about how RAKE works and the various parameters you can set by visiting slowraker
's website.