RapidMiner Diffbot Extension
A Diffbot client for RapidMiner 6.1 or above to analyze web pages. It supports the following Diffbot automatic APIs: analyze (general automatic API wrapper) and article (analysis of article webpages; support of it in RapidMiner is experimental).
###Prerequisites RapidMiner Studio 6.1 with Text Processing. The Starter license is sufficient.
Install the Diffbot extension from the RapidMiner marketplace (or by copying the plugin to the
lib/pluginsfolder of RapidMiner Studio or RapidMiner Server). It requires the Text Processing (and the included Cloud Connectivity) plugins.
Use the token you got from Diffbot to analyze web pages using the operators available under
Text Processing/Diffbot. The results are presented as JSON documents. You might prefer to use the
JSON To Dataoperator to extract information in tabular form.
Development getting started
Checkout RapidMiner (e.g. to ~/git/rapidminer; https://github.com/aborg0/rapidminer/tree/extension_java7 is the preferred branch).
Install RapidMiner Studio 6.1 (e.g. to ~/rapidminer-studio).
./setup.shscript like this:
RM_SOURCES=$HOME/git/rapidminer RAPIDMINER_HOME=$HOME/rapidminer-studio ./setup.sh. It will create a folder named RM_61 in the parent folder (required for
Build and install your extension by executing the Ant target "install"
Start RapidMiner and check whether your extension has been loaded
If you prefer, you can update (the file
lib/diffbot-java-1.0-SNAPSHOT.jar) the diffbot-java version you want to use. The current version was compiled from https://github.com/aborg0/diffbot-java-client/releases/tag/vknime0.1