No description, website, or topics provided.
Java Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
README.md
pom.xml
startserver

README.md

RefineOnSpark

RefineOnSpark is a driver program to run OpenRefine jobs on the Spark cluster.

  1. Prerequsites on the cluster

  • An instance of OpenRefine is up and bind to the default localhost:3333.
  • Input files are served via HDFS, however local files are also accepted, but have to be located under the same path on all the worker nodes.
  1. Application taxonomy

TODO