Clone this wiki locally
Using Jython as your Expression Language
Full docs on the Jython language are at its official site http://www.jython.org.
Note: The Jython extension has been bundled with OpenRefine since 2.1. Before that it was an extension which needed to be installed separately.
Note: You can use almost any Python (.py)(.pyc) files compatible with the bundled Jython 2.5.1 and drop them into the path. For instance, download, extract and drop in BeautifulSoup.py and use it to parse and extract HTML tags or content using Jython as your expression language in OpenRefine. Since Jython is essentially Java, you can even import Java libraries and utilize those!
OpenRefine now has most of the Jsoup.org library built in for parsing and working with HTML elements and extraction
Remember to restart OpenRefine, so that new Jython/Python libraries are initialized during Butterfly's startup.
A few HTML parsing Python libraries to experiment with :
A few XML parsing Python libraries:
- ElementTree (bundled with Jython in Refine)
- lxml will NOT work in Jython, since lxml has C bindings for CPython (regular Python), hence will not work in OpenRefine which is Jython / Java only, and has no CPython interpreter built-in
Expressions in Jython must have a return statement:
Fields have to be accessed using the bracket operator rather than the dot operator:
To access the Levenshtein distance between the reconciled value and the cell value (?) use the Recon variable:
To return the lower case of value (if the value is not null):
if value is not None: return value.lower() else: return None