GREL Other Functions

Thad Guidry edited this page Sep 23, 2016 · 10 revisions
Clone this wiki locally

Other functions supported by the OpenRefine Expression Language (GREL)

See also: All GREL functions.

type(o)

Returns the type of o, such as undefined, string, number, etc.

hasField(o, string name)

Returns a boolean indicating whether o has a field called name. For example, cell.hasField("value") always returns true, as every cell has a value field.

jsonize(value)

Quotes a value as a JSON literal value.

parseJson(string s)

Parses s as JSON. get can then be used with parseJson, e.g.,

parseJson(" { 'a' : 1 } ").get("a")

returns 1.

To get all instances from a JSON array called "keywords" having the same object string name of "text", combine with the forEach() function to iterate over the array, e.g.,

JSON:

{
   "status":"OK",
   "url":"",
   "language":"english",
   "keywords":[
      {
         "text":"York en route",
         "relevance":"0.974363"
      },
      {
         "text":"Anthony Eden",
         "relevance":"0.814394"
      },
      {
         "text":"President Eisenhower",
         "relevance":"0.700189"
      }
   ]
}

GREL expression:

forEach(value.parseJson().keywords,v,v.text).join(":::")

will output like this:

York en route:::Anthony Eden:::President Eisenhower

cross(cell c, string projectName, string columnName)

Returns an array of zero or more rows in the project projectName for which the cells in their column columnName have the same content as cell c (similar to a lookup).

NOTE: There is also a VIB-BITS OpenRefine extension that automates some of the GREL cross() functionality for ease of use with its "Add column(s) from other projects..." function.

Consider 2 projects with the following data:

My Address Book

friend address
john 120 Main St.
mary 50 Broadway Ave.
anne 17 Morning Crescent

Christmas Gifts

gift recipient
lamp mary
clock john

Now in the project "Christmas Gifts", we want to add a column containing the addresses of the recipients, which are in project "My Address Book". So we can invoke the command Add column based on this column on the "recipient" column in "Christmas Gifts" and enter this expression:

  cell.cross("My Address Book", "friend")[0].cells["address"].value

When that command is applied, the result is

Christmas Gifts

gift recipient address
lamp mary 50 Broadway Ave.
clock john 120 Main St.

facetCount(choiceValue, string facetExpression, string columnName)

Returns the facet count corresponding to the given choice value

gift recipient price
lamp mary 20
clock john 54
watch amit 80
clock claire 62

If we wanted to show how many of each gift we had given, we could use the following expression:

  facetCount(value, "value", "gift")

This would then complete the table as so:

gift recipient price count
lamp mary 20 1
clock john 54 2
watch amit 80 1
clock claire 62 2

Jsoup HTML parsing functions


A tutorial of the below HTML functions is shown here.

parseHtml(string s)

returns a full HTML document and adds any missing closing tags.

You can then extract or .select() which portions of the HTML document you need for further splitting, partitioning, etc.

An example of extracting all table rows <tr> from a <div> using parseHtml().select() together is described more in Extract HTML attributes, text, links with integrated GREL commands.

select(Element e, String s)

returns an element from an HTML doc element using selector syntax.

Example:

value.parseHtml().select("div#content")[0].select("tr").toString()

This function can be used with most of the Jsoup selector syntax as documented here: http://jsoup.org/cookbook/extracting-data/selector-syntax

htmlAttr(Element e, String s)

returns a value from an attribute on an HTML Element.

htmlText(Element e)

returns the text from within an element (including all child elements).

innerHtml(Element e)

returns the innerHtml of an HTML element.

outerHtml(Element e)

returns the outerHtml of an HTML element. (Note: outerHtml Indicates the rendered text and HTML tags including the start and end tags, of the current element.)

ownText(Element e)

Gets the text owned by this HTML element only; does not get the combined text of all children.