FAQ

Thad Guidry edited this page Jun 3, 2016 · 27 revisions
Clone this wiki locally

Frequently Asked Questions

This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.

OpenRefine does not start after clicking the .exe, it only opens and closes a window

  • Ensure that you have a Java JRE installed on your system. And have at least 1 GB of RAM available for it.

Out of Memory Errors - Feels Slow - Could not reserve enough space for object heap

Where is my data stored?

I have a question. Where do I ask?

Send your question to the OpenRefine mailing list.

I've found a bug or want a new feature. What should I do?

Consider first discussing it on the mailing list. This will likely help characterize the issue for a good quality bug report or feature request which you can file on the issue tracker.

How do I change the workspace directory that I want Refine to use for its project storage ?

  • On Linux, If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,
  ./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace
  KEY = refine.data_dir
  VALUE = T:\MyOpenRefineDataFolder

How do I change the IP address that OpenRefine uses?

On Linux, Mac from the command line,

./refine -i 127.0.0.1.

On Windows use a slash character like,

C:>refine /i 127.0.0.1:8088)

How do I change the Port that OpenRefine uses?

On Linux, Mac from the command line,

./refine -i 127.0.0.1 -p 3334

On Windows, use a slash character like,

C:>refine /i 127.0.0.1 /p 3334)

You can also edit the refine.ini file to permanently set the IP Address and Port.

I am having trouble connecting to OpenRefine with my browser

You might need to double check your Chrome or Firefox proxy settings. In Firefox options->advanced->network->connection->settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".

If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run OpenRefine with)

What syntax of regular expression (regex) does OpenRefine support?

The regular expression syntax for GREL is that of Java regex, not of Javascript. See GREL Regular Expressions.

You can also use Jython Regex instead of GREL functions and use a Custom Text Facet with something like this:

import re
g = re.search(ur"\u2014 (.*),\s*BWV", value)
return g.group(1)

What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?

A good practice is to use ' single quotes for your Refine Expression syntax and reserve " double quotes for the URL syntax parts. Also make sure to escape() your cell values used, where necessary.

EXAMPLES:

'https://www.googleapis.com/freebase/v1/mqlread?query={"mid":null,"/type/object/key":{"namespace":"/authority/fmd/model","value":"'+escape(cells.ModelName.value, "url")+'"}}'
'=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'

How can I delete a whole row or several rows?

  1. Flag (or star) the row(s)
  2. In the dropdown above the flag you can get a facet, by going to Facet > Facet by flag.
  3. From the facet that opens select the 'true' option.
  4. In the dropdown menu above the flag you can go to Edit Rows > Remove all matching rows.

How do I make a Text Facet show more than 2000 choices?

You can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key "ui.browsing.listFacet.limit".

How do I find duplicates in a column?

Several options:

  • There is a shortcut for this, Facet → Customized facets → Duplicates facet
  • Create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
  • Use the facetCount() function like facetCount(value, 'value', 'column name') > 1 and select 'true' to show all rows that have duplicates

Can OpenRefine be used as a piece of a larger ETL pipeline?

You can use one of the OpenRefine client libraries for automating OpenRefine programatically.

It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can.

Here is some further discussion and a project: