Skip to content
Martin Magdinier edited this page Aug 29, 2023 · 20 revisions

Frequently Asked Questions

This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.

Can I somehow host OpenRefine for others to access ?

OpenRefine has no built-in security for multi-user or multi-tenant scenarios. OpenRefine has a single data model that is not shared, so there is a risk of columnar data operations being overwritten by other users, so care must be taken by users. Having said that, if you are inclined to proceed at your own risk, you can get some security by using a proxy.

Notes about this were discussed on our mailing list at one time HERE.

OpenRefine does not start after clicking the .exe, it only opens and closes a window

Ensure that you have a Java JRE installed on your system. And have at least 1 GB of RAM available for it.

Only shows Logo when starting, or Open Project doesn't work, or shows mixed up HTML content in my browser

This might be due to a path naming issue. See https://github.com/OpenRefine/OpenRefine/issues/3277 . Are there special characters in your Windows User path? What is the output in a Windows terminal if you do echo %USERPROFILE% ?

Or this might be due to extensions installed previously for OpenRefine? See https://github.com/OpenRefine/OpenRefine/issues/3326 If so, this might be because an OpenRefine extension that you have installed in the AppData folder for OpenRefine, which is also where the workspace is defaulted under and holds your projects folders and files. So... we don't want to risk losing your projects when deleting an OpenRefine extension to fix things...so let's do these steps in order:

  1. Backup the OpenRefine folder under your AppData folder. (use Zip or whatever archive tool like 7z that you have on Windows and then move this .zip file to a safe location or additionally uploaded to the cloud somewhere)
  2. Delete the OpenRefine folder under your AppData folder.
  3. Start OpenRefine, which will recreate the folder and workspace folder within it.
  4. Optional next steps would be to selectively move your workspace.json file and project folders back into place from your previously saved .zip backup file.

Or this might arise because you are running Java 16.0.1 which is not currently supported. You can try setting your JAVA_HOME and using a lower version of Oracle's Java JRE or use prebuilt binaries Adoptium.net Temurin OpenJDK. Alternatively for Windows users, you can also try downloading and using our Windows kit with embedded Java which will not depend on any Java installation.

HTTP ERROR 500 Butterfly Error when starting

This can happen for a few reasons but the likely reason is that there is an older extension that might not be compatible with the OpenRefine version you are trying to run. Delete the extension folder and try starting OpenRefine again.

Out of Memory Errors - Feels Slow - Could not reserve enough space for object heap

OpenRefine relies on having computer memory available to it to work effectively. As a general rule the larger your data set, the more memory OpenRefine will need to be able to work with it effectively. The amount of memory available to OpenRefine is a setting which you can change if you need to. If you are getting "out of memory" errors (java.lang.OutOfMemoryError), or generally feel that Refine is slow, you can try allocating more memory to OpenRefine.

I have a question. Where do I ask?

Send your question to the OpenRefine mailing list.

I've found a bug or want a new feature. What should I do?

Consider first discussing it on the mailing list. This will likely help characterize the issue for a good quality bug report or feature request which you can file on the issue tracker.

Where is my data stored?

OpenRefine project data is stored in the 'workspace directory'. A default workspace directory is setup on your local computer when you first run OpenRefine, or you can set it yourself through a setting. For more information read Where is the data stored?.

How do I change the workspace directory that I want Refine to use for its project storage ?

  • On Linux, If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,
  ./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace
  • On Windows, add this line to the file openrefine.l4j.ini, then save :

-Drefine.data_dir=T:\MyOpenRefineDataFolder

(Of course, replace T:\MyOpenRefineDataFolder with your actual directory)

How do I change the IP address that OpenRefine uses?

On Linux, Mac from the command line,

./refine -i 127.0.0.1

On Windows use a slash character like,

C:>refine /i 127.0.0.1:8088

How do I change the Port that OpenRefine uses?

On Linux, Mac from the command line,

./refine -i 127.0.0.1 -p 3334

On Windows, use a slash character like,

C:>refine /i 127.0.0.1 /p 3334

You can also edit the refine.ini file to permanently set the IP Address and Port.

I am having trouble connecting to OpenRefine with my browser

  • You might need to double check your Chrome or Firefox proxy settings. In Firefox options->advanced->network->connection->settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".

  • If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run OpenRefine with)

  • On Windows, sometimes OpenRefine will look like it's starting up, but won't connect as 127.0.0.1 So, you might try configuring OpenRefine to use a different IP address and Port to run on.

  • On Windows, you might be missing the Loopback Adapter for some reason - see https://github.com/datacarpentry/OpenRefine-ecology-lesson/issues/29

What syntax of regular expression (regex) does OpenRefine support?

The regular expression syntax for GREL is that of Java regex, not of Javascript. See GREL Regular Expressions.

You can also use Jython Regex instead of GREL functions and use a Custom Text Facet with something like this:

import re
g = re.search(ur"\u2014 (.*),\s*BWV", value)
return g.group(1)

What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?

A good practice is to use ' single quotes for your Refine Expression syntax and reserve " double quotes for the URL syntax parts. Also make sure to escape() your cell values used, where necessary.

EXAMPLES:

'https://www.googleapis.com/freebase/v1/mqlread?query={"mid":null,"/type/object/key":{"namespace":"/authority/fmd/model","value":"'+escape(cells.ModelName.value, "url")+'"}}'
'=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'

How can I delete a whole row or several rows?

  • Flag (or star) the row(s) you wish to delete.
  • In the All column dropdown menu (above the flags) you can get a facet, by going to Facet > Facet by flag.
  • From the facet that opens click on the 'true' option.
  • In the All column dropdown menu (above the flags) you can go to Edit Rows > Remove all matching rows.

How do I make a Text Facet show more than 2000 choices?

You can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key ui.browsing.listFacet.limit.

How do I find duplicates in a column?

Several options:

  • There is a shortcut for this, Facet → Customized facets → Duplicates facet
  • Create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
  • Use the facetCount() function like (facetCount(value, 'value', 'column name') > 1).toString() and select true to show all rows that have duplicates

Can OpenRefine be used as a piece of a larger ETL pipeline?

You can use one of the OpenRefine client libraries for automating OpenRefine programatically. If you like docker then you might like this container approach to batch processing.

It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can.

Here is some further discussion and a project:

Can I run OpenRefine headless without a browser and what options are there?

In refine.ini you can add the following:

JAVA_OPTIONS=-Drefine.headless=true

You can also select headless mode at runtime using -x refine.headless=true. Some additional arguments are listed at https://github.com/OpenRefine/OpenRefine/issues/1677#issuecomment-648335037.

cross() function does not work for me

You might be missing a few steps that need to be performed before you can use the cross() function and expect it to match the keys between the 2 projects correctly.

  • trim() your key column before doing cross()
  • De-duplicate values in your key column if necessary

Importing large files - "Memory usage: 100%"

When importing large data files, it may happen that OpenRefine consumes all available memory and the import will never finish. Screenshot 100% Memory

  • It may help to increase the amount of memory available to OpenRefine.
  • It may also help to uncheck the "Parse cell text into numbers, dates, ..." option in the import preview.

WARNING about Python/Jython illegal reflective access operation has occurred

  • This is likely because you do not have your JAVA_HOME environment variable set to using Java 1.8 such as JAVA_HOME=C:\Program Files\Java\jdk-1.8.0_191 For further details see Issue #1741
  • This also might happen if you are using Python 3+ and not Python 2.7+ because of our use of Jython 2.7.1 library which does not currently support Python 3+. You will have to set your default Python environment temporarily to Python 2.7+ to use OpenRefine successfully. On Windows, this can be done by temporarily modifying your PATH environment variable to include the location where you have Python 2.7+ installed instead of where Python 3+ is installed.

Accessibility

OpenRefine was designed as a traditional desktop application...that happens to run in your browser. Because of this, we did not invest in meeting any Accessibility guidelines unfortunately (such as WCAG or others). We have lots of labels that can utilize text-to-speech, etc. but that's about it. That is not to say that we won't stop anyone from coming in and helping us with any Accessibility efforts, but because of the way OpenRefine was designed, many of its features are very brittle to becoming more accessible for those with visual impairments certainly, since a multitude of OpenRefine's features were initially designed for visual acuity and accuracy to make human judgements.

But again, nothing is impossible with enough time and focus from others willing to volunteer and code to make more of OpenRefine's features more accessible for all.

Clone this wiki locally