Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Frequently Asked Questions
This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.
OpenRefine does not start after clicking the .exe, it only opens and closes a window
- Ensure that you have a Java JRE installed on your system. And have at least 1 GB of RAM available for it.
Out of Memory Errors - Feels Slow - Could not reserve enough space for object heap
Where is my data stored?
I have a question. Where do I ask?
Send your question to the OpenRefine mailing list.
I've found a bug or want a new feature. What should I do?
How do I change the workspace directory that I want Refine to use for its project storage ?
- On Linux, If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,
./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace
- Alternatively, you can update and add a preference at http://127.0.0.1:3333/preferences ,
KEY = refine.data_dir VALUE = T:\MyOpenRefineDataFolder
- On Windows, add this line to the file openrefine.l4j.ini, then save :
(Of course, replace T:\MyOpenRefineDataFolder with your actual directory)
How do I change the IP address that OpenRefine uses?
On Linux, Mac from the command line,
./refine -i 127.0.0.1.
On Windows use a slash character like,
C:>refine /i 127.0.0.1:8088)
How do I change the Port that OpenRefine uses?
On Linux, Mac from the command line,
./refine -i 127.0.0.1 -p 3334
On Windows, use a slash character like,
C:>refine /i 127.0.0.1 /p 3334)
You can also edit the refine.ini file to permanently set the IP Address and Port.
I am having trouble connecting to OpenRefine with my browser
You might need to double check your Chrome or Firefox proxy settings. In Firefox options->advanced->network->connection->settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".
If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run OpenRefine with)
What syntax of regular expression (regex) does OpenRefine support?
You can also use Jython Regex instead of GREL functions and use a Custom Text Facet with something like this:
import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?
A good practice is to use ' single quotes for your Refine Expression syntax and reserve " double quotes for the URL syntax parts. Also make sure to escape() your cell values used, where necessary.
'=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'
How can I delete a whole row or several rows?
- Flag (or star) the row(s)
- In the dropdown above the flag you can get a facet, by going to Facet > Facet by flag.
- From the facet that opens select the 'true' option.
- In the dropdown menu above the flag you can go to Edit Rows > Remove all matching rows.
How do I make a Text Facet show more than 2000 choices?
You can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key "ui.browsing.listFacet.limit".
How do I find duplicates in a column?
- There is a shortcut for this, Facet → Customized facets → Duplicates facet
- Create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
- Use the facetCount() function like facetCount(value, 'value', 'column name') > 1 and select 'true' to show all rows that have duplicates
ETL pipeline?Can OpenRefine be used as a piece of a larger
It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can.
Here is some further discussion and a project:
cross() function does not work for me
You might be missing a few steps that need to be performed before you can use the cross() function and expect it to match the keys between the 2 projects correctly.
- trim() your key column before doing cross()
- De-duplicate values in your key column if necessary
Importing large files - "Memory usage: 100%"
When importing large data files, it may happen that OpenRefine consumes all available memory and the import will never finish.
- It may help to increase the amount of memory available to OpenRefine.
- It may also help to uncheck the "Parse cell text into numbers, dates, ..." option in the import preview.