-
-
Notifications
You must be signed in to change notification settings - Fork 2k
This is a collection of frequently asked questions on OpenRefine. Feel free to ask your own question on the OpenRefine mailing list and we'll try to answer to the best of our abilities and add them to this list.
OpenRefine has no built-in security for multi-user or multi-tenant scenarios. OpenRefine has a single data model that is not shared, so there is a risk of columnar data operations being overwritten by other users, so care must be taken by users. Having said that, if you are inclined to proceed at your own risk, you can get some security by using a proxy.
Notes about this were discussed on our mailing list at one time HERE.
Ensure that you have a Java JRE installed on your system. And have at least 1 GB of RAM available for it.
Only shows Logo when starting, or Open Project doesn't work, or shows mixed up HTML content in my browser
This might be due to a path naming issue. See https://github.com/OpenRefine/OpenRefine/issues/3277 . Are there special characters in your Windows User path? What is the output in a Windows terminal if you do echo %USERPROFILE% ?
Or this might be due to extensions installed previously for OpenRefine? See https://github.com/OpenRefine/OpenRefine/issues/3326 If so, this might be because an OpenRefine extension that you have installed in the AppData folder for OpenRefine, which is also where the workspace is defaulted under and holds your projects folders and files. So... we don't want to risk losing your projects when deleting an OpenRefine extension to fix things...so let's do these steps in order:
- Backup the OpenRefine folder under your AppData folder. (use Zip or whatever archive tool like 7z that you have on Windows and then move this .zip file to a safe location or additionally uploaded to the cloud somewhere)
- Delete the OpenRefine folder under your AppData folder.
- Start OpenRefine, which will recreate the folder and workspace folder within it.
- Optional next steps would be to selectively move your
workspace.json
file and project folders back into place from your previously saved .zip backup file.
Or this might arise because you are running Java 16.0.1 which is not currently supported. You can try setting your JAVA_HOME and using a lower version of Oracle's Java JRE or use prebuilt binaries Adoptium.net Temurin OpenJDK. Alternatively for Windows users, you can also try downloading and using our Windows kit with embedded Java which will not depend on any Java installation.
This can happen for a few reasons but the likely reason is that there is an older extension that might not be compatible with the OpenRefine version you are trying to run. Delete the extension folder and try starting OpenRefine again.
OpenRefine relies on having computer memory available to it to work effectively. As a general rule the larger your data set, the more memory OpenRefine will need to be able to work with it effectively. The amount of memory available to OpenRefine is a setting which you can change if you need to. If you are getting "out of memory" errors (java.lang.OutOfMemoryError
), or generally feel that Refine is slow, you can try allocating more memory to OpenRefine.
Send your question to the OpenRefine mailing list.
Consider first discussing it on the mailing list. This will likely help characterize the issue for a good quality bug report or feature request which you can file on the issue tracker.
OpenRefine project data is stored in the 'workspace directory'. A default workspace directory is setup on your local computer when you first run OpenRefine, or you can set it yourself through a setting. For more information read Where is the data stored?.
- On Linux, If you run Refine from the terminal, you can point to the workspace directory through the -d parameter, e.g.,
./refine -p 3333 -i 0.0.0.0 -m 6000M -d /where/you/want/the/workspace
- On Windows, add this line to the file openrefine.l4j.ini, then save :
-Drefine.data_dir=T:\MyOpenRefineDataFolder
(Of course, replace T:\MyOpenRefineDataFolder with your actual directory)
On Linux, Mac from the command line,
./refine -i 127.0.0.1
On Windows use a slash character like,
C:>refine /i 127.0.0.1:8088
On Linux, Mac from the command line,
./refine -i 127.0.0.1 -p 3334
On Windows, use a slash character like,
C:>refine /i 127.0.0.1 /p 3334
You can also edit the refine.ini
file to permanently set the IP Address and Port.
-
You might need to double check your Chrome or Firefox proxy settings. In Firefox options->advanced->network->connection->settings and switch from "use system proxy settings" to "auto-detect proxy settings for this network".
-
If you get a message "Network Error (tcp_error)" in your browser, you might also try to uncheck "automatically detect settings" and also add an exception to your firewall rules to allow 127.0.0.1 (or whatever IP address you decide to run OpenRefine with)
-
On Windows, sometimes OpenRefine will look like it's starting up, but won't connect as 127.0.0.1 So, you might try configuring OpenRefine to use a different IP address and Port to run on.
-
On Windows, you might be missing the Loopback Adapter for some reason - see https://github.com/datacarpentry/OpenRefine-ecology-lesson/issues/29
The regular expression syntax for GREL is that of Java regex, not of Javascript. See GREL Regular Expressions.
You can also use Jython Regex instead of GREL functions and use a Custom Text Facet with something like this:
import re
g = re.search(ur"\u2014 (.*),\s*BWV", value)
return g.group(1)
What syntax should I use with GREL for constructing URLs correctly and avoid HTTP errors and other pitfalls, for instance, when working with JSON strings within a URL or to create a HYPERLINK, etc ?
A good practice is to use '
single quotes for your Refine Expression syntax and reserve "
double quotes for the URL syntax parts. Also make sure to escape()
your cell values used, where necessary.
EXAMPLES:
'https://www.googleapis.com/freebase/v1/mqlread?query={"mid":null,"/type/object/key":{"namespace":"/authority/fmd/model","value":"'+escape(cells.ModelName.value, "url")+'"}}'
'=HYPERLINK("http://listings.listhub.net/pages/BHAMMLSAL/' + value + '",' + value + ')'
- Flag (or star) the row(s) you wish to delete.
- In the All column dropdown menu (above the flags) you can get a facet, by going to Facet > Facet by flag.
- From the facet that opens click on the 'true' option.
- In the All column dropdown menu (above the flags) you can go to Edit Rows > Remove all matching rows.
You can go to http://127.0.0.1:3333/preferences and set the facet limit using the preference key ui.browsing.listFacet.limit
.
Several options:
- There is a shortcut for this, Facet → Customized facets → Duplicates facet
- Create a Text Facet on a column and then in the facet click "Sort by: count". Any facets with a count of 2 or more are duplicates
- Use the facetCount() function like
(facetCount(value, 'value', 'column name') > 1).toString()
and selecttrue
to show all rows that have duplicates
Can OpenRefine be used as a piece of a larger ETL pipeline?
You can use one of the OpenRefine client libraries for automating OpenRefine programatically. If you like docker then you might like this container approach to batch processing.
It's worth pointing out that not all Refine features can work unsupervised and without human interaction (clustering, for example), but some can.
Here is some further discussion and a project:
- https://groups.google.com/group/openrefine/msg/ee29cf8d660e66a9?hl=en
- https://groups.google.com/group/openrefine-dev/browse_thread/thread/33374842ccfebfcd#
- https://github.com/dfhuynh/grefine-proxy
In refine.ini you can add the following:
JAVA_OPTIONS=-Drefine.headless=true
You can also select headless mode at runtime using -x refine.headless=true
. Some additional arguments are listed at https://github.com/OpenRefine/OpenRefine/issues/1677#issuecomment-648335037.
You might be missing a few steps that need to be performed before you can use the cross()
function
and expect it to match the keys between the 2 projects correctly.
-
trim()
your key column before doingcross()
- De-duplicate values in your key column if necessary
When importing large data files, it may happen that OpenRefine consumes all available memory and the import will never finish.
- It may help to increase the amount of memory available to OpenRefine.
- It may also help to uncheck the "Parse cell text into numbers, dates, ..." option in the import preview.
- This is likely because you do not have your JAVA_HOME environment variable set to using Java 1.8 such as
JAVA_HOME=C:\Program Files\Java\jdk-1.8.0_191
For further details see Issue #1741 - This also might happen if you are using Python 3+ and not Python 2.7+ because of our use of Jython 2.7.1 library which does not currently support Python 3+. You will have to set your default Python environment temporarily to Python 2.7+ to use OpenRefine successfully. On Windows, this can be done by temporarily modifying your PATH environment variable to include the location where you have Python 2.7+ installed instead of where Python 3+ is installed.
OpenRefine was designed as a traditional desktop application...that happens to run in your browser. Because of this, we did not invest in meeting any Accessibility guidelines unfortunately (such as WCAG or others). We have lots of labels that can utilize text-to-speech, etc. but that's about it. That is not to say that we won't stop anyone from coming in and helping us with any Accessibility efforts, but because of the way OpenRefine was designed, many of its features are very brittle to becoming more accessible for those with visual impairments certainly, since a multitude of OpenRefine's features were initially designed for visual acuity and accuracy to make human judgements.
But again, nothing is impossible with enough time and focus from others willing to volunteer and code to make more of OpenRefine's features more accessible for all.