# Using the openrefine-client in a Linux Bash environment

## Preparations

First we need an OpenRefine server running and the openrefine-client installed.

### Option 1: binder

This [binder](https://github.com/betatim/openrefineder) has OpenRefine, the openrefine-client and a Jupyter server proxy preinstalled. We need to start the OpenRefine server proxy by opening the urlpath `/openrefine`. It is a bit complicated doing it directly from this notebook but the following command will do that for you.

In [1]:
if [[ $HOSTNAME = *openrefineder* ]] ; then
notebook_url="$(jupyter notebook list | grep -o -E 'http\S+')"
openrefine_url="${notebook_url/?token/openrefine?token}"
until wget -q -O - ${openrefine_url} | cat | grep -q -o "OpenRefine" ; do sleep 1; done
openrefine_url="${openrefine_url/http:\/\/0.0.0.0:8888/https:\/\/hub.gke.mybinder.org}"
echo "OpenRefine is available at $openrefine_url"
else echo "not running in binder environment"
fi

not running in binder environment


### Option 2: Local environment

Ensure you have an OpenRefine server running. Then install the OpenRefine client as follows.

In [2]:
wget -nv https://github.com/opencultureconsulting/openrefine-client/releases/download/v0.3.8/openrefine-client_0-3-8_linux -O ~/.local/bin/openrefine-client
chmod +x ~/.local/bin/openrefine-client

2019-08-22 03:52:15 URL:https://github-production-release-asset-2e65be.s3.amazonaws.com/80617276/93779a80-c48e-11e9-816c-36bb4c5c3bbb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190822%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190822T015213Z&X-Amz-Expires=300&X-Amz-Signature=acfeea5f81c161678cacbaaba886c012615722ad68090edea09bba0253836513&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dopenrefine-client_0-3-8_linux&response-content-type=application%2Foctet-stream [4450056/4450056] -> "/home/felix/.local/bin/openrefine-client" [1]


## Create a directory

We will store some files so it is clearer to use a new folder.

In [3]:
workspace=$(date +%Y%m%d_%H%M%S)
mkdir -p ~/$workspace && cd ~/$workspace && pwd

/home/felix/20190822_035215


## Create project

Download sample data

In [4]:
openrefine-client --download "https://git.io/fj5hF" --output=duplicates.csv

Download to file duplicates.csv complete


Import file into OpenRefine

In [5]:
openrefine-client --create duplicates.csv

id: 1986280251125
rows: 10


## List all projects

In [6]:
openrefine-client --list

 1986280251125: duplicates


## Show project metadata

In [7]:
openrefine-client --info "duplicates"

                  id: 1986280251125
                 url: http://127.0.0.1:3333/project?project=1986280251125
                name: duplicates
            modified: 2019-08-22T01:52:16Z
             created: 2019-08-22T01:52:16Z
            rowCount: 10
importOptionMetadata: [{u'storeEmptyStrings': True, u'fileSource': u'duplicates.csv', u'storeBlankRows': True, u'encoding': u'', u'projectName': u'duplicates', u'processQuotes': True, u'separator': u',', u'trimStrings': False, u'limit': -1, u'storeBlankCellsAsNulls': True, u'guessCellValueTypes': False, u'includeFileSources': False}]
          column 001: email
          column 002: name
          column 003: state
          column 004: gender
          column 005: purchase


## Export project to terminal

In [8]:
openrefine-client --export "duplicates"

email	name	state	gender	purchase
danny.baron@example1.com	Danny Baron	CA	M	TV
melanie.white@example2.edu	Melanie White	NC	F	iPhone
danny.baron@example1.com	D. Baron	CA	M	Winter jacket
ben.tyler@example3.org	Ben Tyler	NV	M	Flashlight
arthur.duff@example4.com	Arthur Duff	OR	M	Dining table
danny.baron@example1.com	Daniel Baron	CA	M	Bike
jean.griffith@example5.org	Jean Griffith	WA	F	Power drill
melanie.white@example2.edu	Melanie White	NC	F	iPad
ben.morisson@example6.org	Ben Morisson	FL	M	Amplifier
arthur.duff@example4.com	Arthur Duff	OR	M	Night table


## Apply rules from json file

Download sample json file (the content of this file was previously extracted via Undo/Redo history in the OpenRefine graphical user interface)

In [9]:
openrefine-client --download "https://git.io/fj5ju" --output=duplicates-deletion.json

Download to file duplicates-deletion.json complete


Apply transformations rules

In [10]:
openrefine-client --apply duplicates-deletion.json "duplicates"

File duplicates-deletion.json has been successfully applied to project 1986280251125


Export project to terminal again

In [11]:
openrefine-client --export "duplicates"

email	count	name	state	gender	purchase
arthur.duff@example4.com	2	Arthur Duff	OR	M	Dining table
ben.morisson@example6.org	1	Ben Morisson	FL	M	Amplifier
ben.tyler@example3.org	1	Ben Tyler	NV	M	Flashlight
danny.baron@example1.com	3	Danny Baron	CA	M	TV
jean.griffith@example5.org	1	Jean Griffith	WA	F	Power drill
melanie.white@example2.edu	2	Melanie White	NC	F	iPhone


## Export project to file

Export data in Excel (.xls) format

In [12]:
openrefine-client --export "duplicates" --output deduped.xls

Export to file deduped.xls complete


## Delete project

In [13]:
openrefine-client --delete "duplicates"

Project 1986280251125 has been successfully deleted


## Advanced templating

Create another project from the example file above

In [14]:
openrefine-client --create duplicates.csv --projectName=advanced

id: 1781116687450
rows: 10


The following example code will export the columns "name" and "purchase" in JSON format from the project "advanced" for rows matching the regex text filter ^F$ in column "gender"

In [15]:
openrefine-client "advanced" \
--prefix='{ "events" : [
' \
--template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
--rowSeparator=',
' \
--suffix='
] }' \
--filterQuery='^F$' \
--filterColumn='gender'

{ "events" : [
    { "name" : "Melanie White", "purchase" : "iPhone" },
    { "name" : "Jean Griffith", "purchase" : "Power drill" },
    { "name" : "Melanie White", "purchase" : "iPad" }
] }

There is also an option to store the results in multiple files. Each file will contain the prefix, an processed row, and the suffix.

In [16]:
openrefine-client "advanced" \
--prefix='{ "events" : [
' \
--template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
--rowSeparator=',
' \
--suffix='
] }' \
--filterQuery='^F$' \
--filterColumn='gender' \
--output=advanced.json \
--splitToFiles=true

Export to files complete. Last file: advanced_3.json


Filenames are suffixed with the row number by default (e.g. `advanced_1.json`, `advanced_2.json` etc.). There is another option to use the value in the first column instead:

In [17]:
openrefine-client "advanced" \
--prefix='{ "events" : [
' \
--template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }' \
--rowSeparator=',
' \
--suffix='
] }' \
--filterQuery='^F$' \
--filterColumn='gender' \
--output=advanced.json \
--splitToFiles=true \
--suffixById=true

Export to files complete. Last file: advanced_melanie.white@example2.edu.json


Check the results in the current directory.

In [18]:
ls

advanced_1.json
advanced_2.json
advanced_3.json
advanced_jean.griffith@example5.org.json
advanced_melanie.white@example2.edu.json
deduped.xls
duplicates.csv
duplicates-deletion.json


Because our project "advanced" contains duplicates in the first column "email" this command will overwrite files (e.g. `advanced_melanie.white@example2.edu.json`). When using this option, the first column should contain unique identifiers.

## Delete project

In [19]:
openrefine-client --delete "advanced"

Project 1781116687450 has been successfully deleted


## Getting help

In [20]:
openrefine-client --help

Usage: openrefine-client [--help | OPTIONS]

Script to provide a command line interface to an OpenRefine server.

Options:
  -h, --help            show this help message and exit

  Connection options:
    -H 127.0.0.1, --host=127.0.0.1
                        OpenRefine hostname (default: 127.0.0.1)
    -P 3333, --port=3333
                        OpenRefine port (default: 3333)

  Commands:
    -c [FILE], --create=[FILE]
                        Create project from file. The filename ending (e.g.
                        .csv) defines the input format
                        (csv,tsv,xml,json,txt,xls,xlsx,ods)
    -l, --list          List projects
    --download=[URL]    Download file from URL (e.g. example data). Combine
                        with --output to specify a filename.

  Commands with argument [PROJECTID/PROJECTNAME]:
    -d, --delete        Delete project
    -f [FILE], --apply=[FILE]
                        Apply JSON rules to OpenRefine project
    -E, --export        

The [openrefine-client](https://github.com/opencultureconsulting/openrefine-client) is available as a one file executable for Windows, Mac OS and Linux. Client and server can be executed on different machines (host and port of the OpenRefine server can be specified, e.g. `-H 127.0.0.1 -P 80`).

Please file an [issue](https://github.com/opencultureconsulting/openrefine-client/issues) if you miss some features in the command line interface or if you have tracked a bug. And you are welcome to ask any questions!