# Using the openrefine-client in a Python 2 environment

## Preparations

First we need an OpenRefine server running and the openrefine-client installed.

### Option 1: binder

This [binder](https://github.com/betatim/openrefineder) has OpenRefine, the openrefine-client and a Jupyter server proxy preinstalled. We need to start the OpenRefine server proxy by opening the urlpath `/openrefine`. It is a bit complicated doing it directly from this notebook but the following commands will do that for you. This may take up to 30 seconds.

In [1]:
import os
if 'openrefineder' in os.environ['HOSTNAME']:
    notebook = !jupyter notebook list | grep -o -E 'http\S+'
    openrefine_url = notebook[0].replace('?token', 'openrefine?token')
    from urllib import urlopen
    from time import sleep
    for i in range(30):
        response = urlopen(openrefine_url).read()
        sleep(1)
        if 'openrefine' in response:
            openrefine_url = openrefine_url.replace('http://0.0.0.0:8888','')
            from IPython.core.display import display, HTML
            display(HTML('<a href="' + openrefine_url + '">Click here to open OpenRefine</a>'))
            break
        if i == 30:
            print('timeout')

### Option 2: Local environment

Ensure you have an OpenRefine server running. Then install the OpenRefine client as follows.

```
pip install openrefine-client
```

## Create a directory

We will store some files so it is clearer to use a new folder.

In [2]:
import os, datetime
path = os.path.expanduser('~') + '/' + datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
try:
    os.mkdir(path)
    os.chdir(path)
except OSError:
    print ("Creation of the directory %s failed" % path)
else:
    print (os.getcwd())

## Import module

In [3]:
from google.refine import cli

## Create project

Download sample data

In [4]:
cli.download('https://git.io/fj5hF','duplicates.csv')

Import file into OpenRefine (and store returned project)

In [5]:
p1 = cli.create('duplicates.csv')

## List all projects

In [6]:
cli.ls()

## Show project metadata

In [7]:
cli.info(p1.project_id)

## Export project to terminal

In [8]:
cli.export(p1.project_id)

## Apply rules from json file

Download sample json file (the content of this file was previously extracted via Undo/Redo history in the OpenRefine graphical user interface)

In [9]:
cli.download('https://git.io/fj5ju','duplicates-deletion.json')

Apply transformations rules

In [10]:
cli.apply(p1.project_id, 'duplicates-deletion.json')

Export project to terminal again

In [11]:
cli.export(p1.project_id)

## Export project to file

Export data in Excel (.xls) format

In [12]:
cli.export(p1.project_id, 'deduped.xls')

## Delete project

In [13]:
cli.delete(p1.project_id)

## Advanced templating

Create another project from the example file above

In [14]:
p2 = cli.create('duplicates.csv')

The following example code will export the columns "name" and "purchase" in JSON format from the project "advanced" for rows matching the regex text filter ^F$ in column "gender"

In [15]:
cli.templating(p2.project_id,
prefix='''{ "events" : [
''',
template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }',
rowSeparator=''',
''',
suffix='''
] }''',
filterQuery='^F$',
filterColumn='gender')

There is also an option to store the results in multiple files. Each file will contain the prefix, an processed row, and the suffix.

In [16]:
cli.templating(p2.project_id,
prefix='''{ "events" : [
''',
template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }',
rowSeparator=''',
''',
suffix='''
] }''',
filterQuery='^F$',
filterColumn='gender',
output_file='advanced.json',
splitToFiles=True)

Filenames are suffixed with the row number by default (e.g. `advanced_1.json`, `advanced_2.json` etc.). There is another option to use the value in the first column instead:

In [17]:
cli.templating(p2.project_id,
prefix='''{ "events" : [
''',
template='    { "name" : {{jsonize(cells["name"].value)}}, "purchase" : {{jsonize(cells["purchase"].value)}} }',
rowSeparator=''',
''',
suffix='''
] }''',
filterQuery='^F$',
filterColumn='gender',
output_file='advanced.json',
splitToFiles=True,
suffixById=True)

Check the results in the current directory

In [18]:
os.listdir(os.getcwd())

Because our project "advanced" contains duplicates in the first column "email" this command will overwrite files (e.g. `advanced_melanie.white@example2.edu.json`). When using this option, the first column should contain unique identifiers.

## Delete project

In [19]:
cli.delete(p2.project_id)

## Getting help

In [20]:
help(cli)

Client and server can be executed on different machines. Host and port of the OpenRefine server can be specified:

In [21]:
cli.refine.REFINE_HOST = 'localhost'
cli.refine.REFINE_PORT = '3333'

Please file an [issue](https://github.com/opencultureconsulting/openrefine-client/issues) if you miss some features in the command line interface or if you have tracked a bug. And you are welcome to ask any questions!