Skip to content
CDS/ISIS to JSON database converter (compatible with CouchDB and MongoDB)
Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
fixtures
lib
.gitignore
COPYING
COPYING.LESSER
HISTORY
README.rst
argparse.py
bulkup.sh
isis2json.py
iso2709.py
iso2709_test.txt
subfield.py

README.rst

isis2json: CDS/ISIS to JSON database converter

The isis2json.py is a Python/Jython script to export ISIS (MST+XRF) or ISO-2709 databases to JSON files, optionally compatible with CouchDB and MongoDB.

Running under Jython, both MST+XRF and ISO-2709 files can be read, thanks to the Bruma Java library found in the lib/ directory.

Running under Python, only ISO-2709 files can be read.

Usage

$ ./isis2json.py -h
usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-t ISIS_JSON_TYPE]
                    [-q QTY] [-s SKIP] [-i TAG_NUMBER] [-u] [-p PREFIX]
                    [-n] [-k TAG:VALUE]
                    INPUT.(mst|iso)

Convert an ISIS .mst or .iso file to a JSON array

positional arguments:
  INPUT.(mst|iso)     .mst or .iso file to read

optional arguments:
  -h, --help          show this help message and exit
  -o OUTPUT.json, --out OUTPUT.json
                      the file where the JSON output should be written
                        (default: write to stdout)
  -c, --couch         output array within a "docs" item in a JSON document
                        for bulk insert to CouchDB via POST to db/_bulk_docs
  -m, --mongo         output individual records as separate JSON objects,
                        one per line for bulk insert to MongoDB via
                        mongoimport utility
  -t ISIS_JSON_TYPE, --type ISIS_JSON_TYPE
                      ISIS-JSON type, sets field structure:
                        1=string, 2=alist, 3=dict
  -q QTY, --qty QTY   maximum quantity of records to read (default=ALL)
  -s SKIP, --skip SKIP  records to skip from start of .mst (default=0)
  -i TAG_NUMBER, --id TAG_NUMBER
                      generate an "_id" from the given unique TAG field
                        number for each record
  -u, --uuid          generate an "_id" with a random UUID for each record
  -p PREFIX, --prefix PREFIX
                      concatenate prefix to every numeric field tag
                        (ex. 99 becomes "v99")
  -n, --mfn           generate an "_id" from the MFN of each record
                        (available only for .mst input)
  -k TAG:VALUE, --constant TAG:VALUE
                      Include a constant tag:value in every record
                        (ex. -k type:AS)

ISIS-JSON Record Types

There are many ways to represent CDS/ISIS records in JSON [1]. This utility currently exports ISIS-JSON types 1, 2 and 3.

Given an ISIS record with this strcuture:

 2 «538886»
10 «Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»
10 «Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»

Below are the three supported representations of that record in JSON:

ISIS-JSON type 1

{"10":
    ["Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg",
     "Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg"],
 "2":
    ["538886"]
}

ISIS-JSON type 2

{"10":
    [
        [
            ("_", "Kanda, Paulo Afonso"),
            ("1", "USP"),
            ("2", "FMUSP"),
            ("3", "CRDC"),
            ("p", "Brasil"),
            ("c", "São Paulo"),
            ("r", "org")
        ],
        [
            ("_", "Smidth, Magali Taino"),
            ("1", "USP"),
            ("2", "FMUSP"),
            ("3", "CRDC"),
            ("p", "Brasil"),
            ("c", "São Paulo"),
            ("r", "org")
        ]
    ],
 "2":
    [
        [
            ("_", "538886")
        ]
    ]
}

ISIS-JSON type 3

{"10":
    [
        {
            "_": "Kanda, Paulo Afonso",
            "1": "USP",
            "2": "FMUSP",
            "3": "CRDC",
            "c": "São Paulo",
            "p": "Brasil",
            "r": "org"
        },
        {
            "_": "Smidth, Magali Taino",
            "1": "USP",
            "2": "FMUSP",
            "3": "CRDC",
            "c": "São Paulo",
            "p": "Brasil",
            "r": "org"
        }
    ],
 "2":
    [
        {
            "_": "538886"
        }
    ]
}
[1]See section 4.1 of http://journal.code4lib.org/articles/4893

Dependencies

Under Python, isis2json.py depends on:

  • Python2.6 or 2.7
  • argparse.py (bundled; also part of the CPython 2.7 distribution)

Under Jython, isis2json.py depends on:

  • Jython 2.5;
  • argparse.py (bundled)
  • Bruma.jar on the CLASSPATH (bundled);
  • jyson-1.0.1.jar on the CLASSPATH (bundled);

Example CLASSPATH:

export CLASSPATH=/home/luciano/lib/Bruma.jar:/home/luciano/lib/jyson-1.0.1.jar

Troubleshooting

SyntaxError on yield fields running isis2json.py under Jython

If you see this:

Traceback (innermost last):
  (no code object) at line 0
  File "./isis2json.py", line 84
        yield fields
            ^
SyntaxError: invalid syntax

You are probably running Jython 2.2, an old version that is packaged with several Linux distributions such as Debian and Ubuntu. To verify, type:

$ jython --version
Jython 2.2.1 on java1.6.0_20

To fix, download and install Jython 2.5 or later from the Jython project on SourceForge.

IMPORT ERROR: Jython 2.5 and Bruma.jar are required to read .mst files

Check if Jython 2.5 or later is installed:

$ jython --version
Jython 2.5.2

If it is not, se issue above. If it is, add the path to Bruma.jar to the CLASSPATH environment variable, or pass it via the jython -J-cp command line option when running isis2json.py, like this:

$ jython -J-cp lib/jyson-1.0.1.jar:lib/Bruma.jar isis2json.py fixtures/LILACS1.mst
Something went wrong with that request. Please try again.