This repository has been archived. It is now read-only.
The isis2json.py is a Python/Jython script to export ISIS (MST+XRF) or ISO-2709 databases to JSON files, optionally compatible with CouchDB and MongoDB.
Running under Jython, both MST+XRF and ISO-2709 files can be read, thanks to the Bruma Java library found in the lib/ directory.
Running under Python, only ISO-2709 files can be read.
$ ./isis2json.py -h
usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-t ISIS_JSON_TYPE]
[-q QTY] [-s SKIP] [-i TAG_NUMBER] [-u] [-p PREFIX]
[-n] [-k TAG:VALUE]
INPUT.(mst|iso)
Convert an ISIS .mst or .iso file to a JSON array
positional arguments:
INPUT.(mst|iso) .mst or .iso file to read
optional arguments:
-h, --help show this help message and exit
-o OUTPUT.json, --out OUTPUT.json
the file where the JSON output should be written
(default: write to stdout)
-c, --couch output array within a "docs" item in a JSON document
for bulk insert to CouchDB via POST to db/_bulk_docs
-m, --mongo output individual records as separate JSON objects,
one per line for bulk insert to MongoDB via
mongoimport utility
-t ISIS_JSON_TYPE, --type ISIS_JSON_TYPE
ISIS-JSON type, sets field structure:
1=string, 2=alist, 3=dict
-q QTY, --qty QTY maximum quantity of records to read (default=ALL)
-s SKIP, --skip SKIP records to skip from start of .mst (default=0)
-i TAG_NUMBER, --id TAG_NUMBER
generate an "_id" from the given unique TAG field
number for each record
-u, --uuid generate an "_id" with a random UUID for each record
-p PREFIX, --prefix PREFIX
concatenate prefix to every numeric field tag
(ex. 99 becomes "v99")
-n, --mfn generate an "_id" from the MFN of each record
(available only for .mst input)
-k TAG:VALUE, --constant TAG:VALUE
Include a constant tag:value in every record
(ex. -k type:AS)
There are many ways to represent CDS/ISIS records in JSON1. This utility currently exports ISIS-JSON types 1, 2 and 3.
Given an ISIS record with this strcuture:
2 «538886»
10 «Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg» 10 «Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg»
Below are the three supported representations of that record in JSON:
{"10":
["Kanda, Paulo Afonso^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg",
"Smidth, Magali Taino^1USP^2FMUSP^3CRDC^pBrasil^cSão Paulo^rorg"],
"2":
["538886"]
}
{"10":
[
[
("_", "Kanda, Paulo Afonso"),
("1", "USP"),
("2", "FMUSP"),
("3", "CRDC"),
("p", "Brasil"),
("c", "São Paulo"),
("r", "org")
],
[
("_", "Smidth, Magali Taino"),
("1", "USP"),
("2", "FMUSP"),
("3", "CRDC"),
("p", "Brasil"),
("c", "São Paulo"),
("r", "org")
]
],
"2":
[
[
("_", "538886")
]
]
}
{"10":
[
{
"_": "Kanda, Paulo Afonso",
"1": "USP",
"2": "FMUSP",
"3": "CRDC",
"c": "São Paulo",
"p": "Brasil",
"r": "org"
},
{
"_": "Smidth, Magali Taino",
"1": "USP",
"2": "FMUSP",
"3": "CRDC",
"c": "São Paulo",
"p": "Brasil",
"r": "org"
}
],
"2":
[
{
"_": "538886"
}
]
}
Under Python, isis2json.py depends on:
- Python2.6 or 2.7
- argparse.py (bundled; also part of the CPython 2.7 distribution)
Under Jython, isis2json.py depends on:
- Jython 2.5;
- argparse.py (bundled)
- Bruma.jar on the CLASSPATH (bundled);
- jyson-1.0.1.jar on the CLASSPATH (bundled);
Example CLASSPATH:
export CLASSPATH=/home/luciano/lib/Bruma.jar:/home/luciano/lib/jyson-1.0.1.jar
If you see this:
Traceback (innermost last):
(no code object) at line 0
File "./isis2json.py", line 84
yield fields
^
SyntaxError: invalid syntax
You are probably running Jython 2.2, an old version that is packaged with several Linux distributions such as Debian and Ubuntu. To verify, type:
$ jython --version
Jython 2.2.1 on java1.6.0_20
To fix, download and install Jython 2.5 or later from the Jython project on SourceForge.
Check if Jython 2.5 or later is installed:
$ jython --version
Jython 2.5.2
If it is not, se issue above. If it is, add the path to Bruma.jar to the CLASSPATH environment variable, or pass it via the jython -J-cp command line option when running isis2json.py, like this:
$ jython -J-cp lib/jyson-1.0.1.jar:lib/Bruma.jar isis2json.py fixtures/LILACS1.mst
See section 4.1 of http://journal.code4lib.org/articles/4893↩