The ISO DSSSL group defines a number of XML schema definition languages. Amongst them:
- XSD is the most verbose and unwieldy way for understanding a XML schema
- Then comes RelaxNG XML (RNG).
- RelaxNG Compact (RNC) is the most compact and easy to understand. RNC is even easier to read than XMLspy diagrams, because it gives you a ready overview of the schema and convenient search. RNC clearly shows deeply nested “matrioshka” style schemas, like the screen shots below.
Tools for working with RNC include:
- XSDtoRNG.xsl, which converts XSD->RNG. It still does not cover all XSD cases and is in development, but does a decent job.
- In particular, it converts all
xsd:include
torng:include
without caring whether the included schema is converted or not. And since it cannot handle some included schemas (eg OGC Features, which is a complex multi-file schema), you’ll get errors about them. Comment out such includes. - It also can’t reliably identify which is the start element. I’ve added a string param for this, but if there are multiple start elements, you’ll need to touch the schema manually. See LIDO below.
- In particular, it converts all
- trang (manual), which converts RNG->RNC and other conversions
- jing (doc), which validates RNG and RNC.
- The binaries are at google code, the source of both trang and jing is moved to https://github.com/relaxng/jing-trang
- Note: if you get errors like this:
Exception in thread "main" java.lang.StackOverflowError at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
for deeply nested schemas then you need to increase the java stack, eg
java -Xss8M -jar c:/prog/jing-20091111/bin/jing.jar -c schema.rnc
jing usage:
java -cp ... com.thaiopensource.relaxng.util.Driver [opts] RNGFile XMLFile...
-c The schema is compact (RNC not RNG)
-i Disable checking of ID/IDREF/IDREFS.
-s ???
-t Prints the time used by Jing
-f Checks that the document is feasibly valid (wraps each element in "optional")
-C catalogFile
-e encoding
I use a batch like this:
java -Xss8M -jar c:\prog\jing-20091111\bin\jing.jar -c -i %*
In emacs I use rnc-mode, flymake and jing to do syntax highlighting and “on the fly” syntax checking. Use a patched rnc-mode to set -Xss for jing as described above. I use a setup like this:
(setq
rnc-enable-flymake t
rnc-jing-jar-file "c:/prog/jing-20091111/bin/jing.jar"
rnc-jing-java-options "-Xss8M" ; increase stack size to 8M, else get
;; Exception in thread "main" java.lang.StackOverflowError
;; at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
rnc-enable-imenu t
rnc-indent-level 2
)
Below you see the results of “on the fly” syntax checking at the word “JUNK”:
I also use Emacs imenu to get a “table of contents” of the schema, and jump to the definition of the symbol at cursor.
rnc-mode
uses flymake
to check syntax on the fly.
However, flymake has been superseded by flycheck, because it makes it a lot easier to define checkers and supersedes packages like smart-compile
.
See these issues (code already replicated below)
- TreeRex/rnc-mode#9: enable flycheck
- TreeRex/rnc-mode#10: really disable flymake
Install flycheck
with the package manager, then set it up like this:
;; flycheck is a replacement of flymake & smart-compile
;; https://www.masteringemacs.org/article/spotlight-flycheck-a-flymake-replacement
(require 'flycheck)
(flycheck-define-checker rnc-jing
"RNC syntax checker using jing.
Home: https://github.com/relaxng/jing-trang (was https://jing-trang.googlecode.com).
Binary: https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/jing-trang/jing-20091111.zip.
Manual: https://relaxng.org/jclark/jing.html"
:command ("java"
"-Xss8M" ; increase stack size to 8M, else get
;; Exception in thread "main" java.lang.StackOverflowError
;; at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
"-jar" "c:/prog/jing-20091111/bin/jing.jar"
"-c" ; RelaxNG Compact
"-i" ; else any xsd:ID element returns error
;; `a "data" or "value" pattern with non-null ID-type must occur as the child of an "attribute" pattern`,
;; see http://blog.jclark.com/2009/01/relax-ng-and-xmlid.html
source)
:error-patterns
((error line-start (file-name) ":" line ":" column ": " (or "error" "fatal") ": " (message) line-end)
(error "fatal" ":" (message) line-end)
(info line-start (file-name) ":" line ":" column ": " (message) line-end))
:modes rnc-mode)
(add-to-list 'flycheck-checkers 'rnc-jing)
(global-flycheck-mode)
(setq rnc-enable-flymake nil ; now do it with flycheck
(defadvice rnc-configure-flymake (around rnc-enable-flymake activate)
"Don't run rnc-configure-flymake unless rnc-enable-flymake is set.
Else the function produces a distracting error message"
(if rnc-enable-flymake ad-do-it))
./bat/rncfix.pl converts ugly RNC annotations, eg:
a:documentation [ "\x{a}" ~ " \x{a}" ~ " See http://www.w3.org/XML/1998/namespace.html and\x{a}" ~ " http://www.w3.org/TR/REC-xml for information about this namespace.\x{a}" ~ "\x{a}" ~
to nice RNC comments, eg:
## See http://www.w3.org/XML/1998/namespace.html and ## http://www.w3.org/TR/REC-xml for information about this namespace.
It also does a bunch of other cosmetic fixes that hopefully make the RNC easier to read, eg:
- put trailing
}+*
to the line above - remove superfluous empty lines (commented or not)
- put empty line before definitions (word or comment at beginning of line)
./bat/ includes batch files for Windows (the horror!) that I use under cygwin.
- ./bat/xsd2rng.bat: XSD->RNC using xsltproc and XSDtoRNG
xsd2rng ead
- ./bat/xsd2rnc.bat: XSD->RNC using xsltproc, XSDtoRNG.xsl, trang and rncfix
xsd2rnc ead
- ./bat/rncfix.bat: runs rncfix
rncfix ead-tmp.rnc > ead.rnc
- ./bat/rng2rnc.bat: RNG->RNC using trang and rncfix
rng2rnc ead
- ./bat/jing.bat: runs jing to validate a RNG or RNC
jing -c ead.rnc
- ./bat/trang.bat: runs trang to convert RNG->RNC
trang ead.rng ead.rnc
- ./bat/rnc-nocomment.bat: removes all comments from RNC, making it more compact and easier to see the structure. But you need to already know what the elements mean
rnc-nocomment ead.rnc > ead-nocomment.rnc
They assume all files and trang.jar are put in c:\prog\bin; except jing in c:\prog\jing-20091111\bin (has several dependencies): so you need to modify them for your setup.
I have collected or converted the following RNC schemas related to GLAM (galleries, libraries, archives and museums)
- ead.rnc: EAD 2002 version 20080421 (Encoded Archival Description) by Society of American Archivists and Library of Congress, converted to RNC by Vladimir Alexiev.
- [ead-nocomment.rnc]: includes 387 elements or attributes
EAD is used widely by archival institutions and projects, including APex (Arvhives Portal Europe) and EHRI (European Holocaust Research Infrastructure). Uses the following prefixes:
prefix | what | example |
---|---|---|
e. | element definition | e.chronlist is element chronlist , which includes various attributes and a sequence of elements chronitem |
a. | attribute definition | a.identifier is a simple attribute identifier that consists of an xsd:token |
m. | element model | m.inter.noquote is alternative of e.chronlist e.list e.table |
m.mixed | mixed element model | m.mixed.basic consists of text and/or e.abbr e.emph e.expan etc |
am. | attribute model (group) | am.common.empty consists of elements id altrender audience |
EAD3: upcoming revised version, developed natively in RNG. Schemas listed in increasing recency:
- https://raw.githubusercontent.com/SAA-SDT/EAD-Revision/develop/ead3.rng: the namespace http://ead3.archivists.org/schema/ redirects to this
- https://raw.githubusercontent.com/SAA-SDT/EAD2002toEAD3/develop/ead3.rng
- https://raw.github.com/SAA-SDT/EAD-Revision/master/ead3.rng
- [ead3.rnc]: converted to RNC from the most recent schema by Vladimir Alexiev
- [ead3-nocomment.rnc]: includes 264 attributes or elements
CPF is a complement to EAD, describing agents that archival materials originate from.
- [[https://github.com/SAA-SDT/eac-cpf-schema/blob/master/cpf.rnc]cpf.rnc]: EAC CPF version 20100301 (Encoded Archival Context: Corporations, People, Families) by Society of American Archivists, converted to RNC by Vladimir Alexiev
- [cpf-nocomment.rnc]: includes 133 elements or attributes
EAG is used for describing archival institutions, see description.
- [eag.rnc]: EAG 2012 version 0.1e 20120828 (Encoded Archival Guide), APEx project (www.apex-project.eu), converted to RNC by Vladimir Alexiev.
- [eag-nocomment.rnc]: includes 338 elements or attributes
The above is generated from eag_2012.xsd. An alternative official RNC exists, marked as follows:
# Schema generated from ODD source 2015-03-06T09:33:00Z. # Edition: Version 2.7.0. Last updated on # 16th September 2014, revision 13036 # Edition Location: http://www.tei-c.org/Vault/P5/Version 2.7.0/
CDWA is used for describing museum objects and works of art, corresponding to the CCO content standard.
- [CDWAlite.rnc]: CDWA version 1.1 20060712 (Categories for the Description of Works of Art) by ARTstor and J Paul Getty Trust, converted to RNC by Vladimir Alexiev.
- [CDWAlite-nocomment.rnc]: includes 121 elements or attributes
CONA (Cultural Object Names Authority) is an aggregation of art object data by the Getty Research Institute.
- [CONA.rnc]: Getty Vocabulary CONA Contribution Schema - Release 1.0, 09/28/2010, converted to RNC by Vladimir Alexiev on 22-Jun-2015.
- [CONA-nocomment.rnc]: includes 212 elements
CONA also includes a number of lookup lists that are omitted here.
These include cona_associative_type.rnc cona_class.rnc cona_contrib.rnc cona_event.rnc cona_language.rnc cona_lookup_lists.rnc cona_nationality.rnc cona_role.rnc cona_source.rnc
and look like this (example from cona_associative_type.rnc
i.e. associative relations):
ar_code = "4000/related to" | "4001/miscellaneous" | "4100/distinguished from" ... | "4516/was architectural context for"
LIDO is used to describe museum objects and works of art. It’s based on CDWA and MuseumDat and is more complex.
- [lido.rnc]: LIDO version 1.0 20101108 (Lightweight Information Describing Objects) by ICOM-CIDOC Working Group Data Harvesting and Interchange, converted to RNC by Vladimir Alexiev
- [lido-nocomment.rnc]:
- [xml.rnc]: defines
xml:
attributeslang, base, space
. Used by LIDO & EAG.
For LIDO and CDWA I made some manual corrections
- This sets one start element, and introduces the parasitic name “starting_lidoWrap”
start |= starting_lidoWrap starting_lidoWrap =
Corrected to two start elements:
start = lido | lidoWrap
- XSDtoRNG currently can’t grok the OGC GML schema so I’ve commented out
# rng:include href="http://schemas.opengis.net/gml/3.1.1/base/feature.rng"
You’ll get 3 errors at
gmlComplexType = Point*, LineString*, Polygon*
- Moved some comments up, and collapsed simple definitions into one line, eg:
administrativeMetadata = element administrativeMetadata { ## Definition: Holds the administrative metadata for an object / work record. ## How to record: The attribute xml:lang is mandatory ... administrativeMetadataComplexType}
becomes
## Definition: Holds the administrative metadata for an object / work record. ## How to record: The attribute xml:lang is mandatory ... administrativeMetadata = element administrativeMetadata {administrativeMetadataComplexType}
SPECTRUM is a UK museum standard (also used in other countries) that defines museum processes and data (Units of Information)
- [spectrum.rnc]: Collections Trust XML Schema for SPECTRUM version 4.0.4 (26 June 2012), converted to RNC by Vladimir Alexiev.
It uses special documentation properties
spectrum:unitName, spectrum:description
- [spectrum-nocomment.rnc]: 595 elements, of which the majority (490) are about
Object
(artwork)
The Strategy Markup Language (StratML) enables “A worldwide web of intentions, stakeholders, and results”.
- StratML-part1-20140401.rnc (ISO 17469) captures the core elements: Vision, Mission, Value, Goal, Objective and Stakeholder
- StratML-part2-20200123.rnc (ANSI/AIIM 22) captures Performance Plans and Reports, stakeholder Roles, and Performance Indicators
- StratML-part3-20170815.rnc captures additional context like SWOT and PESTLE analysis, RACI, etc