Skip to content

VladimirAlexiev/rnc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RNC

The ISO DSSSL group defines a number of XML schema definition languages. Amongst them:

  • XSD is the most verbose and unwieldy way for understanding a XML schema
  • Then comes RelaxNG XML (RNG).
  • RelaxNG Compact (RNC) is the most compact and easy to understand. RNC is even easier to read than XMLspy diagrams, because it gives you a ready overview of the schema and convenient search. RNC clearly shows deeply nested “matrioshka” style schemas, like the screen shots below.

RNC Tools

Tools for working with RNC include:

  • XSDtoRNG.xsl, which converts XSD->RNG. It still does not cover all XSD cases and is in development, but does a decent job.
    • In particular, it converts all xsd:include to rng:include without caring whether the included schema is converted or not. And since it cannot handle some included schemas (eg OGC Features, which is a complex multi-file schema), you’ll get errors about them. Comment out such includes.
    • It also can’t reliably identify which is the start element. I’ve added a string param for this, but if there are multiple start elements, you’ll need to touch the schema manually. See LIDO below.
  • trang (manual), which converts RNG->RNC and other conversions
  • jing (doc), which validates RNG and RNC.
    • The binaries are at google code, the source of both trang and jing is moved to https://github.com/relaxng/jing-trang
    • Note: if you get errors like this:
      Exception in thread "main" java.lang.StackOverflowError
       at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
              

      for deeply nested schemas then you need to increase the java stack, eg

      java -Xss8M -jar c:/prog/jing-20091111/bin/jing.jar -c schema.rnc
              

jing usage:

java -cp ... com.thaiopensource.relaxng.util.Driver [opts] RNGFile XMLFile...
  -c     The schema is compact (RNC not RNG)
  -i     Disable checking of ID/IDREF/IDREFS.
  -s     ???
  -t     Prints the time used by Jing
  -f     Checks that the document is feasibly valid (wraps each element in "optional")
  -C catalogFile
  -e encoding

I use a batch like this:

java -Xss8M -jar c:\prog\jing-20091111\bin\jing.jar -c -i %*

Emacs rnc-mode

In emacs I use rnc-mode, flymake and jing to do syntax highlighting and “on the fly” syntax checking. Use a patched rnc-mode to set -Xss for jing as described above. I use a setup like this:

(setq
  rnc-enable-flymake t
  rnc-jing-jar-file "c:/prog/jing-20091111/bin/jing.jar"
  rnc-jing-java-options "-Xss8M" ; increase stack size to 8M, else get
    ;; Exception in thread "main" java.lang.StackOverflowError
    ;; at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
  rnc-enable-imenu t
  rnc-indent-level 2
)

Below you see the results of “on the fly” syntax checking at the word “JUNK”:

./img/RNC-flymake.png

I also use Emacs imenu to get a “table of contents” of the schema, and jump to the definition of the symbol at cursor.

./img/RNC-imenu.png

Emacs flycheck

rnc-mode uses flymake to check syntax on the fly. However, flymake has been superseded by flycheck, because it makes it a lot easier to define checkers and supersedes packages like smart-compile. See these issues (code already replicated below)

Install flycheck with the package manager, then set it up like this:

;; flycheck is a replacement of flymake & smart-compile
;; https://www.masteringemacs.org/article/spotlight-flycheck-a-flymake-replacement

(require 'flycheck)
(flycheck-define-checker rnc-jing
  "RNC syntax checker using jing.
Home: https://github.com/relaxng/jing-trang (was https://jing-trang.googlecode.com).
Binary: https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/jing-trang/jing-20091111.zip.
Manual: https://relaxng.org/jclark/jing.html"
  :command ("java"
            "-Xss8M" ; increase stack size to 8M, else get
            ;; Exception in thread "main" java.lang.StackOverflowError
            ;; at com.thaiopensource.relaxng.pattern.BinaryPattern.checkRecursion(BinaryPattern.java:16)
            "-jar" "c:/prog/jing-20091111/bin/jing.jar"
            "-c" ; RelaxNG Compact
            "-i" ; else any xsd:ID element returns error
            ;; `a "data" or "value" pattern with non-null ID-type must occur as the child of an "attribute" pattern`,
            ;; see http://blog.jclark.com/2009/01/relax-ng-and-xmlid.html 
            source)
  :error-patterns
  ((error line-start (file-name) ":" line ":" column ": " (or "error" "fatal") ": " (message) line-end)
   (error "fatal" ":" (message) line-end)
   (info line-start (file-name) ":" line ":" column ": " (message) line-end))
  :modes rnc-mode)
(add-to-list 'flycheck-checkers 'rnc-jing)

(global-flycheck-mode)

(setq rnc-enable-flymake nil ; now do it with flycheck
(defadvice rnc-configure-flymake (around rnc-enable-flymake activate)
  "Don't run rnc-configure-flymake unless rnc-enable-flymake is set.
Else the function produces a distracting error message"
  (if rnc-enable-flymake ad-do-it))

rncfix

./bat/rncfix.pl converts ugly RNC annotations, eg:

a:documentation [
"\x{a}" ~
"  \x{a}" ~
"   See http://www.w3.org/XML/1998/namespace.html and\x{a}" ~
"   http://www.w3.org/TR/REC-xml for information about this namespace.\x{a}" ~
"\x{a}" ~

to nice RNC comments, eg:

##   See http://www.w3.org/XML/1998/namespace.html and
##   http://www.w3.org/TR/REC-xml for information about this namespace.

It also does a bunch of other cosmetic fixes that hopefully make the RNC easier to read, eg:

  • put trailing }+* to the line above
  • remove superfluous empty lines (commented or not)
  • put empty line before definitions (word or comment at beginning of line)

Scripts

./bat/ includes batch files for Windows (the horror!) that I use under cygwin.

  • ./bat/xsd2rng.bat: XSD->RNC using xsltproc and XSDtoRNG
    xsd2rng ead
        
  • ./bat/xsd2rnc.bat: XSD->RNC using xsltproc, XSDtoRNG.xsl, trang and rncfix
    xsd2rnc ead
        
  • ./bat/rncfix.bat: runs rncfix
    rncfix ead-tmp.rnc > ead.rnc
        
  • ./bat/rng2rnc.bat: RNG->RNC using trang and rncfix
    rng2rnc ead
        
  • ./bat/jing.bat: runs jing to validate a RNG or RNC
    jing -c ead.rnc
        
  • ./bat/trang.bat: runs trang to convert RNG->RNC
    trang ead.rng ead.rnc
        
  • ./bat/rnc-nocomment.bat: removes all comments from RNC, making it more compact and easier to see the structure. But you need to already know what the elements mean
    rnc-nocomment ead.rnc > ead-nocomment.rnc
        

They assume all files and trang.jar are put in c:\prog\bin; except jing in c:\prog\jing-20091111\bin (has several dependencies): so you need to modify them for your setup.

RNC Schemas

I have collected or converted the following RNC schemas related to GLAM (galleries, libraries, archives and museums)

EAD

  • ead.rnc: EAD 2002 version 20080421 (Encoded Archival Description) by Society of American Archivists and Library of Congress, converted to RNC by Vladimir Alexiev.
  • [ead-nocomment.rnc]: includes 387 elements or attributes

EAD is used widely by archival institutions and projects, including APex (Arvhives Portal Europe) and EHRI (European Holocaust Research Infrastructure). Uses the following prefixes:

prefixwhatexample
e.element definitione.chronlist is element chronlist, which includes various attributes and a sequence of elements chronitem
a.attribute definitiona.identifier is a simple attribute identifier that consists of an xsd:token
m.element modelm.inter.noquote is alternative of e.chronlist e.list e.table
m.mixedmixed element modelm.mixed.basic consists of text and/or e.abbr e.emph e.expan etc
am.attribute model (group)am.common.empty consists of elements id altrender audience

EAD3

EAD3: upcoming revised version, developed natively in RNG. Schemas listed in increasing recency:

EAC CPF

CPF is a complement to EAD, describing agents that archival materials originate from.

EAG

EAG is used for describing archival institutions, see description.

The above is generated from eag_2012.xsd. An alternative official RNC exists, marked as follows:

# Schema generated from ODD source 2015-03-06T09:33:00Z.
# Edition: Version 2.7.0. Last updated on
#	16th September 2014, revision 13036
# Edition Location: http://www.tei-c.org/Vault/P5/Version 2.7.0/

CDWA

CDWA is used for describing museum objects and works of art, corresponding to the CCO content standard.

  • [CDWAlite.rnc]: CDWA version 1.1 20060712 (Categories for the Description of Works of Art) by ARTstor and J Paul Getty Trust, converted to RNC by Vladimir Alexiev.
  • [CDWAlite-nocomment.rnc]: includes 121 elements or attributes

CONA

CONA (Cultural Object Names Authority) is an aggregation of art object data by the Getty Research Institute.

  • [CONA.rnc]: Getty Vocabulary CONA Contribution Schema - Release 1.0, 09/28/2010, converted to RNC by Vladimir Alexiev on 22-Jun-2015.
  • [CONA-nocomment.rnc]: includes 212 elements

CONA also includes a number of lookup lists that are omitted here. These include cona_associative_type.rnc cona_class.rnc cona_contrib.rnc cona_event.rnc cona_language.rnc cona_lookup_lists.rnc cona_nationality.rnc cona_role.rnc cona_source.rnc and look like this (example from cona_associative_type.rnc i.e. associative relations):

ar_code =
    "4000/related to"
  | "4001/miscellaneous"
  | "4100/distinguished from"
  ...
  | "4516/was architectural context for"

LIDO

LIDO is used to describe museum objects and works of art. It’s based on CDWA and MuseumDat and is more complex.

  • [lido.rnc]: LIDO version 1.0 20101108 (Lightweight Information Describing Objects) by ICOM-CIDOC Working Group Data Harvesting and Interchange, converted to RNC by Vladimir Alexiev
  • [lido-nocomment.rnc]:
  • [xml.rnc]: defines xml: attributes lang, base, space. Used by LIDO & EAG.

For LIDO and CDWA I made some manual corrections

  • This sets one start element, and introduces the parasitic name “starting_lidoWrap”
    start |= starting_lidoWrap
    starting_lidoWrap =
        

    Corrected to two start elements:

    start = lido | lidoWrap
        
  • XSDtoRNG currently can’t grok the OGC GML schema so I’ve commented out
    # rng:include href="http://schemas.opengis.net/gml/3.1.1/base/feature.rng"
        

    You’ll get 3 errors at

    gmlComplexType = Point*, LineString*, Polygon*
        
  • Moved some comments up, and collapsed simple definitions into one line, eg:
    administrativeMetadata =
      element administrativeMetadata {
             ## Definition: Holds the administrative metadata for an object / work record. 
             ## How to record: The attribute xml:lang is mandatory ...
        administrativeMetadataComplexType}
        

    becomes

    ## Definition: Holds the administrative metadata for an object / work record. 
    ## How to record: The attribute xml:lang is mandatory ...
    administrativeMetadata = element administrativeMetadata {administrativeMetadataComplexType}
        

SPECTRUM

SPECTRUM is a UK museum standard (also used in other countries) that defines museum processes and data (Units of Information)

  • [spectrum.rnc]: Collections Trust XML Schema for SPECTRUM version 4.0.4 (26 June 2012), converted to RNC by Vladimir Alexiev. It uses special documentation properties spectrum:unitName, spectrum:description
  • [spectrum-nocomment.rnc]: 595 elements, of which the majority (490) are about Object (artwork)

StratML

The Strategy Markup Language (StratML) enables “A worldwide web of intentions, stakeholders, and results”.

About

RelaxNG Compact schemas for some GLAM-related XML schemas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published