Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemaSpec should provide a mechanism for specifing Schematron query language binding #2330

Open
martindholmes opened this issue Aug 10, 2022 · 10 comments

Comments

@martindholmes
Copy link
Contributor

Schematron schemas can specify a range of different values for the query language binding (typically xslt, xslt2, or xslt3, but many more are listed here: https://archive.xmlprague.cz/2022/files/presentations/schematron-qlb.pdf). The ATOP team believes that TEI ODD should have a mechanism for specifying this, perhaps through an attribute on the root schemaSpec element called schQueryLanguageBinding.

@martindholmes
Copy link
Contributor Author

The ATOP team is thinking of two distinct approaches to this:

  1. (Simple): There is a single @schQueryLanguageBinding attribute available on <schemaSpec>, meaning that all Schematron rules must be compiled into a single Schematron output file and use the same query language binding.
  2. (Adventurous): We create a <constraintDecl> element in the header, where various things can be defined and described, including the query language binding value, as well as namespace declarations etc. <constraintSpec> would then be added to att.declaring, and each <constraintSpec> could point up to a <constraintDecl>. For each distinct <constraintDecl> element, a distinct Schematron file would be created using the specified content, including the query language binding, and including all the constraints which point to it.

The second option would obviously provide much more flexibility, but would require non-trivial fixes to the existing stylesheets, and there may not be many real use-cases for it; generally, we would expect people to want to use the most advanced query language binding that their processor can support (why not?), and there's no particular reason to want to use an earlier language binding. There's also the question then of how you might override the binding for existing constraints in a downstream customization.

@ebeshero
Copy link
Member

ebeshero commented Sep 1, 2022

I like the adventurous option in theory, but I’m struggling to imagine a practical use-case for generating multiple different Schematron files. I also wonder whether we run a risk of over-complicating schema validation: could multiple Schematron files potentially conflict with each other’s validation when Schematron rules from two different files are triggered in an overlapping context? I guess I wonder if the adventurous path leads to too much potential trouble.

@lb42
Copy link
Member

lb42 commented Sep 1, 2022

If you think of the generated schematron file as an output from the odd rather than as input to the validator, surely it makes sense to maximize flexibility for its format?

@jamescummings
Copy link
Member

The separation of concerns provided by the second option seems more in keeping with the TEI approach to such things.

@martindholmes martindholmes added the atop another TEI ODD processor label Sep 12, 2022
@sydb
Copy link
Member

sydb commented Dec 1, 2022

Also note that (per #335) we should be documenting, somewhere, that one does not get a query binding if using Schematron embedded in RELAX NG.

@sydb
Copy link
Member

sydb commented Feb 10, 2023

Council thinks a simple expandable approach is the way to go — a <constraintDecl> in the TEI Header of the base odd that is not repeatable (and not a member of att.declarable). If & when there is user demand to be able to express constraints in a variety of language bindings, we can make it repeatable, add it to att.delcarable, and add <constraintSpec> to att.declaring.

@sydb
Copy link
Member

sydb commented Mar 10, 2023

A first crack at a specification of the new <constraintDecl>:

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright TEI Consortium. 
Dual-licensed under CC-by and BSD2 licences 
See the file COPYING.txt for details
$Date$
$Id$
-->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<elementSpec xmlns="http://www.tei-c.org/ns/1.0" xmlns:sch="http://purl.oclc.org/dsdl/schematron" module="tagdocs" ident="constraintDecl">
  <gloss versionDate="2023-03-09" xml:lang="en">constraint declaration</gloss>
  <desc versionDate="2023-03-09" xml:lang="en">contains declarations pertaining to formal constraints expressed elsewhere in <gi>constraintSpec</gi> elements</desc>
  <classes>
    <memberOf key="att.global"/>
  </classes>
  <content>
    <sequence>
      <alternate minOccurs="0" maxOccurs="unbounded">
        <classRef key="model.identEquiv"/>
        <classRef key="model.descLike"/>
      </alternate>
      <anyElement/>             <!-- typically <sch:ns> elements -->
    </sequence>
  </content>
  <attList>
    <attDef ident="scheme" usage="req">
      <desc versionDate="2023-03-09" xml:lang="en">supplies the name of the language to which the declarations herein apply</desc>
      <datatype><dataRef key="teidata.enumerated"/></datatype>
      <valList type="semi">
        <valItem ident="schematron">
          <gloss versionDate="2016-09-27" xml:lang="en">ISO Schematron</gloss>
        </valItem>
      </valList>
      <remarks versionDate="2023-03-09" xml:lang="en">
        <p>The declarations contained in a particular
        <gi>constraintDecl</gi> apply to the <gi>constraintSpec</gi>
        elements whose <att>scheme</att> matches the <att>scheme</att>
        of the <gi>constraintDecl</gi>.</p>
      </remarks>
    </attDef>
    <attDef ident="queryBinding" usage="rec">
      <gloss xml:lang="en" versionDate="2023-03-09">query language binding</gloss>
      <desc xml:lang="en" versionDate="2023-03-09">specifies the query
      language binding for rule-based schema expressions in
      <gi>constraintSpec</gi> elements that have a matching
      <att>scheme</att> attribute</desc>
      <datatype><dataRef key="teidata.enumerated"/></datatype>
      <valList type="semi">
        <valItem ident="exslt"/>
        <valItem ident="stx"/>
        <valItem ident="xslt"/>
        <valItem ident="xslt2"/>
        <valItem ident="xslt3"/>
        <valItem ident="xpath"/>
        <valItem ident="xpath2"/>
        <valItem ident="xpath3"/>
        <valItem ident="xpath31"/>
        <valItem ident="xquery"/>
        <valItem ident="xquery3"/>
        <valItem ident="xquery31"/>
      </valList>
      <remarks versionDate="2023-03-09" xml:lang="en">
        <p>The suggested values above are the values reserved by the
        Schematron specification. Only <val>exslt</val>,
        <val>stx</val>, <val>xslt</val>, <val>xslt2</val>,
        <val>xslt3</val>, <val>xpath2</val>, and <val>xpath3</val> are
        defined by the specification. Most processors only support a
        subset of <val>xslt</val>, <val>xslt2</val>, and
        <val>xslt3</val>.</p>
      </remarks>
    </attDef>
  </attList>
  <exemplum xml:lang="en">
    <egXML xmlns="http://www.tei-c.org/ns/Examples">
      <constraintDecl scheme="schematron" queryBinding="xslt3">
        <sch:ns prefix="wwp" uri="http://www.wwp.northeastern.edu/ns/textbase"/>
      </constraintDecl>
    </egXML>
  </exemplum>
  <listRef>
    <ptr target="#?????"/>
  </listRef>
</elementSpec>

@ebeshero
Copy link
Member

ebeshero commented Mar 10, 2023

Discussion after Council meeting 2023-03-10 of @sydb @hcayless @ebeshero @martinascholger

Are we in a rush? — Only insofar as ATOP TF wants to know that there is going to a <constraintDecl> ; if the exact XPath to the query binding language changes later, no big deal.

Where do we put <constraintDecl>?

Two possibilities jump to mind:

  1. Make it an option to put it in either an <encodingDesc> in the <teiHeader> or in a <schemaSpec>.
  2. It goes only in <schemaSpec>. In which case you would generally need a <schemaSpec> to appear in a base ODD (rather than just a customization ODD).

Note that if we choose option # 2 — <constraintDecl> only goes in <schemaSpec> — then we would need to add a <schemaSpec> to the TEI Guidelines, because they do not currently have one. We think that new <schemaSpec> should show up in the driver files (i.e. P5/source/guidelines-en.xml and P5/source/guidelines-fr.xml), but no one has any idea what kind of havoc adding a <schemaSpec> might wreak on the build process.

@sydb sydb added this to the Guidelines 4.7.0 milestone Mar 10, 2023
@lb42
Copy link
Member

lb42 commented Mar 14, 2023

I used to know why there is no <schemaSpec> in the TEI Guidelines and it was for a plausible reason. Something to do with the fact that P5 itself isn't an ODD (though tei_all is). So I'd definitely vote for putting this thing inside tjhe encodingDesc.

@sydb
Copy link
Member

sydb commented Apr 14, 2023

Council 2023-04-14 agrees that there will be a mechanism for an ODD writer to specify the query language binding, and that it will be accessible by XPath, without actually committing to <constraintDecl> in any particular place.
(This means ATOP group can move forward, and update the XPath to access the desired binding later.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment