Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add <taxonomy> and <category> to att.datcat #2419

Closed
bansp opened this issue Apr 18, 2023 · 20 comments · Fixed by #2422
Closed

add <taxonomy> and <category> to att.datcat #2419

bansp opened this issue Apr 18, 2023 · 20 comments · Fixed by #2422

Comments

@bansp
Copy link
Member

bansp commented Apr 18, 2023

This is a request coming from the Lexical Resources Summit 2023 (DARIAH, Berlin), convened by @ttasovac and @laurentromary . The immediate context is the TEI Lex0 customisation but @JessedeDoes suggests that the request could also be helpful in the ParlaMint project.

Essence: it would be most useful to be able to use the datcat attributes for taxonomies, of any sort (datcat atts are not only a grammatical device, any longer). For that purpose, the <taxonomy> and <category> elements should be members of att.datcat.

When discussing the initial bundle of elements for the re-written datcat, a few months ago, we decided to start small and expand when there is a need. Now, the need comes from two well-established projects, and there is a good chance that the addition is going to be useful elsewhere as well.

Also, we said that, since taxonomies may use <equiv>, we'd see if a genuine need for the attribute class exists. Neither the ODD for Lex-0 nor the ODD for ParlaMint uses the tagdocs module. I think that this may qualify as genuine need, because requiring these carefully crafted ODDs to use the tagdocs module only for the sake of equiv/@url as the mechanism for aligning with external taxonomies is definitely far-fetched.

We're going to add a PR to this ticket, with some examples added to (at least) the att.datcat spec.

@ttasovac
Copy link

ttasovac commented Apr 20, 2023

Just to add a small detail to what @bansp was saying. We are working on an edition of a Portuguese monolingual dictionary from the 18th century, and we are using taxonomy in the header to organize a hierarchy of domain labels in the dictionary. Each usg label in the dictionary points to a category in the taxonomy, but each category in the taxonomy points to an externally hosted ontology.

At the moment, we are using @corresp on each category to point to the external ontology, but I do believe that having @datcat in this context would be semantically more precise and therefore a better encoding choice than the generic @corresp mechanism.

@sydb
Copy link
Member

sydb commented Apr 24, 2023

Sounds quite reasonable to this linguistic ignoramus.

  1. Could someone post (or point to) an example of @datacat on <category>?
  2. Roughly how long do you think it might be before the PR is ready?

@bansp
Copy link
Member Author

bansp commented Apr 24, 2023

The PR needs a bit more love: where I mention "Morais dictionary", we probably want a project reference or at least a few more words and a link.

The resulting att.datcat is here: https://jenkins-paderborn.tei-c.org/view/LingSIG/job/TEIP5-LingSIG-tests/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/ref-att.datcat.html

@bansp
Copy link
Member Author

bansp commented Sep 7, 2023

@anacastrosalgado is going to help and provide the missing reference, she said.

@anacastrosalgado
Copy link

anacastrosalgado commented Sep 7, 2023

@bansp Morais Silva, A. M. (1789). Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro (vols. 1–2). Officina 730 de Simão Thaddeo Ferreira. MORDigital project (https://mordigital.fcsh.unl.pt/en/homepage/). The digital edition will be available via TEI Lex-0 Publisher at the end of the project.
 
Example:
image

@bansp
Copy link
Member Author

bansp commented Sep 7, 2023

Thanks, Ana. I've added the reference. Are you OK with the example used there? I think it came from you directly, but that was something like a year ago, so maybe you'd rather see some changes there while the Council is still in the process.

@anacastrosalgado
Copy link

anacastrosalgado commented Sep 7, 2023

@bansp Please, see if it can be like this (@ttasovac , @laurentromary also take a look, please). It is the example that we used yesterday on our presentation during the TEI conference.

The att.datcat attributes can be used for any sort of taxonomies. The example below illustrates their usefulness for describing usage domain labels in dictionaries showing a lexicographic article from a Portuguese legacy dictionary, the Morais dictionary [Morais Silva, A. M., (1789). Diccionario da lingua portugueza composto pelo padre D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva, natural do Rio de Janeiro, vols. 1–2. Officina 730 de Simão Thaddeo Ferreira. The digital edition will be available via TEI Lex-0 Publisher at the end of the MORDigital project (https://mordigital.fcsh.unl.pt/en/homepage/).

<!--  in the dictionary header    -->
<encodingDesc>
   <classDecl>
      <taxonomy xml:id="domains">
         <!--...-->
         <category xml:id="domain.mathematical_sciences"
            valueDatcat="http://www.semanticweb.org/OntoDomLab-Math#MathematicalSciences http://vocabs.rossio.fcsh.unl.pt/morais_domains/0036">
            <catDesc xml:lang="en">
               <term>Mathematical Sciences</term>
               <gloss>Group of areas of study that includes, in addition to mathematics, those
                  academic disciplines that are primarily mathematical in nature but may not
                  be universally considered subfields of mathematics proper.</gloss>
            </catDesc>
            <catDesc xml:lang="pt">
               <term>Ciências Matemáticas</term>
               <gloss>
                  <!--...-->
               </gloss>
            </catDesc>
            <category xml:id="domain.mathematics"
               valueDatcat="http://www.semanticweb.org/OntoDomLab-Math#Mathematics http://vocabs.rossio.fcsh.unl.pt/morais_domains/0024">
               <catDesc xml:lang="en">
                  <term>Mathematics</term>
                  <gloss>
                     <!--...-->
                  </gloss>
               </catDesc>
               <catDesc xml:lang="pt">
                  <term>Matemática</term>
                  <gloss>
                     <!--...-->
                  </gloss>
               </catDesc>
               <category xml:id="domain.arithmetic"
                  valueDatcat="http://www.semanticweb.org/OntoDomLab-Math#Arithmetic http://vocabs.rossio.fcsh.unl.pt/morais_domains/0003">
                  <catDesc xml:lang="en">
                     <term>Arithmetic</term>
                     <gloss>
                        <!--...-->
                     </gloss>
                  </catDesc> 
                  <catDesc xml:lang="pt">
                     <term>Aritmética</term>
                     <gloss>
                        <!--...-->
                     </gloss>
                  </catDesc>
               </category>
               <category xml:id="domain.geometry"
                  valueDatcat="http://www.semanticweb.org/OntoDomLab-Math#Geometry http://vocabs.rossio.fcsh.unl.pt/morais_domains/0018">
                  <catDesc xml:lang="en">
                     <term>Geometry</term>
                     <gloss>
                        <!--...-->
                     </gloss>
                  </catDesc>
                  <catDesc xml:lang="pt">
                     <term>Geometria</term>
                     <gloss>
                        <!--...-->
                     </gloss>
                  </catDesc>
               </category>
            </category>
            </category>
         <!--...-->
      </taxonomy>
   </classDecl>
</encodingDesc>
<!-- inside an <entry> element: -->
<usg type="domain" valueDatcat="#domain.mathematics">Mathem.</usg>
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MORAIS.DLP.1.ORDENADA" type="mainEntry" xml:lang="pt">
   <form type="lemma">
      <orth>ORDENADA</orth>
   </form>
   <metamark function="lemmaDelimiter">,</metamark>
   <gramGrp>
      <gram type="pos" norm="NOUN">ſ.</gram>
      <gram type="gen">f.</gram>
   </gramGrp>
   <sense xml:id="MORAIS.DLP.1.ORDENADA.s.1">
      <usg type="domain" valueDatcat="#domain.mathematics">Mathem.</usg>
      <def>linha recta tirada perpendicularmente do ponto da curva a ſeu eixo</def>
   </sense>
   <metamark function="senseDelimiter">.</metamark>
</entry>
<entry xmlns="http://www.tei-c.org/ns/1.0" xml:id="MORAIS.DLP.1.TRIGONOMETRIA" type="mainEntry" xml:lang="pt">
   <form type="lemma">
      <orth>TRIGONOMETRIA</orth>
   </form>
   <metamark function="lemmaDelimiter">,</metamark>
   <gramGrp>
      <gram type="pos" norm="NOUN">ſ.</gram>
      <gram type="gen">f.</gram>
   </gramGrp>
   <sense xml:id="MORAIS.DLP.1.TRIGONOMETRIA.s.1">
      <!-- invisible domain -->
      <usg type="domain" valueDatcat="#domain.mathematics" resp="#Salgado"/>
      <def>parte da Mathematica , que enſina a reſolver os triangulos planos , e esfericos</def>
   </sense>
   <metamark function="senseDelimiter">.</metamark>
</entry>

In the Morais dictionary, the relevant domain labels are organised in the header, getting referenced inside the dictionary, from usg elements. The vocabulary used for dictionary-internal labelling is in turn anchored in the MORDigital controlled vocabulary service of the NOVA University of Lisbon – School of Social Sciences and Humanities (NOVA FCSH).

@bansp
Copy link
Member Author

bansp commented Sep 7, 2023

I should have phrased my last comment differently :-) Like "is there something in the example right now that makes it utterly wrong (rather than not beautiful enough)" ;-)
Because if the example and text are "not super" but nevertheless not blatantly lying, then I would dearly prefer not to edit anything there right now, because I simply don't have the time for it -- maybe in December, but maybe only in January, if I can help it.
One very important thing to remember is that the use of Morais in that piece of documentation should be treated as accidental -- it is used only to illustrate one case where the <taxonomy> element uses DCR attributes. One short example, because the spec needs to be readable, rather than TL;DR-able.

The last Jenkins build shows some errors, but I'm not at all sure that the errors come from the newly added reference. I have now pushed a new commit and hope that it lights green, and the ticket/PR gets accepted for merging.

@bansp
Copy link
Member Author

bansp commented Sep 7, 2023

Update: the Jenkins build keeps failing, but it looks a bit like an incompatibility between some configuration item (path?) and some backwards-incompatible modification in a new release of the Guidelines.

I wonder if lines such as

[xslt] WARNING: file https://www.tei-c.org/Vault/P5/4.5.0/VERSION cannot be read, so links will probably be broken

can be taken as indicative of what's wrong (see the console output).

Will pester @peterstadler about this at some point, but only after he's had a bit of a breather after the conference...

bansp added a commit to LingSIG/TEI that referenced this issue Sep 12, 2023
Thanks, Peter Stadler!
Fixing a compatibility failure to allow a build to pass for issue TEIC#2419
@bansp
Copy link
Member Author

bansp commented Sep 12, 2023

... and, as usual, Peter has not failed. Thanks!
I've put the fix in just so the build doesn't fail and we can finally see the result at https://jenkins-paderborn.tei-c.org/view/LingSIG/job/TEIP5-LingSIG-tests/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/ref-att.datcat.html .

See issue #2472 for progress on the Council side.

@bansp
Copy link
Member Author

bansp commented Sep 12, 2023

That hasn't worked as planned, compare

old console output, 6 errors: https://jenkins-paderborn.tei-c.org/job/TEIP5-LingSIG-tests/13/parsed_console/

new output, 8 errors: https://jenkins-paderborn.tei-c.org/job/TEIP5-LingSIG-tests/14/parsed_console/

-- so let me just wait for a fix by the Council.

@ebeshero
Copy link
Member

@bansp I took a very quick look at the bug report and saw this issue right away: "ERROR: Guidelines.epub: OPS/XHTML file OPS/ref-att.calendarSystem.html is missing"

It looks like the build is missing a crucial file for ref-att.calendarSystem ?

@ebeshero
Copy link
Member

ebeshero commented Sep 12, 2023

@bansp That's not your doing, of course--just a recognition that the build problem is likely to do with activity last week and this on a different PR (uh oh...): #2435

@raffazizzi and @sydb should be able to help here! I'll look in later--I'm headed back to the university trenches for the next several hours.

@bansp
Copy link
Member Author

bansp commented Sep 12, 2023

@ebeshero Thanks for giving it a check :-)
I've withdrawn the modification and will just wait for whatever you guys end up doing, and will update the fork then. There's no need to divert Raff's or Syd's attention.

@ebeshero
Copy link
Member

@bansp I think the proverbial dust has settled from yesterday's activities on the other PR! It's probably safe to update your branch now. But I also think it may be safe to ask our Council reviewers to check things out too.

@bansp
Copy link
Member Author

bansp commented Sep 13, 2023

Thanks, Elisa. No hurry on this end. I'll do my best to react to potential comments by the reviewers.

@bansp
Copy link
Member Author

bansp commented Sep 13, 2023

OTOH, there's no movement yet, in the dev branch of either TEI or Stylesheets. I'll just check back in a day or two :-)

@bansp
Copy link
Member Author

bansp commented Oct 6, 2023

Updated the pull request with the content coming from issue #2480 but still no go, at least not in the Paderborn Jenkins. It's the first time since I can remember that the build tree has been broken for so long. Feels weird. I understand it's because we're waiting for some upstream sanity but I'm not sure that that is sensible. They must know they've broken stuff and since they haven't bothered to fix it, shouldn't we go around them? As Martin suggests in #2472 .

@ebeshero
Copy link
Member

ebeshero commented Oct 6, 2023

@bansp Sorry for the long wait, but for us it is only the documentation build that breaks. The "upstream sanity" you refer to is a decision that we will make when Council meets next Friday October 13. We need Council discussion and consensus on the best path forward to resolve #2473.

@bansp
Copy link
Member Author

bansp commented Oct 6, 2023

Thanks, Elisa. By upstream sanity, I was referring to what seems a happy go lucky move by the Debian team, if I understood Martin correctly. And leaving the matters unchanged despite a hiccup.

I realise that, for the Council, waiting is a reasonable option, up to a limit, and the costs are arguably low in this case.

Maybe a different make flow is what can be done in our case. Will ask Peter about that.

HelenaSabel pushed a commit that referenced this issue Oct 26, 2023
* add category and taxonomy to att.datcat; add to the spec (references #2419 )

* minor corrections

* correction of attribute names

* add the Morais dictionary reference

With thanks to Ana Salgado

* cosmetic changes to restart Jenkins

* Update att.datcat.xml

use title rather than hi

* "Junicode" to "Junicode Two Beta"

Thanks, Peter Stadler!
Fixing a compatibility failure to allow a build to pass for issue #2419

* revert
@ebeshero ebeshero added this to the Guidelines 4.7.0 milestone Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants