Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New element annotatedU #539

Closed
TEITechnicalCouncil opened this issue Dec 5, 2014 · 35 comments
Closed

New element annotatedU #539

TEITechnicalCouncil opened this issue Dec 5, 2014 · 35 comments

Comments

@TEITechnicalCouncil
Copy link

[This is the second of a few tickets related to the TEI/ISO standard for transcriptions of spoken language: see http://bit.ly/1jyZC37 ]

It is usual to segment transcribed speech into smaller chunks for which the existing <u> element is appropriate. This proposal suggests a way of grouping each such chunk with one or more tiers of annotation, as is common practice.

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

We should probably see how we could also deal with such cases by leans of the stand-off element. I see the two options as complementary flavors (for many pieces of speech annotation software an interleaved representation à la annotationU is easier; whereas for some other use cases, it is better to leave the primary transcription "untouched")

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

After going back and forth between the ISO proposal and the stdf proposal. I see the possibility to create an element that would be slightly more generic than annotated you, which we could call annotationGrp. This element could be used to group together series of annotations associated to the same primary object (e.g. the same u element) either by having this object as a child (i.e. what we wanted with annotatedU: a u with a series of spanGrp for instance) or in a stand-off mode within the annotations sub-element of stdf. The specification of this element could be as follows:


<elementSpec ident="annotationGrp" mode="add" ns="http://standoff.proposal">
   <desc>Groups together various annotations, for instance for parallel interpretations of a spoken segment</desc>
   <classes>
      <memberOf key="model.annotationPart"/>
      <memberOf key="model.divPart.spoken"/>
      <memberOf key="att.timed"/>
      <memberOf key="att.global"/>
      <memberOf key="att.ascribed"/>
   </classes>
   <content>
      <rng:zeroOrMore>
         <rng:choice>
            <rng:ref name="u"/>
            <rng:ref name="model.global.meta"/>
            <rng:ref name="model.annotationPart"/>
         </rng:choice>
      </rng:zeroOrMore>
   </content>
</elementSpec>

with the idea that model.annotationPart would be the hook where one could add any kind of internal or external annotation object. For instance in my tests, I make model.global.meta member of this class to get spanGrp and the like in it.

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

Generalizing is always nice. But what is "stdf" please?

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

stdf is a proposed element badly in need of a name approved for all audiences.

Please see ticket #378, then the google doc linked from there, then Peter Stadler's ODD proposal for standoff annotations, linked from the google doc...

Original comment by: @bansp

@TEITechnicalCouncil
Copy link
Author

There is also a github project (https://github.com/laurentromary/stdfSpec), where I maintain updates on the stdf proposal and some samples, which shows how annotatedU can be used nine or stand-off in relation to speech transcription.

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

  • assigned_to: Lou Burnard

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

Referring to the document at https://docs.google.com/document/d/1BTjYHSiPjD6GhKMNFmZrrvCkLQAa1RK7aGbG5K50uN4

Section 6.5.2 ("Representation as unclear or gap") says that when an string of words is unclear, and alternatives are proposed, the strings should each be wrapped in a separate span element (within choice, within unclear). I think this meant to say "a separate seg element" ; and indeed the examples given two sections later (6.5.4) use seg, not span. Probably just the usual code-switching problem between HTML span and TEI seg.

Section 5.7 (6.7 as listed in the TOC) on "Global divisions" proposes that divisions of the transcription at levels superordinate to the utterance should be accomplished by the use of non-tessellating divs. Unless utterance and annotated utterance themselves are regarded as syntactic sugar for div type="utterance", this is surely a very un-TEI way of doing things. Do we really mean to slip floating divs into the scheme by this means?

Original comment by: @PFSchaffner

@TEITechnicalCouncil
Copy link
Author

I have suggested a revision to the document precluding non-tesselating divs. In the meantime, do we have agreement on introducing a new <annotatedU> element, a spec for which would look something like this

<elementSpec ident="annotatedU" ns="http://iso-tei-spoken.org/ns/1.0">
<desc>groups an utterance with the  annotation layers associated with
it</desc>
<classes>
<memberOf key="model.divPart.spoken"/>
</classes>
<content>
      <group xmlns="http://relaxng.org/ns/structure/1.0">
    <ref name="u"/>
    <oneOrMore>
      <ref name="spanGrp"/>
    </oneOrMore>
      </group>
</content>     
</elementSpec>

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

@Lou: please see above the new name + specification for annotationGrp, comprising the creation of a class model.annotationPart allowing an easy customization of the content depending of the kind of annotation object people will use (e.g. term entries, NER, open annotation objects, what have you)

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

So you want to replace "annotatedU" with "annotationGrp" ?

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

Yes. See Thomas' last document.

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

For the benefit of others trying to follow this ticket, "Thomas' last document" is an entirely new docx version of the googledoc, the existence of which I learned of about 20 minutes ago when he sent me a copy !

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

The current version of this latest draft is now available from
https://sourceforge.net/p/tei/code/HEAD/tree/trunk/Incubator/Spoken/ISO-TEI-Transcription_of_spoken_language_FINAL_DRAFT_EDIT2_LR.docx

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

Could we put this behind a pwd protected place. We may have a pb with ISO copyrighted documents. (I am +not+ opening a debate, just mentioning)

Original comment by: @laurentromary

@TEITechnicalCouncil
Copy link
Author

Well, we have the wiki, but that is hardly secure. If you want to restrict access to this document, then clearly it is not yet ready for discussion by the TEI, so I will remove it.

Original comment by: @lb42

@TEITechnicalCouncil
Copy link
Author

This issue was originally assigned to SF user: louburnard
Current user is: lb42

@lb42
Copy link
Member

lb42 commented Oct 30, 2015

The latest version of the ISO proposal has apparently renamed this element as "annotationGrp". Unfortunately, TEI naming conventions require that an element named xxxGrp contains only xxx elements, which is not the case here. Perhaps a better name might be "annotationUnit" or "annotationBlock" ?

@laurentromary
Copy link
Contributor

I must say I like both (annotationUnit or annotationBlock). If a decision could be taken quickly by the council. We would make sure that the final ISO publication would refer to it. We actually presented the case in ISO as pending the naming decision by the TEI council.

@lb42
Copy link
Member

lb42 commented Feb 3, 2016

So are we agreed on the following:
a) we add a new element <annotationBlock> with a structure like that proposed above (under the name "annotationGrp")
b) we add some discussion and examples of its use to the current TS (transcribed speech) chapter, and probably also refer to it in current AI (anaytic info) chapter.

If so, I'd appreciate some help confecting the latter. Laurent? Tomas?

@laurentromary
Copy link
Contributor

I am sending to Hugh the ISO document which is under balloting and from which the council can take up examples. Come back to me and Thomas (now subscribed to both tickets) for any additional information.
We are about (ballot finishing in one week) to lock the element name to <annotationBlock>, so it would actually be optimal not to go towards another name. The content model should validate all examples form the ISO document in any case (thus making <u> optional to allow a stand-off mechanism, and rely on the use of a class (model.annotationBlockPart ?) to allow more than just <span> and make further customization easy.

@lb42
Copy link
Member

lb42 commented Feb 25, 2016

I've now seen the PDF of the draft: it still says "annotationGrp" rather than "annotationBlock", but on
the assumption that you will change that, I can do my best to get "annotationBlock" into the next release of TEI P5 (due around easter time). I will also check the examples in the PDF file (would be easier if I had the source) : how do you want to be notified of any problems that show up?

@laurentromary
Copy link
Contributor

Of course, since it is under ballot. We have already filed in a comment requesting the change to annotationBlock. So please go ahead with the implementation. Please notify me and Tomas if anything is wrong.

@lb42
Copy link
Member

lb42 commented Feb 28, 2016

In which TEI module should <annotationBlock> be defined? In spoken or in analysis ?

@laurentromary
Copy link
Contributor

Clearly analysis. It is potentially a tool for grouping annotations related to quite a range of object and of course an essential piece for standOff.

@bansp
Copy link
Member

bansp commented Feb 28, 2016

I concur -- it would be ideal for it to sit in a standoff module, but since there is no such module (yet?), analysis is definitely the way to go.

@lb42
Copy link
Member

lb42 commented Feb 28, 2016

some simple usage examples would be very helpful, if anyone has them.

@laurentromary
Copy link
Contributor

Following a more in-depth discussion with @lb42 we suggest to make the content model of <annotationBlock> more flexible by means of two sub-classes:

  • model.annotableSegment: groups together elements that may be annotated within an <annotationBlock> element
  • model.annotation: groups together any kind of element that may be used to annotate an annotable segment

The content model of <annotationBlock> would be something like:
(model.annotableSegment?, model.annotation*)

These classes could be bootstrapped with typical TEI elements that would have the appropriate semantic for the corresponding function in annotationBlock:

  • model.annotableSegment: <u> (as in the ISO standard proposal), <seg> (for written texts), <zone> (when the annotation is directly about an image)
  • model.annotation: <span> and <spanGrp> (cf. ISO document), <interp>, and <interpGrp> (obvious...), <fs> (generic purpose FS based annotation)

In the case of a stand-off use of annotationBlock, we may consider either to make the annotableSegment optional or use <span> to point to the annotated object.

@sydb
Copy link
Member

sydb commented Apr 27, 2016

@sydb wonders aloud (for @laurentromary to answer) if requiring the model.annotableSegment bit would get rid of the ambiguity that occurs when you want to annotate (with <seg>) a segment (encoded with <seg>). Add <ptr> to model.annotableSegemnt, so if you want to annotate something indicated by a pointer, put in a pointer to it!

@laurentromary
Copy link
Contributor

The issue of ambiguity is one for which I do not have an answer. In theory (if XML schemas were no headache), I would like to have the two model classes above. But in practice, we may just resolve to have one and provide written guidelines as to proper usage: for instance mapping this to the Open Annotation model as already alluded to in https://hal.inria.fr/hal-01254365
I should push myself to submit an abstract on all this for Vienna...

@laurentromary
Copy link
Contributor

@hcayless : getting tired? The comment is not related to the ticket, is it?

@hcayless
Copy link
Member

@laurentromary Wrong ticket. Deleted.

@sydb
Copy link
Member

sydb commented Sep 27, 2016

Council to prod LB to prod LR.

@lb42
Copy link
Member

lb42 commented Jun 6, 2017

@laurentromary The element <annotationBlock> is now in the Guidelines. Can we close this issue?

@laurentromary
Copy link
Contributor

Yes. There will be a specific ticket for updating the content model of annotationBlock

@lb42
Copy link
Member

lb42 commented Jun 6, 2017

OK, thanks. Closing this one.

@lb42 lb42 closed this as completed Jun 6, 2017
@hcayless hcayless added this to the Guidelines 3.2.0 milestone Jul 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants