New element annotatedU #539

TEITechnicalCouncil · 2014-12-05T12:31:38Z

[This is the second of a few tickets related to the TEI/ISO standard for transcriptions of spoken language: see http://bit.ly/1jyZC37 ]

It is usual to segment transcribed speech into smaller chunks for which the existing element is appropriate. This proposal suggests a way of grouping each such chunk with one or more tiers of annotation, as is common practice.

Original comment by: @lb42

TEITechnicalCouncil · 2014-12-05T13:53:09Z

We should probably see how we could also deal with such cases by leans of the stand-off element. I see the two options as complementary flavors (for many pieces of speech annotation software an interleaved representation à la annotationU is easier; whereas for some other use cases, it is better to leave the primary transcription "untouched")

Original comment by: @laurentromary

TEITechnicalCouncil · 2014-12-17T08:34:47Z

After going back and forth between the ISO proposal and the stdf proposal. I see the possibility to create an element that would be slightly more generic than annotated you, which we could call annotationGrp. This element could be used to group together series of annotations associated to the same primary object (e.g. the same u element) either by having this object as a child (i.e. what we wanted with annotatedU: a u with a series of spanGrp for instance) or in a stand-off mode within the annotations sub-element of stdf. The specification of this element could be as follows:


<elementSpec ident="annotationGrp" mode="add" ns="http://standoff.proposal">
   <desc>Groups together various annotations, for instance for parallel interpretations of a spoken segment</desc>
   <classes>
      <memberOf key="model.annotationPart"/>
      <memberOf key="model.divPart.spoken"/>
      <memberOf key="att.timed"/>
      <memberOf key="att.global"/>
      <memberOf key="att.ascribed"/>
   </classes>
   <content>
      <rng:zeroOrMore>
         <rng:choice>
            <rng:ref name="u"/>
            <rng:ref name="model.global.meta"/>
            <rng:ref name="model.annotationPart"/>
         </rng:choice>
      </rng:zeroOrMore>
   </content>
</elementSpec>

with the idea that model.annotationPart would be the hook where one could add any kind of internal or external annotation object. For instance in my tests, I make model.global.meta member of this class to get spanGrp and the like in it.

Original comment by: @laurentromary

TEITechnicalCouncil · 2015-01-29T19:22:27Z

Generalizing is always nice. But what is "stdf" please?

Original comment by: @lb42

TEITechnicalCouncil · 2015-01-29T21:01:07Z

stdf is a proposed element badly in need of a name approved for all audiences.

Please see ticket #378, then the google doc linked from there, then Peter Stadler's ODD proposal for standoff annotations, linked from the google doc...

Original comment by: @bansp

TEITechnicalCouncil · 2015-01-29T23:25:36Z

There is also a github project (https://github.com/laurentromary/stdfSpec), where I maintain updates on the stdf proposal and some samples, which shows how annotatedU can be used nine or stand-off in relation to speech transcription.

Original comment by: @laurentromary

TEITechnicalCouncil · 2015-01-30T14:41:17Z

assigned_to: Lou Burnard

Original comment by: @lb42

TEITechnicalCouncil · 2015-03-02T16:16:07Z

Referring to the document at https://docs.google.com/document/d/1BTjYHSiPjD6GhKMNFmZrrvCkLQAa1RK7aGbG5K50uN4

Section 6.5.2 ("Representation as unclear or gap") says that when an string of words is unclear, and alternatives are proposed, the strings should each be wrapped in a separate span element (within choice, within unclear). I think this meant to say "a separate seg element" ; and indeed the examples given two sections later (6.5.4) use seg, not span. Probably just the usual code-switching problem between HTML span and TEI seg.

Section 5.7 (6.7 as listed in the TOC) on "Global divisions" proposes that divisions of the transcription at levels superordinate to the utterance should be accomplished by the use of non-tessellating divs. Unless utterance and annotated utterance themselves are regarded as syntactic sugar for div type="utterance", this is surely a very un-TEI way of doing things. Do we really mean to slip floating divs into the scheme by this means?

Original comment by: @PFSchaffner

TEITechnicalCouncil · 2015-03-16T13:13:11Z

I have suggested a revision to the document precluding non-tesselating divs. In the meantime, do we have agreement on introducing a new <annotatedU> element, a spec for which would look something like this

<elementSpec ident="annotatedU" ns="http://iso-tei-spoken.org/ns/1.0">
<desc>groups an utterance with the  annotation layers associated with
it</desc>
<classes>
<memberOf key="model.divPart.spoken"/>
</classes>
<content>
      <group xmlns="http://relaxng.org/ns/structure/1.0">
    <ref name="u"/>
    <oneOrMore>
      <ref name="spanGrp"/>
    </oneOrMore>
      </group>
</content>     
</elementSpec>

Original comment by: @lb42

TEITechnicalCouncil · 2015-03-16T13:19:29Z

@Lou: please see above the new name + specification for annotationGrp, comprising the creation of a class model.annotationPart allowing an easy customization of the content depending of the kind of annotation object people will use (e.g. term entries, NER, open annotation objects, what have you)

Original comment by: @laurentromary

TEITechnicalCouncil · 2015-03-16T13:53:21Z

So you want to replace "annotatedU" with "annotationGrp" ?

Original comment by: @lb42

TEITechnicalCouncil · 2015-03-16T14:12:10Z

Yes. See Thomas' last document.

Original comment by: @laurentromary

TEITechnicalCouncil · 2015-03-16T14:15:26Z

For the benefit of others trying to follow this ticket, "Thomas' last document" is an entirely new docx version of the googledoc, the existence of which I learned of about 20 minutes ago when he sent me a copy !

Original comment by: @lb42

TEITechnicalCouncil · 2015-03-24T12:37:11Z

The current version of this latest draft is now available from
https://sourceforge.net/p/tei/code/HEAD/tree/trunk/Incubator/Spoken/ISO-TEI-Transcription_of_spoken_language_FINAL_DRAFT_EDIT2_LR.docx

Original comment by: @lb42

TEITechnicalCouncil · 2015-03-24T12:39:58Z

Could we put this behind a pwd protected place. We may have a pb with ISO copyrighted documents. (I am +not+ opening a debate, just mentioning)

Original comment by: @laurentromary

TEITechnicalCouncil · 2015-03-24T12:44:35Z

Well, we have the wiki, but that is hardly secure. If you want to restrict access to this document, then clearly it is not yet ready for discussion by the TEI, so I will remove it.

Original comment by: @lb42

TEITechnicalCouncil · 2015-10-02T17:55:59Z

This issue was originally assigned to SF user: louburnard
Current user is: lb42

lb42 · 2015-10-30T08:48:09Z

The latest version of the ISO proposal has apparently renamed this element as "annotationGrp". Unfortunately, TEI naming conventions require that an element named xxxGrp contains only xxx elements, which is not the case here. Perhaps a better name might be "annotationUnit" or "annotationBlock" ?

laurentromary · 2015-10-30T08:52:33Z

I must say I like both (annotationUnit or annotationBlock). If a decision could be taken quickly by the council. We would make sure that the final ISO publication would refer to it. We actually presented the case in ISO as pending the naming decision by the TEI council.

lb42 · 2016-02-03T12:22:26Z

So are we agreed on the following:
a) we add a new element <annotationBlock> with a structure like that proposed above (under the name "annotationGrp")
b) we add some discussion and examples of its use to the current TS (transcribed speech) chapter, and probably also refer to it in current AI (anaytic info) chapter.

If so, I'd appreciate some help confecting the latter. Laurent? Tomas?

laurentromary · 2016-02-08T09:00:05Z

I am sending to Hugh the ISO document which is under balloting and from which the council can take up examples. Come back to me and Thomas (now subscribed to both tickets) for any additional information.
We are about (ballot finishing in one week) to lock the element name to <annotationBlock>, so it would actually be optimal not to go towards another name. The content model should validate all examples form the ISO document in any case (thus making  optional to allow a stand-off mechanism, and rely on the use of a class (model.annotationBlockPart ?) to allow more than just  and make further customization easy.

lb42 · 2016-02-25T18:12:54Z

I've now seen the PDF of the draft: it still says "annotationGrp" rather than "annotationBlock", but on
the assumption that you will change that, I can do my best to get "annotationBlock" into the next release of TEI P5 (due around easter time). I will also check the examples in the PDF file (would be easier if I had the source) : how do you want to be notified of any problems that show up?

laurentromary · 2016-02-25T18:28:33Z

Of course, since it is under ballot. We have already filed in a comment requesting the change to annotationBlock. So please go ahead with the implementation. Please notify me and Tomas if anything is wrong.

lb42 · 2016-02-28T17:21:27Z

In which TEI module should <annotationBlock> be defined? In spoken or in analysis ?

laurentromary · 2016-02-28T18:43:32Z

Clearly analysis. It is potentially a tool for grouping annotations related to quite a range of object and of course an essential piece for standOff.

bansp · 2016-02-28T19:37:20Z

I concur -- it would be ideal for it to sit in a standoff module, but since there is no such module (yet?), analysis is definitely the way to go.

lb42 · 2016-02-28T20:17:54Z

some simple usage examples would be very helpful, if anyone has them.

laurentromary · 2016-03-11T16:33:32Z

Following a more in-depth discussion with @lb42 we suggest to make the content model of <annotationBlock> more flexible by means of two sub-classes:

model.annotableSegment: groups together elements that may be annotated within an <annotationBlock> element
model.annotation: groups together any kind of element that may be used to annotate an annotable segment

The content model of <annotationBlock> would be something like:
(model.annotableSegment?, model.annotation*)

These classes could be bootstrapped with typical TEI elements that would have the appropriate semantic for the corresponding function in annotationBlock:

model.annotableSegment:  (as in the ISO standard proposal), <seg> (for written texts), <zone> (when the annotation is directly about an image)
model.annotation:  and <spanGrp> (cf. ISO document), <interp>, and <interpGrp> (obvious...), <fs> (generic purpose FS based annotation)

In the case of a stand-off use of annotationBlock, we may consider either to make the annotableSegment optional or use  to point to the annotated object.

sydb · 2016-04-27T19:56:41Z

@sydb wonders aloud (for @laurentromary to answer) if requiring the model.annotableSegment bit would get rid of the ambiguity that occurs when you want to annotate (with <seg>) a segment (encoded with <seg>). Add <ptr> to model.annotableSegemnt, so if you want to annotate something indicated by a pointer, put in a pointer to it!

laurentromary · 2016-04-27T20:02:09Z

The issue of ambiguity is one for which I do not have an answer. In theory (if XML schemas were no headache), I would like to have the two model classes above. But in practice, we may just resolve to have one and provide written guidelines as to proper usage: for instance mapping this to the Open Annotation model as already alluded to in https://hal.inria.fr/hal-01254365
I should push myself to submit an abstract on all this for Vienna...

laurentromary · 2016-04-27T20:06:10Z

@hcayless : getting tired? The comment is not related to the ticket, is it?

hcayless · 2016-04-27T20:07:45Z

@laurentromary Wrong ticket. Deleted.

sydb · 2016-09-27T14:45:47Z

Council to prod LB to prod LR.

lb42 · 2017-06-06T10:46:05Z

@laurentromary The element <annotationBlock> is now in the Guidelines. Can we close this issue?

laurentromary · 2017-06-06T13:27:13Z

Yes. There will be a specific ticket for updating the content model of annotationBlock

lb42 · 2017-06-06T14:00:38Z

OK, thanks. Closing this one.

TEITechnicalCouncil added Type: FeatureRequest sf-automigrated Status: Needs Discussion labels Oct 2, 2015

TEITechnicalCouncil assigned lb42 Oct 2, 2015

raffazizzi added the TEI: Schema label Dec 18, 2015

lb42 mentioned this issue Feb 3, 2016

New element <transcriptionDesc> as a possible child of <encodingDesc> #511

Closed

sydb mentioned this issue Apr 27, 2016

Encoding of Standoff annotations #374

Closed

lb42 closed this as completed Jun 6, 2017

hcayless added this to the Guidelines 3.2.0 milestone Jul 8, 2017

New element annotatedU #539

New element annotatedU #539

Comments

TEITechnicalCouncil commented Dec 5, 2014

TEITechnicalCouncil commented Dec 5, 2014

TEITechnicalCouncil commented Dec 17, 2014

TEITechnicalCouncil commented Jan 29, 2015

TEITechnicalCouncil commented Jan 29, 2015

TEITechnicalCouncil commented Jan 29, 2015

TEITechnicalCouncil commented Jan 30, 2015

TEITechnicalCouncil commented Mar 2, 2015

TEITechnicalCouncil commented Mar 16, 2015

TEITechnicalCouncil commented Mar 16, 2015

TEITechnicalCouncil commented Mar 16, 2015

TEITechnicalCouncil commented Mar 16, 2015

TEITechnicalCouncil commented Mar 16, 2015

TEITechnicalCouncil commented Mar 24, 2015

TEITechnicalCouncil commented Mar 24, 2015

TEITechnicalCouncil commented Mar 24, 2015

TEITechnicalCouncil commented Oct 2, 2015

lb42 commented Oct 30, 2015

laurentromary commented Oct 30, 2015

lb42 commented Feb 3, 2016

laurentromary commented Feb 8, 2016

lb42 commented Feb 25, 2016

laurentromary commented Feb 25, 2016

lb42 commented Feb 28, 2016

laurentromary commented Feb 28, 2016

bansp commented Feb 28, 2016

lb42 commented Feb 28, 2016

laurentromary commented Mar 11, 2016

sydb commented Apr 27, 2016

laurentromary commented Apr 27, 2016

laurentromary commented Apr 27, 2016

hcayless commented Apr 27, 2016

sydb commented Sep 27, 2016

lb42 commented Jun 6, 2017

laurentromary commented Jun 6, 2017

lb42 commented Jun 6, 2017