Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standoff: annotation microstructure #1745

Closed
tuurma opened this issue Feb 25, 2018 · 13 comments
Closed

Standoff: annotation microstructure #1745

tuurma opened this issue Feb 25, 2018 · 13 comments

Comments

@tuurma
Copy link
Contributor

tuurma commented Feb 25, 2018

related to #374

History

Subject of standoff is discussed in this tracker since 2012 and must have surfaced much earlier in unrecorded conversations. Initial agreement was of a <standoff> element envisioned as a sibling of the <teiHeader> and <text> which would serve as a wrapper for individual annotations. Convincing statement about what is meant by annotation is given in #374 (comment) and Council meeting of 2012-09 agreed with that, without determining exact content model, except postulating

The thing should have a very simple content model based on a model class, so that external vocabularies can be included easily as a customization.

Further work on standoff has been delegated to workgroups and LingSiG but until now we do not have any conclusion. Meanwhile other attempts at standardization of annotations have been carried out by Open Annotation Community group, later resulting in W3C specification of Web Annotation Data Model
https://www.w3.org/TR/annotation-model. Council agrees that TEI annotation model should be compatible with OA recommendations.

Current standoff proposal can be found at http://htmlpreview.github.io/?https://github.com/laurentromary/stdfSpec/blob/AnnArbor/Scenarios/StandOffScenarios.html

@tuurma tuurma self-assigned this Feb 25, 2018
@tuurma
Copy link
Contributor Author

tuurma commented Feb 25, 2018

Web Annotation Data Model (WADM)

The primary aim of the Web Annotation Data Model is to provide a standard description model and format to enable annotations to be shared between systems (...) The model should cover as many annotation use cases as possible, while keeping the simple annotations easy and expanding from that baseline to make complex uses possible.

The Web Annotation Data Model is a single, consistent model that can be used by all interested parties. A single method of fulfilling a use case is strongly preferred over multiple methods (...) the Data Model is built using Linked Data fundamentals (...)

Web Annotation Principles

The Web Annotation Data Model is defined using the following basic principles:

  • An Annotation represents a relationship between resources.
  • There are two primary types of resource that participate in this relationship, Bodies and Targets.
  • The content of the Body resources is related to, and typically "about", the content of the Target resources.
  • Annotations, Bodies and Targets may have their own properties and relationships, typically including creation and descriptive information.

The Web is distributed, with different systems working together to provide access to content. Annotations can be used to link those resources together, being referenced as the Body and Target. The Target resource is always an External Web Resource, but the Body may also be embedded within the Annotation.

An Annotation is a Web Resource. Typically, an Annotation has a single Body, which is a comment or other descriptive resource, and a single Target that the Body is somehow "about". The Annotation likely also has additional descriptive properties.

image

Annotation model

context Property The context that determines the meaning of the serialization format as an Annotation. 
id Property The identity of the Annotation. An Annotation must have exactly 1 IRI that identifies it.
type Relationship The type of the Annotation. An Annotation must have 1 or more types, and the Annotation class must be one of them.
Annotation Class The class for Web Annotations. The Annotation class must be associated with an Annotation using type.
body Relationship The relationship between an Annotation and its Body. There should be 1 or more body relationships associated with an Annotation but there may be 0.
target Relationship The relationship between an Annotation and its Target. There must be 1 or more target relationships associated with an Annotation.

Example

Use Case: Alice has written a post that makes a comment about a particular web page. Her client creates an Annotation with the post as the body resource, and the web page as the target resource.

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "id": "http://example.org/anno1",
  "type": "Annotation",
  "body": "http://example.org/post1",
  "target": "http://example.com/page1"
}

@tuurma
Copy link
Contributor Author

tuurma commented Feb 25, 2018

More complex annotation models

Real life situation require mechanisms to cope with other factors:

  • it is beneficial to specify what types of resources the bodies and targets represent
  • bodies and targets may be fragments of external resources (though annotation bodies may often just be created as part of an annotation without separate IRIs)
  • annotation may involve multiple bodies and/or targets
  • It is often important to have information about the context in which the Annotation was created, modified and used

@tuurma
Copy link
Contributor Author

tuurma commented Feb 25, 2018

An attempt to nest TEI markup inside OA mapped to XML (from earlyPrint project)
First annotation deals with regularization (Pauls to Paul's) and a second one with correction of the base text (to Widower). Targets which are always document fragments are identified by IdSelectors using values of @xml:id from the target document.

<annotation-list>
    <annotation-item id="A77567_04-000340" creator="earlyPrint" visibility="public" generator="import" 
        status="accepted" created="2016-01-01">
        <annotation-body type="TEI" subtype="regularization" format="text/xml">
            <orig>Pauls</orig>
            <reg>Paul's</reg>
        </annotation-body>
        <annotation-target source="A7e0c97f9-60a1-43d7-9b0e-7f4ec7ad39ac" version="1">
            <target-selector type="IdSelector" value="A77567_04-000340"/>
        </annotation-target>
    </annotation-item>
    <annotation-item generator="earlyPrint" id="Ad93fcf5b-7174-4883-9934-06209eaa80ee" status="pending" visibility="public" creator="shcuser" created="2016-07-05T19:38:09.944Z" modified="2016-07-05T19:38:09.944Z" class="style-scope annotation-list">
        <annotation-body subtype="update" type="TEI" format="text/xml" class="style-scope annotation-list">
            <w class="style-scope annotation-list">Widower</w>
        </annotation-body>
        <annotation-target source="A7e0c97f9-60a1-43d7-9b0e-7f4ec7ad39ac" class="style-scope annotation-list">
            <target-selector type="IdSelector" value="A77567_04-000640" class="style-scope annotation-list"/>
        </annotation-target>
    </annotation-item>
</annotation-list>

@jamescummings
Copy link
Member

Shouldn't the @format in these examples be "application/tei+xml" (i.e. the TEI mimetype)?

@bansp
Copy link
Member

bansp commented Feb 25, 2018

Just a note: the principles worked out in stdfSpec have now been put to use in several projects. They are not at odds with the WADM and are compatible with ISO proposals. I trust that these principles will form the core of whatever is being proposed here.

@tuurma
Copy link
Contributor Author

tuurma commented Feb 26, 2018

Another example from a standoff proposal of ~2015, never really used in practice. Idea was to use TEI markup enhanced with @stf_target or @stf_from/@stf_to anchoring it to base text to encode different layers (each a valid OHCO)

<standoff>
	<stf xml:id= “stf_name”>
         		<persName stf_target="#w119" ref="#Morgain"/>
         		<persName stf_target="#w132" ref="#Lancelot"/>
<persName stf_target="#w320" ref="#Yvain"/>
         		<persName stf_from="#w323" stf_to="#w325" ref="#DukeClarence"/>
	</stf>
	<stf xml:id= “stf_hi”>
         		<hi stf_from="#w1" stf_to="#w10" rend="rubric"/>
         	</stf>
</standoff>

@laurentromary
Copy link
Contributor

Like alluded to by @bansp , we already have quite a couple of project implementing the current proposal. You have a NER scenario under https://github.com/laurentromary/stdfSpec/tree/AnnArbor/Scenarios . In particular, it would be good to keep to the naming agreed at the Ann Arbor council meeting (standOff, listAnnotation, annotationBlock). We should focus on defining a stable content model (see the analysis at the end of the scenario document).

@tuurma
Copy link
Contributor Author

tuurma commented Feb 26, 2018

Standoff representation of apparatus variorum in Digital Mishnah project

                <app xml:id="app.4.1.1.2.5">
                    <rdgGrp n="1">
                        <rdg wit="#P00001">
                            <ptr target="#P00001.4.1.1.2.5"/>
                        </rdg>
                        <rdg wit="#P00002">
                            <ptr target="#P00002.4.1.1.2.5"/>
                        </rdg>
                        <rdg wit="#S01520">
                            <ptr target="#S01520.4.1.1.2.5"/>
                        </rdg>
                        <rdg wit="#S07106">
                            <ptr target="#S07106.4.1.1.2.5"/>
                        </rdg>
                        <rdg wit="#S07204">
                            <ptr target="#S07204.4.1.1.2.4"/>
                        </rdg>
                        <rdg wit="#S07319">
                            <ptr target="#S07319.4.1.1.2.6"/>
                        </rdg>
                        <rdg wit="#S07326">
                            <ptr target="#S07326.4.1.1.2.5"/>
                        </rdg>
                        <rdg wit="#S08174">
                            <ptr target="#S08174.4.1.1.2.5"/>
                        </rdg>
                    </rdgGrp>
                    <rdgGrp n="empty">
                        <rdg wit="#S00483"/>
                    </rdgGrp>
                </app>

@tuurma
Copy link
Contributor Author

tuurma commented Feb 26, 2018

@laurentromary @bansp I was charged by the TEI Council yesterday with gathering existing standoff approaches for further discussion. I would appreciate having examples from the projects you mention here, ideally making it clear how they are WADM conformant.

@laurentromary
Copy link
Contributor

This is what you will find under https://github.com/laurentromary/stdfSpec/tree/AnnArbor/Scenarios
with a group of WADM compatible examples and a couple of other ones.

@ebeshero
Copy link
Member

ebeshero commented Nov 3, 2018

I was reviewing this issue within context with #1833 and accidentally managed to close it—sorry! It is reopened now. I do agree with @sydb that the use case @joeytakeda is providing on that ticket relate to the examples being reviewed here as stand-off annotation.

@tuurma
Copy link
Contributor Author

tuurma commented May 7, 2019

Status update: we have a workgroup on standoff that involves @laurentromary and Council representatives. Specific discussion is taking place in standoff proposal issue tracker https://github.com/laurentromary/stdfSpec/issues. Currently we are working on creating good examples for several specific use cases we have defined and intend to work on elaborating standoff and listAnnotation content models incrementally.

@chiarcos
Copy link

chiarcos commented Apr 12, 2020

As this discussion is motivated via Web Annotation and it seems the discussion will converge to yet another TEI insular solution: What is the objective for not following W3C recommendations regarding embedding Web Annotation in markup languages? See https://www.w3.org/TR/annotation-html/.
Two of the three approaches it describes can readily be applied to TEI, i.e., using RDFa (natural solution for elements in the TEI body), or putting JSON-LD into <script> elements (corresponds to a TEI approach that places standoff data in <xenoData> in the TEI header). Note that both possibilities have also been discussed for TEI independently from standoff: #1860.

@tuurma tuurma closed this as completed May 8, 2021
@martinascholger martinascholger added this to the Guidelines 4.3.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants