Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

chiarcos · 2019-03-03T14:31:42Z

Motivation is to achieve a representation of RDF relations in the TEI which is unambiguous in vocabulary and semantics. Note that this does not pertain to cases where native TEI vocabulary elements could be interpreted as triples, but to cases that are not covered by TEI semantics, e.g., the linking between a passage in a edition and a terminology repository or a CTS urn. A similar restriction can be found in the definition of <link>.

At the moment, there are at least three different possibilities to express RDF triples inline in TEI:
<relation> (#311)
<fs>
<link>

Each of these are problematic as they conflate pre-RDF and RDF semantics, and that they are analogy-driven ("tag abuse") rather than explicitly defined. The currently preferred solution with <relation> is restricted to named entities, example 4 in the guidelines thus breaks the TEI schema (see my comment on #311).

Several alternatives are possible (see email thread in http://tei-l.970651.n3.nabble.com/Best-practice-for-W3C-Web-Annotations-generated-based-on-TEI-names-and-dates-module-tags-td4031445.html). One possibility, RDFa, has great appeal due to being an established W3C standard that comes with off-the-shelf tooling (e.g., https://www.w3.org/2012/pyRdfa/ and http://www.sparql.org/sparql.html which can directly run against TEI documents or derived XML formats that maintain [rather than generate] RDFa information).

In the past, RDFa has been ruled out, partially because of fears it would evolve and this would have a negative impact on the TEI (http://tei-l.970651.n3.nabble.com/TEI-and-RDFa-was-Re-SAWS-and-LOD-was-Re-Cross-references-among-segs-in-TEI-td4025195.html). Since its W3C standardization (2015, https://www.w3.org/TR/rdfa-core/), this risk does no longer exist.

In 2018, two successful applications of TEI+RDFa in two independent projects have been reported (http://lrec-conf.org/workshops/lrec2018/W23/pdf/10_W23.pdf, http://e-spacio.uned.es/fez/eserv/bibliuned:363-Pruiz3/Ruiz_Fabo_Pablo_DISCO.pdf), thus motivating project-independent specifications, ideally as part of the TEI. I suggest to follow the modeling of https://github.com/postdataproject/disco/#rdfa-attributes.

Note1: This is a follow-up to #311, but a different approach.

Note2: One possible alternative is to redefine <link>, <relation> or (not and) <relation> to provide unambiguous RDF semantics and to couple this with GRDDL/XSLT scripts to generate RDFa attributes (cf. http://www.ancientwisdoms.ac.uk/media/ontology/tei_to_rdf.xsl).

Note3: Third possibility is to sandbox RDFa attributes by restricting them to <ab> and <seg> (i.e., same contexts as for <relation> in the SAWS proposal: http://www.ancientwisdoms.ac.uk/media/documents/Markup_Guidelines_for_Gnomologia.html#TEI.relation)

lb42 · 2019-03-08T18:25:58Z

Just for completeness, I ask again: what about <graph>? (especially, since I understand graph-theoretic ontologies are replacing RDF in some ecosystems)

chiarcos · 2019-03-08T20:46:13Z

RDF and graphs are closely related, indeed. On a theoretical level, RDF formalizes labelled directed multi-graphs. A technical difference is that RDF is based on URIs and W3C standards whereas graph databases are usually not.
But <graph> in TEI is not meant to provide graphs as a data structure, but only visualizations of such data structures. At least this is what the examples under https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-graph.html and https://www.tei-c.org/release/doc/tei-p5-doc/en/html/GD.html look like. It's more like GraphViz/Dot than like RDF, and of course both could be used to draw RDF graphs as illustrations.

lb42 · 2019-03-08T20:50:28Z

Why do you think tei:graph is not intended to provide a way of encoding a graph data structure? The beginning of chapter 19 would seem to indicate that it is: "Among the types of qualitative relations often represented by graphs are organizational hierarchies, flow charts, genealogies, semantic networks, transition networks, grammatical relations, tournament schedules, seating plans, and directions to people's houses. In developing recommendations for the encoding of graphs of various types, we have relied on their formal mathematical definitions and on the most common conventions for representing them visually. However, it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement."

PietroLiuzzo · 2019-03-08T20:56:12Z

in Beta Masaheft we also transform, like SAWS, the TEI in RDF triples of different flavours. However, I now think that perhaps these semantic mapping could be defined in a project ODD rather than in the transformation, with something like models and @behaviour, and that an XSLT or any other script making that transform from TEI to RDF should be able to rely on that information in the ODD in order to do so.
Because in most cases people will make different decisions on what classes and properties to use in their RDF also where their TEI is the same, even nicer would be if there were suggested associations for each element in the TEI modules already which could be customised in the ways all other parts of TEI are customisable, including them or not, adding or changing them. I could then define in my ODD the precise semantics, and opt for seg, relation, link, etc. and have it clearly defined in my custom ODD and in relation to the standard set in the original modules.

chiarcos · 2019-03-08T20:59:01Z

Wrt <graph>: In fact, my interpretation of "it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement" would indeed be that <graph> deals with graphical representations of graphs, with the "partially" clause referring to the fact that the rendering itself is beyond TEI (as it is beyond the dot language).
We should probably elicit feedback on actual uses of <graph>, but it should definitely not be used for both purposes, because of their different functions: A conceptual graph is normally not to be rendered whereas graph visualizations have to.

chiarcos · 2019-09-11T16:22:08Z

We should probably elicit feedback on actual uses of <graph>, but it should definitely not be used for both purposes, because of their different functions: A conceptual graph is normally not to be rendered whereas graph visualizations have to.

Public responses under http://tei-l.970651.n3.nabble.com/Current-and-historical-uses-of-lt-graph-gt-td4031618.html. Neither there nor in the private responses, any actual and current use of <graph> has been confirmed, only its historical use for drawing network graphs and its potential use for representing graph data structures. If indeed, the use of <graph> as a data structure (rather than a graphical representation) would be endorsed by the TEI, I would strongly suggest to rephrase its definition accordingly, and to provide alternative vocabulary for the representation use (e.g., by recommending/enabling the embedding of SVG [or GraphML], following the spirit of the suggestion in https://wiki.tei-c.org/index.php/TEI_to_SVG#Using_SVG_with_TEI).

For pragmatic reasons, I would prefer an RDFa-compliant solution (even if possibly sandboxed by restricting it to container elements such as <seg> and <ab>) because it comes with off-the-shelf tooling whereas anything based on <graph> would have to be rebuilt by every data provider individually (and as a new XML-based solution, it is highly unlikely to find any support outside the DH community). More important than this (personal) preference is, however, to have clear instructions for expressing RDF triples (or at least, RDF properties and objects) in TEI and to have that in the guidelines, and with respect to this, I'd be happy with any clear guidance.

martindholmes · 2019-09-16T08:54:55Z

@chiarcos For a very straightforward solution, have you considered just putting RDFa inside a <xenoData> element and pointing to/from the TEI? That would leave your RDFa clean, straightforward and easily processable, while tightly linking it to the TEI content.

chiarcos · 2019-09-16T10:35:24Z

Am .09.2019, 10:55 Uhr, schrieb Martin Holmes <notifications@github.com>:

@chiarcos For a very straightforward solution, have you considered just putting RDFa inside a <xenoData> element and pointing to/from the TEI? That >would leave your RDFa clean, straightforward and easily processable, while tightly linking it to the TEI content.

Yes, but <xenoData> is a header element that can be used for RDF *meta*data (and this is the first example in the guidelines), and I see no easy way to use if for annotating *content* elements with RDF links. Problems: - <xenoData> is for (document) metadata, not for annotation (of specific parts of the document). The kind of information we would want to express would be linking with external dictionaries, term bases or ontologies, so this would not be typical document metadata. Semantically, it would be closer to <link> than to anyrhing in the header. - <xenoData> is a header element, so any kind of RDF data would be detached from the content it refers to. This is technically possible, but this is effectively standoff and where standoff could be applied, it's way safer, more standard conformant and better supported by existing tools to work with WebAnnotation to bridge between TEI and RDF. (This is provided by the Recogito tool, recently also for TEI documents.) So there is a good solution for standoff use cases, but where standoff isn't good option (i.e., if the content you're pointing to is still evolving), we need an inline solution. - For the specific case of RDFa, this only provides attributes, so we would also need additional elements to anchor these attributes to. Thus <xenoData> should not be used with RDFa, but could be used, e.g., with XHTML+RDFa. But infusing XHTML into TEI would be strange, because it's a semantically weaker formalism and there would be adequate data structures within the TEI that we can embed RDFa attributes to. - <xenoData> must be validated (e.g., whether local URIs resolve against elements in the body), and I see no convenient way to parse and to validate that with off-the-shelf tooling. It's not at all hard to build something, but this already represents a technical hurdle that a user that expects something to work right after download might not be willing to take. This is a problem that most TEI-native ways to encode RDF triples would have, as well.

martindholmes · 2019-09-16T10:41:16Z

@chiarcos Thanks for the clarification.

chiarcos · 2020-01-18T12:43:25Z

As an afterthought: Where it is not possible/necessary to provide RDF statements in inline XML, the standard solution (i.e., the only solution that is both TEI-compliant and W3C- [or otherwise] standardized) would be to use a standoff annotation with Web Annotation (JSON-LD) over a TEI/XML document. This works nicely as long as the underlying TEI/XML doesn't change anymore (such that URIs, resp. XPaths or offsets -- whatever selector is used for Web Annotation -- still point to the right element), but it is not feasible for content under production.

Permitting RDFa in TEI is actually conceptually compatible with the recommendation to use Web Annotation for standoff annotation, as an RDFa serialization of Web Annotation has been developed, too: https://www.w3.org/community/openannotation/wiki/RDFa, resp. https://www.w3.org/TR/annotation-html/#annotations-embedded-as-rdfa

peterstadler · 2020-05-04T13:38:59Z

We discussed that issue briefly during our virtual f2f this weekend. If I understand correctly, the current issue is about expressing "RDF triples inline in TEI" where the straightforward solution would be to add RDFa attributes to (nearly?) all TEI elements. While this might not be a proper solution to be incorporated into the TEI standard, would it still be helpful to have that as an example customization at https://tei-c.org/guidelines/customization/ (in analogy to TEI + SVG or TEI + Math)?

chiarcos · 2020-05-05T11:29:15Z

If I understand correctly, the current issue is about expressing "RDF triples *inline* in TEI" where the straightforward solution would be to add RDFa attributes to (nearly?) all TEI elements.

Let's call that the maximum solution, and it is clearly not the best way for incorporation into the TEI standard.*

would it still be helpful to have that as an example customization at https://tei-c.org/guidelines/customization/ (in analogy to TEI + SVG or TEI + Math)?

Very much so, *if* *- *this is presented as a TEI-endorsed approach (i.e., under "Customizations provided by the TEI Consortium"), *and* *- *candidate elements for a native TEI encoding of RDF triples (all discussed in this thread) are complemented with a link to the TEI+RDFa customization in the guidelines (something like "Note that this element should not be used for the encoding of RDF graphs in inline TEI, instead, see the ..."), *and* *-* the examples for using <relation> for encoding RDF triples are deprecated in the guidelines (and replaced by [or at least, complemented with] a reference to the TEI+RDFa customization) I think these conditions are necessary to give TEI users a *clear guidance* and to guarantee interoperability among different projects and between TEI and LOD communities. As long as TEI users see their graphs as independent from RDF, they remain free to model it however they like, but if an RDF interpretation is intended, it should be marked as such. I would be happy to contribute to the development of such a customization and its documentation. A disadvantage of the customization approach is that customizations seem to be monolithic. As I am less into TEI than into LOD, is it possible to combine different customizations with each other? In the TEI-Drama customization, RDFa would be useful for entity linking, in TEI-Corpus, it could complement standoff markup and feature structures, and in the TEI-MS customization, it would be useful for intertextual relations, in other existing customizations, it would be useful for object metadata. For lexical resources, a novel Dict+RDFa customization that combines TEI Dict with OntoLex could be useful. In the end we might end up with a very large number of customizations, basically every customization with and without RDF(a), respectively. Thanks a lot, Christian * An alternative approach, more likely to be integrated with the TEI standard, is to sandbox RDFa attributes by introducing them attributes *for certain elements* such as <seg> and <ab>. This would roughly follow the model of the current use of <relation> in SAWS (there as child element of <seg> and <ab>), except that the @Active and @Passive attributes are replaced by their RDFa counterparts and as direct attributes of <seg>, etc. But a customization will do, as long as people are likely to find it when they search *on the TEI pages* for a solution about encoding RDF triples inline.

martinascholger · 2020-07-03T19:05:54Z

@peterstadler and I discussed the issue in a meeting on July, 1. Based on the discussion, Peter started with a first draft for an example customization.

chiarcos · 2020-07-07T08:41:08Z

Am .07.2020, 21:06 Uhr, schrieb Martina Scholger <notifications@github.com>:

@peterstadler and I discussed the issue in a meeting on July, 1. Based on the discussion, Peter started with a first draft for an example customization.

Great news! Let me know how to help.

peterstadler · 2020-07-31T10:31:17Z

Just for the record: The current draft of the customisation ODD is added in the branch issue-1860 at 151136c.
It simply adds all RDFa attributes to a new class att.global.analytic.rdfa and hooks this class into att.global.analytic.

RobertoRDT · 2021-07-02T15:36:15Z

Dear all, any new developments on this? Has anyone tested the new customisation? Would you suggest that the RDFa attributes are a good solution? I would like to do some experimental work with ontologies and RDF-like triples, hope that the "clear guidance" mentioned by Christian arrives at some point in time.

Thank you for your work,

R

chiarcos · 2021-07-05T09:47:06Z

AFAIK, the status so far is that there were two concrete applications of TEI+RDFa that motivated the customization. Data under https://github.com/pruizf/disco (includes TEI+RDFa raw data) and http://www.deaf-page.de/guichaulmTel/edition.html (HTML with RDFa from TEI+RDFa preserved, read off RDF with https://www.w3.org/2012/pyRdfa/extract?uri=http%3A%2F%2Fwww.deaf-page.de%2FguichaulmTel%2Fedition.html, use the latter link to explore the graph, e.g. using the FROM keyword of the web service at http://www.sparql.org/sparql.html). Links for descriptions can be found in this thread. However, both precede the customization. Following our 2018 experiments, I applied for a 3-year project on 16th c. Lithuanian postils where the customization is foreseen to be used wide-scale for linking between edition and dictionaries, as well as for intertextual links between the Old Lithuanian texts and their German or biblical sources. This was approved in Dec 2020, but due to administrative delays at my university, it has not started yet. Otherwise, this would have been the demonstrator you're asking for. Anyway, even though delayed, it will follow the agenda we laid out for it, so including a broad-scale application (and validation) of the TEI+RDFa customization. I know that the colleagues at the Heidelberg Academy of Sciences were very much interested in continuing the work on http://www.deaf-page.de/guichaulmTel/edition.html for other Romance data, but I don't think that a concrete follow-up project has yet manifested itself. You might want to reach out to Sabine Tittel (contact details in the TEI+RDFa paper) for confirmation. I'm not exactly unbiased, but I guess it is fair to say that TEI+RDFa will work as a representation formalism (to the same extent as most "TEI-native" alternatives, except that the latter have ambiguous semantics). How it performs in established TEI workflows (rather in newly created ones) will depend on specifics of the project. And if you plan to either embed RDF(a) into your generated markup or extract it from your TEI+RDFa source data, there is technology at hand to do so, so for a new project where text edition and RDF annotation evolve simultaneously, it would be my first choice for this very reason. If in your scenario, text edition is completed before RDF annotation begins, Web Annotation+TEI (as used in Recogito) would be more established, but it's standoff and therefore both a bit brittle (in terms of data consistency) and technically challenging (you need to set up and synchronize an XML and a JSON-LD workflow -- unless you're happy with what Recogito can already do). If you're looking for a place to discuss that, please feel free to reach out to https://www.w3.org/community/ld4lt/, where we are in the process to harmonize linguistic annotations in an RDF-compliant way ( https://github.com/ld4lt/linguistic-annotation). The focus of that group is not TEI, but (annotation with) RDF, but TEI (TEI with JSON-LD markup or inline RDF annotations) is a key aspect in the discussion. Except for adhering to independently established standards for RDF data for which there is independent tooling available (this does rule out the earlier TEI practices), there's no clear recommendation coming out of that, yet, because there are multiple candidate vocabularies that need to be harmonized (TEI among them) and harmonization is a long-term effort that will take some time to arrive at any consolidated model. I expect that this will be an extension of Web Annotation, and support RDF inline annotations in accordance with https://www.w3.org/TR/annotation-html/ [this is not a standard, but just a working note], i.e., as a special case of TEI+RDFa, but this is an educated guess, only.

JanelleJenstad · 2023-05-08T16:38:02Z

Revisited at Guelph 2023 F2F. Peter Stadler has rotated off Council. @HelenaSabel (whose work is mentioned in this ticket) will review the draft ODD and get things moving again.

chiarcos mentioned this issue Mar 4, 2019

Encoding RDF relationships in TEI #311

Closed

ebeshero assigned martinascholger Mar 15, 2019

martinascholger added the Status: Needs Discussion label May 5, 2019

chiarcos mentioned this issue Apr 12, 2020

Standoff: annotation microstructure #1745

Closed

peterstadler self-assigned this May 3, 2020

peterstadler removed their assignment Sep 12, 2022

sydb assigned HelenaSabel May 8, 2023

HelenaSabel mentioned this issue May 8, 2023

first draft for a TEI with RDFa exemplar #2431

Draft

ebeshero added the Status: Go label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

chiarcos commented Mar 3, 2019

lb42 commented Mar 8, 2019

chiarcos commented Mar 8, 2019

lb42 commented Mar 8, 2019

PietroLiuzzo commented Mar 8, 2019

chiarcos commented Mar 8, 2019

chiarcos commented Sep 11, 2019 •

edited

martindholmes commented Sep 16, 2019

chiarcos commented Sep 16, 2019 via email

martindholmes commented Sep 16, 2019

chiarcos commented Jan 18, 2020 •

edited

peterstadler commented May 4, 2020

chiarcos commented May 5, 2020 via email

martinascholger commented Jul 3, 2020

chiarcos commented Jul 7, 2020 via email

peterstadler commented Jul 31, 2020

RobertoRDT commented Jul 2, 2021

chiarcos commented Jul 5, 2021 via email

JanelleJenstad commented May 8, 2023 •

edited

Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860

Comments

chiarcos commented Mar 3, 2019

lb42 commented Mar 8, 2019

chiarcos commented Mar 8, 2019

lb42 commented Mar 8, 2019

PietroLiuzzo commented Mar 8, 2019

chiarcos commented Mar 8, 2019

chiarcos commented Sep 11, 2019 • edited

martindholmes commented Sep 16, 2019

chiarcos commented Sep 16, 2019 via email

martindholmes commented Sep 16, 2019

chiarcos commented Jan 18, 2020 • edited

peterstadler commented May 4, 2020

chiarcos commented May 5, 2020 via email

martinascholger commented Jul 3, 2020

chiarcos commented Jul 7, 2020 via email

peterstadler commented Jul 31, 2020

RobertoRDT commented Jul 2, 2021

chiarcos commented Jul 5, 2021 via email

JanelleJenstad commented May 8, 2023 • edited

chiarcos commented Sep 11, 2019 •

edited

chiarcos commented Jan 18, 2020 •

edited

JanelleJenstad commented May 8, 2023 •

edited