New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding RDF relationships in TEI (TEI+RDFa and alternatives) #1860
Comments
Just for completeness, I ask again: what about |
RDF and graphs are closely related, indeed. On a theoretical level, RDF formalizes labelled directed multi-graphs. A technical difference is that RDF is based on URIs and W3C standards whereas graph databases are usually not. |
Why do you think tei:graph is not intended to provide a way of encoding a graph data structure? The beginning of chapter 19 would seem to indicate that it is: "Among the types of qualitative relations often represented by graphs are organizational hierarchies, flow charts, genealogies, semantic networks, transition networks, grammatical relations, tournament schedules, seating plans, and directions to people's houses. In developing recommendations for the encoding of graphs of various types, we have relied on their formal mathematical definitions and on the most common conventions for representing them visually. However, it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement." |
in Beta Masaheft we also transform, like SAWS, the TEI in RDF triples of different flavours. However, I now think that perhaps these semantic mapping could be defined in a project ODD rather than in the transformation, with something like |
Wrt <graph>: In fact, my interpretation of "it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout, and placement" would indeed be that <graph> deals with graphical representations of graphs, with the "partially" clause referring to the fact that the rendering itself is beyond TEI (as it is beyond the dot language). |
Public responses under http://tei-l.970651.n3.nabble.com/Current-and-historical-uses-of-lt-graph-gt-td4031618.html. Neither there nor in the private responses, any actual and current use of <graph> has been confirmed, only its historical use for drawing network graphs and its potential use for representing graph data structures. If indeed, the use of <graph> as a data structure (rather than a graphical representation) would be endorsed by the TEI, I would strongly suggest to rephrase its definition accordingly, and to provide alternative vocabulary for the representation use (e.g., by recommending/enabling the embedding of SVG [or GraphML], following the spirit of the suggestion in https://wiki.tei-c.org/index.php/TEI_to_SVG#Using_SVG_with_TEI). For pragmatic reasons, I would prefer an RDFa-compliant solution (even if possibly sandboxed by restricting it to container elements such as <seg> and <ab>) because it comes with off-the-shelf tooling whereas anything based on <graph> would have to be rebuilt by every data provider individually (and as a new XML-based solution, it is highly unlikely to find any support outside the DH community). More important than this (personal) preference is, however, to have clear instructions for expressing RDF triples (or at least, RDF properties and objects) in TEI and to have that in the guidelines, and with respect to this, I'd be happy with any clear guidance. |
@chiarcos For a very straightforward solution, have you considered just putting RDFa inside a |
Am .09.2019, 10:55 Uhr, schrieb Martin Holmes <notifications@github.com>:
@chiarcos For a very straightforward solution, have you considered just
putting RDFa inside a <xenoData> element and pointing to/from the TEI?
That >would leave your RDFa clean, straightforward and easily
processable, while tightly linking it to the TEI content.
Yes, but <xenoData> is a header element that can be used for RDF
*meta*data (and this is the first example in the guidelines), and I see no
easy way to use if for annotating *content* elements with RDF links.
Problems:
- <xenoData> is for (document) metadata, not for annotation (of specific
parts of the document). The kind of information we would want to express
would be linking with external dictionaries, term bases or ontologies, so
this would not be typical document metadata. Semantically, it would be
closer to <link> than to anyrhing in the header.
- <xenoData> is a header element, so any kind of RDF data would be
detached from the content it refers to. This is technically possible, but
this is effectively standoff and where standoff could be applied, it's way
safer, more standard conformant and better supported by existing tools to
work with WebAnnotation to bridge between TEI and RDF. (This is provided
by the Recogito tool, recently also for TEI documents.) So there is a good
solution for standoff use cases, but where standoff isn't good option
(i.e., if the content you're pointing to is still evolving), we need an
inline solution.
- For the specific case of RDFa, this only provides attributes, so we
would also need additional elements to anchor these attributes to. Thus
<xenoData> should not be used with RDFa, but could be used, e.g., with
XHTML+RDFa. But infusing XHTML into TEI would be strange, because it's a
semantically weaker formalism and there would be adequate data structures
within the TEI that we can embed RDFa attributes to.
- <xenoData> must be validated (e.g., whether local URIs resolve against
elements in the body), and I see no convenient way to parse and to
validate that with off-the-shelf tooling. It's not at all hard to build
something, but this already represents a technical hurdle that a user that
expects something to work right after download might not be willing to
take. This is a problem that most TEI-native ways to encode RDF triples
would have, as well.
|
@chiarcos Thanks for the clarification. |
As an afterthought: Where it is not possible/necessary to provide RDF statements in inline XML, the standard solution (i.e., the only solution that is both TEI-compliant and W3C- [or otherwise] standardized) would be to use a standoff annotation with Web Annotation (JSON-LD) over a TEI/XML document. This works nicely as long as the underlying TEI/XML doesn't change anymore (such that URIs, resp. XPaths or offsets -- whatever selector is used for Web Annotation -- still point to the right element), but it is not feasible for content under production. Permitting RDFa in TEI is actually conceptually compatible with the recommendation to use Web Annotation for standoff annotation, as an RDFa serialization of Web Annotation has been developed, too: https://www.w3.org/community/openannotation/wiki/RDFa, resp. https://www.w3.org/TR/annotation-html/#annotations-embedded-as-rdfa |
We discussed that issue briefly during our virtual f2f this weekend. If I understand correctly, the current issue is about expressing "RDF triples inline in TEI" where the straightforward solution would be to add RDFa attributes to (nearly?) all TEI elements. While this might not be a proper solution to be incorporated into the TEI standard, would it still be helpful to have that as an example customization at https://tei-c.org/guidelines/customization/ (in analogy to |
If I understand correctly, the current issue is about expressing "RDF
triples *inline* in TEI" where the straightforward solution would be to
add RDFa attributes to (nearly?) all TEI elements.
Let's call that the maximum solution, and it is clearly not the best way
for incorporation into the TEI standard.*
would it still be helpful to have that as an example customization at
https://tei-c.org/guidelines/customization/ (in analogy to TEI + SVG or TEI
+ Math)?
Very much so, *if*
*- *this is presented as a TEI-endorsed approach (i.e., under
"Customizations provided by the TEI Consortium"), *and*
*- *candidate elements for a native TEI encoding of RDF triples (all
discussed in this thread) are complemented with a link to the TEI+RDFa
customization in the guidelines (something like "Note that this element
should not be used for the encoding of RDF graphs in inline TEI, instead,
see the ..."), *and*
*-* the examples for using <relation> for encoding RDF triples are
deprecated in the guidelines (and replaced by [or at least, complemented
with] a reference to the TEI+RDFa customization)
I think these conditions are necessary to give TEI users a *clear guidance*
and to guarantee interoperability among different projects and between TEI
and LOD communities. As long as TEI users see their graphs as independent
from RDF, they remain free to model it however they like, but if an RDF
interpretation is intended, it should be marked as such.
I would be happy to contribute to the development of such a customization
and its documentation.
A disadvantage of the customization approach is that customizations seem to
be monolithic. As I am less into TEI than into LOD, is it possible to
combine different customizations with each other? In the TEI-Drama
customization, RDFa would be useful for entity linking, in TEI-Corpus, it
could complement standoff markup and feature structures, and in the TEI-MS
customization, it would be useful for intertextual relations, in other
existing customizations, it would be useful for object metadata. For
lexical resources, a novel Dict+RDFa customization that combines TEI Dict
with OntoLex could be useful. In the end we might end up with a very large
number of customizations, basically every customization with and without
RDF(a), respectively.
Thanks a lot,
Christian
* An alternative approach, more likely to be integrated with the TEI
standard, is to sandbox RDFa attributes by introducing them attributes *for
certain elements* such as <seg> and <ab>. This would roughly follow the
model of the current use of <relation> in SAWS (there as child element of
<seg> and <ab>), except that the @Active and @Passive attributes are
replaced by their RDFa counterparts and as direct attributes of <seg>, etc.
But a customization will do, as long as people are likely to find it when
they search *on the TEI pages* for a solution about encoding RDF triples
inline.
|
@peterstadler and I discussed the issue in a meeting on July, 1. Based on the discussion, Peter started with a first draft for an example customization. |
Am .07.2020, 21:06 Uhr, schrieb Martina Scholger
<notifications@github.com>:
@peterstadler and I discussed the issue in a meeting on July, 1. Based
on the discussion, Peter started with a first draft for an example
customization.
Great news! Let me know how to help.
|
Just for the record: The current draft of the customisation ODD is added in the branch |
Dear all, any new developments on this? Has anyone tested the new customisation? Would you suggest that the RDFa attributes are a good solution? I would like to do some experimental work with ontologies and RDF-like triples, hope that the "clear guidance" mentioned by Christian arrives at some point in time. Thank you for your work, R |
AFAIK, the status so far is that there were two concrete applications of
TEI+RDFa that motivated the customization. Data under
https://github.com/pruizf/disco (includes TEI+RDFa raw data) and
http://www.deaf-page.de/guichaulmTel/edition.html (HTML with RDFa from
TEI+RDFa preserved, read off RDF with
https://www.w3.org/2012/pyRdfa/extract?uri=http%3A%2F%2Fwww.deaf-page.de%2FguichaulmTel%2Fedition.html,
use the latter link to explore the graph, e.g. using the FROM keyword of
the web service at http://www.sparql.org/sparql.html). Links for
descriptions can be found in this thread. However, both precede the
customization.
Following our 2018 experiments, I applied for a 3-year project on 16th c.
Lithuanian postils where the customization is foreseen to be used
wide-scale for linking between edition and dictionaries, as well as for
intertextual links between the Old Lithuanian texts and their German or
biblical sources. This was approved in Dec 2020, but due to administrative
delays at my university, it has not started yet. Otherwise, this would have
been the demonstrator you're asking for. Anyway, even though delayed, it
will follow the agenda we laid out for it, so including a broad-scale
application (and validation) of the TEI+RDFa customization.
I know that the colleagues at the Heidelberg Academy of Sciences were very
much interested in continuing the work on
http://www.deaf-page.de/guichaulmTel/edition.html for other Romance data,
but I don't think that a concrete follow-up project has yet manifested
itself. You might want to reach out to Sabine Tittel (contact details in
the TEI+RDFa paper) for confirmation.
I'm not exactly unbiased, but I guess it is fair to say that TEI+RDFa will
work as a representation formalism (to the same extent as most "TEI-native"
alternatives, except that the latter have ambiguous semantics). How it
performs in established TEI workflows (rather in newly created ones) will
depend on specifics of the project. And if you plan to either embed RDF(a)
into your generated markup or extract it from your TEI+RDFa source data,
there is technology at hand to do so, so for a new project where text
edition and RDF annotation evolve simultaneously, it would be my first
choice for this very reason.
If in your scenario, text edition is completed before RDF annotation
begins, Web Annotation+TEI (as used in Recogito) would be more established,
but it's standoff and therefore both a bit brittle (in terms of data
consistency) and technically challenging (you need to set up and
synchronize an XML and a JSON-LD workflow -- unless you're happy with what
Recogito can already do).
If you're looking for a place to discuss that, please feel free to reach
out to https://www.w3.org/community/ld4lt/, where we are in the process to
harmonize linguistic annotations in an RDF-compliant way (
https://github.com/ld4lt/linguistic-annotation). The focus of that group is
not TEI, but (annotation with) RDF, but TEI (TEI with JSON-LD markup or
inline RDF annotations) is a key aspect in the discussion. Except for
adhering to independently established standards for RDF data for which
there is independent tooling available (this does rule out the earlier TEI
practices), there's no clear recommendation coming out of that, yet,
because there are multiple candidate vocabularies that need to be
harmonized (TEI among them) and harmonization is a long-term effort that
will take some time to arrive at any consolidated model. I expect that this
will be an extension of Web Annotation, and support RDF inline annotations
in accordance with https://www.w3.org/TR/annotation-html/ [this is not a
standard, but just a working note], i.e., as a special case of TEI+RDFa,
but this is an educated guess, only.
|
Revisited at Guelph 2023 F2F. Peter Stadler has rotated off Council. @HelenaSabel (whose work is mentioned in this ticket) will review the draft ODD and get things moving again. |
Motivation is to achieve a representation of RDF relations in the TEI which is unambiguous in vocabulary and semantics. Note that this does not pertain to cases where native TEI vocabulary elements could be interpreted as triples, but to cases that are not covered by TEI semantics, e.g., the linking between a passage in a edition and a terminology repository or a CTS urn. A similar restriction can be found in the definition of <link>.
At the moment, there are at least three different possibilities to express RDF triples inline in TEI:
<relation> (#311)
<fs>
<link>
Each of these are problematic as they conflate pre-RDF and RDF semantics, and that they are analogy-driven ("tag abuse") rather than explicitly defined. The currently preferred solution with <relation> is restricted to named entities, example 4 in the guidelines thus breaks the TEI schema (see my comment on #311).
Several alternatives are possible (see email thread in http://tei-l.970651.n3.nabble.com/Best-practice-for-W3C-Web-Annotations-generated-based-on-TEI-names-and-dates-module-tags-td4031445.html). One possibility, RDFa, has great appeal due to being an established W3C standard that comes with off-the-shelf tooling (e.g., https://www.w3.org/2012/pyRdfa/ and http://www.sparql.org/sparql.html which can directly run against TEI documents or derived XML formats that maintain [rather than generate] RDFa information).
In the past, RDFa has been ruled out, partially because of fears it would evolve and this would have a negative impact on the TEI (http://tei-l.970651.n3.nabble.com/TEI-and-RDFa-was-Re-SAWS-and-LOD-was-Re-Cross-references-among-segs-in-TEI-td4025195.html). Since its W3C standardization (2015, https://www.w3.org/TR/rdfa-core/), this risk does no longer exist.
In 2018, two successful applications of TEI+RDFa in two independent projects have been reported (http://lrec-conf.org/workshops/lrec2018/W23/pdf/10_W23.pdf, http://e-spacio.uned.es/fez/eserv/bibliuned:363-Pruiz3/Ruiz_Fabo_Pablo_DISCO.pdf), thus motivating project-independent specifications, ideally as part of the TEI. I suggest to follow the modeling of https://github.com/postdataproject/disco/#rdfa-attributes.
Note1: This is a follow-up to #311, but a different approach.
Note2: One possible alternative is to redefine <link>, <relation> or (not and) <relation> to provide unambiguous RDF semantics and to couple this with GRDDL/XSLT scripts to generate RDFa attributes (cf. http://www.ancientwisdoms.ac.uk/media/ontology/tei_to_rdf.xsl).
Note3: Third possibility is to sandbox RDFa attributes by restricting them to <ab> and <seg> (i.e., same contexts as for <relation> in the SAWS proposal: http://www.ancientwisdoms.ac.uk/media/documents/Markup_Guidelines_for_Gnomologia.html#TEI.relation)
The text was updated successfully, but these errors were encountered: