Skip to content
Permalink
Browse files
feat(standoff)!: return XML alongside HTML for textValue with custom …
…standoff mapping and default XSL transformation (DEV-201) (#1991)

* testing: add stubs. for StandoffModels

* test: add test data for standoff custom mapping

* expand standoff ontology

* feat: return XML even with custom mapping

* test: start working on a proper E2E test for standoff with custom mapping

* test: add TODO for TEI related task

* test: add more TODOs

* test: add some more stubs for StandoffModels

* Update standoffModelsUtil.scala

* remove unused StandoffModelsUtil file

* test: revert unnecessary base ontology changes

* test: fix sample xml file

* test: add E2E test for standoff with custom mapping

* test:  adjust FileModels

* test: clean up Standoff E2E test

* test: add tests for StandoffModels

* refactor: minor cleaning up

* test: add mock sipi to standoff e2e test

* tests: start with standoff E2E tests

* Update StandoffModels.scala

* tests: check if sipi is available in E2E test

* tests: enable SIPI in E2E tests

* test: clean up

* test: move SIPI utils to E2ESpec

* test: rename E2E test from R2R to E2E

* test: reasonably test creating a standoff mapping in a unit test

* refactor: tidy up

* test: add e2e test for standard mapping

* refactor: tidy up unit tests

* refactor: remove potentially unused files

* testdata: add gitignore

* test: add standoff example to freetest test data

* refactor: clean up after merging Bazel-to-SBT PR

* docs: start documenting standoff

* docs: update documentation

* docs: update documentation

* docs: update documentation

* refactor: final tidy up

* refactor: changes according to review

* docs: add scaladoc

* refactor: minor cleanup according to review

* refactor: rename variable to be more clear what it actually is

* docs: update documentation to make creating text values with custom standoff more clear

* docs: fix typo

* refactor: move sipi messages from test into the sipi messages package
  • Loading branch information
BalduinLandolt committed Mar 7, 2022
1 parent eac0049 commit 2548b8f2cc75e8350091aefd70f62d66d8605428
Showing with 1,494 additions and 2,160 deletions.
  1. +16 −21 docs/01-introduction/standoff-rdf.md
  2. +70 −158 docs/02-knora-ontologies/knora-base.md
  3. +23 −2 docs/03-apis/api-v2/editing-values.md
  4. +44 −4 docs/03-apis/api-v2/reading-and-searching-resources.md
  5. +4 −1 sipi/images/0001/.gitignore
  6. +7 −0 sipi/images/0001/Cpl5d73kOLz-FclZg2VVf6r.info
  7. +26 −0 sipi/images/0001/Cpl5d73kOLz-FclZg2VVf6r.xsl
  8. +2 −0 sipi/images/0801/.gitignore
  9. +0 −68 sipi/images/0801/24PaAn6Qs6y-BM61fJZSUqv.txt
  10. +0 −250 sipi/images/0801/3oFWiltb5K8-CkJAyd1xKzq.xsl
  11. +0 −250 sipi/images/0801/GYcgKQrbSUo-CpXlnPubcWs.xsl
  12. +0 −471 sipi/images/0801/HOX3ZO0bmkn-DtOIUZBpvrP.xsl
  13. +0 −471 sipi/images/0801/IhH6F5GD9K7-D1hIZjjmcUq.xsl
  14. +0 −68 sipi/images/0801/J6cQCyIFSZ4-EOdiCxSsgR5.txt
  15. +2 −0 sipi/images/originals/0001/.gitignore
  16. +258 −99 test_data/all_data/freetest-data.ttl
  17. +155 −124 test_data/ontologies/freetest-onto.ttl
  18. +48 −0 test_data/test_route/texts/freetestCustomMapping.xml
  19. +25 −0 test_data/test_route/texts/freetestCustomMappingTransformation.xsl
  20. +48 −0 test_data/test_route/texts/freetestCustomMappingWithTransformation.xml
  21. +4 −0 test_data/test_route/texts/freetestXMLTextValue.xml
  22. +1 −1 test_data/test_route/texts/mappingForHTML.xml
  23. +1 −1 test_data/test_route/texts/mappingForLetter.xml
  24. +1 −1 test_data/test_route/texts/mappingForLetterWithXSLTransformation.xml
  25. +1 −1 test_data/test_route/texts/mappingForStandardHTML.xml
  26. +31 −0 webapi/src/main/scala/org/knora/webapi/messages/store/sipimessages/SipiMessages.scala
  27. +12 −8 webapi/src/main/scala/org/knora/webapi/messages/v2/responder/valuemessages/ValueMessagesV2.scala
  28. +2 −2 webapi/src/test/scala/org/knora/webapi/E2ESpec.scala
  29. +2 −31 webapi/src/test/scala/org/knora/webapi/ITKnoraLiveSpec.scala
  30. +355 −0 webapi/src/test/scala/org/knora/webapi/e2e/v2/StandoffRouteV2E2ESpec.scala
  31. +0 −123 webapi/src/test/scala/org/knora/webapi/e2e/v2/StandoffRouteV2R2RSpec.scala
  32. +2 −0 webapi/src/test/scala/org/knora/webapi/it/v1/DrawingsGodsV1ITSpec.scala
  33. +2 −0 webapi/src/test/scala/org/knora/webapi/it/v1/KnoraSipiIntegrationV1ITSpec.scala
  34. +4 −1 webapi/src/test/scala/org/knora/webapi/it/v2/KnoraSipiIntegrationV2ITSpec.scala
  35. +11 −4 webapi/src/test/scala/org/knora/webapi/models/filemodels/FileModels.scala
  36. +116 −0 webapi/src/test/scala/org/knora/webapi/models/standoffmodels/StandoffModels.scala
  37. +99 −0 webapi/src/test/scala/org/knora/webapi/models/standoffmodels/StandoffModelsSpec.scala
  38. +122 −0 webapi/src/test/scala/org/knora/webapi/responders/v2/StandoffResponderV2Spec.scala
@@ -5,15 +5,14 @@

# Standoff/RDF Text Markup

[Standoff markup](https://lexiconse.uantwerpen.be/index.php/lexicon/markup-standoff/)
is text markup that is stored separately from the content it describes. Knora's
[Standoff markup](https://lexiconse.uantwerpen.be/lexicon/markupStandoff.html)
is text markup that is stored separately from the content it describes. DSP-API's
Standoff/RDF markup stores content as a simple Unicode string, and represents markup
separately as RDF data. This approach has some advantages over commonly used markup systems
such as XML:

First, XML and other hierarchical markup systems assume that a document is a hierarchy, and
have difficulty representing
[non-hierarchical structures](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html)
have difficulty representing [non-hierarchical structures](http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html)
or multiple overlapping hierarchies. Standoff markup can easily represent these structures.

Second, markup languages are typically designed to be used in text files. But there is no
@@ -22,43 +21,39 @@ markup. It is possible to do this in a non-standard way by using an XML database
such as [eXist](http://exist-db.org), but this still does not allow for queries that include
text as well as non-textual data not stored in XML.

By storing markup as RDF, Knora can search for markup structures in the same way that it
By storing markup as RDF, DSP-API can search for markup structures in the same way as it
searches for any RDF data structure. This makes it possible to do searches that combine
text-related criteria with other sorts of criteria. For example, if persons and events are
represented as Knora resources, and texts are represented in Standoff/RDF, a text can contain
represented as resources, and texts are represented in Standoff/RDF, a text can contain
tags representing links to persons or events. You could then search for a text that mentions a
person who lived in the same city as another person who is the author of a text that mentions an
event that occurred during a certain time period.

In Knora's Standoff/RDF, a tag is an RDF entity that is linked to a
In DSP-API's Standoff/RDF, a tag is an RDF entity that is linked to a
[text value](../02-knora-ontologies/knora-base.md#textvalue). Each tag points to a substring
of the text, and has semantic properties of its own. You can define your own tag classes
in your ontology by making subclasses of `knora-base:StandoffTag`, and attach your own
properties to them. You can then search for those properties using Knora's search language,
properties to them. You can then search for those properties using DSP-API's search language,
[Gravsearch](../03-apis/api-v2/query-language.md).

The built-in [knora-base](../02-knora-ontologies/knora-base.md) and `standoff` ontologies
provide some basic tags that can be reused or extended. These include tags that represent
Knora data types. For example, `knora-base:StandoffDateTag` represents a date in exactly the
same way as a Knora [date value](../02-knora-ontologies/knora-base.md#datevalue), i.e. as a
DSP-API data types. For example, `knora-base:StandoffDateTag` represents a date in exactly the
same way as a [date value](../02-knora-ontologies/knora-base.md#datevalue), i.e. as a
calendar-independent astronomical date. You can use this tag as-is, or extend it by making
a subclass, to represent dates in texts. Gravsearch includes built-in functionality for
searching for these data type tags. For example, you can search for text containing a date that
falls within a certain [date range](../03-apis/api-v2/query-language.md#matching-standoff-dates).

Knora's APIs support automatic conversion between XML and Standoff/RDF. To make this work,
DSP-API supports automatic conversion between XML and Standoff/RDF. To make this work,
Standoff/RDF stores the order of tags and their hierarchical relationships. You must define an
[XML-to-Standoff Mapping](../03-apis/api-v2/xml-to-standoff-mapping.md) for your standoff tag classes and properties.
Then you can import an XML document into Knora, which will store it as Standoff/RDF. The text and markup
can then be searched using Gravsearch. When you retrieve the document, Knora converts it back to the
Then you can import an XML document into DSP-API, which will store it as Standoff/RDF. The text and markup
can then be searched using Gravsearch. When you retrieve the document, DSP-API converts it back to the
original XML.

To represent overlapping or non-hierarchical markup in exported and imported XML, Knora supports
[CLIX](http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html#t6) tags.
To represent overlapping or non-hierarchical markup in exported and imported XML, DSP-API supports
[CLIX](https://web.archive.org/web/20171222112655/http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html) tags.

Future plans for Standoff/RDF include:

- Creation and retrieval of standoff markup as such via the DSP-API,
without using XML as an input/output format.
- A user interface for editing standoff markup.
- The ability to create resources that cite particular standoff tags in other resources.
As XML-to-Standoff has proved to be complicated and not very well performing, the use of standoff with custom mappings is discouraged.
Improved integration of text with XML mark up, particularly TEI-XML, is in planning.

0 comments on commit 2548b8f

Please sign in to comment.