# Distributed Text Services (DTS) in DraCor

by Ingo Börner (University of Potsdam, Germany)

This notebook is an adapted executable version of the chapter *Implementing the Standard API Specification DTS in DraCor for Enhanced Data Access* of the [CLS INFRA](https://clsinfra.io) Deliverable D7.4. [Report on the implementation of Programmable Corpora](https://doi.org/10.5281/zenodo.15301341). For the more comprehensive version see the report, pp. 56–94.

## Introduction

The implementation of the “Distributed Text Services” (DTS) Specification ([Cayless et al. 2024](https://doi.org/10.4000/jtei.4352)) within the DraCor platform represents an important development in enhancing the generic capabilities of DraCor and the concept of “Programmable Corpora”, making it more suitable for a broader range of applications in Computational Literary Studies. 

While on the one side the DraCor API serves as the backbone of the platform, enabling researchers to interact with dramatic texts in programmatic ways, it is on the other side crucial to recognize that the DraCor API is a custom solution tailored specifically to the needs and structure of DraCor. In the end, it is not a standardized API that can be universally applied across different projects or initiatives. This custom API, while beneficial for DraCor's specific use cases, limits its reusability for other purposes without significant adaptation.

As the ecosystem of “Programmable Corpora” continues to expand, both in general and within the specific context of DraCor, there is a growing need for more generic and widely applicable APIs. These APIs should ideally cover the document-driven functions that are currently handled by the DraCor API, but in a manner that allows for broader adoption and integration across diverse projects, and possibly, in generic client software. For a discussion of other standard APIs see the report D7.4, pp. 56-57.

## Setting up the notebook

The following code cells import some necessary Python packages, especially `requests` which is used throughout the notebook to fetch data from the DraCor API. Uncomment the cells if the packages and install them using `pip install {package}` if you encounter problems executing code cells.

In [1]:
# Uncomment to install requests
#!pip install requests

In [2]:
import requests, json, logging

In [3]:
#!pip install treelib

In [4]:
# Quick drawing of trees used in an example
from treelib import Tree

In [5]:
# Need to install the package lxml to have etree class available
#!pip install lxml

In [6]:
# Needed for parsing XML
from lxml import etree

In [7]:
# Log info to notebook
logging.basicConfig(level=logging.INFO)
#logging.basicConfig(level=logging.DEBUG)

At the time of writing the version of the DraCor API ([version 1.1](https://github.com/dracor-org/dracor-api/releases/tag/untagged-ef9a152d71f78ea9043a)) that includes the DTS endpoints has not been deployed on the production server, but is available on the staging server.

You can set the server to use by changing the value of the variable `api_base` in the cell below:

In [8]:
# Base URL of the API used in this notebook
# For production change to https://dracor.org/api/v1/

api_base = "https://staging.dracor.org/api/v1/"

In [9]:
# We can not use PyDraCor at the moment because the methods to retrieve data via th DTS endpoint
# have not been implemented in PyDraCor yet and it is up do discussion, if integrating them makes much sense. 
# We use a generic function with the Python package 'requests' to connect to the API

def get(corpusname:str = None, 
        playname:str = None, 
        apibase:str = "https://staging.dracor.org/api/v1",
        method:str = None,
        parse_json:bool = True):
    """
    Generic Method to retrieve data from the DraCor API
    """
    
    # Remove tailing slash in apibase if not set, otherwhise concatinating url parameters would not work as expected
    
    if apibase is not None:
        if apibase.endswith("/"):
            apibase = apibase[:-1]
            

    # Both parameters corpusname an playname are supplied
    if corpusname is not None and playname is not None :
        # used for /api/corpora/{corpusname}/plays/{playname}/
        if method is not None:
            request_url = f"{apibase}/corpora/{corpusname}/plays/{playname}/{method}"
        else:
            request_url = f"{apibase}/corpora/{corpusname}/plays/{playname}"

    # no playname set, use the .../corpora/{method} routes 
    elif corpusname is not None and playname is None:
        if method is not None:
            request_url = f"{apibase}/corpora/{corpusname}/{method}"
        else:
            request_url = f"{apibase}/corpora/{corpusname}"
    
    # only a method is set
    elif method is not None and corpusname is None and playname is None:
            request_url = f"{apibase}/{method}"
    else: 
        #nothing is set, return information on the API
        request_url = f"{apibase}/info"

    logging.info(f"Sending request to: {request_url}")
    
    #send the response
    r = requests.get(request_url)
    if r.status_code == 200:
        # successful request, decide if response need to be parsed
        if parse_json is True:
            json_data = json.loads(r.text)
            return json_data
        else:
            return r.text
    else:
        raise Exception(f"Request was not successful. Server returned status code: {str(r.status_code)}")

In [10]:
# We only need the generic function later when comparing DTS to the default DraCor API
# Get Information on the API (and test if everything works as expected)
get()

INFO:root:Sending request to: https://staging.dracor.org/api/v1/info


{'openapi': 'https://staging.dracor.org/api/v1/openapi.yaml',
 'existdb': '6.4.0',
 'version': '1.1.0-rc.1',
 'name': 'DraCor API v1',
 'base': 'https://staging.dracor.org/api/v1',
 'status': 'beta'}

## The DraCor DTS Implementation

We implemented the API endpoints defined in the DTS Specification to the codebase of the DraCor API eXist-DB application in the module [`dts.xqm`](https://github.com/dracor-org/dracor-api/blob/f2a9c451b2080f9d2bcbd54a221a38da9c086c3d/modules/dts.xqm). At the time of writing the final stable version 1 of DTS has not been released. We implemented the Specification in a version that was tagged “unstable”, the version that followed “1-alpha” (see [Snapshot of the DTS Spec in the Internet Archive](https://web.archive.org/web/20250317100350/https://distributed-text-services.github.io/specifications/versions/unstable/)).

The DraCor API now provides the four additional endpoints specified by DTS (see also schema below, adapted from Almas et al. 2023).

* `api/v1/dts`: DTS Entrypoint
* `api/v1/dts/collection`: Collection endpoint
* `api/v1/dts/navigation`: Navigation endpoint
* `api/v1/dts/document`: Document endpoint

![Schema representing the resource model underlying each DTS model, the type of data it exposes, as well as the required format of expression (Almas et al. 2023, adapted)](DTS-Endpoint-Overview.png)

In the following we discuss these endpoints in more detail. Based on the explanation of the quite simple to understand “Entrypoint” endpoint we offer some information on the general principles and technical solutions used in DTS (e.g. *URI Templates* and *JSON-LD* as return format to support Linked Data). The other, more complex endpoints “Collection”, “Navigation”, and “Document” are explained further below by providing practical examples of how they can be used. 

A note on naming conventions: In the DTS Specification properties start with a lower case letter, objects (classes) with a capital letter. Unfortunately, there are properties and classes that differ only in the first letter, e.g. `citeStructure` – a property - and `CiteStructure` – an object; `citationTrees` – the property, – `citationTree` – the object. Additionally, the use of plural and singular forms for properties is not always strictly consistent.

## The “Entrypoint” endpoint and some general DTS design principles

### DraCor DTS Entrypoint: `api/v1/dts`

This is the main entry point for accessing the DTS API within DraCor. It provides an overview of the other available DTS endpoints  and can serve as a starting point for navigating the API's capabilities. Users can retrieve metadata about the DTS implementation, such as the version, and, in theory, this endpoint should allow a DTS client to self configure by evaluating the provided links to the other endpoints.

A basic call to the DTS Entrypoint using Python works as is shown in the followin cell:

In [11]:
# Retrieve information on the DTS implementation of DraCor by calling 
# the DTS Entrypoint, e.g https://dracor.org/api/v1/dts
# The variable request_url attaches the path of the dts entrypoint 
# to the base url of the API

request_url = api_base + "dts"

# Use the Python library 'requests' to send a HTTP request
r = requests.get(request_url)
if r.status_code == 200:
    # the request to the endpoint was successful, parse the returned JSON object
    response_data = json.loads(r.text)

# Output the response data in the notebook
response_data

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'EntryPoint',
 'document': 'https://staging.dracor.org/api/v1/dts/document{?resource,ref,start,end,mediaType}',
 'navigation': 'https://staging.dracor.org/api/v1/dts/navigation{?resource,ref,start,end,down,tree}',
 '@id': 'https://staging.dracor.org/api/v1/dts',
 'collection': 'https://staging.dracor.org/api/v1/dts/collection{?id,nav}',
 'dtsVersion': 'unstable'}

The property `dtsVersion` specifies the version of the specification hat is implemented, the values of the properties `collection`, `navigation` and `document`contain so-called *URI Templates* of the Collection, the Navigation and the Document endpoints. 

### URI Templates

The DTS Specification uses URI Templates as defined in [RFC 6570](https://www.rfc-editor.org/rfc/rfc6570): "A URI Template is a compact sequence of characters for describing a range of Uniform Resource Identifiers through variable expansion." These patterns allow a client to construct URIs dynamically by expanding variables marked in curly brackets `{}` (see also https://distributed-text-services.github.io/specifications/versions/unstable/#about-uri-templates). 

For example, the response of the DraCor DTS Entrypoint contains the URI Template of the Collection endpoint

`https://staging.dracor.org/api/v1/dts/collection{?id,nav}`. 

From this we can deduce that there a two supported query parameters `id` and `nav`. When values for these two variables are provided, a client should expand the URI Template containing a question mark `?` followed by a sequence of parameter names separated by comma `,` into a valid URI by substituting the first parameter with the sequence `?` followed by the variable name, equals sign `=` and the the value of the first variable. For every other variable name in the comma separated list contained in the curly brackets, a client should substitute the variable name with an ampersant `&` followed by the variable name, an equals sign `=` and the value of the given variable.

In the case of above included URI Template and the supplied values `ger000001` as the first variable, the identifier the identifier `id` of a single resource, and `parents` as the value of the second parameter `nav`, a client should construct the following URI:

[`https://staging.dracor.org/api/v1/dts/collection?id=ger000001&nav=parents`](https://staging.dracor.org/api/v1/dts/collection?id=ger000001&nav=parents)


### JSON-LD

As with the other endpoints that return metadata, the response of the Entrypoint is in the [“JSON-LD”](https://www.w3.org/TR/json-ld) format. Using JSON-LD as a return format in DTS offers several significant benefits, particularly in terms of data interoperability and usability. One of the primary advantages is the embedded documentation it provides. The context within a JSON-LD document allows for the definition of terms, enabling users and systems to easily look up and understand the meaning of each term. This self-describing nature enhances the clarity and usability of the data, making it more accessible to developers and researchers alike.

In the “JavaScript Object Notation for Linked Data”, the key-value pairs within the returned JSON object are called properties. Each key represents a property name, and the corresponding value is the property value. The first three properties included in the response (see above) are defined by the JSON-LD specification:

* `context`: defines the context for interpreting the JSON-LD data. The value is a URI pointing to a document that maps the terms used in the data to their respective URIs (Uniform Resource Identifiers). For most of the properties the developers of the DTS specification provide these mappings under the URL `https://distributed-text-services.github.io/specifications/context/1-alpha1.json`, but in the case of the “Entrypoint” the properties `dtsVersion`, `collection`, `document` and `navigation` have not been defined. Currently, the process of releasing a new version of the specification is ongoing which will include an overhauled version of the JSON-LD context file including references to classes and properties in an *DTS Ontology* (see [issue on GitHub](https://github.com/distributed-text-services/specifications/issues/271) and the corresponding merge [commit](https://github.com/distributed-text-services/specifications/commit/460ccfd4c35002676f60b893f505513d73badb1a))
* `@id`: This default JSON-LD property provides a unique identifier for the resource being described.
* `@type`: This standard property specifies the type of the resource returned. In the case of the response of the Entrypoint endpoint the described resource is of the class `EntryPoint` as defined by the URI `https://w3id.org/dts/api#EntryPoint`. Because JSON-LD is a serialization of RDF the function of this property is the same as in Turtle stating that a thing is an instance of a class: `ex:Thing rdf:type ex:Class.`

The following code cells demonstrate how dereferencing concepts of the DTS Specification currently does not work as one would expect:

In [12]:
# test how the dereferencing of DTS on GitHub works, look at the HTTP headers
# do not allow a redirect by setting allow_redirects=False 

request_url = "https://w3id.org/dts/api#EntryPoint"
r = requests.get(request_url, allow_redirects=False)
status_code = r.status_code
headers = r.headers

In [13]:
# Output the HTTP Status Code
status_code

302

In [14]:
# Look at the HTTP Headers

headers

{'Date': 'Tue, 01 Jul 2025 12:26:39 GMT', 'Server': 'Apache/2.4.29 (Ubuntu)', 'Access-Control-Allow-Origin': '*', 'Location': 'https://github.com/distributed-text-services/specifications/', 'Content-Length': '319', 'Keep-Alive': 'timeout=2, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=iso-8859-1'}

In [15]:
# Output the Location as the target of the redirect

headers["Location"]

'https://github.com/distributed-text-services/specifications/'

In [16]:
# explicitly ask for RDF in any serialization (XML, Turtle, Notation 3)
# Probably GitHub (Pages) do not handle such requests

request_headers = {"Accept" : "application/rdf+xml, application/x-turtle, text/turtle, text/n3, text/rdf+n3"}
r = requests.get(request_url, headers=request_headers)

In [17]:
# The returned format is still text/html

r.headers['Content-Type']

'text/html; charset=utf-8'

When dreferencing the URI of the `EntryPoint` class a HTTP Status Code 302 "Found" is returned. This instructs the browser to follow the link supplied in the `Location` field of the header and thus redirects the client to another URL. In the case of `https://w3id.org/dts/api#EntryPoint` the client is redirected to https://github.com/distributed-text-services/specifications/ and always returns a HTML document, even if explicitly requesting RDF (in any serialization) via the HTTP Accept header in the request. This shows that the current implementation of how the DTS Context is served does not support "real" linked data applications that would ask for Linked data in a RDF format and would digest the returned RDF accordingly.

In [18]:
# Just test with DraCor's /id endpoint if this code would work at all

request_url = "https://staging.dracor.org/id/ger000001"
r = requests.get(request_url, allow_redirects=False, headers=request_headers)
status_code = r.status_code
headers = r.headers

In [19]:
# Output the HTTP Status Code

status_code

303

In [20]:
r.headers["Location"]

'https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/rdf'

In [21]:
request_headers = {"Accept" : "text/html"}
r = requests.get(request_url, allow_redirects=False, headers=request_headers)
status_code = r.status_code
r.headers["Location"]

'https://staging.dracor.org/ger/goethe-iphigenie-auf-tauris'

In contrast, DraCor, which implements [Cool URIs for the semantic web](https://www.w3.org/2001/sw/sweo/public/2007/cooluris/doc-20071008.html#r303uri), issues a HTTP Status Code or 303 "See other" and provides the URL of the RDF if this format is specified in the accept header of the request.

Still, using JSON-LD as a return format in DTS offers several significant benefits, particularly in terms of data interoperability and usability. One of the primary advantages is the embedded documentation it provides. The context within a JSON-LD document allows for the definition of terms, enabling users and systems to easily look up and understand the meaning of each term. This self-describing nature enhances the clarity and usability of the data, making it more accessible to developers and researchers alike.

See Chapter "DTS and Linked Data" in *D7.4 Report on the Implementation of Programmable Corpora* ([Börner and Trilcke (eds.) 2025](https://doi.org/10.5281/zenodo.15301341)) for a detailed analysis of the current state of the DTS Specification (version "unstable") and Linked Data.

## The DTS endpoints “Collection”, “Navigation”, and “Document”

Having discussed more general considerations and theoretical foundations of the DTS Specification, particularly in relation to Linked Data and its implications, we now shift our focus to a more practical perspective: In this part, we will delve into the other DTS endpoints (Collection, Navigation and Document), providing practical examples and demonstrations of how these endpoints can be utilized within DraCor. By exploring real-world use cases and specific functionalities, we aim to illustrate the tangible benefits and capabilities that the DTS implementation brings to the DraCor platform.

### DraCor DTS Collection Endpoint `api/v1/dts/collection`

This endpoint allows users to access information about the “collections” available. Users can retrieve a list of available corpora, including metadata such as title, corpus maintainer, and corpus language, similar to the regular DraCor API endpoints `api/v1/corpora` and `api/v1/corpora/{corpusname}`. This is useful for discovering and selecting specific collections for further exploration or analysis. 

For example, to list all available corpora, the following request URL can be used:

[`https://staging.dracor.org/api/v1/dts/collection`](https://staging.dracor.org/api/v1/dts/collection)

However, through the Collection endpoint, users can not only access lists of available drama corpora but also retrieve detailed metadata for individual resources, i.e. plays. This functionality is similar to the regular DraCor endpoint `api/v1/corpora/{corpusname}/plays/{playname}`.

To control this behaviour of the endpoint, the specification defines three query parameters for the Collection endpoint `id`, `nav` and `page`, whereas the DraCor implementation does not provide the paging functionality and therefore the parameter `page` is not available.

* **Parameter `id`**: The query parameter `id` is used to identify a collection or a single resource, i.e. in the case of DraCor either a corpus or a single play. The value of the parameter should be a URI, which, in the case of DraCor can either be the corpus name (Feature C1: [`corpus_name`](https://dracor.org/doc/odd#corpus_name)), e.g. `ger` of the *German Drama Corpus*, or a http(s) URI, e.g. `https://dracor.org/id/ger`.
We encourage using full HTTP-URIs when possible, because in most Linked Data applications this type of unique identifiers are commonly used. In the case of DraCor http(s) URIs will properly dereference in accordance with the concept of [Cool URIs for the Semantic Web](https://www.w3.org/TR/cooluris). For a user this means, that navigating to the given URI with a web browser will show the HTML page with information on the play, e.g. [https://dracor.org/id/ger000088](https://dracor.org/id/ger000088) will be redirected to [https://dracor.org/ger/lessing-emilia-galotti](https://dracor.org/ger/lessing-emilia-galotti).
If a Linked Data aware client requests RDF of the server for the URI using the HTTP Accept Header including the XML serialization of RDF `application/rdf+xml` the server will send a 303 “See other” HTTP Status code and provide the URL of the RDF serialization in the `Location` field of the response header. In the case of the play Emilia Galotti this is [https://dracor.org/api/v1/corpora/ger/plays/lessing-emilia-galotti/rdf](https://dracor.org/api/v1/corpora/ger/plays/lessing-emilia-galotti/rdf).

* **Parameter `nav`**: This query parameter can be used to navigate the collection hierarchy. It can take on a single value only – `parents`. When supplied the Collection endpoint returns not the included sub-collections or resources in the `member` array, but the parent collection. For example, when requesting information on the *German Shakespeare Drama Corpus* (GershDraCor) and providing the `nav`parameter, the cAPI includes the metadata on the root collection “DraCor Corpora”: [https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/gersh&nav=parents](https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/gersh&nav=parents)

The following code cell demonstrates how the data can be requested:

In [22]:
# If you change the api base to production, make sure to change the URI of the 
# corpus provided as the value of the id parameter as well

request_url = "https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/gersh&nav=parents"
r = requests.get(request_url)
r.json()

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'Collection',
 'description': "Edited by [Frank Fischer](https://lehkost.github.io/). This corpus contains all 37 of Shakespeare's plays in their German translations published by Schlegel and Tieck, in the edition of Aufbau-Verlag Berlin/Weimar (3rd edition 1975), which is based on the last edition published during Schlegel's lifetime (3rd edition 1843/44). The digitised print edition was procured from [Zeno.org](http://www.zeno.org/nid/20005683920) (via TextGrid Repository), which also provided an additional play (»Die beiden edlen Vettern«) that Shakespeare is now considered to have co-authored. For a full description please see the [README on GitHub](https://github.com/dracor-org/gershdracor).",
 'title': 'German Shakespeare Drama Corpus',
 'dtsVersion': 'unstable',
 'totalParents': 1,
 'totalChildren': 38,
 '@id': 'https://staging.dracor.org/id/gersh',
 'member': [{'@type': 'C

The JSON-LD response of the API is – in this case a `Collection` object (see value of property `@type`). 

Without the parameter `nav` supplied, and the parameter `id` having as its value the identifier of a existing corpus, e.g. `https://staging.dracor.org/id/tat` for the *Tatar Drama Corpus*, the API returns a `Collection` object with information on this corpus. 

Metadata on the plays of this corpus are included as `Resource` objects in the `member` array. The following request URL can be used to fetch information on the *Tatar Drama Corpus*: [https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat](https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat)

The following code cell requests the data on the *Tatar Drama Corpus* (TatDraCor) from the DTS Collection endpoint of the DraCor Staging server:

In [23]:
request_url = "https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat"
r = requests.get(request_url)
r.json()

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'Collection',
 'description': 'Edited by Daniil Skorinkin and Frank Fischer. Features a handful of plays in Tatar language, provided through Tatar Electronic Library.',
 'title': 'Tatar Drama Corpus',
 'dtsVersion': 'unstable',
 'totalParents': 1,
 'totalChildren': 3,
 '@id': 'https://staging.dracor.org/id/tat',
 'member': [{'@type': 'Resource',
   'download': 'https://staging.dracor.org/api/v1/corpora/tat/plays/qamal-berenche-teatr/tei',
   'dublinCore': {'language': 'tat', 'creator': 'Камал, Галиәсгар'},
   'collection': 'https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat000001{&nav}',
   'navigation': 'https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/tat000001{&ref,start,end,down}',
   'totalParents': 1,
   'totalChildren': 0,
   '@id': 'https://staging.dracor.org/id/tat000001',
   'extensions': {'@contex

The properties `@context`, `@id` and `@type` are key components of the JSON-LD standard. 

The JSON-LD context linked in `@context` can be used (by a JSON-LD aware client) to expand the other property names included in the response object to full URIs. This mapping allows applications to understand the semantics of the terms by referencing their definitions in an ontology. How this works can be explained with the property `title` that is included in the response of the Collection endpoint.

With the help of the JSON-LD context that includes the mapping

```
"title": "hydra:title" 
```

and the definition of the prefix `hydra:` as 

```
"hydra": "https://www.w3.org/ns/hydra/core#" 
```

allows a client to expand the property name `title` to `https://www.w3.org/ns/hydra/core#title`.

This full URI refers to the term `title` as defined in the [Hydra Core Vocabulary](https://www.hydra-cg.com/spec/latest/core). Hydra is available as an OWL Ontology and defines its terms accordingly. A client can use the URI of the term title to retrieve the definition of the term, which is included in the Hydra Core Vocabulary as a Data Property. 

The other two property that re-used from the *Hydra Core Vocabulary* contained in the `Collection` object returned by the DTS Collection endpoint are [`hydra:description`](https://www.hydra-cg.com/spec/latest/core/#hydra:description), a specialized sub-property of the more general `rdfs:comment` and the object property `hydra:member` that in DTS, in case of the Collection endpoint, is used to include either sub-collections or resources as parts of a collection.

**Note:** The version of the DTS specification implemented to DraCor and described in the [Report On the Implementation of Programmable Corpora](10.5281/zenodo.15301341) reused several properties from the Hydra Core Vocabulary. With the upcoming stable version of the specification all properties and classes used in DTS will be defined by a DTS Ontology.

Specific to DTS are the properties `dtsVersion`, `totalParents`, `totalChildren` and `extensions` that are returned as part of the `Collection` object in the DraCor DTS implementation. There are additional properties that are defined by the DTS Specification, but are not used (see [https://distributed-text-services.github.io/specifications/versions/unstable/#scheme-for-collection-api-responses](https://distributed-text-services.github.io/specifications/versions/unstable/#scheme-for-collection-api-responses)). The used properties contain the following information:

* **`totalParents`**: This property contains the number of parent collections the current collection (or Resource) is part of. In the example from the Tatar Drama Corpus the number is `1` (as for all other corpora in DraCor), because the collection hierarchy of DraCor is flat: all corpora are part of a single parent collection that is “DraCor corpora”. Plays are always members of a single corpus only. Therefore the value of this always equals `1` when requesting a play via the Collection endpoint.

* **`totalChildren`**: This property contains the number of resources that are children of the given collection. In the case of the *Tatar Drama Corpus* the number is `3`, which means that there are three plays contained in this corpus. In case of a single play the value of the property is always `0`.

* **`member`**: This property links the collection with the data on the resources contained therein. In the case of the JSON-LD response of *Tatar Drama Corpus* the “members” are included as instances of the class `Resource` (`dts:Resource`) as items of an JSON array.

The following example shows the first item contained in the "member" array:

In [24]:
# The data of the Tatar Drama Corpus was requested in the previous code cell 
# and is available as a requests response object assigned to the variable r

r.json()["member"][0]

{'@type': 'Resource',
 'download': 'https://staging.dracor.org/api/v1/corpora/tat/plays/qamal-berenche-teatr/tei',
 'dublinCore': {'language': 'tat', 'creator': 'Камал, Галиәсгар'},
 'collection': 'https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat000001{&nav}',
 'navigation': 'https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/tat000001{&ref,start,end,down}',
 'totalParents': 1,
 'totalChildren': 0,
 '@id': 'https://staging.dracor.org/id/tat000001',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'numOfSegments': 13,
  'yearNormalized': 1908,
  'wordCountText': 4505,
  'numOfSpeakers': 7,
  'wikidataUri': 'http://www.wikidata.org/entity/Q25556355'},
 'document': 'https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/tat000001{&ref,start,end,mediaType}',
 'title': 'Беренче театр'

The items which represent the plays in a given DraCor corpus are included as `Resource` objects. This is marked by the value of the property `@type`, which is `"Resource"`. 

In [25]:
# Get the value of the @type field of the first play

r.json()["member"][0]["@type"]

'Resource'

Some of the other properties that appear on the `Resource` object have already been discussed above.

The properties `collection`, `navigation` and `document` contain URI Templates based on which a client can construct URLs to access the resource via the Collection, the Navigation and the Document endpoints.

In [26]:
# Print the URI Templates of the resource

for prop in ["collection", "navigation", "document"]:
    print(f"{prop.title()}: {r.json()["member"][0][prop]}\n")

Collection: https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/tat000001{&nav}

Navigation: https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/tat000001{&ref,start,end,down}

Document: https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/tat000001{&ref,start,end,mediaType}



The pattern listing the parameters is slightly different than the ones described when discussing the Entrypoint above: The URI Templates already include a fixed query parameter resource which holds the identifier of the given resource, e.g. `?resource=https://staging.dracor.org/id/tat000001`, the other available query parameters are put in curly brackets `{}` with an ampersand `&` before the first parameter name, e.g. in the URI Template of the document endpoint: {&ref,start,end,mediaType}. When expanding the pattern to a full URL a client should for each parameter name create a sequence of ampersand symbol `&`, parameter name, equals sign `=` and supplied value of the given parameter.

For example, when a client wants to use the Document endpoint to request the text proper of the first scene of the first act. This segment is identified by the fragment identifier `body/div[1]/div[1]` which must be provided as the value of the `ref` parameter. If a client wants to retrieve the content of this fragment as plaintext, this can be controlled in the request by setting the value of the parameter `mediaType`  to `text/plain`. Accordingly, the variables in URI Template 

```
{&ref,start,end,mediaType}` 
```

can be expanded into 

```
&ref=body/div[1]/div[1]&mediaType=text/plain
```

and thus resulting in  the following request URL: 

[`https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/tat000001&ref=body/div[1]/div[1]&mediaType=text/plain`](https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/tat000001&ref=body/div[1]/div[1]&mediaType=text/plain)

Another property included in the Resource object is `download`. It holds a link to download the resource – in the case of DraCor it is the URL of the API endpoint that serves the TEI file `api/v1/copora/{corpusname}/plays/{playname}/tei`.

In [27]:
# Get the download link from the previously fetched play

print(r.json()["member"][0]["download"])

https://staging.dracor.org/api/v1/corpora/tat/plays/qamal-berenche-teatr/tei


Additional metadata can be added to a resource in two sections or “metadata zones” ([Almas et al. 2023](https://doi.org/10.4000/jtei.4352)). The Metadata objects are connected to the resource with the properties `dublinCore` and `extensions`. While in the first Metadata object terms of the [*Dublin Core Vocabulary*](https://www.dublincore.org/specifications/dublin-core/dcmi-terms ) (DC Terms) can be used, the latter `Metadata` object connected via `extensions` that can hold additional metadata using terms of arbitrary schemas. The terms must be defined in a separate JSON-LD context.

The DraCor implementation adds additional metadata in `extensions`. The respective terms are included in a JSON-LD context that is currently available from a DraCor GitHub repository at [https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json](https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json), but will be later provided directly via the DraCor infrastructure to be properly dereferencable. In the JSON-LD context properties already defined in the  [*DraCor API Ontology*](https://github.com/dracor-org/dracor-ontology/blob/main/v1/dracor_api_ontology.ttl) are used. However, In the current version of the DraCor DTS implementation only a couple of specialized data properties are present, as is shown in the following example:

In [28]:
# Get the extensions metadata object from the previously fetched play

r.json()["member"][0]["extensions"]

{'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
 'numOfSegments': 13,
 'yearNormalized': 1908,
 'wordCountText': 4505,
 'numOfSpeakers': 7,
 'wikidataUri': 'http://www.wikidata.org/entity/Q25556355'}

The following list includes the links to the descriptions of the respective API features referenced in the context:

* `numberOfSegments`: Feature P37 [play_num_of_segments](https://staging.dracor.org/doc/odd#play_num_of_segments)
* `yearNormalized`: Feature P27 [play_year_normalized](https://staging.dracor.org/doc/odd#play_year_normalized)
* `wordCountText`: Feature P41 [play_num_of_word_tokens_in_text_elements](https://staging.dracor.org/doc/odd#play_num_of_word_tokens_in_text_elements)
* `numOfSpeakers`: Feature P45 [play_num_of_speakers](https://staging.dracor.org/doc/odd#play_num_of_speakers)
* `wikidataUri`: Here no Feature defined in the *DraCor API Ontology* could be re-used. The full URI included here is based on Feature P4 [play_wikidata_id](https://staging.dracor.org/doc/odd#play_wikidata_id) prepended by the Wikidata Base URI `http://www.wikidata.org/entity/`

Because it is possible to attach custom metadata as well to a collection as to a resource via the `extensions` property, it is possible to cover the functionality provided by the DraCor APIs endpoints `api/v1/corpora`, `api/v1/corpora/{corpusname}`, `api/v1/corpora/{corpusname}/metadata` and, to some extent also endpoints, that provide metadata on a single play, e.g. `api/v1/corpora/{corpusname}/plays/{playname}`.

#### Metadata on a single play via the Collection endpoint

When requesting metadata on a corpus via the Collection endpoint, the plays included in a corpus are listed in the member array, but, as has been said, it is also possible to retrieve metadata on a single play via the Collection endpoint. 

The following request URL can be used to fetch the data on the play “Lessing: Emilia Galotti” from the German Drama Corpus:

[https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000088](https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000088)

In the following code cell the data about this play is retrieved via the DTS Collection endpoint of the DraCor Staging Server:

In [29]:
# Get information on the play and output a JSON serialization

request_url = "https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000088"
r = requests.get(request_url)
r.json()

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 'citationTrees': [{'@type': 'CitationTree',
   'maxCiteDepth': 4,
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'front',
     'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'title_page'}]},
    {'@type': 'CiteStructure',
     'citeType': 'body',
     'citeStructure': [{'@type': 'CiteStructure',
       'citeType': 'act',
       'citeStructure': [{'@type': 'CiteStructure',
         'citeType': 'scene',
         'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
          {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]}],
 '@type': 'Resource',
 'download': 'https://staging.dracor.org/api/v1/corpora/ger/plays/lessing-emilia-galotti/tei',
 'dublinCore': {'language': 'ger', 'creator': 'Lessing, Gotthold Ephraim'},
 'navigation': 'https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000088{&ref,st

A single `Resource` object – not `Collection` as in the case of a corpus – is returned by the API that represents the play. It still resembles the objects included as members of the collection that was shown in the previous example from the *Tatar Drama Corpus*, but includes an additional nested object: the “Citation Tree”.

#### Citation Tree

A “Citation Tree” in the context of DTS refers to the hierarchical structure used to identify and reference specific parts of a text. This structure allows for the organization of textual content into so-called “Citable Units”, which can be cited and navigated.

The “Citation Tree” serves as an abstraction or generalization of the structure of a play, providing a standardized way to represent and navigate its hierarchical components, such as acts, scenes, speech acts and stage directions. Citation trees are essential for scholarly referencing, allowing researchers to accurately point to specific parts of a text. In the Collection endpoint (but also the Navigation endpoint) `CitationTree` objects are included for Resources. The property `citeStructure` is used for nesting child `CiteStructure` objects in the parent object. 

In the case of the DraCor implementation of DTS a single “Citation Tree” is included as a `CitationTree` object in the JSON array with the key `citationTrees`. 

It organizes a text into nested sections, such as the top level divisions front matter (identifier: `front`), text proper (`body`) and back matter (`back`) on the upper most level. A Citation Tree of a text that has only these divisions is one level deep and allows only for the retrieval of the before mentioned sections. It has a so-called maximum cite depth (deprecated property `maxCiteDepth`) of `1`.

The text proper of texts on DraCor are commonly structured into nested segments, most prominently, but not always referred to as “acts” and “scenes”. There are common different labels depending on language and dramatic traditions, e.g. “configuration”. 

A play that is structured into scenes only – and these scenes could be addressed and retrieved via the DTS implementation – would have a maximum cite depth of `2`. 

If a play is divided into acts and therein into scenes the cite depth would be `3`. In the current DraCor implementation it is possible to address segments of the play down to the level of the individual speech act and the stage direction, thus we support a maximum cite depth of `4` in the text proper.

The following code cell shows the CitationTree object with the `@type` property value `CitationTree` is included in the play “Emilia Galotti” requested with the URL above:

In [30]:
# Get the CitationTree object of the play Emilia Galotti 
# that was requested in the previous code cell 
# and is available in the variable r

r.json()["citationTrees"]

[{'@type': 'CitationTree',
  'maxCiteDepth': 4,
  'citeStructure': [{'@type': 'CiteStructure',
    'citeType': 'front',
    'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'title_page'}]},
   {'@type': 'CiteStructure',
    'citeType': 'body',
    'citeStructure': [{'@type': 'CiteStructure',
      'citeType': 'act',
      'citeStructure': [{'@type': 'CiteStructure',
        'citeType': 'scene',
        'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
         {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]}]

In [31]:
# Get the part of the tree representing the front matter

r.json()["citationTrees"][0]["citeStructure"][0]

{'@type': 'CiteStructure',
 'citeType': 'front',
 'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'title_page'}]}

In [32]:
# output the citation tree part representing the text proper
# output the contents of the field with the key `citeStructure`:

r.json()["citationTrees"][0]["citeStructure"][1]

{'@type': 'CiteStructure',
 'citeType': 'body',
 'citeStructure': [{'@type': 'CiteStructure',
   'citeType': 'act',
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'scene',
     'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
      {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}

In [33]:
# here we are at level 4 of the CitationTree object

r.json()["citationTrees"][0]["citeStructure"][1]["citeStructure"][0]["citeStructure"][0]["citeStructure"]

[{'@type': 'CiteStructure', 'citeType': 'speech'},
 {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]

From this `CitationTree` we can understand that the play “Emilia Galotti”:

* can be navigated, referenced, cited on four levels, the maximum cite depth is `4`,
* on the top level the play is structured into two main parts: front matter and text proper (there is no back matter),
* the text proper is structured into acts and scenes,
* it is possible to address (reference, cite) individual speech acts or stage directions inside the scenes.

### Practical Examples using the data returned by the DraCor DTS Collection endpoint

The following section demonstrates how the DraCor DTS Collection endpoint can be used in practice.

#### Order the corpora by number of plays

The following code cell retrieves all corpora via the DraCor DTS Collection endpoint and sorts them by the number of plays included. It then outputs the name of the largest corpus together with the number of plays:

In [34]:
# We currently use the Staging Sever
request_url = "https://staging.dracor.org/api/v1/dts/collection"

r = requests.get(request_url)
if r.status_code == 200:
    # the request to the endpoint was successful, parse the returned JSON object
    collection_response_data = r.json()

# Print the number of corpora (= 'totalChildren' of the root collection)
num_corpora = collection_response_data["totalChildren"] 
print(f"There are currently {num_corpora} corpora published on the DraCor Staging Server.")

# A test if the number of items in member equals the number of corpora (as 'totalChildren' )
# This should not raise an Assertion Error
assert len(collection_response_data["member"]) == num_corpora, "The number of items in 'member' and the value of 'totalChildren' does not match!"

# Sort the corpora by the value of 'totalChildren'; 
# the list should be in descending order therefore the flag reverse need to be set
# The filtering is done with a lambda function
sorted_corpora = sorted(collection_response_data["member"], key=lambda corpus: corpus['totalChildren'], reverse=True)

# Print the title (key  'title') and the number of plays (key 'totalChildren') of the biggest corpus 
# which is the first item in the list 'sorted_corpora'
print(f"The biggest corpus by number of plays is the {sorted_corpora[0]["title"]} with {sorted_corpora[0]["totalChildren"]} plays.")

There are currently 26 corpora published on the DraCor Staging Server.
The biggest corpus by number of plays is the French Drama Corpus with 1940 plays.


#### Filter the plays of a corpus

The properties currently included in the `extensions` metadata of the "Resources" (i.e. plays) in a "Collection" (i.e. corpus) can be used for filtering. 

The following cell retrieves the data of a corpus from the DraCor DTS Collection endpoint of a single corpus identified by the value in the query parameter `id`.

In [35]:
# Set the corpus name.
# This will be used as identifier as the value of the query parameter 'id'
# We use the German Drama Corpus here
corpus_name = "ger"
request_url = f"https://staging.dracor.org/api/v1/dts/collection?id={corpus_name}"

r = requests.get(request_url)
if r.status_code == 200:
    # the request to the endpoint was successful, parse the returned JSON object
    corpus_response_data = json.loads(r.text)

print(f"Retrieved data of the {corpus_response_data["title"]} including {corpus_response_data["totalChildren"]} plays.")

Retrieved data of the German Drama Corpus including 735 plays.


In the following sections we filter the retrieved corpus data based on the metadata available in the `extensions` metadata zone.

##### Filter for plays in a given corpus within a certain date range

The code in the following cell filters the plays included in the *German Drama Corpus* by a date range. We will get only the plays with a `yearNormalized` value between 1730 and 1930. The actual filtering is done with a lambda function that checks if the value of the property falls within the range.

In [36]:
# Filter the list to include only plays with a normalized date between 1730 and 1930 
# (approx. the time span covered by the original GerDraCor based on the DLINA Corpus)
plays_filtered_by_year_normalized = list(filter(lambda item: 1730 <= item["extensions"]["yearNormalized"] <= 1900, corpus_response_data["member"]))

print(f"There are {len(plays_filtered_by_year_normalized)} plays of which the value of 'yearNormalized' is between 1730 and 1930.")

# Sort the plays by the value of 'yearNormalized'
plays_filtered_by_year_normalized = sorted(plays_filtered_by_year_normalized, key=lambda play: play["extensions"]["yearNormalized"])

print(f"The play with the lowest yearNormalized value whithin this set is '{plays_filtered_by_year_normalized[0]["title"]}' ({plays_filtered_by_year_normalized[0]["extensions"]["yearNormalized"]}).")

There are 566 plays of which the value of 'yearNormalized' is between 1730 and 1930.
The play with the lowest yearNormalized value whithin this set is 'Der sterbende Cato' (1731).


##### Filter by the number of top level segments

The `extensions` metadata object of the DraCor DTS implementation also includes information on the number of top level segments a play is structured. We will retrieve plays that have 5 top level segments:

In [37]:
# Filter for 5 act plays: the property 'numOfSegments' in 'extensions' should equal 5

# Caveat: Actually, these are plays that have 5 segments (see https://staging.dracor.org/doc/odd#play_num_of_segments) 
# Theoretically, these can also be plays that have 5 scenes or other other 5 top level <div>s inside <body>
# It would be better to rely on Feature P38 play_num_of_acts (see https://staging.dracor.org/doc/odd#play_num_of_acts) 
# but this is not included in the 'extensions' Metadata object in the DTS implementation

# Filter the list to include only items where 'numOfSegments' is equal to 5
plays_filtered_by_num_of_segments = list(filter(lambda item: item["extensions"]["numOfSegments"] == 5, corpus_response_data["member"]))

print(f"There are {len(plays_filtered_by_num_of_segments)} plays that have 5 top level segments (acts?).")

# uncomment and change the following line to get the first play from the filtered list:
# plays_filtered_by_num_of_segments[0]

There are 61 plays that have 5 top level segments (acts?).


We will later demonstrate how to use a DTS specific feature – the *CitationTree* – to check if the segments are really of the type act. 

##### Evaluating a play's structure based on the `CitationTree`

In Example above we used the Collection endpoint to retrieve plays that consist of 5 segments. The code in the following cell retrieves the distinct types of citation trees of these plays:

In [38]:
%%time
#Create a list of distinct CiteStructures in in the list of 5 segment thingis
# This takes some time, it fetches data from the API in turns for each of the 61 plays

distinct_body_cite_structures = []

# The previous example assigned the filtered data to a variable plays_filtered_by_num_of_segments

for play in plays_filtered_by_num_of_segments:
    request_url = f"https://staging.dracor.org/api/v1/dts/collection?id={play["@id"]}"
    r = requests.get(request_url)
    if r.status_code == 200:
        data = json.loads(r.text)

        citation_tree_top_level_structures = data["citationTrees"][0]["citeStructure"]
        # get the body structure
        body_structure = list(filter(lambda item: item["citeType"] == "body", citation_tree_top_level_structures))
        
        
        if body_structure in distinct_body_cite_structures:
            pass
        else:
            distinct_body_cite_structures.append(body_structure)

CPU times: user 635 ms, sys: 148 ms, total: 783 ms
Wall time: 1min 1s


In [39]:
print(f"There are {len(distinct_body_cite_structures)} distinct structures.")

There are 5 distinct structures.


In [40]:
# Output the distinct types of CiteStructures

distinct_body_cite_structures

[[{'@type': 'CiteStructure',
   'citeType': 'body',
   'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'act'}]}],
 [{'@type': 'CiteStructure',
   'citeType': 'body',
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'scene',
     'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
      {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}],
 [{'@type': 'CiteStructure',
   'citeType': 'body',
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'act',
     'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'location'}]}]}],
 [{'citeType': 'body'}],
 [{'@type': 'CiteStructure',
   'citeType': 'body',
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'act',
     'citeStructure': [{'@type': 'CiteStructure',
       'citeType': 'scene',
       'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
        {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]]

In [41]:
# If it does not work, the package treelib needs to be installed
# uncomment the following line and run it
# The installation should have happend at the beginning of the notebook already

#!pip install treelib

The following function uses the package [treelib](https://treelib.readthedocs.io/) to transform the `CitationTree` object returned by the API into an instance of the class `Tree` which can then be displayed:

In [42]:
def cite_structure_to_tree(cite_structure, parent_id:str = None, tree:Tree = None):
    """Transform a DTS CiteStructure to a tree that can be visualized""" 
    logging.debug(cite_structure)

    if tree is None:
        tree = Tree()
        logging.debug("Created tree")
    else: 
        logging.debug("There is a tree already")
        tree = tree

    logging.debug(type(cite_structure))
    if type(cite_structure) == list:
        logging.debug("Is a list, process each item.")
        for item in cite_structure:
            cite_structure_to_tree(item, parent_id, tree)
            
        
    elif type(cite_structure) == dict:
        logging.debug("Is a dictionary, process this item.")
        logging.debug(f"the cite type is: {cite_structure["citeType"]}")
        logging.debug(f"the parent is: {parent_id}")

        if parent_id is None:
            tree.create_node(cite_structure["citeType"].capitalize().replace("_"," "), cite_structure["citeType"])
        else: 
            tree.create_node(cite_structure["citeType"].capitalize().replace("_"," "), cite_structure["citeType"], parent=parent_id)

        if "citeStructure" in cite_structure:
            logging.debug("There is a nested structure!")
            logging.debug("parent_id should be self!")
            parent_id = cite_structure["citeType"]
            logging.debug(f"parent_id is now: {parent_id}")
            for item in cite_structure["citeStructure"]:
                tree = cite_structure_to_tree(item, parent_id, tree)
                

    # return the tree
    return tree
            
    

The code in the next cell iterates about the collected structures and visualizes them as trees:

In [43]:
n = 1
for item in distinct_body_cite_structures:
    print(f"Structure type {n}:\n")
    tree = cite_structure_to_tree(item)
    tree.show()
    print("----------------------\n")
    n=n+1

Structure type 1:

Body
└── Act

----------------------

Structure type 2:

Body
└── Scene
    ├── Speech
    └── Stage direction

----------------------

Structure type 3:

Body
└── Act
    └── Location

----------------------

Structure type 4:

Body

----------------------

Structure type 5:

Body
└── Act
    └── Scene
        ├── Speech
        └── Stage direction

----------------------



The Collection endpoint in the DTS API is primarily designed to facilitate the browsing of text collections, providing users with a catalog-like entry point to explore available resources.

However, in the context of the DraCor implementation, this endpoint goes beyond mere discovery functionality by including additional, DraCor-specific metadata in the extensions object which not only allow for filtering a corpus by certain criteria as is already possible with other DraCor API endpoints, e.g. `api/v1/corpora/{corpusname}`, but can also be used to answer research questions specific to the study of drama.

Moreover, DTS specific metadata, like the information in the `CitationTree` object can become particularly relevant for scholars studying dramatic texts, as it provides a clear view of how plays are organized and segmented. Thus, the DraCor DTS Collection endpoint could enable comparative studies across multiple plays, allowing for the examination of structural patterns and their variations. This capability is invaluable for exploring how different playwrights approach the organization of their works and how these choices might influence the dramatic impact and audience engagement.

### DraCor DTS Navigation Endpoint `api/v1/dts/navigation`

While the Collections endpoint helps users locate individual resources within a collection – in the case of DraCor – a corpus, the DTS Navigation endpoint `/dts/navigation` returns data on the hierarchical relationships between different structural divisions of a text. It provides references that can be used to retrieve specific parts of the text from the DTS Document endpoint, enabling precise navigation and retrieval of text segments. It also lists available identifiers of these segments that can be used to unambiquously reference it.

#### Parameters of the Navigation Endpoint

DTS Specification defines the query paramerters `resource`, `ref`, `start`, `end`, `down`, `tree` and `page`. Because the pageing functionality has not been implemented in DraCor and there is only a single reference system in place (see `CitationTree`), the latter two parameters are not or not fully implemented in the DraCor endpoint.

* `resource`: This parameter is used to provide the unique identifier (URI) of the Resource being navigated. In the case of DraCor it is the HTTP URI of a corpus or a document that is also used as the value of the parameter `id` in the Collection endpoint. E.g. the identifier `https://staging.dracor.org/id/ger000088` is used in the request to the Collection endpoint

```
https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000088
```

This same identifier should be used as the value of the parameter `resource` of the Navigation endpoint as well to navigate the structural segments of this play. For a request to the server to be valid it is not sufficient to use the parameter `resource` alone, but in combination with the following parameters.

* `ref`: This parameter is used to supply an identifier of a certain segment of a text (a *Citeable Unit*). The specification does not restrict what this identifier should be or how it should be minted. In the case of the DraCor implementation of DTS we use designated Fragment Identifiers (see section below)

By using the Navigation endpoint it is possible to *either* address a single Citeable Unit *or* a segment of the text, consisting of all Citeable Unites between two nodes. The parameter `ref` must not be combined in a request with any of the following two parameters that have alway to be used together:

* `start` and `end`:  When requesting a range of text, the parameters `start` and `end` include the identifiers of the Citeable Units that define the boundaries of a sequence. This sequence encompasses all Citeable Units between these two points, including the starting and ending points themselves.

Requesting ranges is still an experimental feature in the DraCor implementation. A limitation is that the starting and ending points of a requested range must share the same parent. For example, a user can request the first three acts of a five-act play, or the second to seventh speeches within the second scene of the third act. However, he or she cannot request a range that spans from the last scene of the first act to the second scene of the third act, as these do not share a common parent.

* `down`: This query parameter controls to which maximum depth the citation subtree should be returned. The depth is relative to the specified reference (identified with `ref` or `start` and `end`), e.g. if a Resource is 4 levels deep (body, act, scene, speech act/stage direction) then a value of `2` of the parameter `down` and starting from the body (`ref=body`), the Navigation endpointreturn all acts and all scenes therein. When starting from the first act (`ref=body/div[1]`) it would return all scenes from the first act only including also speech acts and stage directions. A value of -1 indicates that the entire citation tree, down to its deepest level, should be returned. If the parameter is not set, the Navigation endpoint will not return any Citeable Units of the Citation tree, but only the references identfied by `ref` or `start` and `end`.

#### DraCor Fragment Identifiers

In the DraCor implementation these text fragment identifiers resemble [xPath](https://www.w3.org/TR/xpath20/) expressions. XPath is a W3C recommended language used to navigate and select nodes within XML documents. Using XPath expressions, users can address resources or nodes within an XML tree by specifying paths that traverse the hierarchical structure of the document. The slash character `/` is used to separate the different levels or steps in a path expression, allowing to navigate down the hierarchical structure of XML. For example in the following XML snippet the element with the text value `B` can be "reached" starting from the element `<root>` and going down into the first nested element and then into the second nested element, which results in the path: `/root/element[1]/element[2]`

```xml
<root>
  <element>
    <element>A</element>
  </element>
  <element>B</element>
</root>
```

In the DraCor DTS implementation a similar notation is used to address segments of the text, that are – in the source TEI-XML File more deeply nested than in the simple example above:

```xml
<TEI>
  <teiHeader>
    <!-- ... -->
  </teiHeader>
  <text>
    <front>
      <div>
        <!-- ... -->
      </div>
    <front>
    <body>
      <div type="act">
        <head>First act</head>
        <div type="scene">
          <head>First scene</head>
          <stage>
            <!-- ... -->
          </stage>
          <sp>
            <!-- ... -->
          <sp>
        </div>
        <div>
         <head>Second scene</head>
         <!-- ... -->
        </div>
      </div>
      <div>
       <head>Second act</head>
       <!-- ... -->
      </div> 
    <body>
  </text>
</TEI>
```

A valid xPath expression to select the single first speech act in the first scene of the first act would be:

```
/TEI[1]/text[1]/body[1]/div[1]/div[1]/sp[1]
```

However, if we assume, that some of the elements (`<TEI>`, `<text>` and `<body>`) are included only once in a (DraCor) TEI file, the number in brackets – `[1]` – meaning the position in a sequence of sibling elements does not necessarily needs to be included. Thus, the xPath results in 

```
/TEI/text/body/div[1]/div[1]/sp[1]
```

In the DrCor DTS implementation not this whole xPath would be used as identifier, but only the part starting from the element `<text>` and thus would result in an identifier that could be used as the value of the parameter `ref` of 

```
body/div[1]/div[1]/sp[1]
```

 Likewise, the whole second act in the example above would be identified with `body/div[2]` and so on.

#### Response of the Navigation Endpoint

The following code can be used to retrieve a single play via the Navigation endpoint. 

Apart from the mandatory parameter `resource` which holds the ID of the play, as a bare minimum it is necessary to provide either a reference to a citeable unit as the value of the `ref` parameter or the parameter `down` specifying up until to which level citeable units should be returned. 

 In the following example information about the first act (`ref=body/div[1]`) of *Goethe: Iphigenie auf Tauris* (URI: [https://staging.dracor.org/id/ger000001](https://staging.dracor.org/id/ger000001)) are requested with the following URL:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body/div[1]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body/div[1])

In [44]:
resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[1]" # Identifier of the first act

request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}"
r = requests.get(request_url)
if r.status_code == 200:
    navigation_response_data = r.json()

# Output the retrieved data
navigation_response_data

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'Navigation',
 'ref': {'identifier': 'body/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 'dtsVersion': 'unstable',
 'resource': {'citationTrees': [{'@type': 'CitationTree',
    'maxCiteDepth': 4,
    'citeStructure': [{'@type': 'CiteStructure',
      'citeType': 'front',
      'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'front'},
       {'@type': 'CiteStructure', 'citeType': 'dramatis_personae'},
       {'@type': 'CiteStructure', 'citeType': 'setting'}]},
     {'@type': 'CiteStructure',
      'citeType': 'body',
      'citeStructure': [{'@type': 'CiteStructure',
        'citeType': 'act',
        'citeStructure': [{'@type': 'CiteStructure',
          'citeType': 'scene',
          'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
           {'@type': 'CiteStruct

The DTS Navigation endpoint returnes a `Navigation` object (see `@type` property), identified with the request URL as the value of the `@id` property. The property `dtsVersion` holds the version of the DTS Specification, that is the basis for the navigation response object. Additionally, it includes a `Resource` object (linked via property `resource`) as well as a reference to a Citable Unit (property `ref`) (in the following example replaced with placeholder `{ ... }`): 

```json
{
  "@context" : "https://distributed-text-services.github.io/specifications/context/1-alpha1.json",
  "@id" : "https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body/div[1]",
  "@type" : "Navigation",
  "dtsVersion" : "unstable",
  "resource" : { … } ,
  "ref" : { … }
}
```

In [45]:
# Output the value of the @type property

navigation_response_data["@type"]

'Navigation'

The included `Resource` object lists the URI Templates for the Document, the Collection and the Navigation endpoints (properties `document`, `collection`, `navigation`), the Citation Tree(s) (property `citationTrees`) and – as in optional property (`mediaType`) carries the information in which formats the resource is available. Below is an example showing only the resource part:

```json
"resource" : {
    "@id" : "https://staging.dracor.org/id/ger000001",
    "@type" : "Resource",
    "document" : "https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000001{&ref,start,end,mediaType}",
    "collection" : "https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000001{&nav}",
    "navigation" : "https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001{&ref,start,end,down}",
    "citationTrees" : [ … ],
    "mediaTypes" : [ "application/tei+xml", "text/plain" ]
  }
```

In [46]:
# Print the @type value

print(f"The returned object is of type: {navigation_response_data["resource"]["@type"]}\n\n")

# Output the included Resource object

navigation_response_data["resource"]

The returned object is of type: Resource




{'citationTrees': [{'@type': 'CitationTree',
   'maxCiteDepth': 4,
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'front',
     'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'front'},
      {'@type': 'CiteStructure', 'citeType': 'dramatis_personae'},
      {'@type': 'CiteStructure', 'citeType': 'setting'}]},
    {'@type': 'CiteStructure',
     'citeType': 'body',
     'citeStructure': [{'@type': 'CiteStructure',
       'citeType': 'act',
       'citeStructure': [{'@type': 'CiteStructure',
         'citeType': 'scene',
         'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
          {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]}],
 '@type': 'Resource',
 'mediaTypes': ['application/tei+xml', 'text/plain'],
 'document': 'https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000001{&ref,start,end,mediaType}',
 'collection': 'https://staging.dracor.org/api/v1/dts/collection?id=https://st

A resource can be available in various serializations. The property `mediaTypes` links to an JSON array includes the identifiers that can be used as the value of the query parameter `mediaType` of the Document endpoint to retrieve the given representation. In the case of DraCor either TEI-XML as the default, or Plaintext.

In [47]:
# display the contents of 'mediaTypes'
navigation_response_data["resource"]["mediaTypes"]

['application/tei+xml', 'text/plain']

When requesting information on a single citable unit providing its identifier in the query parameter `ref` the returned data includes this referenced citable unit. It is linked with the property `ref`. An example is included below:

```json
"ref" : {
    "@type" : "CitableUnit",
    "identifier" : "body/div[1]",
    "level" : 2,
    "parent" : "body",
    "citeType" : "act",
    "dublinCore" : {
      "title" : "Erster Aufzug"
    }
  }

```

The `CiteableUnit` object has the following properties:

* **`@type`**: This JSON-LD specific property indicates the class of the object. The only allowed value is `CiteableUnit`.
* **`identifier`**: The identifier of the citable unit. It is used as the value of the query parameter `ref` to identify it across the Navigation and Document endpoints. In the DraCor implementation xPath-like path expressions are used as identifiers (see section *DraCor Fragment Identifiers* above).
* **`level`**: This property indicates the depth at which the citable unit is found within the citation tree of the Resource.
* **`parent`**: This property contains the identifier of the parent of the citable unit in the tree of the Resource.
* **`citeType`**: This property contains a string identifying the type of the citable unit, e.g. "act", "scene", "speech", "stage_direction". The values correspond to the types used in the `CiteStructure` objects in the `CitationTree`.
* **`dublinCore`**: This property is used to include a `MetadataObject` using terms from the *Dublin Core Terms Vocabulary*. In the DraCor implementation currently only the term `dct:title` is used. In acts and scenes it includes the value of the very first `<tei:head>` element found in the corresponding `<tei:div>`.
* **`extensions`**: This property is used to include a `MetadataObject` with DraCor-specific properties.

The following examples illustrate how to use the Navigation endpoint to retrieve information about a single play.

In [48]:
# display the citeable unit identified by the identifier in the parameter ref
navigation_response_data["ref"]

{'identifier': 'body/div[1]',
 '@type': 'CitableUnit',
 'dublinCore': {'title': 'Erster Aufzug'},
 'level': 2,
 'parent': 'body',
 'citeType': 'act'}

#### Navigating a resource

The following examples demonstrate how to use the Navigation endpoint to retrieve information about the citable units of a Resource. As an example we use the play Goethe: “Iphigenie auf Tauris” (`ger000001`) from the *German Drama Corpus*.

The code in the following cell retrieves the top level structural segments of a play. Setting the query parameter `down` to the value of `1` tells the API to get the strucutral segments one level deep. Because the parameter `ref` is not used, the starting point is the top most element of the tree – the whole text. 

In [49]:
# list the top level structural segments of a play
# by setting down to "1"
# If you use the production server, change the resource to https://dracor.org/id/ger000001

resource = "https://staging.dracor.org/id/ger000001"
request_url = f"{api_base}dts/navigation?resource={resource}&down=1"
r = requests.get(request_url)
response_data = r.json()
response_data

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'Navigation',
 'dtsVersion': 'unstable',
 'resource': {'citationTrees': [{'@type': 'CitationTree',
    'maxCiteDepth': 4,
    'citeStructure': [{'@type': 'CiteStructure',
      'citeType': 'front',
      'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'front'},
       {'@type': 'CiteStructure', 'citeType': 'dramatis_personae'},
       {'@type': 'CiteStructure', 'citeType': 'setting'}]},
     {'@type': 'CiteStructure',
      'citeType': 'body',
      'citeStructure': [{'@type': 'CiteStructure',
        'citeType': 'act',
        'citeStructure': [{'@type': 'CiteStructure',
          'citeType': 'scene',
          'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
           {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]}],
  '@type': 'Resource',
  'mediaTypes': ['application/tei+xml', 'text/plain'],
  'document': 'https://staging.dracor.o

The `Navigation` object returned by the API includes a property `member` which holds an array including two citable units – the front matter (identifier: `front`) and the text proper (identifier: `body`):

In [50]:
# display the list containing the citeable units
response_data["member"]

[{'identifier': 'front',
  '@type': 'CitableUnit',
  'level': 1,
  'parent': None,
  'citeType': 'front'},
 {'identifier': 'body',
  '@type': 'CitableUnit',
  'level': 1,
  'parent': None,
  'citeType': 'body'}]

In the cell above, the citeable units are transformed into Python dictionaries, which means that in the data structure datetypes native to Python are included. Therefore in the example above the value of the property `parent` is `None`, while in in the actual JSON response the value is `null`:

```json
{
    "@type" : "CitableUnit",
    "identifier" : "front",
    "level" : 1,
    "parent" : null,
    "citeType" : "front"
}
```
This means that the citable unit is at the most upper level in the citation tree.

The values of the `identifier` properties can be used to construct request urls to retrieve information about these parts of the text. For example, the request URL to retrieve the top structural divisions of the front matter can be constructed by using the value of the property identifier of the first included citable unit `front` as the value of the query parameter `ref` (`ref=front`) and setting the parameter `down` to the value `1` to tell the API to include the nested structural segments one level down in the citation tree:

In [51]:
# list the structural elements in the front matter

resource = "https://staging.dracor.org/id/ger000001"
ref = "front"
down = "1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
response_data["member"]

[{'identifier': 'front/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Johann Wolfgang Goethe'},
  'level': 2,
  'parent': 'front',
  'citeType': 'front'},
 {'identifier': 'front/div[2]',
  '@type': 'CitableUnit',
  'level': 2,
  'parent': 'front',
  'citeType': 'dramatis_personae'},
 {'identifier': 'front/set[1]',
  '@type': 'CitableUnit',
  'level': 2,
  'parent': 'front',
  'citeType': 'setting'}]

The front matter of the play contains three structural parts that can be adressed and cited via DTS by using the identifiers included in the property `identifier`. For example using `front/div[2]` can be used to adress the dramatis personae of the play. 

The navigation endpoint can be used for that as is shown in the code cell below:

In [52]:
# Use the identifier of the structural division representing the 
# front matter as the value of the query parameter ref

resource = "https://staging.dracor.org/id/ger000001"
ref = "front/div[2]"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}"
r = requests.get(request_url)
response_data = r.json()
response_data

{'@context': 'https://distributed-text-services.github.io/specifications/context/1-alpha1.json',
 '@type': 'Navigation',
 'ref': {'identifier': 'front/div[2]',
  '@type': 'CitableUnit',
  'level': 2,
  'parent': 'front',
  'citeType': 'dramatis_personae'},
 'dtsVersion': 'unstable',
 'resource': {'citationTrees': [{'@type': 'CitationTree',
    'maxCiteDepth': 4,
    'citeStructure': [{'@type': 'CiteStructure',
      'citeType': 'front',
      'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'front'},
       {'@type': 'CiteStructure', 'citeType': 'dramatis_personae'},
       {'@type': 'CiteStructure', 'citeType': 'setting'}]},
     {'@type': 'CiteStructure',
      'citeType': 'body',
      'citeStructure': [{'@type': 'CiteStructure',
        'citeType': 'act',
        'citeStructure': [{'@type': 'CiteStructure',
          'citeType': 'scene',
          'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
           {'@type': 'CiteStructure', 'citeType': 'stage_dir

A link to cite the dramatis personae of the play would thus be [`https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=front/div[2]`](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=front/div[2]). 

The fragment identifier `front/div[2]` can also be used to retrieve the acutal text from the Document endpoint (which will be shown later).

The Navigation endpoint can also be used to fetch the citable units down from a certain element. To understand what is available we can have a look at the citation tree:

In [53]:
# Get the first citation tree from the Resource object included in the response

response_data["resource"]["citationTrees"][0]

{'@type': 'CitationTree',
 'maxCiteDepth': 4,
 'citeStructure': [{'@type': 'CiteStructure',
   'citeType': 'front',
   'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'front'},
    {'@type': 'CiteStructure', 'citeType': 'dramatis_personae'},
    {'@type': 'CiteStructure', 'citeType': 'setting'}]},
  {'@type': 'CiteStructure',
   'citeType': 'body',
   'citeStructure': [{'@type': 'CiteStructure',
     'citeType': 'act',
     'citeStructure': [{'@type': 'CiteStructure',
       'citeType': 'scene',
       'citeStructure': [{'@type': 'CiteStructure', 'citeType': 'speech'},
        {'@type': 'CiteStructure', 'citeType': 'stage_direction'}]}]}]}]}

In [54]:
# Visualize the cite structure 
# using the designated fuction cite_structure_to_tree developed in an example above:

tree = cite_structure_to_tree(response_data["resource"]["citationTrees"][0]["citeStructure"][1])
tree.show()

Body
└── Act
    └── Scene
        ├── Speech
        └── Stage direction



We could now list the top level structural elements of the text proper – the acts by setting the parameter value of `ref` to the identifier of the text proper (`body`) and going one level down by setting `down` to `1`:

In [55]:
# list the scenes in the second act

resource = "https://staging.dracor.org/id/ger000001"
ref = "body"
down = "1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()

# Output only the member part
response_data["member"]

[{'identifier': 'body/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 {'identifier': 'body/div[2]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Zweiter Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 {'identifier': 'body/div[3]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Dritter Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 {'identifier': 'body/div[4]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Vierter Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 {'identifier': 'body/div[5]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Fünfter Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'}]

By starting at the second act with the identifier `body/div[2]` we can list the structural elements one level down – the scenes of the second act as is shown in the next cell:

In [56]:
# list the scenes in the second act

resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[2]"
down = "1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
response_data["member"]

[{'identifier': 'body/div[2]/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Auftritt'},
  'level': 3,
  'parent': 'body/div[2]',
  'citeType': 'scene'},
 {'identifier': 'body/div[2]/div[2]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Zweiter Auftritt'},
  'level': 3,
  'parent': 'body/div[2]',
  'citeType': 'scene'}]

There are two scenes in the second act (`body/div[2]`) which are here labeled "Auftritt" (appearance). 

The DTS Navigation endpoint can be used to fetch the scenes and also include the next level, i.e. the speech acts and stage directions by providing a value of `2` in the parameter `down`. We only show the first 6 items included as memebers.

In [57]:
# list the scenes in the second act and include speeches and stage directions
# We include only the first 6 items [0:5]

resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[2]"
down = "2"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
# We restrict the output to the first five citeable units
response_data["member"][0:5]

[{'identifier': 'body/div[2]/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Auftritt'},
  'level': 3,
  'parent': 'body/div[2]',
  'citeType': 'scene'},
 {'identifier': 'body/div[2]/div[1]/stage[1]',
  '@type': 'CitableUnit',
  'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
   'snippet': 'Orest. Pylades.'},
  'level': 4,
  'parent': 'body/div[2]/div[1]',
  'citeType': 'stage_direction'},
 {'identifier': 'body/div[2]/div[1]/sp[1]',
  '@type': 'CitableUnit',
  'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
   'speakers': ['orest'],
   'snippet': 'Es ist der Weg des [...] Furcht.'},
  'level': 4,
  'parent': 'body/div[2]/div[1]',
  'citeType': 'speech'},
 {'identifier': 'body/div[2]/div[1]/sp[2]',
  '@type': 'CitableUnit',
  'extensions': {'@context': 'https://ra

The citable units representing the individual speech acts include addtional DraCor-specific metadata in the `extensions` field:

In [58]:
response_data["member"][3]["extensions"]

{'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
 'speakers': ['pylades'],
 'snippet': 'Ich bin noch nicht, Orest, [...] wähnt.'}

The citable unit with the identifier (property `identifier`) of `body/div[2]/div[1]/sp[2]` as is shown in the example above, contains additional information on which character is speaking (property `speakers`) and includes a small portion of the text (property `snippet`). This text snippet consists of the first 5 word tokens of a speech, three dots to mark an elipsis and the last word token. The snippet can help to identify a certain speech that should be further referenced; the information on the speaking characters can allow for filtering for certain speeches by a character. 

Using the parameter `down` with a value of `-1` allows for retrieving all citable units in a certain segment of the text down to the lowest level of the citation tree. We can use to select a certain structural segment as the starting point (parameter `ref`) or don't include this parameter in the request to get all citable units of the whole play by providing the parameter `down` only.

The following request will fetch all citable units of the play; we don't output them, but count them:

In [59]:
# Get all citeable units of a play and count

resource = "https://staging.dracor.org/id/ger000001"
down = "-1"
request_url = f"{api_base}dts/navigation?resource={resource}&down={down}"
r = requests.get(request_url)
response_data = r.json()

# We only print the number of citeable units included in the play and don't output them
num_citeable_units = len(response_data["member"])
print(f"The API returned information on {num_citeable_units} citeable units.")

The API returned information on 354 citeable units.


#### Using the Navigation endpoint for analysis

We already argued that the `CitationTree` object can be repurposed to get information about the general structuring of a play, e.g. if a play is structured into acts only, or acts and scenes, if a play contains a dramatis personae in the front matter, and so on. The reliability of these information depends on the implementation of the functionality that computes the `CitationTree` based on the combination of TEI-XML elements like `<div>` and certain attributes (for details see the code of the implementation, esp. the xQuery function [`local:generate-citeStructure`](https://github.com/dracor-org/dracor-api/blob/a8ec727c13b3a8d43da298b77ac989572c5c20a4/modules/dts.xqm#L719-L870) which includes a lot hof hard-coded logic).

The endpoints of the regular DraCor API already provide some information about the segmentation of a play. Foremost it is the count based metrics that provide insight into the number of acts and segments a play is divided into (API Feature P37 [play_num_of_segments](https://staging.dracor.org/doc/odd#play_num_of_segments), API Feature P38 [play_num_of_acts](https://staging.dracor.org/doc/odd#play_num_of_acts)). These information is not provided by endpoints that return information on an individual play like `api/v1/corpora/{corpusname}/play/{playname}` or `api/v1/corpora/{corpusname}/play/{playname}/metrics` but needs to be extracted from the “metadata table” of a corpus or from the list of plays included in the response of the endpoint `api/v1/corpora/{corpusname}/metadata`. For the *German Drama Corpus* the data can be retrieved via 

[https://staging.dracor.org/api/v1/corpora/ger/metadata/csv](https://staging.dracor.org/api/v1/corpora/ger/metadata/csv)

The data returned by `api/v1/corpora/{corpusname}/plays/{playname}` includes a JSON array in the field with the key `segments` that lists the individual structural segments that are the basis for generating the co-presence networks. To get this data for the play “Iphigenie auf Tauris” the following request URL can be used:

[https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris](https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris)

By combining the information gathered from requests to both endpoints mentioned above for the given play, a user can find out the following:

* there are "acts" in the play, because the value of the feature `play_num_of_acts` is greater than `0` or not `null`,
* the play is divided into 5 acts,
* there are 20 segments in the play,
* these segments are of the type "scene", and,
* which character(s) are speaking in which of these segments.

With some string analyzing it would be possible to retrieve the information which segment is included in which act because the values of title are constructed following the pattern `{title of act} | {title of scene}` relying on the text content of the element `<tei:head>` in the TEI-XML source.

The same information can be retrieved by the DTS API using the Navigation endpoint. If some client side filtering logic is applied later on it is sufficient to make a single request to the API at

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body&down=-1](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body&down=-1)

The code cells below demonstrate how this information can be retrieved via the regular API endpoints and compare the performance to retrieving it via the DTS Navigation endpoint. 

First, the regular API approach. The operation time can be determined using the cell magic `%%time`. 

In [60]:
%%time

# Regular API approach

# Get the metadata on all plays in the German Drama Corpus. 
# This is a JSON serialization of the Metadata table (https://staging.dracor.org/api/v1/corpora/ger/metadata/csv)

request_url = "https://staging.dracor.org/api/v1/corpora/ger/metadata"
r = requests.get(request_url)
gerdracor_data = r.json()

# we need to filter all plays for a play with the playname "goethe-iphigenie-auf-tauris". 
# The filtering returns a list. We know there is only one play, so we take the first (and only) item.
iphigenie_metadata = list(filter(lambda play: play["name"] == "goethe-iphigenie-auf-tauris", gerdracor_data))[0]

CPU times: user 33.2 ms, sys: 7.13 ms, total: 40.4 ms
Wall time: 40.1 s


In [61]:
#iphigenie_metadata
print(f"Number of acts: {iphigenie_metadata["numOfActs"]}")
print(f"Number of segments: {iphigenie_metadata["numOfSegments"]}")

Number of acts: 5
Number of segments: 20


There is already some overhead of retrieving the information. For a reasonably big corpus as the *German Drama Corpus*, we need to fetch the data of all included plays (which takes some time, in this case almost 40 seconds) and then use some sort of filtering mechanism to just get the data on the single play.

As has been said, the endpoint `corpora/{corpusname}/plays/{playname}` (see [API Documentation](https://staging.dracor.org/doc/api#/public/play-info)) includes a JSON array in the field `segments` that lists the individual structural segments that are the basis for generating the co-presence networks.

The following code cell retrieves this information for the play "Goethe: Iphigenie auf Tauris" via the regular API:

In [62]:
%%time

request_url = "https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris"
r = requests.get(request_url)
api_response_data = r.json()
api_segments = api_response_data["segments"]

CPU times: user 11 ms, sys: 3.37 ms, total: 14.3 ms
Wall time: 208 ms


In [63]:
# Output the first five segments included
api_segments[0:4]

[{'type': 'scene',
  'title': 'Erster Aufzug | Erster Auftritt',
  'speakers': ['iphigenie'],
  'number': 1},
 {'type': 'scene',
  'title': 'Erster Aufzug | Zweiter Auftritt',
  'speakers': ['arkas', 'iphigenie'],
  'number': 2},
 {'type': 'scene',
  'title': 'Erster Aufzug | Dritter Auftritt',
  'speakers': ['iphigenie', 'thoas'],
  'number': 3},
 {'type': 'scene',
  'title': 'Erster Aufzug | Vierter Auftritt',
  'speakers': ['iphigenie'],
  'number': 4}]

In [64]:
# Test if the number of segments as returned by the metadata endpoint is the same as the count of the segments included in the response above
assert len(api_segments) == iphigenie_metadata["numOfSegments"], "The numbers of segments returned by the API do not match."

print(f"There are {len(api_segments)} segments in the play.")

There are 20 segments in the play.


The same information can be retrieved by the DTS API using the Navigation endpoint, as is demonstrated in the following cells. If some client side filtering logic is applied later on it is sufficient to make a single request to the API:

In [65]:
%%time
# Get all citeable units in the body of a play 
resource = "https://staging.dracor.org/id/ger000001"
ref = "body"
down = "-1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
# We only print the number of citeable units included in the play and don't output them
citeable_units = response_data["member"]

CPU times: user 14.5 ms, sys: 4.13 ms, total: 18.6 ms
Wall time: 1.15 s


In [66]:
# display the first 5 citeable units
citeable_units[0:4]

[{'identifier': 'body/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Aufzug'},
  'level': 2,
  'parent': 'body',
  'citeType': 'act'},
 {'identifier': 'body/div[1]/div[1]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Erster Auftritt'},
  'level': 3,
  'parent': 'body/div[1]',
  'citeType': 'scene'},
 {'identifier': 'body/div[1]/div[1]/sp[1]',
  '@type': 'CitableUnit',
  'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
   'speakers': ['iphigenie'],
   'snippet': 'Heraus in eure Schatten, rege [...] Tode!'},
  'level': 4,
  'parent': 'body/div[1]/div[1]',
  'citeType': 'speech'},
 {'identifier': 'body/div[1]/div[2]',
  '@type': 'CitableUnit',
  'dublinCore': {'title': 'Zweiter Auftritt'},
  'level': 3,
  'parent': 'body/div[1]',
  'citeType': 'scene'}]

To get the acts and segments (in this case "scenes") we need two lambda functions that could be used in filtering:

In [67]:
act_filter = lambda citeable_unit: citeable_unit["citeType"] == "act"
scene_filter = lambda citeable_unit: citeable_unit["citeType"] == "scene"

dts_acts = list(filter(act_filter, citeable_units))
dts_scenes = list(filter(scene_filter, citeable_units))

print(f"There are {len(dts_acts)} acts and {len(dts_scenes)} scenes (segments).")

There are 5 acts and 20 scenes (segments).


We can check if the data returned from the regular API endpoints and the DTS API matches:

In [68]:
# check if these are the same values
# This should not throw an assertion error
assert len(dts_acts) == iphigenie_metadata["numOfActs"], "The number of acts does not match."
assert len(dts_scenes) == iphigenie_metadata["numOfSegments"], "The number of segments does not match."

The following code cells display the first scene retrieved from the regular API endpoint and the citeable unit retrieved from the DTS Navigation endpoint:

In [69]:
# First Segment via regular API
api_segments[0]

{'type': 'scene',
 'title': 'Erster Aufzug | Erster Auftritt',
 'speakers': ['iphigenie'],
 'number': 1}

In [70]:
# Same segment as citeable unit
dts_scenes[0]

{'identifier': 'body/div[1]/div[1]',
 '@type': 'CitableUnit',
 'dublinCore': {'title': 'Erster Auftritt'},
 'level': 3,
 'parent': 'body/div[1]',
 'citeType': 'scene'}

The data included in the segment returned by regular the API can also be retrieved via the DTS endpoint: The value of the property `number` (see API Feature [segment_number](https://staging.dracor.org/doc/odd#segment_number)) can is the position of the element in the list `dts_scenes` that was extracted from the Navigation response. We can write that information into the dictionary (as a property of the extensions object), if necessary (see `segment_number`):

In [71]:
for index, item in enumerate(dts_scenes):
    # we need to create the extensions object in each scene
    item["extensions"] = {}
    item["extensions"]["segment_number"] = index + 1

In [72]:
dts_scenes[0]

{'identifier': 'body/div[1]/div[1]',
 '@type': 'CitableUnit',
 'dublinCore': {'title': 'Erster Auftritt'},
 'level': 3,
 'parent': 'body/div[1]',
 'citeType': 'scene',
 'extensions': {'segment_number': 1}}

In the following code cells the speaker information is retrieved and added. We use a scene with multiple speakers as the example:

In [73]:
# Use a scene with several characters as the next example:
api_segments[18]

{'type': 'scene',
 'title': 'Fünfter Aufzug | Fünfter Auftritt',
 'speakers': ['pylades', 'arkas', 'thoas', 'orest'],
 'number': 19}

In [74]:
# the same data in the DTS example:
dts_scenes[18]

{'identifier': 'body/div[5]/div[5]',
 '@type': 'CitableUnit',
 'dublinCore': {'title': 'Fünfter Auftritt'},
 'level': 3,
 'parent': 'body/div[5]',
 'citeType': 'scene',
 'extensions': {'segment_number': 19}}

The speakers must be aggregated from the individual character identifiers listed in the `speakers` property of each citable unit representing a speech act. We can either use all citable units already downloaded, or, as is demonstrated in the next cell, use the DTS API to get this information in separate API calls (which, obviously, is slower). The following cell demonstrates this for the 19th segment ("Fünfter Aufzug | Fünfter Auftritt" [Fifth act | Fifth scene]) with the identifier `body/div[5]/div[5]` (used as value of the parameter `ref`):

In [75]:
# Get all citeable units in scene 5 of act 5
resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[5]/div[5]"
down = "1" # one level down are the speech acts and stage directions
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()

dts_distinct_speakers = []
for item in response_data["member"]:
    if item["citeType"] == "speech":
       for speaker in item["extensions"]["speakers"]:
           if speaker not in dts_distinct_speakers:
               dts_distinct_speakers.append(speaker)

In [76]:
dts_distinct_speakers

['pylades', 'arkas', 'thoas', 'orest']

In [77]:
# This should not throw an assertion error:
assert dts_distinct_speakers == api_segments[18]["speakers"], "The speakers don't match."

The data on the speakers that can be retrieved via the DTS Navigation endpoint matches the information that is returned by the regular DraCor API endpoint. The last information that is included in the response of the `corpora/{corpusname}/plays/{playname}` endpoint is the constructed title. In case of the segment used as an example above it is `'Fünfter Aufzug | Fünfter Auftritt'`. 
This value can be constructed when evaluating the title information included in the Dublin Core `MetadataObject` of the current citable unit and combined with the title of the parent citable unit. Because we have the information on the parent present, it is not difficult to retrieve this data via the DTS API. 

This is done in the following code cells:

In [78]:
scene_title = dts_scenes[18]["dublinCore"]["title"]

print(f"The title of the citable unit is: {scene_title}")
print(f"The ID of the parent citable unit is: {dts_scenes[18]["parent"]}")

# Get the parent:
resource = "https://staging.dracor.org/id/ger000001"
# Use the identifier of the parent in the field `parent`
ref = dts_scenes[18]["parent"]
# parameter down is not needed, because we intend to retrieve the object linked to by ref from the Navigation object
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}"
r = requests.get(request_url)
response_data = r.json()

parent = response_data["ref"]
parent_title = parent["dublinCore"]["title"]
print(f"The title of the parent is: {parent_title}")

# Construct the title as is returned in the response of the DraCor API corpora/{corpusname}/plays/{playname} endpoint
constructed_segment_title = f"{parent_title} | {scene_title}"

# Test if the titles match
# This should not trow an assertion error!
assert constructed_segment_title == api_segments[18]["title"], "Titles do not match."

print(f"The constructed title is: {constructed_segment_title}")

The title of the citable unit is: Fünfter Auftritt
The ID of the parent citable unit is: body/div[5]
The title of the parent is: Fünfter Aufzug
The constructed title is: Fünfter Aufzug | Fünfter Auftritt


The examples above demonstrated that it is possible to retrieve the information on the structuring of a play returned by various endpoints of the DraCor API using the DTS Navigation endpoint. The functionalities of the endpoints complement each other, and, while the results are the same, the means of getting them differ greatly in performance and amount of necessary client-side post-processing. 

Depending on the endpoint and the strategy choosen different perfomances are possible. In general, fetching data from the API is more resource intensive and, in some cases, takes a long time. For example, when a user wants to get the infomation on the number of acts and scenes of a single play only, getting this information from the `corpora/{corpusname}` endpoint took almost 40 seconds. This is because the endpoint returns data on the whole corpus. In contrast, fetching the data from the DTS Navigation endpoint took only 1.5 seconds, but produces an overhead client side because the values need to be computed afterwards.

In other cases, retrieving the same information from the DTS Navigation endpoint was realized by issuing multiple calls to the API, e.g. when retrieving and producing the same information on the segments of a play that is aleady returned ready to use from the `corpora/{corpusname}/plays/{playname}` endpoint. The same segment information can also be generated client-side based on a single API call, i.e. [https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&down=-1](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&down=-1), but makes some data wrangling necessary:

In [79]:
%%time

# Get all citable units:
resource = "https://staging.dracor.org/id/ger000001"
down = "-1" 
request_url = f"{api_base}dts/navigation?resource={resource}&down={down}"
print(request_url)
r = requests.get(request_url)
response_data = r.json()
citable_units = response_data["member"]

# Set up a list that will hold the constructed dictionaries
# that resemble the data returned by the regular API
api_like_segments = []

citable_unit_counter = 0

for citable_unit in citable_units:
    # Do this for scenes!
    if citable_unit["citeType"] == "scene":
        citable_unit_counter += 1
        # initialize empty dictionary = segment
        segment = {}
        # set the property type
        segment["type"] = "scene"
        # add the property number
        segment["number"] = citable_unit_counter

        # There is a bug in the DTS API, which I still have to fix, see
        # https://github.com/dracor-org/dracor-api/issues/298
        # When there is only one element in member, it is acutally not a list
        # construct the title by getting the parent
        parent_citable_unit = list(filter(lambda item: item["identifier"] == citable_unit["parent"] , citable_units))[0]
        parent_title = parent_citable_unit["dublinCore"]["title"]
        #print(parent_title)
        self_title = citable_unit["dublinCore"]["title"]
        segment_title = f"{parent_title} | {self_title}"

        segment["title"] = segment_title

        # get the speakers
        # first compile a list of all children speeches:
        children_speeches = list(filter(lambda item: (item["parent"] == citable_unit["identifier"] and item["citeType"] == "speech") , citable_units))
        distinct_speakers = []

        for child in children_speeches:
            for speaker in child["extensions"]["speakers"]:
                if speaker not in distinct_speakers:
                    distinct_speakers.append(speaker)

        segment["speakers"] = distinct_speakers

        api_like_segments.append(segment)

act_filter = lambda citable_unit: citable_unit["citeType"] == "act"
scene_filter = lambda citable_unit: citable_unit["citeType"] == "scene"

dts_acts = list(filter(act_filter, citable_units))
dts_scenes = list(filter(scene_filter, citable_units))

print(f"There are {len(dts_acts)} acts and {len(dts_scenes)} scenes (segments).")

https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&down=-1
There are 5 acts and 20 scenes (segments).
CPU times: user 10.2 ms, sys: 3.34 ms, total: 13.5 ms
Wall time: 1.14 s


In [80]:
# Output some of the constructed regular API-like segments
api_like_segments[0:3]

[{'type': 'scene',
  'number': 1,
  'title': 'Erster Aufzug | Erster Auftritt',
  'speakers': ['iphigenie']},
 {'type': 'scene',
  'number': 2,
  'title': 'Erster Aufzug | Zweiter Auftritt',
  'speakers': ['arkas', 'iphigenie']},
 {'type': 'scene',
  'number': 3,
  'title': 'Erster Aufzug | Dritter Auftritt',
  'speakers': ['iphigenie', 'thoas']}]

Using DTS generating the segment and the count-based measures takes 1.35 seconds altogether and includes only a single API call to the Navigation endpoint. Relying on the regular API endpoints, retrieving information on the number of acts as well as the individual segment information involves two API calls, of which one alone takes approx. 40 seconds.

Because the data returned Navigation endpoint include rich information on a play's structure, it can be used to produce information on the internal structure of individual parts of a play. The following code snippets demonstrate this:

In [81]:
print("How many scenes are there in the third act?")

# How many scenes are there in the third act (ref=body/div[3])
# Get all citeable units:
resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[3]"
down = "1" 
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
# these are probably the scenes already, but in theory, there could be a stage direction as well
citable_units = response_data["member"]

scene_filter = lambda citable_unit: citable_unit["citeType"] == "scene"
scenes = list(filter(scene_filter, citable_units))

print(f"There are {len(scenes)} scenes.")

How many scenes are there in the third act?
There are 3 scenes.


In [82]:
print("How many stage directions are there in the third act?")

# How many scenes are there in the third act (ref=body/div[3])
# Get all citable units:
resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[3]"
down = "-1" 
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
# these are probably the scenes already, but in theory, there could be a stage direction as well
citable_units = response_data["member"]

stage_direction_filter = lambda citable_unit: citable_unit["citeType"] == "stage_direction"
stage_directions = list(filter(stage_direction_filter, citable_units))

print(f"There are {len(stage_directions)} stage directions.")

How many stage directions are there in the third act?
There are 8 stage directions.


In [83]:
# an example of a stage direction
stage_directions[0]

{'identifier': 'body/div[3]/div[1]/stage[1]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'snippet': 'Iphigenie. Orest.'},
 'level': 4,
 'parent': 'body/div[3]/div[1]',
 'citeType': 'stage_direction'}

The stage directions that are included inside the spoken text (xPath: `//sp[.//stage]`) are not adressable via the DraCor DTS implementation. 

See the TEI example below including two types of stage directions: a separate stage direction including the names of the characters (`<stage>Iphigenie. Pylades.</stage>`), followed by a speech act with an "inline" stage direction, that can't be addressed (`<stage>Sie nimmt ihm die Ketten ab.</stage>`):

```xml
<stage>Iphigenie. Pylades.</stage>
<sp who="#iphigenie">
  <speaker>IPHIGENIE.</speaker>
  <lg>
    <l>Woher du seist und kommst, o Fremdling, sprich!</l>
    <l>Mir scheint es, daß ich eher einem Griechen</l>
    <l>Als einem Skythen dich vergleichen soll.</l>
  </lg>
  <stage>Sie nimmt ihm die Ketten ab.</stage>
  <lg>
    <l>Gefährlich ist die Freiheit, die ich gebe;</l>
    <l>Die Götter wenden ab, was euch bedroht!</l>
  </lg>
</sp>
```

Unsing the DTS Navigation endpoint at https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&down=-1 and searching (Strg+F) for the string `Woher du seist und kommst` allows to find the `snippet` of citeable unit with the identifier `body/div[2]/div[2]/sp[1]` (https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/ger000001&ref=body/div[2]/div[2]/sp[1]). 

In [84]:
# We can also code that "searches" by filtering on the snippet; we re-use the citable units (citable_units)
snippet_search_string = "Woher du seist und kommst"

# Get all citable units of a play via the Navigation endpoint
resource = "https://staging.dracor.org/id/ger000001"
ref = "body"
down = "-1" 
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
# these are probably the scenes already, but in theory, there could be a stage direction as well
citable_units = response_data["member"]

# only speech acts have a snippet, so we need to filter here, otherwise we would have to handle key errors
speech_filter = lambda citable_unit: citable_unit['citeType'] == 'speech'
speech_acts = list(filter(speech_filter, citable_units))

# filter for the search string
snippet_filter = lambda citable_unit: snippet_search_string in citable_unit["extensions"]["snippet"]
results = list(filter(snippet_filter, speech_acts))

#display the results
results

[{'identifier': 'body/div[2]/div[2]/sp[1]',
  '@type': 'CitableUnit',
  'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
   'speakers': ['iphigenie'],
   'snippet': 'Woher du seist und kommst, [...] bedroht!'},
  'level': 4,
  'parent': 'body/div[2]/div[2]',
  'citeType': 'speech'}]

In contrast to the DraCor DTS API implementation, the regular endpoint `corpora/{corpusname}/plays/{playname}/stage-directions-with-speakers` (see [API Documentation](https://staging.dracor.org/doc/api#/public/play-stage-directions-with-speakers)) includes the nested stage directions as well.

In [85]:
# Get the text data from the endpoint corpora/{corpusname}/plays/{playname}/stage-directions-with-speakers
request_url = "https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/stage-directions-with-speakers"
r = requests.get(request_url)
data = r.text
# split the text into individual stage directions
with_speakers_stage_directions = data.split("\n")

# iterate over the stage directions and return a result: either the first string is contained in the stage direction, or
# the text of the stage direction ist the second string
# in a more general function, this would not be hardcoded, but this is for demonstration purposes only:

for stage_direction in with_speakers_stage_directions:
    if "Sie nimmt ihm die Ketten ab" in stage_direction or "Iphigenie. Pylades." == stage_direction:
        print(f"Found: {stage_direction}")

Found: Iphigenie. Pylades.
Found: IPHIGENIE.  Sie nimmt ihm die Ketten ab.
Found: Iphigenie. Pylades.


In the examples discussed, we demonstrated that the DTS Navigation endpoint is capable of providing detailed information about the structure of a play, allowing for fine-grained insights into its organization. However, in its current implementation, this functionality is limited, as it can only address citable units down to a certain hierarchical level. This limitation means that more granular structural units, such as nested stage directions or individual verse lines within spoken text, are not yet addressable through the Navigation endpoint.

To fully leverage the potential of the DTS Navigation endpoint for the analysis of Drama, there is a need to extend its capabilities to include also deeper nested structural units. By making elements like individual verse lines and nested stage directions addressable, researchers would gain the ability to analyze and navigate plays at an even more detailed level. 

We have also seen that there is a need for client-side processing, such as assembling the speakers of a structural unit (like a scene or act) by fetching or extracting the speakers from individual speech acts. This process would likely require adding information about speaking characters to structural units that currently do not include a `speaker` property in the `extensions` metadata object. By enhancing the DraCor-specific metadata to include this information, researchers would be better equipped to analyze and navigate plays at a more detailed level, without the need of much post-processing.

#### Citing plays with DTS

In the preceeding sections, we explored how the DTS Navigation endpoint can be effectively used for the structural analysis of a play, providing detailed insights into its organization and hierarchical components. However, beyond its analytical utility, the DTS Navigation endpoint offers an additional functionality that is central to the study of drama: the possibility to cite a specific part of a play.

One of core ideas behind DTS is to enhance the citability of digital texts, a feature particularly valuable in the realm of Digital Drama research. Unlike prose, where citations in the literature often rely on page numbers, drama research benefits from referencing the text's structural divisions. This allows for more precise citations, normally referencing a specific act, a scene within, and, if available the lines within a play. The fine-grained citation capability of the DraCor DTS implementation aligns with the traditional methods of referencing dramatic texts and thus, could support more nuanced scholarly analysis in Computational Literary Studies.

DraCor has already significantly advanced the citation of drama by enabling references to entire plays using unique identifiers (URIs). For example, the play "Iphigenie auf Tauris" can be cited using the URI `https://dracor.org/id/ger000001` because the DraCor identifier, the Play ID (Feature P2 [play_id](https://staging.dracor.org/doc/odd#play_id)) is universally unique across all DraCor corpora. Additionally, DraCor has introduced unique identifiers for characters within a single play, which, when combined with the play's identifier, create potentially universally unique identifiers as well. Although these character identifiers are not yet resolvable via the `/id/{id}` endpoint (see API [Documentation](https://staging.dracor.org/doc/api#/public/resolve-id)), they represent a step toward more precise and standardized citation practices in drama research.

The following examples illustrate how the DTS Navigation endpoint facilitates the citation of dramatic texts, showcasing its application in various research contexts. By leveraging the structural divisions inherent in dramatic works, researchers can cite specific elements with greater accuracy, enhancing the depth and clarity of their analyses.

The following examples are taken from the english translation of Manfred Pfister's important work "Das Drama" ("The Theory and Analysis of Drama", engl. translation by John Halliday), which has been foundational for the structuralist approach to drama studies and has inspired research in the Computational Literary Studies.

In the chapter on "dramatic irony" Pfister refers to a scene from the play *Shakespeare: Julius Caesar*:

> Thus, one of the most famous examples of irony in all drama - Antony's spoken refrain 'Brutus is an honourable man' in his funeral oration for the central figure in **Shakespeare's Julius Caesar (III, ii)** - has nothing to do with dramatic irony in the strict sense since it is a verbal utterance that the speaker intended to be ironic, and whose irony is intended to be, and indeed does become, obvious to the fictional receivers.

The reference "Shakespeare's Julius Caesar (III, ii)" in the text refers to the play [Julius Caesar](https://staging.dracor.org/shake/julius-caesar), which is contained in DraCor as the play with the identifier `shake000030`, although in a different edition stemming from the Folger Digital Library (for the integration of this corpus into DraCor and possible use cases in combining APIs see also also [CLS INFRA Deliverable D7.1](https://zenodo.org/records/7664964), chapter 6.1 “Relating APIs using OpenAPI: The Example of the Folger Shakespeare API”). 

In the reference the roman capital letters "III" stand for the third act, the small roman letters "ii" mean the second scene of the play. This scene of the play can be adressed via the DTS Navigation endpoint setting the value of the parameter `ref` to `body/div[3]/div[2]`, which results in the URL:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2])

The larger part of the speech of Anthony is starting with the Citable Unit with the identifier `body/div[3]/div[2]/sp[31]`, in which the phrase "Brutus is an honourable man" is frequently repeated. This citable unit can be addressed as such:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]/sp[31]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]/sp[31])

In [86]:
# to get the text, we need to use the Document endpoint, to display the TEI snippet
request_url = "https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]/sp[31]"
r = requests.get(request_url)
antony_speech = r.text

In [87]:
print(antony_speech)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <dts:wrapper xmlns:dts="https://w3id.org/dts/api#">
        <sp xml:id="sp-1559" who="#Antony_JC">
            <speaker xml:id="spk-1559">ANTONY </speaker>
            <l xml:id="ftln-1559" n="3.2.82">Friends, Romans, countrymen, lend me your ears. </l>
            <l xml:id="ftln-1560" n="3.2.83">I come to bury Caesar, not to praise him. </l>
            <l xml:id="ftln-1561" n="3.2.84">The evil that men do lives after them; </l>
            <l xml:id="ftln-1562" n="3.2.85">The good is oft interrèd with their bones. </l>
            <l xml:id="ftln-1563" n="3.2.86">So let it be with Caesar. The noble Brutus </l>
            <l xml:id="ftln-1564" n="3.2.87">Hath told you Caesar was ambitious. </l>
            <l xml:id="ftln-1565" n="3.2.88">If it were so, it was a grievous fault, </l>
            <l xml:id="ftln-1566" n="3.2.89">And grievously hath Caesar answered it. </l>
            <l xml:id="ftln-1567" n="3.2.90">Here, under leave of B

In the example above, Pfister referred to a single scene without pointing to a certain speech act in particular, but there are also cases, in which there is a reference to a certain portion text below the scene level in another Shakespearean play:

>  [I]n the sphere of verbal communication there are speeches that have scarcely any novelty value for the fictional listener on stage, but which serve to clarify certain relationships for the audience [...] An example of this is Prospero's report on Ariel's past in **The Tempest (I,ii,250—93)**; this contains no information that could possibly be unfamiliar to Ariel, but it is new, and thus important, to the audience. (Pfister 2000: 40)

Obviously, the reference in the text refers to *Shakespeare: The Tempest* which is [contained in DraCor](https://staging.dracor.org/shake/the-tempest) as the play with the identifier `shake000001`. As in the example from Julius Ceasar, the reference `I,ii` refers to act and scene. In a request to the DTS Navigation endpoint setting the value of the parameter `ref` to have to be set to `body/div[1]/div[2]` for retrieving information on the given citable unit:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000001&ref=body/div[1]/div[2]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000001&ref=body/div[1]/div[2])

To list all speeches in this section, we can use the parameter `down` with a value of `1` to get the speeches included in the member property:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000001&ref=body/div[1]/div[2]&down=1](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000001&ref=body/div[1]/div[2]&down=1)

Beyond that, really identifying the the actual part of the scene, Pfister has in mind, is tricky, because the lines referenced with the numbers`250—93` are not addressable with the DraCor DTS implementation.

Below is another example that does not include a bibliographic reference, but includes a literal citation of a parts of the play "The Cherry Orchard" by Anto Chekhov (cf. Pfister 2000: 15–16):

> Stage-directions are not restricted to the secondary text, however; they may also be found, implicitly, in the primary text - as the following extract from Chekhov's Cherry Orchard demonstrates:

> LOPAKHIN: What's the matter, Dooniasha?
>
> DOONIASHA: My hands are trembling. I feel as if I'm going to faint.
>
> LOPAKHIN: You're too refined and sensitive, Dooniasha. You dressyourself up like a lady, and you do your hair like one too.

The source of this citation is included in endnote 7 "A. Chekhov (1954) 334. [...]" which is according to the bibliography "Chekhov, A. The Cherry Orchard, in: Plays, transl. and with introd. by E. Feu (Harmondsworth, 1954)."

While there is no translation of the play Cherry Orchard by Anton Chekhov in DraCor, let alone the cited edition, we can still locate the [original Russian play](https://staging.dracor.org/rus/chekhov-vishnevyi-sad). It has the unique DraCor identifier `rus000059` and the URI `https://dracor.org/id/rus000059` (to resolve it on staging.dracor.org use `https://staging.dracor.org/id/rus000059`). 

All citeable units of the play can be accessed via the Navigation endpoint at: 

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&ref=body&down=-1](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&ref=body&down=-1)

Based on the snippets and some guessing the cited passage can be found: The first speech in the Russian original has the fragment identifier `body/div[1]/div[1]/sp[7]` and thus can be retrieved from the URL: 

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&ref=body/div[1]/div[1]/sp[7]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&ref=body/div[1]/div[1]/sp[7]) 

Using the parameters `start` (value: `body/div[1]/div[1]/sp[7]`) and `end` (value `body/div[1]/div[1]/sp[9]`) all three speeches can be cited as a range with the following URL:

[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&start=body/div[1]/div[1]/sp[7]&end=body/div[1]/div[1]/sp[9]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/rus000059&start=body/div[1]/div[1]/sp[7]&end=body/div[1]/div[1]/sp[9])

In [88]:
# print the TEI-XML text
request_url = "https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/rus000059&start=body/div[1]/div[1]/sp[7]&end=body/div[1]/div[1]/sp[9]"
r = requests.get(request_url)
print(r.text)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <dts:wrapper xmlns:dts="https://w3id.org/dts/api#">
        <sp who="#Lopahin">
            <speaker>Лопахин.</speaker>
            <p>Что ты, Дуняша, такая...</p>
          </sp>
          <sp who="#Dunjasha">
            <speaker>Дуняша.</speaker>
            <p>Руки трясутся. Я в обморок упаду.</p>
          </sp>
          <sp who="#Lopahin">
            <speaker>Лопахин.</speaker>
            <p>Очень уж ты нежная, Дуняша. И одеваешься, как барышня, и прическа тоже. Так нельзя.
              Надо себя помнить.</p>
          </sp>
    </dts:wrapper>
</TEI>


The three examples above already show that there are several possibilities of how traditional scholarship cites and quotes the text of plays and that some of the functionality implemented allows to construct de-referencable URIs for the cited segments. 

Still, the resulting URIs are relatively long and not very "nice" to include in actualy studies, an option would be to device a shorter notation that could be then expanded into the full URI, e.g. `dracor:shake000030#body/div[3]/div[2]`– still quite long, but this would contain enough information to convert it into a full URI – 
[https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]](https://staging.dracor.org/api/v1/dts/navigation?resource=https://staging.dracor.org/id/shake000030&ref=body/div[3]/div[2]). 

Locating a specific passage to cite can be quite cumbersome. Users must first examine the full text, such as in DraCor's front end in the "Full Text" tab, and guess which part of the text will appear in the snippet (5 first word tokens of a speech followed by an ellipsis and the last one). They then need to list all citable units with their snippets using the parameter `down=-1` and perform a full-text search (Ctrl+F) in the browser to locate the desired citable unit. After identifying the citable unit, they must copy the value of the `identifier` field and manually construct the link for the Navigation endpoint. Implementing a graphical user interface (GUI) would significantly enhance this process, making it more intuitive and user-friendly.

### DraCor DTS Document Endpoint `api/v1/dts/document`

The DTS Document endpoint is used to get a textual representation of a single Resource or parts thereof. The endpoint accepts the following query parameters:

* **`resource`**: This parameter is used to provide the unique identifier (URI) of the Resource of which a textual representation is requested. In the case of DraCor it is the HTTP URI of a document, e.g.  It is the same value that would be used in a request to the Navigation endpoint and also as the value of the parameter `id` in case of the Collection endpoint. In requests to the Document endpoint providing this parameter is mandatory. 
* **`ref`**: This parameter is used to supply an identifier of a certain segment of a text (a Citeable Unit). The Document endpoint uses the same identifiers as the Navigation endpoint (see detailed description of the DraCor Fragment Identifiers there)
* **`start`** and **`end`**: To request a textual representation of a section containing spanning across several structural units of the text, e.g. the first 5 scenes of the second act, the parameters `start` and `end` include the identifiers of the Citeable Units that define the boundaries of a sequence.
* **`tree`**: The DTS Specification forsees an optional parameter to provide the identifier of a Citation Tree that is not the default. This parameter is not implemented for DraCor because there is only one Citation Tree available.
* **`mediaType`**: This parameter controls in which format the textual representation of a Resource is returned. Per default the endpoint returns TEI-XML. 

The DraCor DTS implementation provides TEI-XML in all cases, but also a plaintext representation requested with `text/plain` as the value of the `mediaType` parameter. Available formats are listed when requesting a Resource via the Collection or Navigation endpoints in the field `mediaTypes`.

The plaintext representation is currently not available for a range of citeable units requested with the parameters `start` and `end`.

If a fragment is requested using the parameters `ref` or `start` and `end` the segment of the TEI-XML file is wrapped into a DTS custom element `<dts:wrapper>` (in the DTS namespace `https://w3id.org/dts/api#`) as shown in the following example:

[https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000243&ref=body/div[19]/sp[3]](https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000243&ref=body/div[19]/sp[3])

In [89]:
resource = "https://staging.dracor.org/id/ger000243" # Goethe: Faust (goethe-faust-eine-tragoedie)
ref = "body/div[19]/sp[3]"
request_url = f"https://staging.dracor.org/api/v1/dts/document?resource={resource}&ref={ref}"
r = requests.get(request_url)
tei_response = r.text
print(tei_response)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <dts:wrapper xmlns:dts="https://w3id.org/dts/api#">
        <sp who="#gretchen">
          <speaker>MARGARETE.</speaker>
          <lg>
            <l>Nun sag, wie hast du's mit der Religion?</l>
            <l>Du bist ein herzlich guter Mann,</l>
            <l>Allein ich glaub', du hältst nicht viel davon.</l>
          </lg>
        </sp>
    </dts:wrapper>
</TEI>


The same textual fragment can be retrieved as plaintext when setting the parameter `mediaType` to the value `text/plain`:

[https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000243&ref=body/div[19]/sp[3]&mediaType=text/plain](https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000243&ref=body/div[19]/sp[3]&mediaType=text/plain)

In [90]:
resource = "https://staging.dracor.org/id/ger000243" # Goethe: Faust (goethe-faust-eine-tragoedie)
ref = "body/div[19]/sp[3]"
media_type = "text/plain"
request_url = f"https://staging.dracor.org/api/v1/dts/document?resource={resource}&ref={ref}&mediaType={media_type}"
r = requests.get(request_url)
text_response = r.text
print(text_response)

MARGARETE. Nun sag, wie hast du's mit der Religion? Du bist ein herzlich guter Mann, Allein ich glaub', du hältst nicht viel davon.


The parameters `start` and `end` can be used to request the text of a range of citable units:

[https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000001&start=body/div[4]/div[2]/sp[7]&end=body/div[4]/div[2]/sp[14]](https://staging.dracor.org/api/v1/dts/document?resource=https://staging.dracor.org/id/ger000001&start=body/div[4]/div[2]/sp[7]&end=body/div[4]/div[2]/sp[14])

In [91]:
resource = "https://staging.dracor.org/id/ger000001"
start = "body/div[4]/div[2]/sp[7]"
end = "body/div[4]/div[2]/sp[14]"
request_url = f"https://staging.dracor.org/api/v1/dts/document?resource={resource}&start={start}&end={end}"
r = requests.get(request_url)
tei_response = r.text
print(tei_response)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <dts:wrapper xmlns:dts="https://w3id.org/dts/api#">
        <sp who="#arkas">
            <speaker>ARKAS.</speaker>
            <lg>
              <l>Ich melde dieses neue Hindernis</l>
              <l>Dem Könige geschwind; beginne du</l>
              <l>Das heil'ge Werk nicht eh', bis er's erlaubt.</l>
            </lg>
          </sp>
          <sp who="#iphigenie">
            <speaker>IPHIGENIE.</speaker>
            <l>Dies ist allein der Priestrin überlassen.</l>
          </sp>
          <sp who="#arkas">
            <speaker>ARKAS.</speaker>
            <l>Solch seltnen Fall soll auch der König wissen.</l>
          </sp>
          <sp who="#iphigenie">
            <speaker>IPHIGENIE.</speaker>
            <l>Sein Rat wie sein Befehl verändert nichts.</l>
          </sp>
          <sp who="#arkas">
            <speaker>ARKAS.</speaker>
            <l>Oft wird der Mächtige zum Schein gefragt.</l>
          </sp>
          <sp who="

The DTS Document endpoint provides a link (key `Link`) back to the Resource via the Collection endpoint in the HTTP header:

In [92]:
# Get the HTTP headers of the response object when using requests

r.headers

{'Server': 'nginx/1.18.0 (Ubuntu)', 'Date': 'Tue, 01 Jul 2025 12:29:03 GMT', 'Content-Type': 'application/xml; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Link': '<https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000001>; rel="collection"', 'Vary': 'Accept-Encoding, User-Agent', 'Content-Encoding': 'gzip', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'GET, POST, OPTIONS, HEAD', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Headers': 'Authorization, Origin, X-Requested-With, Content-Type, Accept, DNT'}

In [93]:
# Print the value of the Link header
print(r.headers["Link"])

<https://staging.dracor.org/api/v1/dts/collection?id=https://staging.dracor.org/id/ger000001>; rel="collection"


## Evaluating Data Retrieval Capabilities: DraCor API vs. DraCor DTS API 

In practical use cases the endpoints Collection, Navigation and Document are used together. 

The experiments conducted in this section of the notebook aim to compare the functionality and capabilities of the regular DraCor API with the newly implemented DTS API. The focus is on determining whether the DTS API can replicate the data retrieval capabilities of the DraCor API, particularly in terms of accessing and manipulating textual data. The experiments involve practical use cases that demonstrate how the Navigation and Document endpoints of the DTS API can be utilized to achieve similar outcomes to those provided by the DraCor API.

The "regular" DraCor API provides several endpoints that allow to get a subset of the text:

* `corpora/{corpusname}/plays/{playname}/spoken-text`: Spoken text of a play excluding stage directions (and Speaker labels), see [API Documentation](https://staging.dracor.org/doc/api#/public/play-spoken-text)
* `corpora/{corpusname}/plays/{playname}/spoken-text-by-character`: Spoken text for each character of a play, see [API Documentation](https://staging.dracor.org/doc/api#/public/play-spoken-text-by-character)
* `corpora/{corpusname}/plays/{playname}/stage-directions`: Stage directions of a play, see [API Doucmentation](https://staging.dracor.org/doc/api#/public/play-stage-directions)
* `copora/{corpusname}/plays/{playname}/stage-directions-with-speakers`, see [API Documentation](https://staging.dracor.org/doc/api#/public/play-stage-directions-with-speakers) In the result of this endpoint the stage directions that are nested into a `<sp>` element an can thus be attributed to a character are prepended with the speaker label

The functionality of these endpoints can to some extend also realized with the DTS API, but there are some caveats as will be shown in the examples. In almost all cases additional post-processing of the results is necessary, but overall it can be shown, that based on the DTS API it is possible to retrieve the same results as can be retrieved via the regular API endpoints. 

In general it is necessary to query the Navigation endpoint to get the structural information and then use the identifiers included in the citable units to query the Document endpoint to retrieve a text representation (TEI-XML or plaintext). 

It can be necessary to retrieve the TEI-XML and further parse it, e.g. to extract lines of spoken text, which are not available as citable units at the moment (but will be added with further releases of the DTS API). 

For a quick overview it might be sufficient to just use the regular DraCor API endpoints, but if there is a certain interest in the textual contents of sub-structures of a play, this can only be done with the DraCor DTS API.


### Getting the Spoken Text

As an example, again, the play Goethe: Iphigenie auf Tauris (`ger000001`) from the *German Drama Corpus* is used. The following cell retrieves all citable units in this play via the DTS Navigation endpoint:

In [94]:
# We use only all the citable units contained in the body 
# (use params ref and down)
resource = "https://staging.dracor.org/id/ger000001"
ref = "body"
down = "-1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
citeable_units = response_data["member"]

Citable units that represent speech acts have `speech` as the value of the property `citeType`. We can write a lambda function that is used to filter for speech acts:

In [95]:
# Speech filter as a lambda function

speech_filter = lambda citeable_unit: citeable_unit['citeType'] == 'speech'

In [96]:
# Filter for speeches using the lambda function  above

speech_acts = list(filter(speech_filter, citeable_units))

In [97]:
# Output the first element in the list, i.e. the first speech of the play

speech_acts[0]

{'identifier': 'body/div[1]/div[1]/sp[1]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'speakers': ['iphigenie'],
  'snippet': 'Heraus in eure Schatten, rege [...] Tode!'},
 'level': 4,
 'parent': 'body/div[1]/div[1]',
 'citeType': 'speech'}

The filtered list can be used to count the speeches in the play.

In [98]:
print(f"There are {len(speech_acts)} speeches in the play.")

There are 311 speeches in the play.


There seems to be no option to get the number of speech acts in a play from the DraCor API; there is no such feature. See also: https://staging.dracor.org/doc/odd#section-play-features

While the DTS API provides structural data on the speeches and does not include the full text in the Navigation endpoint response, the DraCor API only provides the spoken text using the endpoint `corpora/{corpusname}/plays/{playname}/spoken-text`. The following cell retrieves the spoken text of the play:

In [99]:
r = requests.get("https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/spoken-text")
text_via_api = r.text
# split on newline character
text_via_api_lines = text_via_api.split("\n")
#text_via_api_lines[0]
print(f"The text of the play retrieved by the DraCor API spoken text endpoint includes {len(text_via_api_lines)} lines of text.")

The text of the play retrieved by the DraCor API spoken text endpoint includes 2203 lines of text.


These are obviously not the verse lines, but also don't match the number of speeches returned by the DTS API. We need to further investigate at this point.

Another approach is to retrieve the XML file representing the play from the endpoint `corpora/{corpusname}/plays/{playname}/tei`. The following cells retrieve and parse the XML using the package `etree`. Based on the parsed XML tree the `<tei:sp>` elements can be identified with an xPath-expression and can be counted. 

In [100]:
# define the namespaces, otherwise xPath expressions won't work
namespaces = {
    'tei': 'http://www.tei-c.org/ns/1.0',
    'dts': 'https://w3id.org/dts/api#'
}

r = requests.get("https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/tei")
# use etree to parse the XML
xml = etree.fromstring(r.text)
# Get all tei:sp elements with an xPath 
sp_elements = xml.xpath("//tei:sp", namespaces=namespaces)
# Count the <sp> elements
num_sp_elements = len(sp_elements)
print(f"In the TEI-XML file retrieved from the DraCor API there are {num_sp_elements} <sp> elements.")

In the TEI-XML file retrieved from the DraCor API there are 311 <sp> elements.


The number of `<sp>` elements and the number of citable units that are of the `citeType` `speech` should be the same. This is tested in the next code cell.

In [101]:
# check if the number is the same
# There should be no assertion error when running this cell

assert len(speech_acts) == num_sp_elements, "The number of citable units of type 'speech' does not match the number of <sp> in the xml."   

### Comparing the Spoken text (DTS, DraCor API)

To compare if the spoken text returned by the regular DraCor API and the text available via the DTS Document endpoint we need to generate a "text" that combines the textual contents of all citable units that are representing speeches. The textual content for each speech is fetched from the Document endpoint in the following code cell. There are several API calls necessary therefore running the cell takes some time (we are talking about minutes here...).

In [102]:
%%time

# Generate the text that can be retrieved via DTS 
# similar to the one returned by the API /corpora/{corpusname}/plays/{playname}/spoken-text

resource = "https://staging.dracor.org/id/ger000001"
dts_text = []

# Use the already downloaded citable units that have been identified as speech acts
for citable_unit in speech_acts:
    # we have to query the document endpoint here to get the actual text
    request_url = f"{api_base}dts/document?resource={resource}&ref={citable_unit["identifier"]}"
    r = requests.get(request_url)
    tei = etree.fromstring(r.text)
    text_nodes = tei.xpath("//tei:l/text()|tei:p/text()", namespaces=namespaces)
    for text_node in text_nodes:
        dts_text.append(text_node)

CPU times: user 3.26 s, sys: 771 ms, total: 4.03 s
Wall time: 5min 13s


In this case the program had to send 311 requests to the API to fetch each individual citable unit and get the TEI representation via the DTS Document endpoint. Getting the text this way makes only sense for evaluation purposes and must not be integrated in a front end or any other application.

After the text has been assembled, it can be output as demonstrated in the following cell:

In [103]:
# First 10 lines of the text
dts_text[0:9]

['Heraus in eure Schatten, rege Wipfel',
 "Des alten, heil'gen, dichtbelaubten Haines,",
 'Wie in der Göttin stilles Heiligtum',
 "Tret' ich noch jetzt mit schauderndem Gefühl,",
 'Als wenn ich sie zum erstenmal beträte,',
 'Und es gewöhnt sich nicht mein Geist hierher.',
 'So manches Jahr bewahrt mich hier verborgen',
 'Ein hoher Wille, dem ich mich ergebe;',
 'Doch immer bin ich, wie im ersten, fremd.']

Now the texts retrieved via the regular DraCor API endpoint and the text assembled based on the citable units via the DTS endpoints Navigation and Document can be compared:

In [104]:
# Compare to the first 10 lines of the text retrieved via the DraCor API (corpora/.../tei) that was spit by newline character
text_via_api_lines[0:9]

['Heraus in eure Schatten, rege Wipfel',
 "Des alten, heil'gen, dichtbelaubten Haines,",
 'Wie in der Göttin stilles Heiligtum',
 "Tret' ich noch jetzt mit schauderndem Gefühl,",
 'Als wenn ich sie zum erstenmal beträte,',
 'Und es gewöhnt sich nicht mein Geist hierher.',
 'So manches Jahr bewahrt mich hier verborgen',
 'Ein hoher Wille, dem ich mich ergebe;',
 'Doch immer bin ich, wie im ersten, fremd.']

Although the first 10 lines as shown in the two code cells above are similar, when comparing the whole texts, they are not identical, as is revealed by the test in the following cell:

In [105]:
# The lists don't have the same length, why?
# assert len(dts_text) == len(text_via_api_lines), "Not the same length"

print(f"Number of lines in the text retrived via the regular DraCor API: {len(text_via_api_lines)}")
print(f"Number of lines in the text retrieved via the DTS API based on citeable units: {len(dts_text)}")

Number of lines in the text retrived via the regular DraCor API: 2203
Number of lines in the text retrieved via the DTS API based on citeable units: 2254


The numbers don't add up because there is some cleaning necessary which can be demonstrated by a typical problem that involves additional whitespace characters:

In [106]:
# There is some cleaning necessary for sure
# Inspected the first 100 lines
# dts_text[0:99]
# The problem might be this whitespace as can be seen here:

dts_text[18:21]

['Nach seines Vaters Hallen, wo die Sonne',
 '\n                    ',
 ' Zuerst den Himmel vor ihm aufschloß, wo']

There is a blank line included in comparison to the output of the API, where there is no blank line:

In [107]:
# Get the same portion of the text as returned by the regular DraCor API

text_via_api_lines[18:21]

['Nach seines Vaters Hallen, wo die Sonne',
 'Zuerst den Himmel vor ihm aufschloß, wo',
 'Sich Mitgeborne spielend fest und fester']

The code in the following cell cleans up the text lines bases on the DTS responses and removes the unnecessary blank lines:

In [108]:
import re

# Regular expression to match strings with only whitespace and newline characters
regex_pattern = re.compile(r"^\s*$")

# Counter for items that match the pattern
counter = 0

# Loop over the items
for index, value in enumerate(dts_text):
    if regex_pattern.match(value):
        #print(line)
        counter += 1
        dts_text.pop(index)

# Output the counter: how many lines have been removed
counter

52

There are only 52 of these lines containing only whitespace. Still, the numbers don't match up, as can be seen in the next cell:

In [109]:
#assert len(text_via_api_lines) == ( len(dts_text) - counter ), "Still some discrepancies"

In [110]:
#dts_text[18:21]
# The lists don't have the same length, why?
# assert len(dts_text) == len(text_via_api_lines), "Not the same length"
print(f"Number of lines in the text retrived via the regular DraCor API: {len(text_via_api_lines)}")
print(f"Number of lines in the text retrieved via the DTS API based on citeable units: {len(dts_text)}")

Number of lines in the text retrived via the regular DraCor API: 2203
Number of lines in the text retrieved via the DTS API based on citeable units: 2202


The following code identifies differences in the individual text lines:

In [111]:
#Determine the maximum length the two lists of lines
max_length = max(len(dts_text), len(text_via_api_lines))

# Compare the lists and identify differences
differences = []
for i in range(max_length):
    # Get the elements from both lists, or None if the index is out of range
    elem1 = dts_text[i] if i < len(dts_text) else None
    elem2 = text_via_api_lines[i] if i < len(text_via_api_lines) else None

    # Compare the elements
    if elem1 != elem2:
        differences.append((i, elem1, elem2))

# Output the differences
#differences
print(f"There are {len(differences)} text lines that are different.")

There are 53 text lines that are different.


The next cell outputs a typical case in which the text of the lines is different because of whitespace:

In [112]:
# Output the first item of the list containing the differing lines

differences[0]

(19,
 ' Zuerst den Himmel vor ihm aufschloß, wo',
 'Zuerst den Himmel vor ihm aufschloß, wo')

The last item in the list also shows a problem:

In [113]:
# in the last one there is also a problem; in the text returned via the api there is an empty line, 
# thus the 1 difference in count
differences[-1:]

[(2202, None, '')]

Most of the differnces in the texts is due to whitespace. A quick way to check if the text is otherwise identical is to strip the whitespace characters altogether and just compare the other character data:

In [114]:
# We see that the differences are probably due to whitespace
# if we reduce all whitespace the texts should be the same
# in the first step all lines are joined to a single string
joined_dts_text = "".join(dts_text)
joined_api_text_lines = "".join(text_via_api_lines)
print(f"Length of joined DTS text in characters: {len(joined_dts_text)}")
print(f"Length of API text in characters: {len(joined_api_text_lines)}")
# There is still whitespace in it

Length of joined DTS text in characters: 85952
Length of API text in characters: 85900


In [115]:
# Then a regular expression is used to remove all whitespace
cleaned_joined_dts_text = re.sub(r"\s+", "", joined_dts_text)
cleaned_joined_api_text = re.sub(r"\s+", "", joined_api_text_lines)

print(f"Length of DTS text (whitespace removed) in characters: {len(cleaned_joined_dts_text)}")
print(f"Length of API text (whitespace removed) in characters: {len(cleaned_joined_api_text)}")

Length of DTS text (whitespace removed) in characters: 73551
Length of API text (whitespace removed) in characters: 73551


In [116]:
# This assertion should not fail
assert len(cleaned_joined_dts_text) == len(cleaned_joined_api_text), "The texts are not the same length"

We demonstrated that after some programming intervention and treating whitespace it is possible to retrieve the same text from the DTS API as could be retrieved from the `/corpora/{corpusname}/plays/{playname}/spoken-text` endpoint of the DraCor API. 

The API implements filtering options via the paramerts `gender`, `reation` and `role` e.g. it is possible to filter for spoken text by female characters only.

### Getting the Spoken Text by Character

Using the DTS endpoint it is possible to retrieve the spoken text of individual characters. This can be achieved by filtering the citable units representing the speech acts. For example, we can filter for all speech acts by the character *Iphigenie* (identifier: `iphigenie`):

In [117]:
# we use 'speech_acts' as basis; quick check if it is still the data of Iphigenie auf Tauris
assert len(speech_acts) == 311, "There have been changes to the variable containing the speech acts of the play Iphigenie auf Tauris. Should redownload."

In [118]:
# Output a single speech act

speech_acts[0]

{'identifier': 'body/div[1]/div[1]/sp[1]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'speakers': ['iphigenie'],
  'snippet': 'Heraus in eure Schatten, rege [...] Tode!'},
 'level': 4,
 'parent': 'body/div[1]/div[1]',
 'citeType': 'speech'}

The code in the next cell filters the the citable units representing speeches by character:

In [119]:
# Speech filter as a lambda function

character_filter = lambda citeable_unit: "iphigenie" in citeable_unit["extensions"]["speakers"]
# Filter for speeches using the lamdda function
iphigenie_speech_acts = list(filter(character_filter, speech_acts))

In [120]:
# Output the number of speech acts by Iphigenie

len(iphigenie_speech_acts)

134

In [121]:
# Output a single speech act by the character Iphigenie

iphigenie_speech_acts[0]

{'identifier': 'body/div[1]/div[1]/sp[1]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'speakers': ['iphigenie'],
  'snippet': 'Heraus in eure Schatten, rege [...] Tode!'},
 'level': 4,
 'parent': 'body/div[1]/div[1]',
 'citeType': 'speech'}

The DraCor API endpoint `corpora/{corpusname}/plays/{playname}/spoken-text-by-character` returns a JSON array of all characters represented by character objects that include the spoken text in the field `text`, e.g. we can get the spoken text of the character *Iphigenie* via this endpoint of the regular DraCor API as well:

[https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/spoken-text-by-character](https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/spoken-text-by-character)

In [122]:
r = requests.get("https://staging.dracor.org/api/v1/corpora/ger/plays/goethe-iphigenie-auf-tauris/spoken-text-by-character")
character_texts = r.json()

The JSON array that is returned by the API contains the spoken text of the play grouped by character as objects. If we want to get the text by a single character only, we need to identify the object that represents the text lines spoken by a single character, i.e. the list needs to be filtered:

In [123]:
# Filter for the object with the id value of iphigenie

iphigenie_text_object_via_api = list(filter(lambda character: character["id"] == "iphigenie", character_texts))[0]

In [124]:
#Just assign the text from the dictionary iphigenie_text_object_via_api
# to a new variable 

iphigenie_text_via_api = iphigenie_text_object_via_api["text"]

In [125]:
print(f"When retrieved via the corpora/.../spoken-text-by-character endpoint the character Iphigenie has {len(iphigenie_text_via_api)} lines of text.")

When retrieved via the corpora/.../spoken-text-by-character endpoint the character Iphigenie has 992 lines of text.


Again, from the DTS endpoint we can retrieve the individual speeches a citable units. The DraCor API endpoint returns the text in spoken text lines and does not include any additional structuring. To compare both text representations, we again get the text for the citable units from the DTS Document endpoint and strip the whitespace; in case of the text retrieved from the `corpora/.../spoken-text-by-character` endpoint we join the lines and then strip the whitespace. This way the two texts should be identical. The whole process takes some time (approx. 2mins) because several requests to the API are necessary.

In [126]:
%%time

# Generate the text that can be retrieved via DTS similar 
# to the one returned by the API /corpora/{corpusname}/plays/{playname}/spoken-text

resource = "https://staging.dracor.org/id/ger000001"
dts_iphigenie_text = []

# Use the already downloaded citable units that have been identified as speech acts
# use Iphigenie's speech acts here:

for citable_unit in iphigenie_speech_acts:
    # we have to query the document endpoint here to get the actual text
    request_url = f"{api_base}dts/document?resource={resource}&ref={citable_unit["identifier"]}"
    r = requests.get(request_url)
    tei = etree.fromstring(r.text)
    text_nodes = tei.xpath("//tei:l/text()|tei:p/text()", namespaces=namespaces)
    for text_node in text_nodes:
        dts_iphigenie_text.append(text_node)

CPU times: user 1.48 s, sys: 338 ms, total: 1.82 s
Wall time: 2min 12s


In [127]:
# there again will be the problem with the whitespace character lines so initially the numbers don't match

len(dts_iphigenie_text)

1014

In [128]:
# A typical whitespace problem

dts_iphigenie_text[18:21]

['Nach seines Vaters Hallen, wo die Sonne',
 '\n                    ',
 ' Zuerst den Himmel vor ihm aufschloß, wo']

In [129]:
# Clean the text: Join the lines, remove whitespace

dts_iphigenie_text_joined = "".join(dts_iphigenie_text)
dts_iphigenie_text_cleaned = re.sub(r"\s+", "", dts_iphigenie_text_joined)

iphigenie_text_via_api_joined = "".join(iphigenie_text_via_api)
iphigenie_text_via_api_cleaned = re.sub(r"\s+", "", iphigenie_text_via_api_joined)


assert len(iphigenie_text_via_api_cleaned) == len(dts_iphigenie_text_cleaned), "Texts are not the same length."

print(f"Length of joined DTS text in characters: {len(dts_iphigenie_text_cleaned)}")
print(f"Length of API text in characters: {len(iphigenie_text_via_api_cleaned)}")

assert iphigenie_text_via_api_cleaned == dts_iphigenie_text_cleaned, "Texts are not identical."


Length of joined DTS text in characters: 32729
Length of API text in characters: 32729


The experiment shows that the text of a certain character that can be retrieved via the DTS API is (after some whitespace handling) identical to the text returned by the `/corpora/{corpusname}/plays/{playname}/spoken-text-by-character` endpoint. 

In [130]:
# list the keys of the object providing information on the spoken text of
list(iphigenie_text_object_via_api.keys())

['label', 'id', 'sex', 'text', 'isGroup', 'roles']

In the case of the DraCor API spoken-text-by-character endpoint, apart from the spoken text of a character (`text`) there is formation on the identifier of a character (`id`, API Feature Ch1 [character_id](https://staging.dracor.org/doc/odd#character_id)), the character label (`name`, API Feature Ch2 [character_name](https://staging.dracor.org/doc/odd#character_name)), if the character is a group character (`isGroup`, API Feature Ch3 [character_is_group](https://staging.dracor.org/doc/odd#character_is_group)), the assigned gender of the character (`gender`, API Feature Ch4 [character_gender](https://staging.dracor.org/doc/odd#character_gender)) (**Attention**: "sex/gender" is changed in the version 1.1 release of the API and version 1 of the DraCor schema) and it roles (`roles`, API Feature Ch14 [character_role](https://staging.dracor.org/doc/odd#character_role)). Similar information on the character is not included in any response of the DTS endpoint.

While the endpoint of the regular API provides richer information on the character speaking (see list of field keys the previous code cell), the DTS API allows to retrieve citeable units for a given segment of the text. 

### Granular access to the spoken text by character

The following code cells show an example of how the spoken text by the character *Iphigenie* in a certain act and scene can be retrieved:

In [131]:
# Get the citable units of the fifth act

resource = "https://staging.dracor.org/id/ger000001"
ref = "body/div[5]"
down = "-1"
request_url = f"{api_base}dts/navigation?resource={resource}&ref={ref}&down={down}"
r = requests.get(request_url)
response_data = r.json()
fifth_act_citable_units = response_data["member"]

We can check if it is the fifth acts (we guessed the value of the identifier `ref`) by looking a the object liked via the property `ref`, in this case the `dc:title` in the Dublin Core `Metadata` object which is `"Fünfter Aufzug"` (equivalent to "Fifth Act").

In [132]:
# check on the ref if it is really the fifth act
response_data["ref"]

{'identifier': 'body/div[5]',
 '@type': 'CitableUnit',
 'dublinCore': {'title': 'Fünfter Aufzug'},
 'level': 2,
 'parent': 'body',
 'citeType': 'act'}

In [133]:
# How many citable units are there in the fifth act

print("How many citable units are in the fifth act?")
len(fifth_act_citable_units)

How many citable units are in the fifth act?


85

Based on this citable units we can identify scenes by evaluating the propety `citeType`. This is also something that can't be done with the regular DraCor API.

In [134]:
# How many scenes
# This is also something that can't be done with the regular DraCor API

scene_filter = lambda citeable_unit: citeable_unit["citeType"] == "scene"
scenes_fifth_act = list(filter(scene_filter, fifth_act_citable_units))
len(scenes_fifth_act)
print(f"There are {len(scenes_fifth_act)} scenes in the fifth act.")

There are 6 scenes in the fifth act.


Likewise, we can count the speech acts in a given act:

In [135]:
# How many of these are speech acts
# we need to filter here for citeType = "speech"
# Speech filter as a lambda function
speech_filter = lambda citable_unit: citable_unit["citeType"] == "speech"
speeches_fifth_act =  list(filter(speech_filter, fifth_act_citable_units))
print(f"There are {len(speeches_fifth_act)} speech acts in the fifth act.")

There are 65 speech acts in the fifth act.


We can find out, how many speech acts are attributed to a single character in a given segment of the play:

In [136]:
# filter for the speech acts (speeches_fifth_act) of the character iphigenie in the fifth act 
# Speech filter as a lambda function

iphigenie_filter = lambda citeable_unit: "iphigenie" in citeable_unit["extensions"]["speakers"]
iphigenie_speech_acts = list(filter(iphigenie_filter, speeches_fifth_act))
print(f"Iphigenie has {len(iphigenie_speech_acts)} speech acts in the fifth act.")

Iphigenie has 26 speech acts in the fifth act.


With the help of DTS we can also answer questions like "What is the last speech act of a character?"

In [137]:
# We can get the last speech act of the character Iphigenie

iphigenie_speech_acts[-1]

{'identifier': 'body/div[5]/div[6]/sp[12]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'speakers': ['iphigenie'],
  'snippet': 'Nicht so, mein König! Ohne [...] Rechte.'},
 'level': 4,
 'parent': 'body/div[5]/div[6]',
 'citeType': 'speech'}

To compile the actual text spoken by the character *Iphigenie* in the fifth act we need to use the Document endpoint:

In [138]:
%%time

# Generate the text that can be retrieved via DTS Document endpoint

resource = "https://staging.dracor.org/id/ger000001"
iphigenie_text = []
# Use the already downloaded citable units that have been identified as speech acts by iphigenie of the fifth act (iphigenie_speech_acts)
for citeable_unit in iphigenie_speech_acts:
    # we have to query the document endpoint here to get the actual text
    request_url = f"{api_base}dts/document?resource={resource}&ref={citeable_unit["identifier"]}"
    r = requests.get(request_url)
    tei = etree.fromstring(r.text)
    text_nodes = tei.xpath("//tei:l/text()|tei:p/text()", namespaces=namespaces)
    for text_node in text_nodes:
        iphigenie_text.append(text_node)

CPU times: user 210 ms, sys: 51.3 ms, total: 261 ms
Wall time: 25.9 s


Again, some cleaning of whitespace is necessary. The following code cell removes the whitespace only lines and outputs how many lines have been removed:

In [139]:
# Iphigenie speaks approx. 240 lines of text, probably there are lines that contain only whitespace characters which have to be stripped
#len(iphigenie_text)
# Remove the newline characters only lines (again, using regex to identify these lines)

regex_pattern = re.compile(r"^\s*$")

# Counter for items that match the pattern
counter = 0

# Loop over the items
for index, value in enumerate(iphigenie_text):
    if regex_pattern.match(value):
        #print(line)
        counter += 1
        iphigenie_text.pop(index)

# Output the counter
print(f"{counter} lines were removed.")

6 lines were removed.


In [140]:
# There were 6 lines with only whitespace
print(f"The character Iphigenie speaks {len(iphigenie_text)} lines of text in the fifth act.")

The character Iphigenie speaks 234 lines of text in the fifth act.


In [141]:
# Return the text similar to the spoken-text-by-character endpoint
# To do this, we need to join the lines into a single text. The lines should be separated by newline
iphigenie_plain_text = "\n".join(iphigenie_text)

We can output the spoken text by the character Iphigenie in the fifth act:

In [142]:
# We just print the first 1000 characters of the text
print(f"{iphigenie_plain_text[:999]} [...]")

Du forderst mich! Was bringt dich zu uns her?
Ich hab' an Arkas alles klar erzählt.
Die Göttin gibt dir Frist zur Überlegung.
Wenn dir das Herz zum grausamen Entschluß
Verhärtet ist, so solltest du nicht kommen!
Ein König, der Unmenschliches verlangt,
Findt Diener gnug, die gegen Gnad' und Lohn
Den halben Fluch der Tat begierig fassen;
Doch seine Gegenwart bleibt unbefleckt.
Er sinnt den Tod in einer schweren Wolke
Und seine Boten bringen flammendes
Verderben auf des Armen Haupt hinab;
Er aber schwebt durch seine Höhen ruhig,
Ein unerreichter Gott, im Sturme fort.
Nicht Priesterin! nur Agamemnons Tochter.
Der Unbekannten Wort verehrtest du,
Der Fürstin willst du rasch gebieten? Nein!
Von Jugend auf hab' ich gelernt gehorchen,
Erst meinen Eltern und dann einer Gottheit,
Und folgsam fühlt' ich immer meine Seele
Am schönsten frei; allein dem harten Worte,
Dem rauhen Ausspruch eines Mannes mich
Zu fügen, lernt' ich weder dort noch hier.
Wir fassen ein Gesetz begierig an,
Das unsrer Leiden 

In a similar flexible way DTS can be used to retrieve information about the speeches of individual character in certain structural segments of the text and allow for retrieving this text. 

This can not be done with the regular DraCor API endpoints because the endpoints that return a certain subtext of the play are not aware of the play's overall structural division. This structural information can not be retrieved by any other endpoints of the regular DraCor API in such a way that it can be re-connected to the text. The only option with the regular API would be to ultimatively rely on the "bare" TEI-XML.

### Granular access to stage directions

What is possible for speech acts, can also be done for stage directions. The following code briefly demonstrates how to retrieve stage directions of the fifth act:

In [143]:
# Lambda function that can be used to filter the already downloaded citable units 
# in the fifth act (fifth_act_citable_units)
stage_direction_filter = lambda citable_unit: citable_unit["citeType"] == "stage_direction"
fifth_act_stage_directions = list(filter(stage_direction_filter, fifth_act_citable_units))

print(f"There are {len(fifth_act_stage_directions)} stage directions in the fifth act.")

There are 14 stage directions in the fifth act.


In [144]:
# an example of a citable unit representing a stage direction
fifth_act_stage_directions[4]

{'identifier': 'body/div[5]/div[4]/stage[1]',
 '@type': 'CitableUnit',
 'extensions': {'@context': 'https://raw.githubusercontent.com/dracor-org/dracor-ontology/refs/heads/main/json-ld-contexts/dts-extension-context.json',
  'snippet': 'Orest gewaffnet. Die Vorigen.'},
 'level': 4,
 'parent': 'body/div[5]/div[4]',
 'citeType': 'stage_direction'}

### Conclusion

Despite the additional complexity, the experiments demonstrated that the DTS API can be used to retrieve the same text as the DraCor API, albeit with the need for post-processing to handle whitespace and newline characters. The major advantage of the DTS API is that it supports a more granular access to the text that takes the structuring of the play into account. 

For example, the experiments demonstrated how to retrieve the spoken text of the character *Iphigenie* of Goethe's play *Iphigenie auf Tauris* using the DTS API resulting in the same text as returned by the regular API. The DTS-based approach involved filtering the citable units for speech acts (value of  the property `citeType` is `speech`) attributed to *Iphigenie* (`iphigenie` included in the array of the field `speakers`) and then querying the Document endpoint to retrieve the actual text. On the level of the whole play the regular API endpoint shows a better performance, because using the DTS API normally involves multiple calls to the Document endpoint to retrieve the text of a single citable unit. The DTS API also does not provide the additional character information that the DraCor API offers, such as gender and roles. 

The experiments also explored the retrieval of stage directions using both APIs. The DraCor API provides endpoints that return stage directions, including those nested within `<sp>` elements. The DTS API, on the other hand, allows for the retrieval of stage directions by filtering citable units representing stage directions (value of the property `citeType` is `stage_direction`).

The experiments demonstrated that the DTS API can retrieve the same stage directions as the DraCor API, with the added flexibility of filtering by specific structural segments of the text.

One of the significant advantages of the DTS API is its ability to provide structural information about the text. This information can be used to retrieve text representations for specific segments of the text, such as acts and scenes. The experiments demonstrated how to retrieve the spoken text of the character *Iphigenie* in a specific act and scene using the DTS API.
This level of flexibility is not possible with the DraCor API, which does not provide endpoints that return structural information in a way that can be reconnected to the text. The only option with the DraCor API is to rely on the returned TEI-XML, which requires additional parsing and processing.

The experiments conducted in this notebook highlight the strengths and limitations of both the DraCor API and the DTS API. While the DraCor API provides straightforward endpoints for retrieving specific subsets of textual data, the DTS API offers a more flexible and granular approach to accessing and manipulating textual data. This flexibility comes at the cost of additional complexity, decreased performance, and the need for post-processing, but it enables more precise and customized data retrieval.

The DTS API's ability to provide structural information and retrieve text representations for specific segments of the text makes it an additional powerful tool for researchers and developers working with dramatic corpora. However, the lack of additional character information and the need for multiple API requests are limitations that need to be addressed in future developments.

## References

Almas, Bridget, Hugh Cayless, Thibault Clérice, Vincent Jolivet, Pietro Maria Liuzzo, Jonathan
Robie, Matteo Romanello, and Ian Scott. “Distributed Text Services (DTS): A Community-
Built API to Publish and Consume Text Collections as Linked Data.” Journal of the Text
Encoding Initiative (2023). [https://doi.org/10.4000/jtei.4352](https://doi.org/10.4000/jtei.4352). 

Börner, Ingo, and Peer Trilcke. „CLS INFRA D7.1 On Programmable Corpora“. Zenodo, 2023.
[https://doi.org/10.5281/ZENODO.7664964](https://doi.org/10.5281/ZENODO.7664964).

Börner, Ingo, and Peer Trilcke. „CLS INFRA D7.4 Report on the Implementation of Programmable Corpora“. CLS INFRA, 2025.
[https://doi.org/10.5281/zenodo.15301341](https://doi.org/10.5281/zenodo.15301341). 

Cayless, Hugh, Thibault Clérice, Robie Jonathan, Ian Scott, and Bridget Almas: “Distributed Text Services Specifications” (Version 1-alpha) [Computer software], 2024. https://github.com/distributed-text-services/specifications.

Pfister, Manfred. “The Theory and Analysis of Drama” (translated by John Halliday). Cambridge: Cambridge University Press, 2000.