# Perspectives on Text
## _Synthesizing knowledge through markup_
#### Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, Astrid Kulsdom
#### _R&D Digital Infrastructure - KNAW_

### Markup
 Expectations, challenges  
 Requirements
### Method
   Implementation: TAGML  
   Workflow    
### Conclusion
### Discussion

# What is text?

A multilayered, non-linear object containing information which is at times ordered, partially ordered, or unordered

# Markup challenges




<div class="quote">"Most texts are made for a special use in a specific context for a limited group of people"</div><br/><div class="source">(Hillesund 2005)</div>

<div class="quote">"A transcription at its most basic is mediated or subject to mediation"</div> <br/><div class="source">(Shillingsburg 2014)</div>

Logical structure vs. document structure 

# Examples 

- Overlapping structures  
- Discontinuous elements  
- Non-linear elements

# Overlapping structures
<img src="images/Selection-21v.png">
<img src="images/Selection-22v.png">

# Discontinuous elements
<img width="300" height="300" src="images/order.jpg">



# Non-linear structures

```
<s>through shrieks of <sic>slaugter</sic><corr>slaughter</corr></s>
```

<img align="left" width="300" height="300" src="images/order1a.png">
<img align="right" width="300" height="300" src="images/order1b.png">

# Summary

1. Overlap
2. Discontinuity
3. Non-linearity
4. Compatible  
    a. Interoperable  
    b. Reusable
    
"Natural" or idiomatic: the model needs to be close to our understanding of text

# Method
A "shared conception of digital editing" (Hajo 2010) is abandoned in favor of idiosyncratic approaches  


Editors apply markup from a certain perspective on the text: they reify their notions and assumptions.

"Analytical perspectives on text" (Renear _et al_. 1993):

- dramatic:
- prosodic:
- syntactic: 
- etc...

A strict formal observation and with it a description of a text is a **physical impossibility** (cf. Erwin Panofsky)

However markup is not only applied to describe (an interpretation of) a text. In addition to an intellectual, abstract intention it is also applied with a practical intention: to ensure that transcription is processable by software.

So, in summary we can say the following of markup:

# Markup allows us to ...
### ... formally describe our interpretation of text 

### ... create transcriptions that can be processed by software

### ... reify our assumptions about the best ways to describe textual features

Markup is a powerful technology. But:


# With great power comes great responsibilities.


To give just one, simple yet important example: it is not just a matter of identifying and tagging textual features. One also has to take into account how these features may be processed and addressed in later stages. Often, the more complexity you add to your model, the more complex algorithmic operations needed to process it.   

Most digital scholarly editors have been trained to work with the TEI model and with good reason. The TEI is established over several decades by a large community of textual scholars, and can therefore account for an accumulation of scholarly knowledge. Still, it is worthwhile to remain critical and to continue to probe existing models.

The affordances and limitations of a textual model influence our understanding of text (cf. Sahle 2013)

So what happens if we step outside the framework we've all come know, and start all over again? What if we're no longer compelled to think in terms of monohierarchical structures when modeling text and instead take as point departure a model that provides *native* support for multiple hierarchies, without complicated hacks and workarounds? How would we then markup a text? 

Over the past months we've been working on a markup language for an advanced model of text. The technological possibilities offered by this model also prompt novelintellectual questions, some of which I'll discuss today.

# TAGML
Markup language of Text-as-Graph (TAG) model  

Follows the definition of text as a _non-linear and multilayered information object_.

A TAGML file can have multiple **layers**.  

A layer is a set of markup nodes. A layer is hierarchical.

How does that help us? Although all editing efforts need to start with a formal description of the method and the underlying model of text, editors will rarely stick to one of the analytical perspectives outlined by Allen Renear _et al_. Let's go back to the overlap-example I showed previously. Imagine transcribing the poetic structure of the text on this document fragment.

<img src="images/Selection-21v.png">
<img src="images/Selection-22v.png">

```
[tagml>
     [poem>
      [speaker who="springs">2d. Voice from the Springs<speaker]
          [sp>
          [l>Thunderbolts had parched our water<l]
          [l>We had been stained with bitter blood<l]
          [l>And had ran mute 'mid shrieks of slaughter<l]
          [l>Thro' a city & a solitude!<l]
          <sp]
      <poem]
 <tagml]```

```
[tagml>
    [page n="21v">
    ...
      [p>
        [line>2d. Voice from the Springs<line]
        [line>Thunderbolts had parched our water<line]
        [line>We had been stained with bitter blood<line]
      <p]
    <page]
    [page n"22v>
      [p>
        [line>And had ran mute 'mid shrieks of <|[sic>slaugter<sic]|[corr>slaughter<corr]|><line]
        [line>Thro' a city & a solitude!<line]
      <p]
     ...
    <page]
<tagml]
```
        

Let's take a closer look at that last transcription. One could argue that the paragraph isn't really "closed", it just needs to be closed to avoid overlap with the page element. If that weren't necessary, the transcription may look like this:

(In the following transcripton has been stripped of most tags for readability)

```
[page>  
    [p>
      [line>2d. Voice from the Springs<line]
      [line>Thunderbolts had parched our water<line]
      [line>We had been stained with bitter blood<line]
    <page]
    [page>
      [line>And had ran mute 'mid shrieks of slaugter<line]
      [line>Thro' a city and a multitude!<line]
    <p]
<page]
```


The moment structures overlap, the user can create a layer. A layer can be created locally. The layers may be given any name; in this example they are simply referred to as layer A and layer B.

```
[page|+A>  
    [p|+B>
      [line>2d. Voice from the Springs<line]
      [line>Thunderbolts had parched our water<line]
      [line>We had been stained with bitter blood<line]
    <page|A]
    [page|A>
      [line>And had ran mute 'mid shrieks of slaughter<line]
      [line>Thro' a city and a multitude!<line]
    <p|B]
<page|A]
```

Managing TAGML files with multiple layers is done in a repository called _Alexandria_ which stores the TAGM files. 
The workflow is similar to that of Git.

Let's return to the examples I just showed, and let's imagine that the markup is added not by one, but by two editors. We'll name them A and B, or to make it more realistic, Astrid and Bram.

## Astrid
```
[page>
[p>
[line>2d. Voice from the Springs<line]
[line>Thrice three hundred thousand years<line]
[line>We had been stained with bitter blood<line]
<p]
<page]
[page>
[p>
[line>And had ran mute 'mid shrieks of slaughter<line]
[line>Thro' a city and a multitude<line]
<p]
<page]
```

## Bram
```
[page|+A>
[p|+B>
[l>2d. Voice from the Springs<l]
[l>Thrice three hundred thousand years<l]
[l>We had been stained with bitter blood<l]
<page|A]
[page|A>
[l>And had ran mute 'mid shrieks of slaughter<l]
[l>Thro' a city & a multitude<l]
<p|B]
<page|A]
```


Both TAGML transcriptions are merged in Alexandria. Usually, the users would not check out the "master file" but if they would, it would look something like this:

# Astrid + Bram

```
[page|+A>
[p|+B>
[p|+C>
[line>[l>2nd. Voice from the Springs.<l]<line]
[line>[l>Thrice three hundred thousand years<l]<line]
[line>[l>We had been stained with bitter blood<l]<line]
<p|C]
<page|A]
[page|A>
[p|C>
[line>[l>And had ran mute 'mid shrieks of slaughter<l]<line]
[line>[l>Thro' a city and a multitude<l]<line]
<p|B]
<p|C]
<page|A]
```

 It may be clear that, in order to properly manage multiple transcriptions with multiple layers, properly documenting transcriptions is key. If we go back to the statement that adding markup is "making explicit what is implicit", we can say that this explicitness exists on several levels. Not only within the _text_, but also in the form of metadata and additional documenting files. 

# Conclusion

### Text is a multilayered, non-linear object
### The information can be ordered, partially ordered, or unordered 

### TAGML allows for the formal description of textual features

# Discussion

How do we handle the merge of TAG files? Do we consider changes in markup as additions or replacements?

`[l>` to `[line>`

Is the source text part of a perspective or not? In other words, is a perspective only the markup or also the source text? 

<img src="images/Selection-22v.png">

View material:  
`<|[sic>slaugter<sic]|[corr>slaughter<corr]|>`

View poetic:  
`[rhyme>slaughter<rhyme]`

# Extra slides

```
[page|+A>
    [p|+B>
    [p|+C>
        [line>2d. Voice from the Springs<line]
        [line>Thrice three hundred thousand years<line]
        [line>We had been stained with bitter blood<line]
    <p|C]
<page]
[page>
    [p|C>
        [line>And had ran mute 'mid shrieks of slaughter<line]
        [line>Thro' a city and a multitude<line]
    <p|B]
    <p|C]
<page|A]
```

# References

- Alexandria. https://github.com/HuygensING/alexandria-markup ; Information about installing and using the Alexandria command line app is available at links on the TAG portal at https://github.com/HuygensING/TAG.
- Gengnagel, T. 2015. "Marking Up Iconography: Scholarly Editions Beyond Text," in: parergon, 06/11/2015, https://parergon.hypotheses.org/40.
- Haentjens Dekker, R. & Birnbaum, D.J. 2017. "It’s more than just overlap: Text As Graph". In _Proceedings of Balisage: The Markup Conference 201. Balisage Series on Markup Technologies_, vol. 19. doi:10.4242/BalisageVol19.Dekker01. https://www.balisage.net/Proceedings/vol19/html/Dekker01/BalisageVol19-Dekker01.html
- Hajo, C. M. 2010. "The sustainability of the scholarly edition in a digital world". In _Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-term Preservation of XML_. Balisage Series on Markup Technologies, vol. 6. doi:10.4242/BalisageVol6.Hajo01.
- Hillesund, T. 2005. "Digital Text Cycles: From Medieval Manuscripts to Modern Markup". In _Journal of Digital Information_ 6:1. https://journals.tdl.org/jodi/index.php/jodi/article/view/62/65.
- Panofsky, E. 1932/1964. "Zum Problem der Beschreibung und Inhaltsdeutung von Werken der bildenden Kunst" in _Ikonographie und Ikonologie: Theorien, Entwicklung, Probleme (Bildende Kunst als Zeichensystem; vol. 1)_, ed. by Ekkehard Kaemmerling, Köln 1979, pp.185-206.
- Renear, A. H., Mylonas, E., & Durand, D. 1993. "Refining our notion of what text really is: The problem of overlapping hierarchies". https://www.ideals.illinois.edu/bitstream/handle/2142/9407/RefiningOurNotion.pdf?sequence=2&isAllowed=y
- Sahle, P. 2013. _Digitale Editionsformen-Teil 3: Textbegriffe Und Recodierung_. Norderstedt: Books on Demand. http://kups.ub.uni-koeln.de/5353/
- Shelley, P. B. "Prometheus Unbound, Act I", in The Shelley-Godwin Archive, MS. Shelley e. 1, 21v. Retrieved from http://shelleygodwinarchive.org/sc/oxford/prometheus_unbound/act/i/#/p7 
- Shillingsburg, P. 2014. "From physical to digital textuality: Loss and gain in literary projects". In _CEA Critic_ 76:2, pp.158-168.