Introduction

Browsers support

IE / Edge	Firefox	Chrome	Safari
IE11, Edge	last 2 versions	last 2 versions	last 2 versions

Demo

https://argimenes.github.io/standoff-properties-editor/

Introduction

This project is a standoff property editor module built in JavaScript and HTML that can be dropped into any web application. It aims to provide the basic features of a standard text editor along with the ability to create user-defined annotations that can overlap freely. The editor can import or export data in a JSON format and makes no assumptions about the back-end data-store. All annotations are user-defined and passed in as a configuration object. Various event hooks are provided to enable your application code to manage things such as LOD lookups.

In the demo example LOD identifiers are requested via a simple JavaScript prompt, but the continuation-style callbacks allow you to lookup IDs asynchronously. (In my own Digital Humanities project, 'The Codex', lookups are accomplished with modal windows that allow you to either search the backend Neo4j graph database, or quickly create entities inline and pass them back to the editor.)

Standoff properties

A standoff property is a property that 'stands off' from the text it refers to, that is stored apart from the text source. There is a fundamental separation between the raw (or unformatted) text and the properties which standoff from it, and describe the styling and annotations that apply to the text. A property in this context is a data structure that represents a range of characters in the text along with information relevant to the annotation, such as the annotation type. Annotations can be of any type, e.g. stylistic (italics; bold; underline), semantic (line; paragraph; page), or linked entities (database record; URL). The standoff properties editor (SPE) does not mandate any annotations, but some typical ones are provided in the demo example.

Solving the overlap problem

The fundamental limitation of XML and HTML is that semantics are encoded in a tree structure. While descriptive XML schemas have been developed to encode digital humanities texts, XML does not cope well with overlapping annotations. This is because the tree structure of XML mandates that an overlapping annotation (two or more annotations that overlap the same text sequence) cross one or more branches of that tree structure. The result can be complex and ambiguous XML. By representing annotations as discrete, unstructured properties that refer to text with character indexes, annotations can freely overlap as they do not have to conform to a tree structure. An attempt to reconcile the two approaches lead to the creation of the standoff markup format, which encodes annotations as XML elements that define the annotation as the joining of branches in the XML tree. However, such markup generates even less readable XML annotations and could run into problems when the source text changes. These problems disappear, however, if the text and its annotation properties are kept entirely separate. The text, then, is stored in a raw or unformatted state, annotated by a collection of discrete standoff properties.

Static vs dynamic

The technical challenge posed by standoff properties is that they require indexes to link annotations to the words in the text, which suggests that the text cannot be changed without breaking the annotations. However, by using a linked list-style structure composed of SPANs it is possible to create properties that reference characters by pointers, allowing text to be freely inserted or deleted without breaking the annotations. Indexes are only calculated at the end of the session, when the annotated text is to be exported (and presumably saved). Some special handling is required for handling deletions at property boundaries, but everything is basically managed through DOM pointers.

Currently the main speed limitation is the browser's rendering phase when loading very large texts (e.g., hundreds of pages), but there are options for dealing with this.

Layers

As there is no defined limit on the number or types of annotations that can be added to a text, there is the chance that texts may become visually cluttered with annotations. To address this, there is an option to assign a user-defined 'layer' to an annotation for the purpose of grouping them. Layers can be shown and hidden at will, thus reducing clutter. This is particularly helpful when it comes to computer-generated annotations, such as syntactic or semantic annotations created by an NLP library or other text analysis tools.

Annotate the world

As suggested above, there is no reason annotations should be limited to those manually entered by users. Algorithms and libraries could also generate annotations automatically, such as the following:

Lemma
Syntax trees
Entity name recognition
Textual variants (versioning)
Translations (inter-text standoff properties)

Anything that can be mined or derived from a text can be stored as a standoff property.

Text as a graph

While standoff properties can be stored in any format storing them as LOD entities in a graph database vastly increases their potential. For example, if you were searching for all references to a person you would not only find the texts but the exact character positions in the text. If you expanded your query from a person like Leonardo da Vinci, say, to all artists you could see every instance an artist is mentioned in any text. Queries could also be combined across annotation types. For example, if you had the syntax tree of a text you could find every occurence of a term within a given syntactical unit. The more annotation types you record, the greater the number of text minining options become available.

Features

I use the term Standoff Property Document (SPD) to refer to the combination of the raw text and its standoff properties.

Separation of annotations from text source
Text is dynamically editable without corrupting annotations
Annotations can be overlaid freely
Supports zero-point annotations (ZPA), like word-breaking hyphens that need to be marked up from a manuscript but which shouldn't be stored in the text itself
Documents are exported and imported as JSON
Annotations can be grouped into layers, to manage visual complexity
The standoff property data model can easily be extended by the user in their application code through various event hooks, to store as much as or as little data as they need
The editor makes no assumption about where LOD data is stored, and can be easily extended to query any data source

In addition:

Annotations are annotatable with SPDs
SPDs can be annotated with other SPDs (e.g., footnotes, margin notes)
As suggested above, the SPD can be considered a 'first class citizen' capable of infinite recursion

The editor

The demo editor contains four sections. The only mandatory one is the editor panel.

Annotation buttons -- used to apply an annotation to the selected text
Editor panel -- where the text and annotations are entered
Monitor -- lists the annotations at the cursor position, along with tools for managing them
Bind/unbind panel -- the input/output section for the JSON data

Configuration

All annotations are configured in a JSON object passed into the SPEEDy constructor; there are no default or built-in annotations. Along with the type of an annotation (its name), the editor requires annotations to have a format and a topology.

Format

The format value ('decorate' or 'overlay') refers to how styles are to be applied to annotated text. The editor distinguishes between CSS styles that decorate a HTMLElement and CSS styles that overlay a HTMLElement. Because there is a limit of a single underline (border-bottom) per HTMLElement, multiple lines are rendered in the editor by stacking SPANs inside the character SPAN, allowing several lines to overlay a character.

Topology

Topology refers to whether the annotation is one-dimensional (a text range) or zero-dimensional (a point). The usual concept of an annotation is that it applies to a range of text, but SPEEDy also handles annotations that are refer to a position in the text stream between characters (a point). These point annotations can be used to represent things like footnotes or margin notes which need to be located in the text but not represented in the text stream, or to represent characters in the original medium (such as hyphens in a manuscript) that are not required in the text stream.

Getting Started

The SPEEDy configuration object

HTML

<div class="speedy">
   <div id="editor" class="editor"></div>
   <div id="monitor" class="monitor"></div>
</div>

JSON

{
   container: document.getElementById("editor"),
   monitor: document.getElementById("monitor"),
   propertyType: {
      italics: {
         type: "italics",
         shortcut: "i",
         format: "decorate",
         className: "italics"
      },
      agent: {
         type: "agent",
         shortcut: "a",
         format: "overlay",
         className: "agent",
         labelRenderer: function(property) {
            var label = "agent";
            if (property.text) {
               label += " [ " + property.text + " ]";
            }
            return "<span class='agent'>" + label + "</span>";
         },
         propertyValueSelector: function(property, process) {
            // Values are retrieved through whatever mechanism is required -- whether through a simple dialog box
            // or through an AJAX call to a database in a modal search window -- and passed to the process callback.
            var value = prompt("Agent GUID", property.value);
            process(value);
         }
      }
   }
}

// Property type interface
{   
   className: string;   // CSS class name representing the annotation styling
   shortcut: string;    // Single character CONTROL-key shortcut
   format: string;      // Options: "overlay"; "decorate"
   propertyValueSelector: (property: IProperty, process: (value: String) => void) => void;
   labelRenderer: (property: IProperty) => String;
   styleRenderer: (spans: HTMLSpanElement[], property: IProperty) => String;
   unstyleRenderer: (spans: HTMLSpanElement[], property: IProperty) => String;
   attributes: IAttributeHandlers;
   zeroPoint: {
      className: string;
      offerConversion: (property: IProperty) => bool;
      selector: (property: IProperty, process: (value: string) => void) => void;
   }
}

// Interfaces
interface IProperty {
   value: string;
   text: string;
   attributes: IAttributes;
   startNode: HTMLSpanElement;
   endNode: HTMLSpanElement;   
   guid: IGuid?;
   userGuid: IGuid?;
   isDeleted: bool;
   -------------------------
   toNode: () => IStandoffProperty;
}
interface IStandoffProperty {
   type: string;
   text: string?;
   value: string?;
   attributes: IAttributes;
   startIndex: integer?;
   endIndex: integer?;
   isDeleted: bool;
   guid: IGuid?;
   userGuid: IGuid?;
}
interface IAttributeHandlers {
   "key": IAttributeHandler;
}
interface IAttributeHandler {
   renderer: (property: IProperty) => string;
   selector: (property: IProperty, process: (value: string) => void) => void;
}
interface IAttributes {
   "key": IAttribute;
}
interface IAttribute {
   "key": string;
}
interface IGuid {} // 32 character UUID string

Hooks

onPropertyCreated: function (prop: Property, data: JSON);
onPropertyChanged: function (prop: Property);
onPropertyDeleted: function (prop: Property);
onPropertyUnbound: function (data: JSON; prop: Property);
commentManager: (prop: Property);

Methods

bind: { text: string; properties: IStandoffProperty[]; }
setLayerVisibility: (layer: string; selected: bool);
modeClicked: (mode: string);
unbind: function () => IStandoffPropertyEditor;

Demo-Page from DTA

http://www.deutschestextarchiv.de/book/view/patzig_msgermfol841842_1828/?hl=welcher;p=11

Related publications

Neill, I. and Kuczera, A. (2019). The Codex – an Atlas of Relations. ZfdG - Zeitschrift für digitale Geisteswissenschaften. Sonderband 4. http://www.zfdg.de/sb004_008
Neill, I. and Schmidt, D. (2020). SPEEDy A practical editor for texts annotated with standoff properties. GraphSDE2019: Workshop on Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies. Draft available at https://www.researchgate.net/publication/340006851

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
css		css
fonts		fonts
scripts		scripts
HildegardR106.json		HildegardR106.json
HildegardR51.json		HildegardR51.json
HildegardR52.json		HildegardR52.json
LICENSE		LICENSE
R51-with-annotation.json		R51-with-annotation.json
R51-without-annotation.json		R51-without-annotation.json
README.md		README.md
ReggF3-H19-316-no-annotations.json		ReggF3-H19-316-no-annotations.json
ReggF3-H19-316.json		ReggF3-H19-316.json
adam-liszt.json		adam-liszt.json
arabic.json		arabic.json
dta2spo.json		dta2spo.json
index.html		index.html
mallarme.json		mallarme.json
mich1.json		mich1.json
michelangelo-text-block.json		michelangelo-text-block.json
note1.json		note1.json
note2.json		note2.json
query.cypher		query.cypher
rotieren.json		rotieren.json
xml2spo.json		xml2spo.json
xml2spo2.json		xml2spo2.json
xml2spo3.json		xml2spo3.json
xml2spo3a.json		xml2spo3a.json

License

argimenes/standoff-properties-editor

Folders and files

Latest commit

History

Repository files navigation

Browsers support

Demo

Introduction

Standoff properties

Solving the overlap problem

Static vs dynamic

Layers

Annotate the world

Text as a graph

Features

The editor

Configuration

Format

Topology

Getting Started

HTML

JSON

Hooks

Methods

Demo-Page from DTA

Related publications

About

Resources

License

Stars

Watchers

Forks

Languages