# Alexandria, a manual.

This is a detailed, step-by-step guide to using _Alexandria_ for digital text editing.

## 1. What is _Alexandria_?

### Introduction

Briefly put, _Alexandria_ is a text repository system in which you can store and edit documents. It is the reference implementation of TAG (Text-as-Graph), a flexible graph data model for text. Here, the term "reference implementation" means that _Alexandria_ implements all the properties of the TAG model. So, by working with _Alexandria_ you'll learn more about TAG. And vice versa, if you want to experiment with the flexibilities of the TAG model _Alexandria_ is where you start. 

You'll quickly find out that _Alexandria_ is not just any text repository. It builds on the idea that a text can be studied from various perspectives. For example, you can model a text according to a textual perspective or a documentary perspective; you can focus on adding linguistic annotations of a text, or rather information about structure of its verses. In _Alexandria_, you can store those different perspectives on the same text. What is more, _Alexandria_ is build along the principles of a distributed architecture. It is a command line tool and it can be integrated in your own editor of choice.

By bringing together a wide range of information about a text in a structural and distributed way, _Alexandria_ facilitates the exchange and reuse of scholarly data. In short, _Alexandria_ is a powerful modeling instrument for digital textual research.

### Can I use it?

Yes, you can. Below we explain in detail how _Alexandria_ works and what you need to oprate it. Keep in mind that both the TAG data model and the _Alexandria_ implementation are under development. This means that by using _Alexandria_ you will make a valuable contribution to the development process. We therefore encourage you to try it out and share your thoughts. 

### Why should I use it?

If you enjoy experimenting with data models and advanced text analysis, _Alexandria_ is the tool for you. If you're used to working with XML, it is highly enlightening to work with a data model in which you can easily model overlapping structures, discontinuous elements, and nonlinear text without having to resort to workarounds. 

## 2. Okay, tell me more...

### What you need to know

#### Perspectives, layers and views.

Structuring information has both a conceptual and technical side. The conceptual side concerns the theory, the analysis, or the research objective that forms the motivation for the markup. The technical side, then, refers to the specific markup tags you use to express that concept or theory. 

Let's first get our definitions straight. We've mentioned that reseachers have different perspectives on a text. These have also been described as "orientations to text" and include (but are definitely not limited to) a material, linguistic, or genetic perspective. 

A perspective implies a certain structure of text. For example, if you're interested in the material aspects of a text, you may structure it as _page > section > paragraph > sentence_, which says something like: a page has one or more sections, which have one or more paragraphs, which have one or more sentences. If you'd rather study the linguistic aspects, you may follow the structure _phrase > sentence > word > morpheme_. You can consider this as adding layers of information to a text. 

**Layers** are a crucial concept in _Alexandria_. Briefly put, a layer classifies a set of markup elements. The elements may be grouped because, together, they represent a specific research perspective, or because they are added by a specific user. A layer is hierarchically structured, and a document may contain multiple layers. Markup elements can be in more than one layer. For example: the material structure and the linguistic structure outlined above could form a material layer and a linguistic layer, with the element _sentence_ being part of both layers. 

A **view** in _Alexandria_ is defined as a selection of markup tags and layers. You, the user, can define your own view(s) by indicating which markup elements or which markup layer(s) you do and don't want to see. The difference between layers and views may be confusing at first: if you create a view called "linguistic" that shows only the "linguistic" layer, they are synonymous. But you can also define a linguistic view by selecting a number of markup elements from the linguistic layer, combined with some other elements. It's up to you. The main objective of views is to increase the readability of a large file with lots of markup. 

### Workflow

Take a look at the following workflow.

<img src="images/workflow-alexandria.png"/>

In the upper row, you see several user actions, from creating a transcription to defining a view.  

The second row contains the commands associated with these user actions ("`alexandria register document`", "`alexandria define view`", etc); the third row a visual representation of what these actions produce (a document is created, a view is defined, etc).  

The last row symbolises the local repository that you create on your machine when you work with Alexandria. The repo contains all your transcriptions (document instances) and views.  

If you've worked with [git](https://www.atlassian.com/git/tutorials) or [GitHub](https://github.com/), this workflow may seem familiar to you: there's a similar process of initialising a workspace, uploading files, and checking out files. In contrast to git, though, the Alexandria workflow doesn't have a remote repository yet. You work on your local machine, so make sure to save your files appropriately. 

### Markup language

We've developed a markup language called TAGML that allows you to express these and other textual features in a native way. So, if you want to make optimal use of TAG's graph data model for text and the layer-functionality of _Alexandria_, you can transcribe your text in TAGML.

#### Models
A markup language is basically a serialisation of a model. In other words, this XML sentence `<root><s>The sun's not yellow</s></root>` is a serialisation of the OHCO model behind it: 

<img src="images/ohco-sun.png" width="300" heigth="300">

So the two tags `<s>` and `</s>` together represent one `s` node in the OHCO tree.  

Perhaps we'd like to indicate with markup that the word `sun's` is actually a contraction of the words `sun` and `is`, for example by tagging them with `expan`. As we create a new element node and push down two text nodes one level, we rearrange the document hierarchy: 

<img src="images/ohco-sun2.png" width="400" heigth="400">

This may be basic knowledge to you, but when you are encoding a text it helps to think about the underlying model. TAG's graph model requires a new markup language with which you can tag complex textual occurrences: different coexisting structures, discontinuous elements, etc. 

There's at least three important points on which TAGML diverges from XML: 

- it has asymmetrical tags: `[markup> some text <markup]`. 
- you can use it to express not just strings and markup, but other data types as well, like numbers or boolean values
- you can nest annotations.

You find detailed information about the TAGML syntax in its [documentation](https://github.com/HuygensING/TAG/tree/master/TAGML) and in a Balisage article on the [topic](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#d7633e16).

#### Syntax highlighter

To facilitate transcribing in TAGML, we developed a syntax highlighter for Sublime Text Editor. You find a link to a download page with additional information about how to install it [here](https://github.com/HuygensING/TAG/blob/develop/TAGML/syntax-hilite.README.md).

#### Layers

In TAGML you can easily indicate to which layer(s) a markup element belongs. 

Let's return to the example phrase _The sun's not yellow_, now expressed in TAGML :

     [root>
        [s> The sun's not yellow <s]
     <root]

You can tag the expansion of `sun's` using similar markup as in XML:

    [root>
        [s> The [expan>sun is<expan] not yellow <s]
    <root]

Let's say you also want to add some linguistic information about the negation construction `is not`:
    
    [root>
        [s> The [expan>sun [phr>is<expan] not<phr] yellow <s]
    <root]

However, this results in two overlapping elements `expan` and `phr`. To avoid this overlap, you create two layers. You give each layer a unique layer ID: "L1" for the annotation about the expansion; "L2" for the linguistic annotation. You start a new layer with the `+` sign; markup elements can be in more than one layer:

    [root|+L1, +L2>
        [s|L1,L2> The [expan|L1>sun [phr|L2>is<expan] not<phr] yellow<s]
     <root]

Because each layer is hierarchically ordered, you have now created two trees that share some markup and text nodes: 

<img src="images/demo.png" width="400" heigth="400">

Here, the L1 tree is visualised in red and the L2 tree in blue. You can see that the word `is` has two parents: the markup node `expan` and the markup node `phr`. There's no longer question of conflicting overlapping elements.

If you want, you can now define two views on this TAGML document: one view that shows only the L1 layer and one view that shows only the L2 layer.

This is a very basic example; it may be clear that when you actually start transcribing a text in TAGML, the amount of tags and layers in a document grows exponentially which will interfere with the human-legibility. All the more reason for having views!  

## 3. I want to use it!

Great! Let's get to work.

#### Command line

Alexandria is a command line tool. In practice this means that it doesn't have an interface: you run Alexandria from your command line (sometimes also called the shell, the terminal, or the command prompt) and interact with it by typing out instructions in words and then hitting the Enter key. Not just any instructions, of course: the command line is very particular about how and what you tell it. 

If you've never worked with the command line before, you may want to take a look at this [tutorial](https://learncodethehardway.org/unix/).

#### Download and install

You can dowload an up-to-date version of Alexandria [here](https://cdn.huygens.knaw.nl/alexandria/alexandria-app.zip). 

This download installs a zip file called `alexandria-app`; unpack it in a directory of your choice. 

The folder contains three subfolders: a `lib` folder with the fat jar, a `bin` folder with the alexandria scripts for linux and windows, and an `example` folder. You can ignore the first two folders, but the `example` folder comes in handy if you want some templates for the views.

Now that you've downloaded Alexandria, you can start it from anywhere on your machine. 

Keep in mind, though, the importance of a good "filesystem hygiene". In short: create a relevant folder in a place that you can easily reach and make sure the folder is [properly named](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb#File-naming-conventions). We advise you to create a separate folder for Alexandria, named something like `alexandria-test`. You can create it anywhere you like, but an appropriate place is your Home directory. Within the `alexandria-text` folder, you can create a subfolder `transcriptions` to store your own transcriptions and a subfolder `views` to store any views you're going to define. If you subsequently initialise Alexandria from the `alexandria-test` directory, you are sure to have all your files in one place and easy to reach.

Once you feel confident about the command line and have installed Alexandria, you can start by creating a transcription.

### Create a transcription

For the purpose of testing, we have created a number of simple TAGML files. The transcriptions are all based on the first two paragraphs of one typescript page (CRLR_GR_MS1H16d_1r) of Gustave Roud's _Requiem_ (courtesy of the [Centre de recherches sur les lettres romandes](https://www.unil.ch/crlr/home/menuinst/projets-de-recherche/gustave-roud-oeuvres-completes.html), Université de Lausanne). 

Navigate to the `alexandria-test` folder on your local machine and create a new subfolder called `transcriptions`. You can download the [example TAGML files](https://github.com/HuygensING/alexandria-markup-server/tree/tutorial/transcriptions) into the `transcriptions` folder.

The transcriptions folder contains the following files:

- roud_ts_1.tagml
- roud_ts_2.tagml

Take a close look at the file "roud_ts_1.tagml" to see if you understand the tagging and the model. It is a transcription of the first two paragraphs on the page. The structure of the text is _excerpt > paragraphs > sentences_. There are no layers. 

Now take a look at "roud_ts_2.tagml". This transcription follows the same structure, but here we also tagged the deletions and additions. A deletion followed by an addition results in a [nonlinear structure](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#order_of_textual_content) which TAGML represents with [branches](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#syntax_nonlinearity). Make sure you understand how this works. 

### Initialise _Alexandria_

If you haven't done so already, navigate to your `alexandria-test/transcriptions` directory on the command line: 

    $ cd alexandria-text/transcriptions

Prepare the directory for usage by initialising _Alexandria_:

    $ alexandria init

You're all set!

##### Interacting with _Alexandria_ 

Keep in mind that every command you give to _Alexandria_ starts with `alexandria` and is followed by what you want _Alexandria_ to do: register a new document, export a png file, define a new view, etc.

Each command is follwed by one or two parameters. With the parameters you specify the transcription file (`-f`) or the name (`-n`) of the document (`-d`). The order of the parameters is insignificant.

### Register a document in _Alexandria_

The next step is to register "roud-ts-1" as a document in the _Alexandria_ repository you just started on your machine. To make it easy for yourself, choose a convenient name for the document in line with the [naming conventions](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb#File-naming-conventions).

    $ alexandria register-document -n roud1 -f roud_ts_test1.tagml

You have now registered the TAGML file `roud_ts_test1.tagml` under the name of `roud1`.
Let's repeat that for the second transcription:

    $ alexandria register-document -n roud2 -f roud_ts_test2.tagml

You have now registered the TAGML file `roud_ts_test2.tagml` under the name of `roud2`.

At all times, you can check which documents and views are registered in _Alexandria_:
    
    $ alexandria info
    
_Alexandria_ will inform you about it's version and that you have two registered documents: "roud1" and "roud2" respectively:

```
alexandria version 2.1-SNAPSHOT
build date: 2018-11-01T21:38:13Z

documents:
  roud1 (created:Wed Nov 14 10:40:00 CET 2018, modified:Wed Nov 14 10:40:01 CET 2018)
  roud2 (created:Wed Nov 14 10:40:22 CET 2018, modified:Wed Nov 14 10:40:22 CET 2018)

no views
```


You have now completed the third step of the workflow (see the diagram above) and succesfully checked in two documents. These are rather simple documents, with just one structure. Let's play around with TAGML's ability to express more than one structure by using the layer-functionality.

### Add layers

A classic example of overlapping structures in markup is the material perspective versus the textual perspective. 

Open the file "roud_ts_test1.tagml" again and take a look at the typescript facsimile below.
<img src="images/CRLR_GR_MS1H16d_1r_1.png" width="400" heigth="400">

The TAGML file contains tags that represent the textual structure (_p > s_); we are going to add tags to represent the material structure (_page > line_). To avoid the conflicting overlap between elements, we create two layers. We'll give the layer with the textual markup the layer id "T" and the layer with the material information the layer id "M". In principle, you're free in your choice of layer id, but it's best to make it short, logical and coherent. 

You can start a layer anywhere in the TAGML file, but for now we create a new element called `layerdef` directly under the root `TAGML`, in which we define the two layers "T" and "M":

```
[TAGML>
[layerdef|+M,+T>
[! some text here !]
<layerdef]
<TAGML]
```

Keep in mind that a markup element in the text can be in either one layer, in both, or in none. For more information about the layers and the TAGML syntax, see the [documentation](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#overlapping_selfoverlapping_markup). The result of your tagging should look like the file "roud_ts_test3.tagml" in the `/transcriptions` directory.

Once you're done transcribing, register the document in _Alexandria_:

    $ alexandria register-document -n roud3 -f transcriptions/roud_ts_test3.tagml

As a test, run `alexandria info` again. You now have three registered documents and no views.

### Define views on the text

We use the file "roud_ts_test3.tagml" as a starting point for the view definition. Conceptually, views are selection of layers or markup elements. You identify the markup and layers you want to include in the view in JSON. 

#### Markup

First, we create a view that includes only the markup element `s`.

In your Sublime Text editor (or another editor of your choice), open a JSON file and enter the following code:

```json
{"includeMarkup": ["s"]}
```

Save the file as "view-s-markup.json" in a new subfolder of the `alexandria` directory called `views`. 

Open a new JSON file to create another view that includes the markup elements `page` and `line`:

```json
{"includeMarkup": ["page", "line"]}
```

Safe the JSON file as "view-page-line-markup.json" in the `views` directory.

#### Layers

Let's create another view, that includes all markup with the layer ID "T".

Open a new JSON file and enter the following:

```json
{"includeLayer":["T"]}
```

Save the file as "view-layer-T.json" in the `views` directory.

Repeat for the layer ID "M".

#### Layers and Markup

You can also define views that includes a layer and excludes markup:

```json
{"includeLayer":["M"]}
{"excludeMarkup":["excerpt", "p"]}
```

Save this file as "view-incl-excl.json" in the `/views` directory.

You can define as many views on a document as you want.

### Register views in _Alexandria_

Let's upload the view files we just created into _Alexandria_:

    $ alexandria define-view -n s-markup -f ../views/views-s-markup.json

You have now registered a view that can later be referred to by the name `s-markup`.

Repeat this step for the other views. Check with `alexandria info` if everything works.

    $ alexandria info

### Checkout a view

You can use the views you have just defined to checkout a document:

    $ alexandria checkout -d roud3 -v s-markup

This generates a TAGML file from document "roud3" containing only text and `s` markup. The generated file is called "roud3-s-markup.tagml" and placed in the folder where you've initialised _Alexandria_ (most probably the `transcriptions` folder). Open the TAGML file in your Sublime Text editor.

You can repeat this step for the other views registered in _Alexandria_. 

### Edit generated views

If you want, you can edit the file "roud3-s-markup.tagml". You can add some new markup or text. 

With the command  

    $ alexandria diff roud3-s-markup.tagml

you can see the changes that you made to this file. If you don't like the changes, you can revert them:

    $ alexandria revert roud3-s-markup.tagml
    
Otherwise, you can commit the changed file to _Alexandria_:

    $ alexandria register-document -n roud3-s-markup -f transcriptions/roud3-s-markup.tagml
        
       

### Export different data formats

You can also export your TAGML file to different data formats:

- svg
- png
- dot
- xml
- tagml

The command is `alexandria export-` followed by the format and the name of the document you want to export:

    $ alexandria export-xml -d roud1 

This generates an XML file of the "roud-ts-test1.tagml" file. 

Note that the DOT, SVG, and PNG export generate a MultiColored Tree which, especially in the case of a file with large amounts of markup and text, can be rather demanding for human readers.


### Overview of commands

    alexandria init
    
    alexandria register-document -d <name doc> -f <path to TAGML file>
    
    alexandria info
    
    alexandria define-view -n <name view> -f <path to JSON file>
    
    alexandria checkout -d <name doc> -v <name view>
    
    alexandria diff <name file>
    
    alexandria revert <name file>
    
    alexandria export-<format> -d <name doc>
    
    alexandria import -d <name doc> -f <name file>
    
    alexandria -h