# Alexandria, a manual.

### A detailed, step-by-step guide to using _Alexandria_ for digital text editing. 

### Before you begin

#### Background knowledge

- make sure you are okay with working on the [command line](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb) 

- get familiar with [Jupyter Notebooks](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Jupyter_notebook.ipynb)

#### Installation requirements

- check if you have [installed the alexandria package](link to package). **Important!** Make sure this notebook is running in the `alexandria-markup-server` folder. 

- install [Sublime Text 3](https://github.com/HuygensING/TAG/blob/develop/TAGML/syntax-hilite.README.md). In this tutorial we work with a number of example TAGML transcriptions (in the `/transcriptions` folder). We recommend you open them in Sublime Text 3, an cross-platform editor that has syntax highlighting for TAGML. It makes your work a lot easier. 

## 1. What is _Alexandria_?

### Introduction

Briefly put, _Alexandria_ is a text repository system in which you can store and edit documents. It is the reference implementation of TAG (Text-as-Graph), a flexible graph data model for text. Here, the term "reference implementation" means that _Alexandria_ implements all the properties of the TAG model. So, by working with _Alexandria_ you'll learn more about TAG. And vice versa, if you want to experiment with the flexibilities of the TAG model _Alexandria_ is where you start. 

You'll quickly find out that _Alexandria_ is not just any text repository. It builds on the idea that a text can be studied from various perspectives. For example, you can model a text according to a textual perspective or a documentary perspective; you can focus on adding linguistic annotations of a text, or rather information about structure of its verses. In _Alexandria_, you can store those different perspectives on the same text. What is more, _Alexandria_ is build along the principles of a distributed architecture. It is a command line tool and it can be integrated in your own editor of choice.

By bringing together a wide range of information about a text in a structural and distributed way, _Alexandria_ facilitates the exchange and reuse of scholarly data. In short, _Alexandria_ is a powerful modeling instrument for digital textual research.

### Can I use it?

Yes, you can. Below we explain in detail how _Alexandria_ works and what you need to oprate it. Keep in mind that both the TAG data model and the _Alexandria_ implementation are under development. This means that by using _Alexandria_ you will make a valuable contribution to the development process. We therefore encourage you to try it out and share your thoughts. 

### Why should I use it?

If you enjoy experimenting with data models and advanced text analysis, _Alexandria_ is the tool for you. If you're used to working with XML, it is highly enlightening to work with a data model in which you can easily model overlapping structures, discontinuous elements, and nonlinear text without having to resort to workarounds. 

## 2. Okay, tell me more...

### What you need to know

#### Perspectives, layers and views.

Structuring information has both a conceptual and technical side. The conceptual side concerns the theory, the analysis, or the research objective that forms the motivation for the markup. The technical side, then, refers to the specific markup tags you use to express that concept or theory. 

Let's first get our definitions straight. We've mentioned that reseachers have different perspectives on a text. These have also been described as "orientations to text" and include (but are definitely not limited to) a material, linguistic, or genetic perspective. 

A perspective implies a certain structure of text. For example, if you're interested in the material aspects of a text, you may structure it as _page > section > paragraph > sentence_, which says something like: a page has one or more sections, which have one or more paragraphs, which have one or more sentences. If you'd rather study the linguistic aspects, you may follow the structure _phrase > sentence > word > morpheme_. You can consider this as adding layers of information to a text. 

**Layers** are a crucial concept in _Alexandria_. Briefly put, a layer classifies a set of markup elements. The elements may be grouped because, together, they represent a specific research perspective, or because they are added by a specific user. A layer is hierarchically structured, and a document may contain multiple layers. Markup elements can be in more than one layer. For example: the material structure and the linguistic structure outlined above could form a material layer and a linguistic layer, with the element _sentence_ being part of both layers. **TODO add image MCT of the two structures above as illustration**

A **view** in _Alexandria_ is defined as a selection of markup tags and layers. You, the user, can define your own view(s) by indicating which markup elements or which markup layer(s) you do and don't want to see. The main objective of views is to increase the readability of a large file with lots of markup.  

The difference between layers and views may be confusing at first: if you create a view called "linguistic" that shows only the "linguistic" layer, they are synonymous. But you can also define a linguistic view by selecting a number of markup elements from the linguistic layer, combined with some other elements. It's up to you. 

In conclusion: a perspective is your orientation to the text. It is related to your expertise, your interpretation of the text, and the objective of your study. It's your motive for making a transcription. A layer is a technique for dealing with (self)overlapping structures in your transcription or for identifying the markup added by users in a collaborative editing process. A view, finally, is a user-defined set of markup and/or layers. If this is all too abstract for you, don't worry. It should become clear once we start working with _Alexandria_.

### Workflow

Take a look at the following workflow.

<img src="images/workflow-alexandria.png"/>

In the upper row, reading from left to right, you see several user actions, from creating a transcription to defining a view.  

The second row contains the commands associated with these user actions (`$ alexandria register document`, `$ alexandria define view`, etc); the third row a visual representation of what these actions produce (a document is created, a view is defined, etc).  

The last row symbolises the local repository that you create on your machine when you work with Alexandria. The repo contains all your transcriptions (document instances) and views.  

If you've worked with [git](https://www.atlassian.com/git/tutorials) or [GitHub](https://github.com/), this workflow may seem familiar to you: there's a similar process of initialising a workspace, uploading files, and checking out files. In contrast to git, though, the Alexandria workflow doesn't have a remote repository yet. You work on your local machine, so make sure to save your files appropriately. 

### Markup language

We've developed a markup language called TAGML that allows you to express these and other textual features in a native way. So, if you want to make optimal use of TAG's graph data model for text and the layer-functionality of _Alexandria_, you can transcribe your text in TAGML.

#### Data models
A markup language is basically a serialisation of a model. In other words, this XML sentence `<root><s>The sun's not yellow</s></root>` is a serialisation of the OHCO model behind it: 

<img src="images/ohco-sun.png" width="300" heigth="300">

So the two tags `<s>` and `</s>` together represent one `s` node in the OHCO tree.  

Perhaps we'd like to indicate with markup that the word `sun's` is actually a contraction of the words `sun` and `is`, for example by tagging them with `expan`. As we create a new element node and push down two text nodes one level, we rearrange the document hierarchy: 

<img src="images/ohco-sun2.png" width="400" heigth="400">

This may be basic knowledge to you, but when you are encoding a text it helps to think about the underlying model. TAG's graph model requires a new markup language with which you can tag complex textual occurrences: different coexisting structures, discontinuous elements, etc. 

There's at least three important points on which TAGML diverges from XML: 

- it has asymmetrical tags: `[markup> some text <markup]`. 
- you can use it to express not just strings and markup, but other data types as well, like numbers or boolean values
- you can nest annotations.

You find detailed information about the TAGML syntax in its [documentation](https://github.com/HuygensING/TAG/tree/master/TAGML) and in a Balisage article on the [topic](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#d7633e16).

#### Layers

In TAGML you can easily indicate to which layer(s) a markup element belongs. 

Let's return to the example phrase _The sun's not yellow_, now expressed in TAGML :

     [root>
        [s> The sun's not yellow <s]
     <root]

You can tag the expansion of `sun's` using similar markup as in XML:

    [root>
        [s> The [expan>sun is<expan] not yellow <s]
    <root]

Let's say you also want to add some linguistic information about the negation construction `is not`:
    
    [root>
        [s> The [expan>sun [phr>is<expan] not<phr] yellow <s]
    <root]

However, this results in two overlapping elements `expan` and `phr`. To avoid this overlap, you create two layers. You give each layer a unique layer ID: "L1" for the annotation about the expansion; "L2" for the linguistic annotation. You start a new layer with the `+` sign; markup elements can be in more than one layer:

    [root|+L1, +L2>
        [s|L1,L2> The [expan|L1>sun [phr|L2>is<expan] not<phr] yellow<s]
     <root]

Because each layer is hierarchically ordered, you have now created two trees that share some markup and text nodes: 

<img src="images/demo.png" width="400" heigth="400">

Here, the L1 tree is visualised in red and the L2 tree in blue. You can see that the word `is` has two parents: the markup node `expan` and the markup node `phr`. There's no longer question of conflicting overlapping elements.

If you want, you can now define two views on this TAGML document: one view that shows only the L1 layer and one view that shows only the L2 layer.

This is a very basic example; it may be clear that when you actually start transcribing a text in TAGML, the amount of tags and layers in a document grows exponentially which will interfere with the human-legibility. All the more reason for having views!  

## 3. I want to use it!

Great! Let's get to work.

### What you need to know

#### Command line

Alexandria is a command line tool. In practice this means that it doesn't have an interface: you run Alexandria from your command line (sometimes also called the shell, the terminal, or the command prompt) and interact with it by typing out instructions in words and then hitting the Enter key. Not just any instructions, of course: the command line is very particular about how and what you tell it. 

You can, to some extent, execute shell commands from a Jupyter Notebook. For this to work, it's important that the notebook is running within the `alexandria` directory. **TODO check name of directory**

#### Interacting with _Alexandria_ 

Keep in mind that every command you give to _Alexandria_ starts with `alexandria` and is followed by what you want _Alexandria_ to do: add a new document, export a png file, commit a new view, etc.

These commands are much the same as the commands you give to `git`. Accordingly, the editorial workflow of _Alexandria_ is similar to the `git` workflow. You don't have to be familiar with `git` in order to work with _Alexandria_, but it helps to understand the difference between commands like `add` and `commit`. If you want to know more about it there's an abundance of tutorials online, for example this [cheat sheet](https://gist.github.com/hofmannsven/6814451).

### 3.1. Create a transcription

** TODO: check with Elena whether we may use Roud**

For the purpose of testing, we have created a number of simple TAGML files. The transcriptions are all based on one typescript (CRLR_GR_MS1H16d_1r) of Gustave Roud's _Requiem_ (courtesy of the [Centre de recherches sur les lettres romandes](https://www.unil.ch/crlr/home/menuinst/projets-de-recherche/gustave-roud-oeuvres-completes.html), Université de Lausanne). 

Let's take a look at the contents of the `transcriptions` folder:

In [None]:
! ls transcriptions/

The folder contains three short TAGML transcriptions of the first two paragraphs of a typescript from Roud's _Requiem_. You can open on of them in Sublime Text to become more familiar with the text and the tagging:

In [None]:
! open -a "Sublime Text" transcriptions/roud_ts_test1.tagml

### 3.2. Initialise _Alexandria_

First, we will prepare your working directory for usage by initialising _Alexandria_:

In [None]:
! alexandria init

### 3.3. Register a document in _Alexandria_

The next step is to register "roud-ts-1.tagml" as a document in the _Alexandria_ repository you just started on your machine. To make it easy for yourself, choose a convenient name for the document in line with the [naming conventions](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb#File-naming-conventions).

In [None]:
! alexandria add transcriptions/roud_ts_test1.tagml

You have now added the TAGML file "roud_ts_test1.tagml".
Let's repeat that for the second transcription:

In [None]:
! alexandria add transcriptions/roud_ts_test2.tagml

You have now added the TAGML file "roud_ts_test2.tagml"..

At all times, you can check which documents and views are registered in _Alexandria_:

In [None]:
! alexandria info

_Alexandria_ will inform you about it's version and that you have added two documents.

Adding files to _Alexandria_ basically means you have registered them for your local repository, but mind you: the files are not yet _committed_ to your local repository. Let's do that now: 

In [None]:
! alexandria commit roud_ts_test1.tagml
! alexandria commit roud_ts_test2.tagml

**NOTE** `commit` like this? Or rather as in `git`: `! alexandria commit -m "<commit message>"`

When you commit a TAGML file to your local _Alexandria_ repository, it is automatically parsed as well. So, if you made any syntax mistakes in your transcription, _Alexandria_ will abort the commit and inform you of the syntax error.

**NOTE** After correcting the error, does the user need to `add` the TAGML file again, or can she directly go to the commit?

You have now completed the third step of the workflow (see the diagram above) and succesfully addeed and committed two TAGML documents. These are rather simple documents, each with just one document structure. In the following paragraphs we'll play around with TAGML's ability to express more than one structure by using the layer-functionality.

### 3.4. Add layers

A classic example of overlapping structures in markup is the material perspective versus the textual perspective. 

Open the file "roud_ts_test1.tagml" again and take a look at the typescript facsimile below.

<img src="images/CRLR_GR_MS1H16d_1r_1.png" width="400" heigth="400">

The TAGML file contains tags that represent the textual structure (a _paragraph_ element with one or more _sentence_ element children); we also want to represent the material structure (a _page_ element with one or more _line_ element children). 

To avoid the conflicting overlap between _sentence_ and _line_ elements, we've created two layers. We've given the layer with the textual markup the layer identifier "T". The layer with the material information has got the layer identifier "M". In principle, you're free in your choice of layer ID, but it's best to make it short, logical and coherent. 

In principle you can start a layer anywhere in the TAGML file, but for reasons of clarity we've created a new element called `layerdef` directly under the root `TAGML`, in which we define the two layers "T" and "M":

```
[TAGML>
    [layerdef|+M,+T>

    [! some text here !]

    <layerdef]
<TAGML]
```

Keep in mind that a markup element in the text can be in either one layer, in both, or in none. For more information about the layers and the TAGML syntax, see the [documentation](http://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html#overlapping_selfoverlapping_markup). 

Check out the file "roud_ts_test3.tagml" in the `transcriptions/` directory:

In [None]:
! open -a "Sublime Text" transcriptions/roud_ts_test3.tagml

In [None]:
! alexandria add transcriptions/roud_ts_test3.tagml

In [None]:
! alexandria commit roud_ts_test3.tagml

**NOTE** Same question here: `! alexandria commit -m "<commit message>"` or rather `"! alexandria commit <filename>`?

You now have registered three documents in your local _Alexandria_ repository. Check this by running  

In [None]:
! alexandria info

### 3.5. Define views on the text

Conceptually, views are selection of layers or markup elements. In a JSON file, you identify the markup and layers you want to include in the view. We use the file "roud_ts_test3.tagml" as a starting point for the view definition. 

#### 3.5.1. Markup

First, we create a view that includes only the markup element `s`.

Make a new JSON file in the `views/` directory:

In [None]:
! touch views/view-s-markup.json

Open the file in your Sublime Text editor:

In [None]:
! open -a "Sublime Text" views/view-s-markup.json

# or an editor of your choice:
# ! open -a "<name editor>" views/view-s-markup.json

Enter the following lines of code in the JSON file:

```json
{"includeMarkup": ["s"]}
```

Save and close the file. 

Open a new JSON file to create another view that includes the markup elements `page` and `line`:

In [None]:
! touch views/view-page-line-markup.json
! open -a "Sublime Text" views/view-page-line-markup.json

Enter the following lines of code and save the file:
```json
{"includeMarkup": ["page", "line"]}
```

#### 3.5.2. Layers

Let's create another view, that includes all markup with the layer ID "T".

Open a new JSON file:

In [None]:
! touch views/view-lT.json
! open -a "Sublime Text" views/view-lT.json

Enter the following code:

```json
{"includeLayer":["T"]}
```

Repeat for the layer ID "M".

In [None]:
! touch views/view-lM.json
! open -a "Sublime Text" views/view-lM.json

Enter the following code:

```json
{"includeLayer":["M"]}
```

Save and close the file.

#### 3.5.3. Layers and Markup

You can also define views that includes a layer and excludes markup:

In [None]:
! touch views/view-incl-excl.json
! open -a "Sublime Text" views/view-incl-excl.json

Enter the following code and save the file:

```json
{"includeLayer":["M"]}
{"excludeMarkup":["excerpt", "p"]}
```

You can define as many views on a document as you want.

### 3.6. Register views in _Alexandria_

Let's commit the views we just created into our local repository. They should all be ready, so let's check with _Alexandria_ by running `alexandria info` again: 

In [None]:
! alexandria info

**NOTE** what does this command give? A list of the files that are new and/or changed? 

In [None]:
! ls views/

_Alexandria_ considers files with the .json extension automatically as views.

In [None]:
! alexandria add views/view-s-markup.json

The name of the view is derived from the name of the file, so "view-s-markup.json" will become `view-s-markup`.

Repeat this step for the other views. Check with `alexandria info` if everything works:

In [None]:
! alexandria info

### 3.7. Checkout a view

You can use the views you have just defined to checkout a document:

In [None]:
! alexandria checkout -d roud3 -v s-markup

This generates a TAGML file from document "roud3" containing only text and `s` markup. The generated file is called "roud3-s-markup.tagml" and placed in the folder where you've initialised _Alexandria_. Open the TAGML file in your Sublime Text editor:

In [None]:
! open -a "Sublime Text" roud3-s-markup.tagml

You can repeat these steps for the other views you have created. Play around with it for a while.

### 3.8. Edit generated views

Open the file "roud3-s-markup.tagml" again:

In [None]:
! open -a "Sublime Text" roud3-s-markup.tagml

To be clear: you are now at the 6th step in the workflow. This means you have "checked out" a document from the repository using a certain view. If you want, you can edit the file by adding or removing markup or text. If you're done editing and save your edits, you can compare the two files (the one you checked out with the one you changed):

In [None]:
! alexandria diff roud3-s-markup.tagml

If you don't like the changes, you can revert them:

In [None]:
! alexandria revert roud3-s-markup.tagml

Otherwise, you can commit the changed file to _Alexandria_:

In [None]:
! alexandria register-document -n roud3-s-markup -f transcriptions/roud3-s-markup.tagml

### 3.9. Export different data formats

You can also export your TAGML file to different data formats:

- svg
- png
- dot
- xml
- tagml

The command is `alexandria export-` followed by the format and the name of the document you want to export. For example:

In [None]:
! alexandria export-xml -d roud1

This generates an XML file of the "roud-ts-test1.tagml" file. 

Note that the DOT, SVG, and PNG export generate a MultiColored Tree which, especially in the case of a file with large amounts of markup and text, can be rather demanding for human readers.

**TODO: check if the export to XML works for roud3 (which has two layers)**

## 4. Overview of commands

**TODO: for each command, add a short description of what it does**

    alexandria init
    