<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Start-up" data-toc-modified-id="Start-up-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Start up</a></span></li></ul></div>

<img align="left" src="images/P005381-obverse-photo.png" width="15%"/>
<img align="left" src="images/P005381-obverse-lineart-annot.png" width="15%"/>
<img align="right" src="images/P005381-reverse-photo.png" width="15%"/>
<img align="right" src="images/P005381-reverse-lineart.png" width="15%"/>

<p>
```
&P005381 = MSVO 3, 70
```
</p>
<p>
<img src="images/P005381-obverse-atf.png" width="40%"/>
<img src="images/P005381-reverse-atf.png" width="40%"/>
</p>

<img align="right" src="images/tf-small.png"/>


# Collation

We want to get insights in the co-occurrences of signs on tablets in the 
[Uruk III/IV](http://cdli.ox.ac.uk/wiki/doku.php?id=proto-cuneiform)
corpus (4000-3100 BC).
These tablets have a poor archival context, since they come from rubbish pits, and may have been transported
from various different places than where they have been excavated.

In order to get more information about their chronology and context, we need to study the evolution of
the signs on the tablets. Collation is one of the pre-requisites to do so.

We have downloaded the transcriptions from the 
**Cuneiform Digital Library Initiative**
[CDLI](https://cdli.ucla.edu),
and converted them to
[Text-Fabric](https://github.com/Dans-labs/text-fabric).
Read more about the details of the conversion in the
[checks](checks.ipynb) notebook.
For an introduction to Text-Fabric, follow the
[start](start.ipynb) tutorial.

A handy feature reference is in the [docs](https://github.com/Dans-labs/Nino-cunei/blob/master/docs/transcription.md).

The tutorial ended with a first exercise in collation, where we collated pairs of signs
that co-occur on tablets and used an unsophisticated distance measure.

We repeat that exercise, and proceed to refine the collation method step by step.

# Authors

J. Cale Johnson and Dirk Roorda (see the 
[README](https://github.com/Dans-labs/Nino-cunei)
of this repository).

## Start up

We import the Python modules we need.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys, os, collections
from tf.fabric import Fabric

We set up our working locations on the file system.

In [3]:
REPO = '~/github/Dans-labs/Nino-cunei'
SOURCE = 'uruk'
VERSION = '0.1'
CORPUS = f'{REPO}/tf/{SOURCE}/{VERSION}'
SOURCE_DIR = os.path.expanduser(f'{REPO}/sources/cdli')
PROGRAM_DIR = os.path.expanduser(f'{REPO}/programs')
TEMP_DIR = os.path.expanduser(f'{REPO}/_temp')
REPORT_DIR = os.path.expanduser(f'{REPO}/reports')

We create the temporary and report directories, if they do not exist already.

In [4]:
sys.path.append(PROGRAM_DIR)
from cunei import Cunei

In [5]:
for cdir in (TEMP_DIR, REPORT_DIR):
    os.makedirs(cdir, exist_ok=True)

In [6]:
TF = Fabric(locations=[CORPUS], modules=[''], silent=False )

This is Text-Fabric 3.2.1
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

32 features found and 0 ignored


We let Text-Fabric load most of the data features of the corpus.

Here is the full
[documentation](https://github.com/Dans-labs/Nino-cunei/blob/master/docs/transcription.md)
of the features.

In [7]:
api = TF.load('''
    grapheme prime repeat
    variant variantOuter
    modifier modifierInner modifierFirst
    damage uncertain remarkable written
    period name type identifier catalogId
    number fullNumber origNumber badNumbering
    crossref text
    srcLn srcLnNum
    op sub comments''')
api.makeAvailableIn(globals())
CUNEI = Cunei(api)

  0.00s loading features ...
   |     0.00s B catalogId            from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.02s B fullNumber           from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.02s B number               from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.06s B grapheme             from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.05s B srcLn                from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.03s B srcLnNum             from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.00s B prime                from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.01s B repeat               from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.01s B variant              from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.00s B variantOuter         from /Users/dirk/github/Dans-labs/Nino-cunei/tf/uruk/0.1
   |     0.00s B modi