<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Tablets" data-toc-modified-id="Tablets-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Tablets</a></span></li><li><span><a href="#Primes" data-toc-modified-id="Primes-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Primes</a></span></li></ul></div>

# Checks
Various checks on the correctness of the transformation from ascii transcriptions to a text-fabric data set.

We will perform *grep* commands on the source files, and we will traverse node in Text-Fabric and collect information.

Then we compare these sets of information.

In [1]:
import sys, os, collections, re
from glob import glob
from tf.fabric import Fabric
from utils import Compare

In [2]:
REPO = '~/github/Dans-labs/nino-cunei'
SOURCE = 'uruk'
VERSION = '0.1'
CORPUS = f'{REPO}/tf/{SOURCE}/{VERSION}'

SOURCE_DIR = os.path.expanduser(f'{REPO}/sources/cdli')

TEMP_DIR = os.path.expanduser(f'{REPO}/_temp')

In [3]:
TF = Fabric(locations=[CORPUS], modules=[''], silent=False )
COMP = Compare(SOURCE_DIR)

This is Text-Fabric 3.1.5
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

24 features found and 0 ignored


In [4]:
api = TF.load('''
    grapheme prime variant modifier
    damage uncertain remarkable written
    name number catalogId period
    srcLn srcLnNum
    op comments
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.00s B catalogId            from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.01s B number               from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.05s B grapheme             from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.04s B srcLn                from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.02s B srcLnNum             from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.00s B prime                from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.01s B variant              from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.00s B modifier             from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.01s B damage               from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.00s B uncertain            from /Users/dirk/github/Dans-labs/nino-cunei/tf/uruk/0.1
   |     0.00s B rema

## Tablets
We check whether we have the same set of tablet numbers.
In TF, the tablet number is stored in the feature `catalogId`.

--to be done --

# Graphemes

## Primes

In [5]:
F.prime.freqList()

((1, 9),)

So there are very few graphemes with a prime. Which are they?

Now let us check the primes with grep, directly in the source files.
We look into lines starting with a (hierarchical number), followed by space,
and then later a single of double prime, but not one within a grapheme, such as `GA'AR`.

In [6]:
def getPrimes():
    for n in F.prime.s(1):
        (tablet, column, line) = T.sectionFromNode(n)
        t = L.u(n, otype='tablet')[0]
        case = L.u(n, otype='case')[0]
        yield f'{F.period.v(t)} {F.srcLnNum.v(case)}: {F.srcLn.v(case)}'

In [7]:
COMP.checkSanity(
    '^[a-zA-Z0-9.\']+\s+.*[\'"][^A]',
    getPrimes,
)

Number of results: TF 9; GREP 9
DIFFERENT
----
TF
----

uruk-iii 48967: 3.b. 3(N41) 1(N24")# , [TAR~a] 
uruk-iii 49069: 1.b. [1(N40)?] 1(N24') , 
uruk-iii 49071: 2.b. 1(N40) 1(N24') , 
uruk-iii 49073: 3.b. 1(N40) 1(N24')# , 
uruk-iii 49075: 4.b. [1(N40) 1(N24')] , 
uruk-iii 49391: 1. [...] 1(N24')# , 
uruk-iii 54446: 1.b. 1(N30c') , 
uruk-iii 55938: 5.a'. 5(N03) 2(N40) 1(N24') 1(N30~a) , 
uruk-iii 55939: 5.b'. 1(N03) 4(N40) 1(N24') DUB~a 1(N30~a) , 
----
GREP
----

uruk-iii 48967: 3.b. 3(N41) 1(N24")# , [TAR~a] 
uruk-iii 49069: 1.b. [1(N40)?] 1(N24') , 
uruk-iii 49071: 2.b. 1(N40) 1(N24') , 
uruk-iii 49073: 3.b. 1(N40) 1(N24')# , 
uruk-iii 49075: 4.b. [1(N40) 1(N24')] , 
uruk-iii 49391: 1. [...] 1(N24')# , 
uruk-iii 54446: 1.b. 1(N30c') , 
uruk-iii 55938: 5.a'. 5(N03) 2(N40) 1(N24') 1(N30~a) , 
uruk-iii 55939: 5.b'. 1(N03) 4(N40) 1(N24') DUB~a 1(N30~a) , 
