<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Start-up" data-toc-modified-id="Start-up-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Start up</a></span><ul class="toc-item"><li><span><a data-toc-modified-id="obverse-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>obverse</a></span></li><li><span><a data-toc-modified-id="reverse-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>reverse</a></span></li><li><span><a data-toc-modified-id="obverse-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>obverse</a></span></li><li><span><a data-toc-modified-id="reverse-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>reverse</a></span></li><li><span><a data-toc-modified-id="obverse-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>obverse</a></span></li><li><span><a data-toc-modified-id="reverse-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>reverse</a></span></li></ul></li></ul></div>

<img src="images/ninologo.png" align="right" width="100"/>
<img src="images/tf.png" align="right" width="100"/>


# Numbers

## Start up

We import the Python modules we need.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, sys, collections
from IPython.display import display, Markdown

from tf.extra.cunei import Cunei

In [3]:
LOC = ('~/github', 'Nino-cunei/uruk', 'numbers')
A = Cunei(*LOC)
A.api.makeAvailableIn(globals())

Found 2095 ideograph linearts
Found 2724 tablet linearts
Found 5495 tablet photos


**Documentation:** <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/about.md" title="provenance of this corpus">Uruk IV-III (v1.0)</a> <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/transcription.md" title="feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/Cunei" title="cunei api documentation">Cunei API</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/api" title="text-fabric-api">Text-Fabric API</a>


This notebook online:
<a target="_blank" href="http://nbviewer.jupyter.org/github/Nino-cunei/tutorials/blob/master/bits-and-pieces/numbers.ipynb">NBViewer</a>
<a target="_blank" href="https://github.com/Nino-cunei/tutorials/blob/master/bits-and-pieces/numbers.ipynb">GitHub</a>


In [4]:
def dm(markdown): display(Markdown(markdown))

# Using Text-Fabric Search

Text-Fabric has a [search](https://github.com/Dans-labs/text-fabric/wiki/Api#search) facility, by which you can avoid a lot of programming. Let's get all the shinPP numerals on a reverse face.

In [5]:
pNums = '''
    P005381
    P005447
    P005448
'''.strip().split()

pNumPat = '|'.join(pNums)

In [6]:
shinPP = dict(
    N41=0.2,
    N04=1,
    N19=6,
    N46=60,
    N36=180,
    N49=1800,
)

shinPPPat = '|'.join(shinPP)

We query for shinPP numerals on the faces of selected tablets.
The result of the query is a list of tuples `(t, f, s)` consisting of
a tablet node, a face node and a sign node, which is a shinPP numeral.

In [7]:
query = f'''
tablet catalogId={pNumPat}
    face
        sign type=numeral grapheme={shinPPPat}
'''

In [8]:
results = list(S.search(query))
len(results)

20

We have found 20 numerals.
We group the results by tablet and by face.

In [9]:
numerals = {}
for (tablet, face, sign) in results:
    numerals.setdefault(tablet, {}).setdefault(face, []).append(sign)

We show the tablets, the shinPP numerals per face, and we add up the numerals per face.

In [10]:
for (tablet, faces) in numerals.items():
    dm('---\n')
    display(A.lineart(tablet, withCaption="top", width="200"))
    for (face, signs) in faces.items():
        dm(f'### {F.type.v(face)}')
        distinctSigns = {}
        for s in signs:
            distinctSigns.setdefault(A.atfFromSign(s), []).append(s)
        display(A.lineart(distinctSigns))
        total = 0
        for (signAtf, signs) in distinctSigns.items():
            # note that all signs for the same signAtf have the same grapheme and repeat
            value = 0
            for s in signs:
                value += F.repeat.v(s) * shinPP[F.grapheme.v(s)]
            total += value
            amount = len(signs)
            shinPPval = shinPP[F.grapheme.v(signs[0])]
            repeat = F.repeat.v(signs[0])
            print(f'{amount} x {signAtf} = {amount} x {repeat} x {shinPPval} = {value}')
        dm(f'**total** = **{total}**')

---


### obverse

1 x 9(N19) = 1 x 9 x 6 = 54
1 x 4(N04) = 1 x 4 x 1 = 4
2 x 2(N19) = 2 x 2 x 6 = 24
1 x 2(N04) = 1 x 2 x 1 = 2


**total** = **84**

### reverse

2 x 1(N46) = 2 x 1 x 60 = 120
2 x 2(N19) = 2 x 2 x 6 = 24
1 x 4(N19) = 1 x 4 x 6 = 24


**total** = **168**

---


### obverse

1 x 2(N04) = 1 x 2 x 1 = 2
1 x 1(N46) = 1 x 1 x 60 = 60
1 x 9(N19) = 1 x 9 x 6 = 54


**total** = **116**

### reverse

1 x 1(N36) = 1 x 1 x 180 = 180
1 x 1(N19) = 1 x 1 x 6 = 6


**total** = **186**

---


### obverse

1 x 3(N04) = 1 x 3 x 1 = 3
1 x 2(N04) = 1 x 2 x 1 = 2
1 x 3(N19) = 1 x 3 x 6 = 18


**total** = **23**

### reverse

1 x 3(N19) = 1 x 3 x 6 = 18
1 x 5(N04) = 1 x 5 x 1 = 5


**total** = **23**

# Frequency of Quads

We count the number of quads, according to their ATF string.

In [11]:
quadFreqs = collections.Counter()
for q in F.otype.s('quad'):
    quadFreqs[A.atfFromQuad(q)] += 1

In [12]:
for qAtf in quadFreqs:
    if 'GISZ' in qAtf:
        print(f'{quadFreqs[qAtf]:>4} x {qAtf}')

  15 x |GISZxSZU2~b|
  42 x |GISZxSZU2~a|
  26 x |GISZ.TE|
   1 x |GIxGISZ@t|
   4 x |(GISZx(DIN.DIN))~a|
   3 x |NINDA2xGISZ|
   1 x |(GISZx(DIN.DIN))~c|
   7 x |DUG~bxGISZ|
   3 x |GISZ3~a&GISZ3~a|
   1 x |GISZ+SZU2~b|
   1 x |BU~axGISZ@t|
   1 x |GA2~a1xGISZ@t|
   1 x |(GISZx(DIN.DIN))~b|
   1 x |(GI&GI)xGISZ@t|


Frequency of a particular quad:

In [13]:
pQuad = 'GI4~a'
print(f'The frequency of {pQuad} is {quadFreqs[pQuad]}')

The frequency of GI4~a is 0


Not what you expected? Let's broaden the question

In [14]:
for qAtf in quadFreqs:
    if 'GI4' in qAtf:
        print(f'{quadFreqs[qAtf]:>4} x {qAtf}')

   1 x |GI4~a&GI4~a|
   1 x |GI4~axA|
   1 x |GI4~b&GI4~b|


Ah, `GI4~a` is a single sign, not a complex quad. Quads are by definition complex.

That's why you want signs and quads in one table. Let's do it.

In [15]:
quadSignFreqs = collections.Counter()
quadSignTypes = {'quad', 'sign'}

for n in N():
    nType = F.otype.v(n)
    if nType not in quadSignTypes:
        continue
    atf = A.atfFromQuad(n) if nType == 'quad' else A.atfFromSign(n)
    quadSignFreqs[atf] += 1
for qsAtf in quadSignFreqs:
    if 'GISZ' in qsAtf or 'GI4' in qsAtf:
        print(f'{quadSignFreqs[qsAtf]:>4} x {qsAtf}')

  34 x GI4~a
  15 x |GISZxSZU2~b|
 478 x GISZ
  45 x GISZIMMAR~b1
  42 x |GISZxSZU2~a|
  26 x |GISZ.TE|
  41 x GISZ@t
   1 x |GIxGISZ@t|
  54 x GISZ3~b
  10 x GI4~b
   2 x GISZIMMAR~b3
  10 x GISZGAL
   4 x |(GISZx(DIN.DIN))~a|
   9 x GISZIMMAR~a2
   3 x |NINDA2xGISZ|
   1 x |(GISZx(DIN.DIN))~c|
   7 x |DUG~bxGISZ|
   3 x |GISZ3~a&GISZ3~a|
  20 x GISZ3~a
   6 x GISZIMMAR~a1
   1 x |GISZ+SZU2~b|
   2 x GISZIMMAR~a3
   1 x |GI4~a&GI4~a|
   1 x |BU~axGISZ@t|
   1 x |GI4~axA|
   1 x GISZIMMAR~b2
   1 x |GA2~a1xGISZ@t|
   1 x |GI4~b&GI4~b|
   1 x |(GISZx(DIN.DIN))~b|
   1 x |(GI&GI)xGISZ@t|


In [18]:
pQuad = 'GI4~a'
print(f'The frequency of {pQuad} is {quadSignFreqs[pQuad]}')

The frequency of GI4~a is 34


# Write frequencies

You can make an output directory by hand, but we do it programmatically.

In [16]:
reportDir = 'reports'
os.makedirs(reportDir, exist_ok=True)

In [17]:
def writeFreqs(fileName, data, dataName):
    print(f'There are {len(data)} {dataName}s')

    for (sortName, sortKey) in (
        ('alpha', lambda x: (x[0], -x[1])),
        ('freq', lambda x: (-x[1], x[0])),
    ):
        with open(f'{reportDir}/{fileName}-{sortName}.txt', 'w') as fh:
            for (item, freq) in sorted(data.items(), key=sortKey):
                if item != '':
                    fh.write(f'{freq:>5} x {item}\n')

In [18]:
writeFreqs('quad-signs', quadSignFreqs, 'quad/sign')

There are 2219 quad/signs


# Grabbing subcases

How can we quickly grab cases at a certain level?
There is a short answer and a long answer.

Here is the short anwer: a new `A`-utility function that does it for you.

In [20]:
subcases = A.casesByLevel(2, terminal=True)
len(subcases)

2595

A basic check: which numbers do we get?

In [21]:
caseNumbers = collections.Counter()
for s in subcases:
    caseNumbers[F.number.v(s)] += 1
caseNumbers

Counter({'1b1': 298,
         '1b2': 294,
         '2b1': 200,
         '2b2': 194,
         '2b3': 59,
         '2b4': 25,
         '2a1': 69,
         '2a2': 64,
         '3b1': 89,
         '3b2': 83,
         '4b1': 56,
         '4b2': 56,
         '4b3': 14,
         '4b4': 6,
         '1b3': 93,
         '3a1': 42,
         '3a2': 42,
         '1b4': 36,
         '5b1': 31,
         '5c1': 2,
         '5c2': 2,
         '1c1': 15,
         '1c2': 16,
         '1b5': 18,
         '1b6': 9,
         '4a4': 2,
         '4a1': 18,
         '4a2': 18,
         '4a3': 4,
         '2c1': 14,
         '2c2': 14,
         '5b2': 30,
         '6b2': 21,
         '6b1': 21,
         '1aB1': 1,
         '1aB2': 1,
         '7b1': 12,
         '7b2': 12,
         '1a2': 103,
         '1a1': 105,
         '1a3': 17,
         '5a1': 14,
         '5a2': 14,
         '11b1': 4,
         '11b2': 4,
         '1a4': 3,
         '1a5': 1,
         '3b3': 24,
         '3c1': 7,
         '3c2': 7,
    

This is what happens under the hood.

The next query picks every line in which there is a case in which there is a case
such that the case is a direct child of the line and the subcase is a direct child of the case.

This is what the `sub` edge is for. A case containing another case is very liberal. Cases not only contain 
their subcases, but also their subsubcases and so on.
Using `sub` brings is close to what we want.

In [29]:
query = '''
l:line
    w1:case
        w2:case
l -sub> w1
w1 -sub> w2
'''

In [30]:
results = list(S.search(query))
len(results)

2719

We still may have too much. Subcases may contain cases themselves, and in that case they do not correspond to a single line in the
transcription. 
If you want only subcases that are terminal cases, you have to filter the results by means of an extra line of code.

A case is terminal if it has a feature `terminal` (with value `1`).

In [31]:
subcases = [subcase for (line, case, subcase) in results if F.terminal.v(subcase)]
len(subcases)

2595

Let's just check whether the subcases have reasonable line numbers.

In [32]:
caseNumbers = collections.Counter()
for subcase in subcases:
    cn = F.number.v(subcase)
    if cn:
        caseNumbers[cn] += 1

In [33]:
caseNumbers

Counter({'1b1': 298,
         '1b2': 294,
         '1b3': 93,
         '2b1': 200,
         '2b2': 194,
         '2b3': 59,
         '3b1': 89,
         '3b2': 83,
         '4b1': 56,
         '4b2': 56,
         '4b3': 14,
         '4b4': 6,
         '4b5': 2,
         '4b6': 1,
         '4b7': 1,
         '2a1': 69,
         '2a2': 64,
         '1a2': 103,
         '1a1': 105,
         '2b4': 25,
         '2b5': 8,
         '6b1': 21,
         '6b2': 21,
         '1b4': 36,
         '3b3': 24,
         '5b1': 31,
         '5b2': 30,
         '3a1': 42,
         '3a2': 42,
         '3a3': 8,
         '2a3': 12,
         '2a4': 3,
         '2a5': 2,
         '1a3': 17,
         '6b3': 7,
         '7b1': 12,
         '7b2': 12,
         '8b1': 7,
         '8b2': 7,
         '8b3': 1,
         '2b6': 6,
         '2b7': 5,
         '2b8': 2,
         '3c1': 7,
         '3c2': 7,
         '5a2': 14,
         '5a1': 14,
         '4a3': 4,
         '4a4': 2,
         '4a1': 18,
         '4a2

In order to get cases at deeper levels, we need to compose a query that is dependent on
the given depth. That is exactly what the `A.casesByLevel` function does.

# Displaying subcases

We show the ATF of nested subcases, together with their location in the corpus.

We pick a deep level of cases, in order to make an inventory of the signs involved.

In [36]:
sublevel4Nodes = A.casesByLevel(4, terminal=True)

for node in sublevel4Nodes:
    (pNum, column, lineNum) = T.sectionFromNode(node)
    srcLn = F.srcLn.v(node)
    print(f'{pNum}:{column}:{lineNum} = {srcLn}')

P002694:reverse:1:3 = 3.b2B1. , X [...] 
P002694:reverse:1:3 = 3.b2B2. , X [...] 
P002694:reverse:1:3 = 3.b1B1. , [...] 
P325234:reverse:1:1 = 1.c5a1. 8(N01) , GAR U2~a 
P325234:reverse:1:1 = 1.c5a2. |SZU2.E2~b| 
P006036:obverse:1:2 = 2.b1B1. , (EN~a# PAP~a#)a 
P006036:obverse:1:2 = 2.b1B2. , (3(N57) GAN2)a 
P006036:obverse:1:2 = 2.b2B1. , (EN~a |SZU2.E2~b|)a 
P006036:obverse:1:2 = 2.b2B2. , (BU~a SZU)a 
P006036:obverse:1:2 = 2.b2B3. , (SAL BU~a)a 
P006036:obverse:1:2 = 2.b2B4. , (EN~a HI KASZ~c)a 
P218054:reverse:1:1 = 1.a1A1. [...] 5(N01)# , [...] UDU~a#? 
P218054:reverse:1:1 = 1.a1A2. [...] 7(N01)# , MASZ2 
P006160:obverse:1:2 = 2.b2B1. , ZATU836 BU~a SZE~a 
P006160:obverse:1:2 = 2.b2B2. , U8 LAGAB~a 
P005322:reverse:1:1 = 1.b2B1. 3(N20) 1(N05) 1(N42~a) , 
P005322:reverse:1:1 = 1.b2B2. 3(N18) 1(N03) 1(N40) , 
P005322:reverse:1:1 = 1.b4B1. 2(N05)? 2(N42~a)? , 1(N57)? SZE~a 
P005322:reverse:1:1 = 1.b4B2. 1(N03) 3(N40) , 
P005322:reverse:1:1 = 1.b1B1. 1(N20) 1(N42~a) 1(N25) , 
P005322:

We collect all sign nodes in these cases into a list.

In [37]:
sublevel4Signs = []
for c4 in sublevel4Nodes:
    for sign in L.d(c4, otype='sign'):
        sublevel4Signs.append(sign)

We count the signs occurring in these cases by their full ATF representation.

In [39]:
signs4count = collections.Counter()
for s in sublevel4Signs:
    signs4count[A.atfFromSign(s)] += 1

We print the counts, first sorted by atf representation.

In [40]:
for (sign, amount) in sorted(signs4count.items()):
    print(f'{amount:>4} x {sign}')

  47 x ...
  16 x 1(N01)
   1 x 1(N02)
   2 x 1(N03)
   2 x 1(N05)
   1 x 1(N14)
   1 x 1(N18)
   1 x 1(N20)
   1 x 1(N24~a)
   1 x 1(N25)
   2 x 1(N40)
   2 x 1(N42~a)
  15 x 1(N57)
   5 x 2(N01)
   1 x 2(N05)
   3 x 2(N14)
   1 x 2(N34)
   1 x 2(N40)
   2 x 2(N42~a)
   1 x 2(N57)
   2 x 3(N01)
   1 x 3(N18)
   2 x 3(N20)
   1 x 3(N40)
   3 x 3(N57)
   4 x 4(N01)
   1 x 4(N18)
   1 x 5(N01)
   1 x 5(N03)
   2 x 6(N01)
   2 x 7(N01)
   1 x 7(N57)
   1 x 8(N01)
   1 x AB~a
   1 x ADAB
   2 x ALAN~b
   2 x AMA~a
   6 x AN
   1 x ANSZE~e
   2 x BA
   2 x BAD
   5 x BU~a
   2 x DI
   2 x DILMUN
   1 x DIM~a
   1 x DU
   1 x DU6~a
   2 x DUB~a
   1 x DUG~c
   1 x DUR2
   4 x E2~a
   2 x E2~b
  11 x EN~a
   1 x ERIN
   1 x GA2~a1
   1 x GAL~a
   1 x GAN2
   1 x GAN~c
   2 x GAR
   1 x GESZTU~a
   2 x GI
   1 x GI4~a
   2 x GI6
   1 x GU
   3 x HI
   1 x HI@g~a
   1 x IG~b
   1 x KASZ~c
   1 x KI
   1 x KISZIK~a
   1 x KU3~a
   2 x KUR~a
   1 x LAGAB~a
   1 x LAL3~a
   1 x MA
   3 x MASZ2
   

and now the same list, sorted by frequency.

In [41]:
for (sign, amount) in sorted(
    signs4count.items(),
    key=lambda x: (-x[1], x[0]),
):
    print(f'{amount:>4} x {sign}')

  47 x ...
  16 x 1(N01)
  15 x 1(N57)
  11 x EN~a
  11 x X
   6 x AN
   5 x 2(N01)
   5 x BU~a
   5 x SAL
   5 x SZU
   5 x U4
   4 x 4(N01)
   4 x E2~a
   4 x NUN~a
   4 x SZE3
   3 x 2(N14)
   3 x 3(N57)
   3 x HI
   3 x MASZ2
   3 x PAP~a
   3 x SUHUR
   3 x SZA
   3 x TUR
   3 x U8
   2 x 1(N03)
   2 x 1(N05)
   2 x 1(N40)
   2 x 1(N42~a)
   2 x 2(N42~a)
   2 x 3(N01)
   2 x 3(N20)
   2 x 6(N01)
   2 x 7(N01)
   2 x ALAN~b
   2 x AMA~a
   2 x BA
   2 x BAD
   2 x DI
   2 x DILMUN
   2 x DUB~a
   2 x E2~b
   2 x GAR
   2 x GI
   2 x GI6
   2 x KUR~a
   2 x MUSZ3~a
   2 x NAB
   2 x NAGA~a
   2 x NIN
   2 x SZE~a
   2 x SZU2
   2 x TAK4~a
   2 x UDUNITA~a
   1 x 1(N02)
   1 x 1(N14)
   1 x 1(N18)
   1 x 1(N20)
   1 x 1(N24~a)
   1 x 1(N25)
   1 x 2(N05)
   1 x 2(N34)
   1 x 2(N40)
   1 x 2(N57)
   1 x 3(N18)
   1 x 3(N40)
   1 x 4(N18)
   1 x 5(N01)
   1 x 5(N03)
   1 x 7(N57)
   1 x 8(N01)
   1 x AB~a
   1 x ADAB
   1 x ANSZE~e
   1 x DIM~a
   1 x DU
   1 x DU6~a
   1 x DUG~c
   1 

# TILL SO FAR
Below are ways to classify tablets as to what number systems are present on them.
This I did earlier.

Specification of the Shin systems: just the bare minimum of info.

In [5]:
numberSystems = dict(
    shinP = (40, 3, 18, 24, 45),
    shinPP = (4,19, 36, 41, 46, 49),
    shinS = (25, 27, 28, 42, 5, 20, 47, 37),
)

We turn the numbers into numeral graphemes:

In [6]:
systems = {}

for (shin, numbers) in numberSystems.items():
    systems[shin] = {f'N{n:>02}' for n in numbers}

Reality check

In [7]:
systems

{'shinP': {'N03', 'N18', 'N24', 'N40', 'N45'},
 'shinPP': {'N04', 'N19', 'N36', 'N41', 'N46', 'N49'},
 'shinS': {'N05', 'N20', 'N25', 'N27', 'N28', 'N37', 'N42', 'N47'}}

We also want the opposite: given a numeral, which system is it?

In [8]:
numeralMap = {}

for (shin, numerals) in systems.items():
    for n in numerals:
        if n in numeralMap:
            dm(f'**warning:** Numeral {n} in {shin} was already in {numeralMap[n]}')
        numeralMap[n] = shin

numeralMap

{'N03': 'shinP',
 'N04': 'shinPP',
 'N05': 'shinS',
 'N18': 'shinP',
 'N19': 'shinPP',
 'N20': 'shinS',
 'N24': 'shinP',
 'N25': 'shinS',
 'N27': 'shinS',
 'N28': 'shinS',
 'N36': 'shinPP',
 'N37': 'shinS',
 'N40': 'shinP',
 'N41': 'shinPP',
 'N42': 'shinS',
 'N45': 'shinP',
 'N46': 'shinPP',
 'N47': 'shinS',
 'N49': 'shinPP'}

Exercise:

For each tablet, add three properties: hasShinP, hasShinPP, hasShinS.
They will be True if and only if the tablet has a numeral in that category.
Even better, instead of True or False, we let them record how many numerals in that set they have. 

In [9]:
tabletNumerics = collections.defaultdict(collections.Counter)

for tablet in F.otype.s('tablet'):
    pNum = F.catalogId.v(tablet)
    for sign in L.d(tablet, otype='sign'):
        if F.type.v(sign) == 'numeral':
            numeral = F.grapheme.v(sign)
            system = numeralMap.get(numeral, None)
            if system is not None:
                tabletNumerics[pNum][system] += 1

Now we write a csv file to the report directory, so that you can work with the data in Excel.

We show the first few lines in the notebook

In [10]:
filePath = f'{A.reportDir}/tabletNumerics.tsv'
lines = []
systemNames = sorted(systems)
fieldNames = "\t".join(systemNames)
for pNum in sorted(tabletNumerics):
    data = tabletNumerics[pNum]
    values = "\t".join(str(data[s]) for s in systemNames)
    lines.append(f'{pNum}\t{values}\n')
with open(filePath, 'w') as fh:
    fh.write(f'tablet\t{fieldNames}\n')
    fh.write(''.join(lines))

print(''.join(lines[0:10]))

P000148	0	0	1
P000245	2	0	0
P000266	0	1	0
P000308	2	0	0
P000434	2	0	0
P000511	1	0	0
P000517	1	0	0
P000550	2	0	0
P000734	2	0	0
P000735	2	0	1

