<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#MSVO-3,-70" data-toc-modified-id="MSVO-3,-70-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>MSVO 3, 70</a></span></li><li><span><a href="#Text-Fabric" data-toc-modified-id="Text-Fabric-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Text-Fabric</a></span></li><li><span><a href="#Installing-Text-Fabric" data-toc-modified-id="Installing-Text-Fabric-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Installing Text-Fabric</a></span><ul class="toc-item"><li><span><a href="#Prerequisites" data-toc-modified-id="Prerequisites-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Prerequisites</a></span></li><li><span><a href="#TF-itself" data-toc-modified-id="TF-itself-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>TF itself</a></span></li></ul></li><li><span><a href="#Pulling-up-a-tablet-and-its-transliteration-using-a-p-number" data-toc-modified-id="Pulling-up-a-tablet-and-its-transliteration-using-a-p-number-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Pulling up a tablet and its transliteration using a p-number</a></span></li><li><span><a href="#Non-numerical-quads" data-toc-modified-id="Non-numerical-quads-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Non-numerical quads</a></span></li><li><span><a href="#Generating-a-list-of-sign-frequency-and-saving-it-as-a-separate-file" data-toc-modified-id="Generating-a-list-of-sign-frequency-and-saving-it-as-a-separate-file-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Generating a list of sign frequency and saving it as a separate file</a></span></li></ul></div>

# Primer 1

This notebook is meant for those with little or no familiarity with 
[Text-Fabric](https://github.com/Dans-labs/text-fabric) and will focus on several basic tasks, including calling up an individual proto-cuneiform tablet using a p-number, the coding of complex proto-cuneiform signs using what we will call "quads" and the identification of one of the numeral systems, and a quick look at the frequency of a few sign clusters. Each primer, including this one, will focus on a single tablet and explore three or four analytical possibilities. In this primer we look at MSVO 3, 70, which has the p-number P005381 at CDLI.

## MSVO 3, 70

The proto-cuneiform tablet known as MSVO 3, 70, is held in the British Museum, where it has the museum number BM 140852. The tablet dates to the Uruk III period, ca. 3200-3000 BCE, and is slated for publication in the third volume of Materialien zu den frühen Schriftzeugnissen des Vorderen Orients (MSVO). Up to now it has only appeared as a photo in Frühe Schrift (Nissen, Damerow and Englund 1990), p. 38.

We'll show the lineart for this tablet and its ATF transcription in a moment, including a link to this tablet on CDLI.

## Text-Fabric

Text-Fabric (TF) is a model for textual data with annotations that is optimized for efficient data analysis. As we will begin to see at the end of this primer, when we check the totals on the reverse of our primer tablet, Text-Fabric also facilitates the creation of new, derived data, which can be added to the original data.

Working with TF is a bit like buying from IKEA. You get all the bits and pieces in a box, and then you assemble it yourself. TF decomposes any dataset into its components, nicely stacked, with every component uniquely labeled. And then we use short reusable bits of code to do specific things. TF is based on a model proposed by [Doedens](http://books.google.nl/books?id=9ggOBRz1dO4C) that focuses on the essential properties of texts such sequence and embedding. For a description of how Text-Fabric has been used for work on the Hebrew Bible, see Dirk Roorda's article [The Hebrew Bible as Data: Laboratory - Sharing - Experiences](https://doi.org/10.5334/bbi.18).

Once data is transformed into Text-Fabric, it can also be used to build rich online interfaces for specific groups of ancient texts. For the Hebrew Bible, have a look at [SHEBANQ](https://shebanq.ancient-data.org/hebrew/text).

The best environment for using Text-Fabric is in a [Jupyter Notebook](http://jupyter.readthedocs.io/en/latest/). This primer is in a Jupyter Notebook: the snippets of code can only be executed if you have installed Python 3, Jupyter Notebook, and Text-Fabric on your own computer.

## Installing Text-Fabric

### Prerequisites

You need to have Python on your system. Most systems have it out of the box,
but alas, that is python2 and we need at least python 3.6.

Install it from [python.org]() or from [Anaconda]().
If you got it from python.org, you also have to install [Jupyter]().

### TF itself

```
pip install text-fabric
```

if you have installed Python with the help of Anaconda, or

```
sudo -H pip3 install text-fabric
```
if you have installed Python from [python.org](https://www.python.org).

###### Execute: If all this is done, the following cells can be executed.

In [1]:
import os, sys, collections
from IPython.display import display
from tf.extra.cunei import Cunei

In [2]:
import sys, os
LOC = ('~/github', 'Nino-cunei/uruk', 'primer1')
A = Cunei(*LOC)
A.api.makeAvailableIn(globals())

Found 2095 ideograph linearts
Found 2724 tablet linearts
Found 5495 tablet photos


**Documentation:** <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/about.md" title="provenance of this corpus">Uruk IV-III (v1.0)</a> <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/transcription.md" title="feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/Cunei" title="cunei api documentation">Cunei API</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/api" title="text-fabric-api">Text-Fabric API 3.4.12</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/api#search-template-introduction" title="Search Templates Introduction and Reference">Search Reference</a>


This notebook online:
<a target="_blank" href="http://nbviewer.jupyter.org/github/Nino-cunei/tutorials/blob/master/bits-and-pieces/primer1.ipynb">NBViewer</a>
<a target="_blank" href="https://github.com/Nino-cunei/tutorials/blob/master/bits-and-pieces/primer1.ipynb">GitHub</a>


## Pulling up a tablet and its transliteration using a p-number

Each cuneiform tablet has a unique "p-number" and we can use this p-number in Text-Fabric to bring up any images and the transliteration of a tablet, here P005381.

There is a "node" in Text-Fabric for this tablet. How do we find it and display the transliteration?

* We *search* for the tablet by means of a template;
* we use functions `A.lineart()` and `A.getSource()` to bring up the lineart and transliterations of tablets.

In [3]:
pNum = 'P005381'
query = f'''
tablet catalogId={pNum}
'''
results = A.search(query)

1 result


The `results` is a list of "records".

Here we have only one result: `results[0]`.

Each result record is a tuple of nodes mentioned in the template.

Here we only mentioned a single thing: `tablet`.
So we find the node of the matched tablets as the firt member of the result records.

Hence the result tablet node is `results[0][0]`.

In [4]:
tablet = results[0][0]
A.lineart(tablet, width=300)
A.getSource(tablet)

['&P005381 = MSVO 3, 70',
 '#atf: lang qpc ',
 '@obverse ',
 '@column 1 ',
 '1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a ',
 '1.b. 3(N19) , |GISZ.TE| ',
 '2. 1(N14) , NAR NUN~a SIG7 ',
 '3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a ',
 '@column 2 ',
 '1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a ',
 '2. , GU7 AZ SI4~f ',
 '@reverse ',
 '@column 1 ',
 '1. 3(N14) , SZE~a ',
 '2. 3(N19) 5(N04) , ',
 '3. , GU7 ',
 '@column 2 ',
 '1. , AZ SI4~f ']

Now we want to view the numerals on the tablet.

In [5]:
query = f'''
tablet catalogId={pNum}
  sign type=numeral
'''
results = A.search(query)

10 results


It is easy to show them at a glance:

In [6]:
A.show(results)


##### Tablet 1


Or we can show them in a table.

In [7]:
A.table(results)

n | tablet | sign
--- | --- | ---
1|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|2(N14) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/2(N14).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-2(n14).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
2|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|1(N14) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/1(N14).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-1(n14).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
3|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|2(N04) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/2(N04).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-2(n04).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
4|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|3(N04) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N04).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n04).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
5|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|1(N57) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/1(N57).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-1(n57).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
6|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|1(N57) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/1(N57).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-1(n57).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
7|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|3(N14) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N14).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n14).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
8|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|3(N19) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N19).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n19).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
9|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|5(N04) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/5(N04).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-5(n04).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>
10|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|3(N19) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N19).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n19).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>

There are a few different types of numerals here, but we are just going to look at the numbers belonging to the "shin prime prime" system, abbreviated here as "shinPP," which regularly adds two narrow horizatonal wedges to each number. N04, which is the basic unit in this system, is the fourth, fith and ninth of the preceding numerals: in the fourth occurrence repeated twice, in the fifth, three times and, unsurprisingly, in the ninth, which is the total on the reverse, five times. (N19, which is the next bundling unit in the same system, also occurs in the text.)

In [8]:
shinPP = dict(
    N41 = 0.2,
    N04 = 1,
    N19 = 6,
    N46 = 60,
    N36 = 180,
    N49 = 1800,
)

First, let's see if we can locate one of the occurrences of shinPP numerals, namely the set of 3(N04) in the first case of the second column on the obverse, using Text-Fabric.

In [9]:
query = f'''
tablet catalogId={pNum}
  face type=obverse
    column number=2
      line number=1
        =: sign
'''
results = A.search(query)
A.table(results)

1 result


n | tablet | face | column | line | sign
--- | --- | --- | --- | --- | ---
1|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|face obverse|column 2|line 1|3(N04) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N04).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n04).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>

Note the `:=` in `=: sign`. This is a device to require that the sign starts at the same position
as the `line` above it. Effectively, we ask for the first sign of the line.

Now the result records are tuples `(tablet, face, column, line, sign)`, so if we want
the sign part of the first result, we have to say `results[0][4]` (Python counts from 0).

In [10]:
num = results[0][4]
A.pretty(num, withNodes=True)

This number is the "node" in Text-Fabric that corresponds to the first sign in the first case of column 2. It is like a bar-code for that position in the entire corpus. Now let's make sure that this node, viz. 106602, is actually a numeral. To do this we check the feature "numeral" of the node 106602. And then we can use A.atfFromSign to extract the transliteration.

In [11]:
print(F.type.v(num) == 'numeral')
print(A.atfFromSign(num))

True
3(N04)


Let's get the name of the numeral, viz. N04, and the number of times that it occurs. This amounts to splitting apart "3" and "(N04)" but since we are calling features in Text-Fabric rather than trying to pull elements out of the transliteration, we do not need to tweak the information.

In [12]:
grapheme = F.grapheme.v(num)
print(grapheme)

iteration = F.repeat.v(num)
print(iteration)

N04
3


Now we can replace "N04" with its value, using the shinPP dictionary above, and then multiple this value by the number of iterations to arrive at the value of the numeral as a whole. Since each occurrence of the numeral N04 has a value of 1, three occurrences of it should have a value of 3.

In [13]:
valueFromDict = shinPP.get(grapheme)

value = iteration * valueFromDict
print(value)

3


Just to make sure that we are calculating these values correctly, let's try it again with a numeral whose value is not 1. There is a nice example in case 1b in column 1 on the obverse, where we have 3 occurrences of N19, each of which has a value of 6, so we expect the total value of 3(N19 to be 18.

In [14]:
query = f'''
tablet catalogId={pNum}
  face type=obverse
    column number=1
      case number=1b
        =: sign
'''
results = A.search(query)
A.table(results)

1 result


n | tablet | face | column | case | sign
--- | --- | --- | --- | --- | ---
1|<a target="_blank" href="https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P005381" title="to CDLI main page for this tablet">tablet P005381</a>|face obverse|column 1|case 1b|3(N19) <a target="_blank" href="https://cdli.ucla.edu/tools/SignLists/protocuneiform/archsigns/3(N19).jpg" title="to higher resolution lineart on CDLI"><img src="cdli-imagery/lineart-3(n19).jpg" style="display: inline;max-width: 2em; max-height: 4em;"  /></a>

In [15]:
sign = results[0][4]
grapheme = F.grapheme.v(sign)
iteration = F.repeat.v(sign)

valueFromDict = shinPP.get(grapheme, 0)

value = iteration * valueFromDict
print(value)

18


The next step is to walk through the nodes on the obverse, add up the total of the shinPP system on the obverse, and then do the same for the reverse and see if the obverse and the total on the reverse add up. We expect the 3(N19) and 5(N04) on the obverse to add up to 23, viz. 18 + 5 = 23.

In [16]:
shinPPpat = '|'.join(shinPP)
query = f'''
tablet catalogId={pNum}
  face
    sign grapheme={shinPPpat}
'''
results = A.search(query)
A.show(results)

5 results



##### Tablet 1


In [17]:
sums = collections.Counter()

for (tablet, face, num) in results:
    grapheme = F.grapheme.v(num)
    iteration = F.repeat.v(num)
    valueFromDict = shinPP[grapheme]
    value = iteration * valueFromDict
    sums[F.type.v(face)] += value

for faceType in sums:
    print(f'{faceType}: {sums[faceType]}')

obverse: 23
reverse: 23


It adds up!

## Non-numerical quads

Now that we have identified the numeral system in the first case of column 2 on the obverse, let's also see what we can find out about the non-numeral signs in the same case. 

We use the term "quad" to refer to all orthographic elements that occupy the space of a single proto-cuneiform sign on the surface of the tablet. This includes both an individual proto-cuneiform sign operating on its own as well as combinations of signs that occupy the same space. One of the most elaborate quads in the proto-cuneiform corpus is the following:

```
|SZU2.((HI+1(N57))+(HI+1(N57)))|
```

This quad has two sub-quads `SZU2`, `(HI+1(N57))+(HI+1(N57))`, and the second sub-quad also consists of two sub-quads `HI+1(N57)` and `HI+1(N57)`; both of these sub-quads are, in turn, composed of two further sub-quads `HI` and `1(N57)`.

First we need to pick this super-quad out of the rest of the line: this is how we get the transliteration of the entire line:

In [18]:
query = f'''
tablet catalogId={pNum}
  face type=obverse
    column number=2
      line number=1
'''
results = A.search(query)
line = results[0][3]
A.pretty(line, withNodes=True)

1 result


We can just read off the node of the biggest quad.

In [19]:
bigQuad = 143015

Now that we have identified the "bigQuad," we can also ask Text-Fabric to show us what it looks like. 

In [20]:
A.lineart(bigQuad)

This extremely complex quad, viz. |SZU2.((HI+1(N57))+(HI+1(N57)))|, is a hapax legomenon, meaning that it only occurs here, but there are three other non-numeral quads in this line besides |SZU2.((HI+1(N57))+(HI+1(N57)))|, namely |GISZ.TE|, GAR and GI4~a, so let's see how frequent these four non-numeral signs are in the proto-cuneiform corpus. We can do this sign by sign using the function "F.grapheme.s()".

In [21]:
GISZTEs = F.grapheme.s('|GISZ.TE|')
print(f'|GISZ.TE| {len(GISZTEs)} times')

GARs = F.grapheme.s('GAR')
print(f'GAR = {len(GARs)} times')

GI4s = F.grapheme.s('GI4')
print(f'GI4 = {len(GI4s)} times')

|GISZ.TE| 0 times
GAR = 412 times
GI4 = 44 times


There are two problems here that we need to resolve in order to get good numbers: we have to get Text-Fabric to count |GISZ.TE| as a single unit, even though it is composed of two distinct graphemes, and we have to ask it to recognize and count the "a" variant of "GI4". In order to count the number of quads that consist of GISZ and TE, namely |GISZ.TE|, it is convenient to make a frequency index for all quads.

We walk through all the quads, pick up its ATF, and count the frequencies of ATF representations.

In [22]:
quadFreqs = collections.Counter()
for q in F.otype.s('quad'):
    quadFreqs[A.atfFromQuad(q)] += 1

With this in hand, we can quickly count how many quads there are that have both signs `GISZ` and `TE` in them.

Added bonus: we shall also see whether there are quads with both of these signs but composed with other operators and signs as well.

In [23]:
for qAtf in quadFreqs:
    if 'GISZ' in qAtf and 'TE' in qAtf:
        print(f'{qAtf} ={quadFreqs[qAtf]:>4} times')

|GISZ.TE| =  26 times


And we can also look at the set of quads in which GISZ co-occurs with another sign, and likewise, the set of quads in which TE co-occurs with another sign.

In [24]:
for qAtf in quadFreqs:
    if 'GISZ' in qAtf:
        print(f'{quadFreqs[qAtf]:>4} x {qAtf}')

  15 x |GISZxSZU2~b|
  42 x |GISZxSZU2~a|
  26 x |GISZ.TE|
   1 x |GIxGISZ@t|
   4 x |(GISZx(DIN.DIN))~a|
   3 x |NINDA2xGISZ|
   1 x |(GISZx(DIN.DIN))~c|
   7 x |DUG~bxGISZ|
   3 x |GISZ3~a&GISZ3~a|
   1 x |GISZ+SZU2~b|
   1 x |BU~axGISZ@t|
   1 x |GA2~a1xGISZ@t|
   1 x |(GISZx(DIN.DIN))~b|
   1 x |(GI&GI)xGISZ@t|


In [25]:
for qAtf in quadFreqs:
    if 'TE' in qAtf:
        print(f'{quadFreqs[qAtf]:>4} x {qAtf}')

  26 x |GISZ.TE|
  10 x |LAGAB~b.TE|


Most of the time, however, when we are interested in particular sign frequencies, we want to cast a wide net and get the frequency of any possibly related sign or quad. The best way to do this is to check the ATF of any sign or quad that might be relevant and add up the number of its occurrences in the corpus. This following script checks both signs and quads and casts the net widely. It looks for the frequency of our same three signs/quads, namely GAR, GI4~a and |GISZ.TE|.

In [26]:
quadSignFreqs = collections.Counter()
quadSignTypes = {'quad', 'sign'}

for n in N():
    nType = F.otype.v(n)
    if nType not in quadSignTypes:
        continue
    atf = A.atfFromQuad(n) if nType == 'quad' else A.atfFromSign(n)
    quadSignFreqs[atf] += 1

We have now an frequency index for all signs and quads in their ATF representation.
Note that if a sign is part of a bigger quad, its occurrence there will be counted as an occurrence of the sign.

In [27]:
selectedAtfs = []

for qsAtf in quadSignFreqs:
    if 'GAR' in qsAtf or 'GI4~a' in qsAtf or '|GISZ.TE|' in qsAtf:
        selectedAtfs.append(qsAtf)
        print(f'{quadSignFreqs[qsAtf]:>4} x {qsAtf}')

  34 x GI4~a
  17 x |SILA3~axGARA2~a|
  73 x GARA2~a
  91 x NAGAR~b
 401 x GAR
  56 x LAGAR~a
  20 x |ZATU651xGAR|
   3 x |NINDA2xGAR|
  99 x NAGAR~a
   1 x LAGAR~a@r
  26 x |GISZ.TE|
   3 x GAR@g~b
  22 x ESZGAR
  18 x AGAR2
   6 x |4(N57).GAR|
   8 x |5(N57).GAR|
   4 x |ZATU737xGAR|
   1 x |MAR~bxGAR|
   3 x GAR@g~c
   2 x |3(N57).GAR|
   1 x |SILA3~a+GARA2~a|
   1 x |7(N57).GAR|
   1 x |GAN~cx(4(N57).GAR)|
   4 x |6(N57).GAR|
   2 x |SZE~a.GAR|
   1 x GAR~b
   4 x LAGAR~c
   5 x LAGAR~b1
   2 x GARA2~b
   1 x |GI4~a&GI4~a|
   1 x |N(N57).GAR|
   1 x |GI4~axA|
  29 x GAR3
   4 x GAR@g~a
   3 x LAGAR~b2


Let's draw all these quads.

In [28]:
for sAtf in selectedAtfs:
    A.lineart(sAtf, width='5em', height='5em', withCaption='right')

Besides our three targets, 34 occurrences of GI4~a, 401 of GAR and 26 of |GISZ.TE|:

    34 x GI4~a
    401 x GAR
    26 x |GISZ.TE|

it has also pulled in a number of quads that include either GAR or GI4~a, among others:

    20 x |ZATU651xGAR|
    3 x |NINDA2xGAR|
    6 x |4(N57).GAR|
 
    1 x |GI4~a&GI4~a|
    1 x |GI4~axA|

There are also other signs tas well as signs that only resemble GAR in transliteration such as LAGAR or GARA2, but as long as we know what we are looking for this type of broader frequency count can be quite useful.

## Generating a list of sign frequency and saving it as a separate file

First, we are going to count the number of distinct signs in the corpus, look at the top hits in the list and finally save the full list to a separate file. Then we will do the same for the quads, and then lastly we are going to combine these two lists and save them as a single frequency list for both signs and quads.

In [29]:
fullGraphemes = collections.Counter()

for n in F.otype.s('sign'):
    grapheme = F.grapheme.v(n)
    if grapheme == '' or grapheme == '…':
        continue
    fullGrapheme = A.atfFromSign(n)
    fullGraphemes[fullGrapheme] += 1
    
len(fullGraphemes)

1477

So there are 1477 distinct proto-cuneiform signs in the corpus. The following snippet of code will show us the first 20 signs on that list.

In [30]:
for (value, frequency) in sorted(
    fullGraphemes.items(), 
    key=lambda x: (-x[1], x[0]),
)[0:20]:
    print(f'{frequency:>5} x {value}')

12983 x 1(N01)
 6870 x X
 3080 x 2(N01)
 2584 x 1(N14)
 1830 x EN~a
 1598 x 3(N01)
 1357 x 2(N14)
 1294 x 5(N01)
 1294 x SZE~a
 1164 x GAL~a
 1117 x 4(N01)
 1022 x U4
 1020 x AN
  999 x 1(N34)
  876 x SAL
  851 x PAP~a
  849 x GI
  791 x 3(N14)
  789 x 1(N57)
  781 x BA


Now we are going to write the full set of sign frequency results to two files in your `_temp` directory, within this repo. The two files are called:

* `grapheme-alpha.txt`, an alphabetic list of graphemes, along with the frequency of each sign, and
* `grapheme-freq.txt`, which runs from the most frequent to the least.

In [31]:
def writeFreqs(fileName, data, dataName):
    print(f'There are {len(data)} {dataName}s')

    for (sortName, sortKey) in (
        ('alpha', lambda x: (x[0], -x[1])),
        ('freq', lambda x: (-x[1], x[0])),
    ):
        with open(f'{A.tempDir}/{fileName}-{sortName}.txt', 'w') as fh:
            for (item, freq) in sorted(data, key=sortKey):
                if item != '':
                    fh.write(f'{freq:>5} x {item}\n')

Now let's go through some of the same steps for quads rather than individual signs, and then export a single frequency list for both signs and quads.

In [32]:
quadFreqs = collections.Counter()
for q in F.otype.s('quad'):
    quadFreqs[A.atfFromQuad(q)] += 1
print(len(quadFreqs))

740


So there are 740 quads in the corpus, and now we ask for the twenty most frequently attested quads.

In [33]:
for (value, frequency) in sorted(
    quadFreqs.items(), 
    key=lambda x: (-x[1], x[0]),
)[0:20]:
    print(f'{frequency:>5} x {value}')

  159 x |U4x1(N57)|
  128 x |DUG~bxX|
  112 x |SZE~a.NAM2|
   86 x |GI&GI|
   85 x |GA~a.ZATU753|
   84 x |1(N58).BAD~a|
   78 x |NI~a.RU|
   68 x |DUG~cx1(N57)|
   57 x |U4x2(N57)|
   52 x |BU~a+DU6~a|
   50 x |SAL.KUR~a|
   43 x |U4.1(N08)|
   43 x |U4x3(N57)|
   42 x |BAD&BAD|
   42 x |GISZxSZU2~a|
   39 x |1(N57).SZUBUR|
   38 x |E2~ax1(N57)@t|
   37 x |SZU2.EN~a|
   36 x |U4x1(N01)|
   35 x |3(N57).PIRIG~b1|


And for the final task in this primer, we ask Text-Fabric to export a frequency list of both signs and quads in a separate file.

In [34]:
reportDir = 'reports'
os.makedirs(reportDir, exist_ok=True)

In [35]:
def writeFreqs(fileName, data, dataName):
    print(f'There are {len(data)} {dataName}s')

    for (sortName, sortKey) in (
        ('alpha', lambda x: (x[0], -x[1])),
        ('freq', lambda x: (-x[1], x[0])),
    ):
        with open(f'{reportDir}/{fileName}-{sortName}.txt', 'w') as fh:
            for (item, freq) in sorted(data.items(), key=sortKey):
                if item != '':
                    fh.write(f'{freq:>5} x {item}\n')

This shows up as a pair of files named "quad-signs-alpha.txt" and "quad-signs-freq.txt" and if we copy a few pieces of the quad-signs-freq.txt file here, they look something like this:

    29413 x ...
    12983 x 1(N01)
    6870 x X
    3080 x 2(N01)
    2584 x 1(N14)
    1830 x EN~a
    1598 x 3(N01)
    1357 x 2(N14)
    1294 x 5(N01)
    1294 x SZE~a
    1164 x GAL~a

Only much farther down the list do we see signs and quads interspersed; here are the signs/quads around 88 occurrences:

    88 x NIMGIR
    88 x NIM~a
    88 x SUG5
    86 x EN~b
    86 x NAMESZDA
    86 x |GI&GI|
    85 x GU
    85 x |GA~a.ZATU753|
    84 x BAD~a
    84 x NA2~a
    84 x ZATU651
    84 x |1(N58).BAD~a|
    83 x ZATU759