For the main tutorial go to [start](../start.ipynb)

---

# Determinatives

Determinatives play a role in finding nouns.

Here we do some basic checks.

In [1]:
import collections

from tf.app import use

In [3]:
A = use('oldbabylonian:clone', checkout="clone", hoist=globals())
# A = use('oldbabylonian', hoist=globals())

Using TF-app in ~/github/annotation/app-oldbabylonian/code:
	repo clone offline under ~/github (local github)
Using data in ~/github/Nino-cunei/oldbabylonian/tf/1.0.5:
	repo clone offline under ~/github (local github)
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API


# Overview of the determinatives

Which readings occur as determinative, and how often?

The following cell gets all signs that are a determinative.

In [4]:
dets = F.det.s(1)
len(dets)

6796

*In words:* for the feature `det`, give all nodes that have value 1.

From the feature docs (there is a link to it after the incantantion above), we know that the determiners are exactly those signs with `det=1`.

We are going to count them by reading, and for each reading we want to list one or two examples.

So we make a mapping from readings to sign nodes that have that reading.

Theoretically, we could have something like `{AN}`, i.e. not a *reading* but a *grapheme*.

Let's check whether there are determinatives that do not have a reading.

In [5]:
noReading = collections.Counter()
for s in dets:
  if not F.reading.v(s):
    noReading[F.grapheme.v(s)] += 1
len(noReading)

0

No, we do not have to reckon with determinatives without reading.

In [6]:
readings = collections.defaultdict(list)

for s in dets:
  readings[F.reading.v(s)].append(s)
  
len(readings)

40

We have a limited set of distinct determinatives. Lets print them by frequency.
Remember that the keys of the `readings` dictionary are concrete readings, and the value for each key is the list of nodes
that have that reading. These are all sign nodes for which `det=1`.

We are going to sort a dictionary based on frequencies.

That kind of sortings occurs quite often.

So we define a few *key* functions for that.

A key function is a function that maps a value to another value.
If we pass a key function to `sorted()`, the values will be sorted by their key values, i.e. the values assigned to
them by the key function.

If we have a `collections.Counter()` that maps things to amounts, here is a suitable key function:

In [7]:
def freqKey(x):
  return (-x[1], x[0])

Explanation: `x` is an item of a counter, i.e. a pair `(thing, amount)`. So the `x[0]` is the thing, and the `x[1]` is the amount.

We want things with a high amount on top, so we should sort on `-x[1]`. If things have the same amount, we want to sort on the thing
itself: `x[0]`. We achieve this by sorting on the pair `(-x[1], x[0])`.

Sometimes we want to sort a dictionary that maps things to sets or lists of nodes, and we want the things that map to large sets first.

We do this with a slight adaptation of the `freqKey()` function:

In [8]:
def freqKeyL(x):
  return (-len(x[1]), x[0])

Because `x[1]` is now a set or list, we need to take its length.

In [9]:
for (reading, signNodes) in sorted(
  readings.items(),
  key=freqKeyL
):
  print(f'{reading:<10} {len(signNodes):>4}')

d          4078
disz       1204
ki          884
gesz        271
sar          84
mi2          46
gi           39
lu2          33
tug2         31
iri          28
na4          17
kusz         12
uruda        11
urudu         7
duru5         5
ku6           4
muszen        4
u2            4
gar           3
szim          3
a             2
am            2
an            2
ap            2
dug           2
id2           2
p             2
sza3          2
ar            1
asz           1
at            1
gisz          1
i7            1
ir            1
iti           1
ku            1
la            1
munus         1
ti            1
uzu           1


Now we show the first example in the corpus for each reading.

If we have tuples of nodes, we can display them with `A.table()`.

So let's make a tuple of nodes: for each example, we take the sign node and its encompassing line node.

In [10]:
examples = []

for (reading, signNodes) in sorted(
  readings.items(),
  key=freqKeyL,
):
  exampleSign = signNodes[0]
  exampleLine = L.u(exampleSign, otype='line')[0]
  examples.append((exampleLine, exampleSign))

A.table(examples)

n,p,line,sign
1,P509373 obverse:1,[a-na] _{d}suen_-i-[din-nam],_{d}
2,P509373 obverse:6,{disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_,{disz}
3,P509373 obverse:10,_a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki},{ki}
4,P389958 obverse:7',_5(disz) ma-na {gesz}zu2 geszimmar_,{gesz}
5,P510537 obverse:5,[x x (x)] SU{sar} sza qa2-ti {disz}mi#-[x x x (x) (x)],{sar}
6,P510559 obverse:7,{mi2}be-le-su#-nu# a-ha#-ti#,{mi2}
7,P510569 obverse:15,asz-szum# {gi}ku-ru-pe2-e asz-szum ha nu x,{gi}
8,P510589 obverse:3,um-ma a-na-ku-ma _{lu2}na-asz#-[bar_],_{lu2}
9,P510618 reverse:13,_{tug2}bar-si-ig_ ta AH? ti i ma,_{tug2}
10,P510700 obverse:5,i-na {iri}dag-la-a _gu4 hi-a_-ni ka-su2-ma,{iri}


It seems that determinatives only occur at the beginning or at the start of words. Let's check that.

Search is handy here: we look for a determinative sign between the first and the last sign of a word.

In [11]:
query = '''
word
  =: sign det#
  < sign det
  < sign det#
  :=
'''

A little bit of explanation:

We look for a word with three signs in it: one at the start (`=:`), one further on (`<`), and one still further on (`<`) that
is also at the end (`:=`).

The signs at the start and at the end should *not* be determinatives (`det#`), and the middle sign should be a determinative (`det`).

In [12]:
results = A.search(query)

  0.52s 913 results


Yes there are, quite many even. Let's show some:

In [13]:
A.table(results, end=10)

n,p,word,sign,sign.1,sign.2
1,P509376 obverse:9,na-bi-{d}suen,na-,{d},suen
2,P509376 reverse:10,na-bi-{d}suen,na-,{d},suen
3,P509377 obverse:8,kin-{d}inanna_,kin-,{d},inanna_
4,P510526 obverse:3,geme2-{d}utu-ma,geme2-,{d},ma
5,P510527 obverse:11,ip-qu2-{d}na-bi-um,ip-,{d},um
6,P510527 reverse:8,e-tel-pi4-{d}na-bi-um,e-,{d},um
7,P510530 reverse:15,ARAD#-{d}ul-masz-szi-tum,ARAD#-,{d},tum
8,P510530 reverse:18,ARAD-{d}ul-masz-szi-tum,ARAD-,{d},tum
9,P510530 reverse:19,ARAD-{d}ul-masz-szi-tum,ARAD-,{d},tum
10,P510531 obverse:5,ip-qu2-{d}na#-bi-um,ip-,{d},um


The question arises: can a word be divided into parts my several determinatives inside it?

We adjust the query:

In [14]:
query2 = '''
word
  =: sign det#
  < sign det
  <: sign det#
  < sign det
  < sign det#
  :=
'''

In [15]:
results2 = A.search(query2)

  0.87s 2 results


In [16]:
A.table(results2)

n,p,word,sign,sign.1,sign.2,sign.3,sign.4
1,P365956 obverse:4,kar-{d}utu{ki}-mesz_,kar-,{d},utu,{ki}-,mesz_
2,P313345 obverse:2,_dumu-{id2}UD.KIB.NUN{ki}_-ma,_dumu-,{id2},UD.,{ki}_-,ma


This seems to be really exceptional.

# Words starting or ending with specific determinatives

We want to single out words that start with a determinative of a certain class and words that end with a determinative of a certain class.

We define those classes by means of a few variables, and then construct a query with them.

In [17]:
detStart = '''
  d
  disz
  gesz
  gi
  mi2
  na4
  iri
  lu2
  tug2
  u2
'''.strip().split()

detEnd = '''
  ki
  sar
  muszen
  ku6
'''.strip().split()

print(f'detStart={str(detStart)}')
print(f'detEnd={str(detEnd)}')

detStart=['d', 'disz', 'gesz', 'gi', 'mi2', 'na4', 'iri', 'lu2', 'tug2', 'u2']
detEnd=['ki', 'sar', 'muszen', 'ku6']


In order to use this in a query, we need a `|`-separated string of those values.

In [18]:
detStartCriterion = '|'.join(detStart)
detEndCriterion = '|'.join(detEnd)

print(f'detStartCriterion={detStartCriterion}')
print(f'detEndCriterion={detEndCriterion}')

detStartCriterion=d|disz|gesz|gi|mi2|na4|iri|lu2|tug2|u2
detEndCriterion=ki|sar|muszen|ku6


In [19]:
queryStart = f'''
word
  =: sign det reading={detStartCriterion}
'''

queryEnd = f'''
word
  := sign det reading={detEndCriterion}
'''

In [20]:
resultsStart = A.search(queryStart)
resultsEnd = A.search(queryEnd)

  0.21s 4342 results
  0.18s 890 results


The results are pairs of a word and a sign (the determinative).

Let's just look at the words. Are the words found by the first query distinct from the words found by the second query?

We count the number of words in the intersection.

In [21]:
wordsStart = {w for (w, s) in resultsStart}
wordsEnd = {w for (w, s) in resultsEnd}
wordsBoth = wordsStart & wordsEnd
len(wordsBoth)

31

Just for expository reasons, we are going to show the words with determinatives at both ends in two ways:

1. By picking the result sets and combining them
2. By writing a modified query.

### Combine result sets

We need to get the word nodes of the intersection and also there first and last signs.

In [22]:
resultsBoth1 = []
for w in sorted(wordsBoth):
  signs = L.d(w, otype='sign')
  first = signs[0]
  last = signs[-1]
  resultsBoth1.append((w, first, last))
A.table(resultsBoth1)

n,p,word,sign,sign.1
1,P510644 reverse:5,{disz}dumu-zimbir{ki},{disz},{ki}
2,P510701 obverse:10,{d}nin-iri?{ki},{d},{ki}
3,P510744 obverse:4,{iri#}ra-pi2-qum{ki},{iri#},{ki}
4,P413253 reverse:11',{iri}eresz2{ki},{iri},{ki}
5,P365956 obverse:9,{iri}kisz{ki},{iri},{ki}
6,P313376 obverse:4,{disz}szi#-ma#-at-_uri2{ki#}_,{disz},{ki#}_
7,P313393 obverse:5,{iri}za-mi-ri-i{ki},{iri},{ki}
8,P313407 obverse:5,{iri}za-mi-ri-i{ki},{iri},{ki}
9,P345556 reverse:10,{iri}a-ta-szum#{ki},{iri},{ki}
10,P345558 obverse:9,{iri}sza3-gu4{ki}_,{iri},{ki}_


### Modified query

We show these cases by means of a third query:

In [23]:
queryBoth = f'''
word
  =: sign det reading={detStartCriterion}
  sign det reading={detEndCriterion}
  :=
'''

Note that we do not say:

```python
queryBoth = f'''
word
  =: sign det reading={detStartCriterion}
  := sign det reading={detEndCriterion}
'''
```

The meaning of this would be that the second sign ends at the same place as the first sign ends.

But we want that the second sign ends at the same place as the word ends.

The way to achieve that, is to place the `:=` after the second sign on a line of its own.

The lonely relation symbol on such a last line holds between the preceding atom (the second sign here)
and the latest embedder (the word here).

In [24]:
resultsBoth2 = A.search(queryBoth)

  0.34s 31 results


In [25]:
resultsBoth1 == resultsBoth2

True

Equal results.

# Noun compilation

We want to compile a list of nouns found in this way:
we take all words that have a specified start determiner at the start or a specified end determiner at the end (or both).

From all those words, we strip the start and/or end determiners. 
What remains, we store in a dictionary, with the form as key, and the word nodes that exhibit that form as value.

When we compute the form, we pick the basic info of a sign, not the full atf with flags and brackets.

Looking at the
[feature documentation, section Text-Formats](https://github.com/Nino-cunei/oldbabylonian/blob/master/docs//transcription.md#text-formats), 
we choose `text-orig-rich` for our representation.

In [26]:
nouns = collections.defaultdict(list)

for (w, s) in resultsStart + resultsEnd:
  signs = L.d(w, otype='sign')
  isDetStart = F.det.v(signs[0])
  isDetEnd = F.det.v(signs[-1])
  # strip determiners
  nounSigns = signs[(1 if isDetStart else 0):(-1 if isDetEnd else len(signs))]
  
  noun = T.text(nounSigns, fmt='text-orig-rich').strip()
  
  nouns[noun].append(w)
  
print(f'{len(nouns)} distinct plain words')  

1604 distinct plain words


We show a frequency list of the top 20 plain words:

In [27]:
showLimit = 20

for (i, (noun, ws)) in enumerate(sorted(
  nouns.items(),
  key=freqKeyL,
)[0:showLimit]):
  print(f'{i + 1:<2} {noun:<20} {len(ws):>4} x')

1  utu                   776 x
2  marduk                626 x
3  babila₂⁼              180 x
4  zimbir⁼               154 x
5  suen-i-din-nam        114 x
6  ma₂                    47 x
7  utu-ha-zi-ir           41 x
8  iri⁼                   39 x
9  kiri₆                  39 x
10 x                      36 x
11 larsa⁼                 31 x
12 ban₂                   29 x
13 inanna                 27 x
14 ka₂-dingir-ra⁼         24 x
15 d⁼suen-i-din-nam       23 x
16 na-bi-um-at-pa-lam     23 x
17 kiš⁼                   21 x
18 suen-i-di₂-nam         20 x
19 marduk-na-ṣi-ir        19 x
20 marduk-mu-ša-lim-ma    17 x


Note noun 15: it starts with a determiner! Why hasn't it be stripped?

We'll have a closer look and make a table of examples-with-context of the top 20 plain words.

We recompute the signs to be stripped away and color them red and green.
The remaining parts of the words we color yellow.

In [28]:
highlights = {}
nounExamples = []

for (noun, ws) in sorted(
  nouns.items(),
  key=freqKeyL,
)[0:showLimit]:
  w = ws[0]
  
  # get to the signs and study the determiners
  
  signs = L.d(w, otype='sign')
  isDetStart = F.det.v(signs[0])
  isDetEnd = F.det.v(signs[-1])
  nounSigns = signs[(1 if isDetStart else 0):(-1 if isDetEnd else len(signs))]
  
  # add nodes for highlighting
  
  if isDetStart:
    highlights[signs[0]] = 'red'
  if isDetEnd:
    highlights[signs[-1]] = 'lightgreen'
  for ns in nounSigns:
    highlights[ns] = 'yellow'
    
  # end highlighting

  line = L.u(w, otype='line')[0]
  nounExamples.append((line, w))

In [29]:
A.table(nounExamples, highlights=highlights)

n,p,line,word
1,P509373 obverse:4,_{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim],_{d}utu_
2,P509373 obverse:4,_{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim],_{d}[marduk]_
3,P509376 obverse:7,{disz}na-bi-{d}suen a-na babila2{ki} i-li-a-am#-ma,babila2{ki}
4,P345569 obverse:6,i-na _{iri}zimbir{ki}_ ni-in-na-am-ru,_{iri}zimbir{ki}_
5,P509373 obverse:1,[a-na] _{d}suen_-i-[din-nam],_{d}suen_-i-[din-nam]
6,P510546 obverse:10,_1(disz) {gesz}ma2_ u3 _1(u) erin2_-am,{gesz}ma2_
7,P509373 obverse:11,sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum,_{d}utu_-ha-zi-[ir]
8,P510534 obverse:11,i-nu-ma _{gesz}apin hi-a_ sza _iri{ki}_ [(x) (x)],_iri{ki}_
9,P510762 obverse:11,u3 _{gesz}kiri6_ e-tel-pi4-{d}marduk,_{gesz}kiri6_
10,P481192 obverse:6',szu-lum-ka ma-har {d}[x u3] {d}[x lu da-ri],{d}[x


Looking again at noun 15, we see that it starts with *two* determinatives, and only the first one got stripped.