For the main tutorial go to [start](../start.ipynb)

---

# Between `um-ma` and `-ma`

What happens between `um-ma` and `ma` can help to identify proper nouns.

More precisely: we are looking for single words, immediately following the sign sequence `um-ma`, and where
the word itself ends in `-ma`.

In [1]:
import collections

from tf.app import use

In [2]:
A = use("Nino-cunei/oldbabylonian", hoist=globals())

This is Text-Fabric 9.3.1
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

67 features found and 0 ignored


The following query captures the intention of finding words after `um-ma` ending in `-ma`.

See [basic relations](https://annotation.github.io/text-fabric/tf/about/searchusage.html#relational-operators)
for the meaning of `<:` and `:=`.
You find them under **slot comparison**.

In [3]:
query = """
line
   sign reading=um
   <: sign reading=ma
   <: word
     := sign reading=ma
"""
results = sorted(S.search(query))
print(f"{len(results)} results")

1472 results


In [4]:
A.table(results, start=1000, end=1010, fmt="layout-orig-rich")

n,p,line,sign,sign.1,word,sign.2
1000,P386007 obverse:6,um-ma šu-u₂-ma,um-,ma,šu-u₂-ma,ma
1001,P386008 obverse:3,um-ma ha-am-mu-ra-pi₂-ma,um-,ma,ha-am-mu-ra-pi₂-ma,ma
1002,P386009 obverse:3,um-ma ha-am-mu-ra-pi₂-ma,um-,ma,ha-am-mu-ra-pi₂-ma,ma
1003,P386009 obverse:7,um-ma šu-ma,um-,ma,šu-ma,ma
1004,P386010 obverse:3,um-ma ha-am-mu-ra-pi₂-ma,um-,ma,ha-am-mu-ra-pi₂-ma,ma
1005,P386010 obverse:6,um-ma šu-nu-ma,um-,ma,šu-nu-ma,ma
1006,P386011 obverse:3,um-ma ha-am-mu-ra-pi₂-ma,um-,ma,ha-am-mu-ra-pi₂-ma,ma
1007,P386011 obverse:4,dišdnanna-tum ki-a-am iq-bi-a-am um-ma šu-ma,um-,ma,šu-ma,ma
1008,P386012 obverse:3,um-ma ha-am-mu-ra-pi₂-ma,um-,ma,ha-am-mu-ra-pi₂-ma,ma
1009,P386012 obverse:4,aš-šum ša ta-aš-pu-ra-am um-ma at-ta-ma,um-,ma,at-ta-ma,ma


In [5]:
introNouns = collections.Counter()

for (line, um, ma1, word, ma2) in results:
    strippedWord = L.d(word, otype="sign")[:-1]
    introNouns[T.text(strippedWord, fmt="text-orig-rich")] += 1

len(introNouns)

538

In [6]:
for (proper, amount) in sorted(
    introNouns.items(),
    key=lambda x: (-x[1], x[0]),
)[0:100]:
    print(f"{proper:<30} {amount:>4} x")

ha-am-mu-ra-pi₂-                126 x
šu-                              86 x
šu-u₂-                           86 x
at-ta-                           62 x
a-na-ku-                         61 x
at-ta-a-                         51 x
a-na-ku-u₂-                      39 x
šu-nu-                           28 x
a-hu-um-                         22 x
a-bi-e-šu-uh-                    17 x
d⁼marduk-mu-ša-lim-              17 x
at-ti-                           15 x
lu₂-igi-sa₆-                     13 x
ṣi-li₂-d⁼utu-                    13 x
am-mi-ṣa-du-qa₂-                 12 x
sa-am-su-i-lu-na-                12 x
d⁼utu-na-ṣi-ir-                  11 x
d⁼iškur-ra-bi-                   10 x
d⁼marduk-na-ṣi-ir-               10 x
d⁼suen-i-din-nam-                10 x
at-tu-nu-                         9 x
ši-                               9 x
d⁼na-bi-um-na-ṣi-ir-              8 x
a-wi-il-dingir-                   7 x
d⁼na-bi-um-ma-lik-                7 x
e-tel-pi₄-d⁼marduk-               7 x
gi-mil-d⁼mar

Same exercise, now based on cuneiform unicode:

In [7]:
introNounsU = collections.Counter()

for (line, um, ma1, word, ma2) in results:
    strippedWord = L.d(word, otype="sign")[:-1]
    introNounsU[T.text(strippedWord, fmt="text-orig-unicode")] += 1

len(introNounsU)

528

Less words. Presumably, some words that are different in ASCII-reading are equal in cuneiform unicode.

In [8]:
for (proper, amount) in sorted(
    introNounsU.items(),
    key=lambda x: (-x[1], x[0]),
)[0:10]:
    print(f"{proper:<30} {amount:>4} x")

𒄩𒄠𒈬𒊏𒁉                           126 x
𒋗                                86 x
𒋗𒌑                               86 x
𒀜𒋫                               62 x
𒀀𒈾𒆪                              61 x
𒀜𒋫𒀀                              51 x
𒀀𒈾𒆪𒌑                             39 x
𒋗𒉡                               28 x
𒀀𒄷𒌝                              22 x
𒀀𒁉𒂊𒋗𒄴                            17 x


But these are the wrong shapes: we need the Santakku font.

Instead of counting the word strings, we collect the word nodes:

In [9]:
introNounsU = collections.defaultdict(set)

for (line, um, ma1, word, ma2) in results:
    introNounsU[F.symu.v(word)].add(word)

len(introNounsU)

528

In [10]:
fmtr = "layout-orig-rich"
fmtu = "layout-orig-unicode"

html = []
html.append("<table>")

for (proper, words) in sorted(
    introNounsU.items(),
    key=lambda x: (-len(x[1]), x[0]),
)[0:10]:
    firstWord = sorted(words)[0]
    amount = len(words)
    html.append(
        f"""
<tr>
  <td>{A.plain(firstWord, fmt=fmtr, withPassage=False, _asString=True)}</td>
  <td>{A.plain(firstWord, fmt=fmtu, withPassage=False, _asString=True)}</td>
  <td>{amount:>4}</td>
</tr>
"""
    )

html.append("</table>")

A.dh("".join(html))

0,1,2
ha-am-mu-ra-pi₂-ma,𒄩𒄠𒈬𒊏𒁉𒈠,126
šu-ma,𒋗𒈠,86
šu-u₂-ma,𒋗𒌑𒈠,86
at-ta-ma,𒀜𒋫𒈠,62
a-na-ku-ma,𒀀𒈾𒆪𒈠,61
at-ta-a-ma,𒀜𒋫𒀀𒈠,51
a-na-ku-u₂-ma,𒀀𒈾𒆪𒌑𒈠,39
šu-nu-ma,𒋗𒉡𒈠,28
a-hu-um-ma,𒀀𒄷𒌝𒈠,22
a-bi-e-šu-uh-ma,𒀀𒁉𒂊𒋗𒄴𒈠,17
