# Investigation into plene spelling in OB corpus

Plene spelling adds an extra vowel after a syllable, mostly to express a long vowel but for other reasons as well. 

In [52]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [53]:
import collections

from tf.app import use

In [54]:
A = use('oldbabylonian:local', checkout='local', hoist=globals())

Using TF-app in C:\Users\marti/text-fabric-data/annotation/app-oldbabylonian/code:
	rv0.2=#4bb2530bfb94dc93601f8b3df7722cb0e5df7a43 offline under ~/text-fabric-data (local release)
Using data in C:\Users\marti/text-fabric-data/Nino-cunei/oldbabylonian/tf/1.0.4:
	rv1.4=#43c36d148794e3feeb3dd39e105ce6a4df79c467 offline under ~/text-fabric-data (local release)


In this notebook we will be taking a look at initial plene writing, from here on IPW, as defined and discussed in Kouwenberg 2003. IPW occurs mainly in the present of the G and both present and preterite of the D of the primae-aleph verbs. 

The difference in spelling between IPW and 'standard' spelling is expressed in cuneiform orthography like this:

| plene | nonplene |
|------|------|
|  v-vC| v-Cv or vC- |

First let us find all occurences of word-initial v-vC in our corpus:

In [119]:
query = '''
word
/with/
  =: sign reading=a
  <: sign reading~^a[^aeiu][2-4]?$
/or/
  =: sign reading=e
  <: sign reading~^e[^aeiu][2-4]?$
/or/
  =: sign reading=i
  <: sign reading~^i[^aeiu][2-4]?$
/or/
  =: sign reading~^u[2-4]?$
  <: sign reading~^u[^aeiu][2-4]?$
/-/
'''

In [120]:
results = A.search(query)


  0.54s 1122 results


Quite a few results, but while these are all occurences of IPW, we do not need all of these results. 

As any Assyriologist will know, many of these occurences with the initial structure `v-CV` will be `u2-ul`. This writing is an orthographic convention with probably an underlying phonetic form /ul/ rather than /ūl/

So let us run the query again, but this time we take out the word `u2-ul` to end up with a more manageable dataset:

In [121]:
query = '''
word
/with/
  =: sign reading=a
  <: sign reading~^a[^aeiu][2-4]?$
/or/
  =: sign reading=e
  <: sign reading~^e[^aeiu][2-4]?$
/or/
  =: sign reading=i
  <: sign reading~^i[^aeiu][2-4]?$
/or/
  =: sign reading~^u[2-4]?$
  <: sign reading~^u[^aeiu][2-4]?$
/-/
/without/
  =: sign reading=u2
  <: sign reading=ul
  :=
/-/
'''

In [122]:
results = A.search(query)

  0.49s 326 results


Great! That cut down the results considerably. Before we continue, let us also take a look at the amount of words rather than just the occurences:

In [123]:
pleneWords1 = collections.Counter()

for (w,) in results:
    signs = L.d(w, otype='sign')
    pleneWords1[T.text(w, fmt='text-orig-rich')] += 1

len(pleneWords1)

194

Much less words than occurences as expected. Let's take a look at the 20 most occuring words:

In [124]:
for (proper, amount) in sorted(
  pleneWords1.items(),
  key=lambda x: (-x[1], x[0]),
)[0:20]:
  print(f'{proper:<30} {amount:>4}x')

u₄-um                            24x
a-ah                             10x
a-al-la-kam                      10x
i-il-la-kam                       8x
a-ah-ka                           7x
a-ah-ka                           7x
a-an-nam                          6x
i-in-ka                           6x
a-an-na                           5x
a-al                              4x
e-em                              4x
i-il-la-ak                        4x
i-il-la-kam-ma                    4x
i-il-la-ku                        4x
i-ip-pu-šu                        4x
a-ad-da-a                         3x
a-al-la-ak                        3x
a-al-la-ka-ak-kum                 3x
a-ap-pa-al-ka                     3x
i-in-ki                           3x


Next to IPW of verbs our results also show IPW of nouns, we will need to deal with these as well