<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/ninologo.png" width="128"/>
<img align="right" src="images/dans.png" width="128"/>

Old Babylonian in the browser:

```
text-fabric oldbabylonian
```

In [1]:
import sys, os, collections
from tf.app import use

# Incantation

In [None]:
A = use('oldbabylonian', hoist=globals())
silentOff()

# Counting

In [None]:
indent(reset=True)
info('Counting nodes ...')

i = 0
for n in N(): i += 1

info('{} nodes'.format(i))

# Node types

In [None]:
F.otype.slotType

In [None]:
F.otype.maxSlot

In [None]:
F.otype.maxNode

In [None]:
F.otype.all

In [None]:
C.levels.data

In [None]:
for (typ, av, start, end) in C.levels.data:
  print(f'{end - start + 1:>7} {typ}s')

# Feature statistics

## repeats

In [None]:
F.repeat.freqList()

## type (cluster)

In [None]:
F.type.freqList('cluster')

## type (sign)

In [None]:
F.type.freqList('sign')

## flags

In [None]:
F.flags.freqList()

# Word matters

## Top 20 frequent words

In [None]:
for (w, amount) in F.sym.freqList('word')[0:20]:
  print(f'{amount:>5} {w}')

## Hapaxes

In [None]:
hapaxes1 = sorted(x for (x, amount) in F.sym.freqList('word') if amount == 1)
len(hapaxes1)

In [None]:
for w in [w for (w, amount) in F.sym.freqList('word') if amount == 1][0:20]:
  print(f'"{w}"')

### Small occurrence base

The occurrence base of a word are the documents in which occurs.

In [None]:
occurrenceBase = collections.defaultdict(set)

for w in F.otype.s('word'):
  pNum = T.sectionFromNode(w)[0]
  occurrenceBase[F.sym.v(w)].add(pNum)

An overview of how many words have how big occurrence bases:

In [None]:
occurrenceSize = collections.Counter()

for (w, pNums) in occurrenceBase.items():
  occurrenceSize[len(pNums)] += 1
  
occurrenceSize = sorted(
  occurrenceSize.items(),
  key=lambda x: (-x[1], x[0]),
)

for (size, amount) in occurrenceSize[0:10]:
  print(f'tablets {size:>4} : {amount:>5} words')
print('...')
for (size, amount) in occurrenceSize[-10:]:
  print(f'tablets {size:>4} : {amount:>5} words')

Let's give the predicate *private* to those words whose occurrence base is a single document.

In [None]:
privates = {w for (w, base) in occurrenceBase.items() if len(base) == 1}
len(privates)

### Peculiarity of tablets

As a final exercise with words, lets make a list of all documents, and show their

* total number of words
* number of private words
* the percentage of private words: a measure of the peculiarity of the document

In [None]:
docList = []

empty = set()
ordinary = set()

for d in F.otype.s('document'):
  pNum = T.documentName(d)
  words = {F.sym.v(w) for w in L.d(d, otype='word')}
  a = len(words)
  if not a:
    empty.add(pNum)
    continue
  o = len({w for w in words if w in privates})
  if not o:
    ordinary.add(pNum)
    continue
  p = 100 * o / a
  docList.append((pNum, a, o, p))

docList = sorted(docList, key=lambda e: (-e[3], -e[1], e[0]))

print(f'Found {len(empty):>4} empty documents')
print(f'Found {len(ordinary):>4} ordinary documents (i.e. without private words)')

In [None]:
print('{:<20}{:>5}{:>5}{:>5}\n{}'.format(
    'document', '#all', '#own', '%own',
    '-'*35,
))

for x in docList[0:20]:
  print('{:<20} {:>4} {:>4} {:>4.1f}%'.format(*x))
print('...')
for x in docList[-20:]:
  print('{:<20} {:>4} {:>4} {:>4.1f}%'.format(*x))

---

next: [search](search.ipynb)

---

full posTag and pos notebooks on
[annotation/tutorials/oldbabylonian/cookbook](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/oldbabylonian/cookbook)

full tutorial on
[annotation/tutorials/oldbabylonian](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/oldbabylonian)