<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Landscape" data-toc-modified-id="Landscape-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Landscape</a></span></li><li><span><a href="#Starting-points" data-toc-modified-id="Starting-points-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Starting points</a></span><ul class="toc-item"><li><span><a href="#Single-things" data-toc-modified-id="Single-things-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Single things</a></span></li><li><span><a href="#Sets" data-toc-modified-id="Sets-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Sets</a></span><ul class="toc-item"><li><span><a href="#Walk-all-nodes" data-toc-modified-id="Walk-all-nodes-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Walk all nodes</a></span></li></ul></li></ul></li><li><span><a href="#Navigation" data-toc-modified-id="Navigation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Navigation</a></span></li></ul></div>

<img align="right" src="images/ninologo.png" width="200"/>
<img align="right" src="images/dans.png" width="100"/>
<img align="right" src="images/tf.png" width="100"/>

# Steps

The transliterations in the Uruk corpus are a kind of landscape.
In this notebook we take our first steps around.

## Landscape

The transcriptions of the tablets in their TF form is organized in a model of nodes, edges and features.

The things such as tablets, faces, columns, cases, and, at the most basic level, signs, are numbered.
The signs correspond to number 1 ... 100,000+, in the same order as they occur in the corpus.
All other things are built from signs. They have higher numbers.

In TF, we call these numbers *nodes*. 
Like a barcode, this number gives access to a whole bunch of
information about the corresponding object.

For example, cases have a property (in TF we call it a *feature*) called `fullNumber`. 
It contains the hierarchical number found at the start of
the numbered lines in the transcription.

If the node (barcode) for a line is `n`, we can find its hierarchical number by saying

```
F.fullNumber.v(n)
```

In words, it reads as:

* `F`: I want to look up a `F`eature
* `fullNumber`: the name of the feature
* `.v`: I want the value of that feature
* `(n)`: for the given node `n`

Seen in this way, the data is like a gigantic spreadsheet of half a million rows (the nodes),
and a few dozen columns (the features).

There is a bit more to it, since the nodes can be grouped together in ways we will see later on.

The complete reference information is in the
[Feature docs](https://github.com/Nino-cunei/uruk/blob/master/docs/transcription.md).

# Incantation

We start the notebook by the familiar incantation.

In [1]:
import sys, os
LOC = ('~/github', 'Nino-cunei/uruk', 'start')
sys.path.append(os.path.expanduser(f'{LOC[0]}/{LOC[1]}/programs'))
from cunei import Cunei
CN = Cunei(*LOC)
CN.api.makeAvailableIn(globals())

Found 2095 ideograph linearts
Found 2724 tablet linearts
Found 5495 tablet photos


**Documentation:** <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/about.md" title="{provenance of this corpus}">Uruk IV-III (v1.0)</a> <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/transcription.md" title="{source} feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/Nino-cunei/uruk/blob/master/docs/cunei.md" title="cunei api documentation">Cunei API</a> <a target="_blank" href="https://github.com/Dans-labs/text-fabric/wiki/api" title="text-fabric-api">Text-Fabric API</a>


This notebook online:
<a target="_blank" href="http://nbviewer.jupyter.org/github/Nino-cunei/tutorials/blob/master/start.ipynb">NBViewer</a>
<a target="_blank" href="https://github.com/Nino-cunei/tutorials/blob/master/start.ipynb">GitHub</a>


## Starting points

We need a place to begin. That could be a single tablet, or case, or a set of signs.

### Single things

We start with looking up a tablet by its *P-number*.

In [3]:
pNum = 'P005381'
tablet = T.nodeFromSection((pNum,))
tablet

148166

**Explanation**

We have imposed a division in sections on the Uruk corpus.
Three levels:
* tablets;
* columns;
* line.

With `T` we get access to section functions.

If we identify a section, by specifying its tablet, column number, and line number,
`T` will give us back the node (barcode) of that section.

If we specify just a P-number, we get the node of the corresponding tablet.

If we specify a P-number and a column number, we get the node of the corresponding column.

If we, additionally, specify a line number, we get the node of the line.

**Warning**

The expression `(pNum, )` is the Python way of denoting a tuple with one element.
Without the awkaward comma the brackets are just grouping brackets, not tuple brackets.
So if you say

```
tablet = T.nodeFromSection((pNum))
```

things go horribly wrong.

Here is its transcription, because a node is just a number, not very informative to us humans.

In [4]:
CN.getSource(tablet)

['&P005381 = MSVO 3, 70',
 '#atf: lang qpc ',
 '@obverse ',
 '@column 1 ',
 '1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a ',
 '1.b. 3(N19) , |GISZ.TE| ',
 '2. 1(N14) , NAR NUN~a SIG7 ',
 '3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a ',
 '@column 2 ',
 '1. 3(N04) , |GISZ.TE| GAR |SZU2.((HI+1(N57))+(HI+1(N57)))| GI4~a ',
 '2. , GU7 AZ SI4~f ',
 '@reverse ',
 '@column 1 ',
 '1. 3(N14) , SZE~a ',
 '2. 3(N19) 5(N04) , ',
 '3. , GU7 ',
 '@column 2 ',
 '1. , AZ SI4~f ']

And, to be even more hands on, we show the lineart:

In [6]:
CN.lineart(tablet, width=200)

Now let's check out the columns and lines.

In [5]:
column = T.nodeFromSection((pNum, 'obverse:1'))
CN.getSource(column)

['@column 1 ',
 '1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a ',
 '1.b. 3(N19) , |GISZ.TE| ',
 '2. 1(N14) , NAR NUN~a SIG7 ',
 '3. 2(N04)# , PIRIG~b1 SIG7 URI3~a NUN~a ']

Note that you have to include the *face*-name into the column number!

Now lines:

In [6]:
line = T.nodeFromSection((pNum, 'obverse:1', '1'))
print('\n'.join(CN.getSource(line, lineNumbers=True)))

85116: 1.a. 2(N14) , SZE~a SAL TUR3~a NUN~a 
85117: 1.b. 3(N19) , |GISZ.TE| 


Note that we have printed the lines not as a Python list, but as a string,
where we have joined the lines in the list with newlines.

We also wanted to see the line numbers in the source files.
These source files are also in the data repo, e.g.
[uruk-iii](https://github.com/Nino-cunei/uruk/blob/master/sources/cdli/transcriptions/1.0/uruk-iii.txt).

We want to go one step further. We want to get the node corresponding to
individual lines in the transliterations.
These correspond to cases which are themselves not divided into cases
(*terminal* cases).

Text-Fabric itself only knows three section levels, we cannot use `T.sectionFromNode()`
for this. 
Text-Fabric is a generic package, which has been used for various other
corpora, such as the Hebrew Bible. It does not know anything of (proto)cuneiform data.

But on top of Text-Fabric we are using a bunch of dedicated cuneiform functions, and
one of them mimicks `T.nodeFromSection`:

In [7]:
case = CN.nodeFromCase((pNum, 'obverse:1', '1.b'))
print('\n'.join(CN.getSource(case, lineNumbers=True)))

85117 1.b. 3(N19) , |GISZ.TE| 


### Sets

Many times we want to start with whole sets.
For example all composite signs, also known as *quads*:

In [8]:
quads = F.otype.s('quad')
len(quads)

3794

This reads as:

* `F` give me the features
* `otype` I want the feature that gives the type of nodes
* `s('quad')` I want the nodes whose `otype` value is `'quad'`
  i.e. the nodes that *support* `otype`-value `'quad'`
  
As we see, there are nearly 4000 of them.

Later, we'll see where they are.

In [9]:
primes = F.prime.s(2)
len(primes)

1

In the same manner, we want to see all things with a double prime.
There is only one.
We pick up a bit of additional information, but later we'll see where it is.

In [10]:
for n in primes:
    print(n, F.otype.v(n))

56360 sign


It is the sign with node (barcode) 56360.

#### Walk all nodes

If we want to go over all nodes, in a sensible order, we do it like this:

In [11]:
count = 0

for n in N():
    count += 1

count

304199

Here we show the first 20 nodes with their type:

In [12]:
limit = 20
for (i, n) in enumerate(N()):
    if i >= limit: break
    print(f'{n:>6} {F.otype.v(n)}')

143889 tablet
159709 comment
     1 sign
150253 face
170799 column
217575 line
253417 case
184822 cluster
     2 sign
     3 sign
     4 sign
170800 column
217576 line
253418 case
     5 sign
     6 sign
     7 sign
184823 cluster
     8 sign
143890 tablet


As you see, the order is not the sequence order of the nodes.
You see first things (in the corpus) first, and if several things start at the same
position, the bigger things come first.

## Navigation

After our starting points, we would like to visit the neighbourhood.
We want to go from nodes to the ones in which they lie embedded, and back.
We want to go to the next node on the same level and back.

We do that with `L.` functions.

* `L.d()` goes "down": from enbedder to embeddee;
* `L.u()` goes "up": from embeddee to embedder;
* `L.p()` goes "previous": to the first left sibling;
* `L.n()` goes "next": to the first right sibling.

Above, we collected some "interesting" nodes, but we had not yet a way to find out where
they were.

Now we have.

Remember the double prime?

In [13]:
caseDouble = L.u(primes[0], otype='case')[0]
print('\n'.join(CN.getSource(caseDouble, lineNumbers=True)))

51246 3.b. 3(N41) 1(N24'')# , [TAR~a] 


So we can go to the source, to the exact line number!

We can also show the whole tablet:

In [14]:
tabletDouble = L.u(primes[0], otype='tablet')[0]
print('\n'.join(CN.getSource(tabletDouble, lineNumbers=True)))

51220 &P411604 = CUSAS 21, 074
51221: #atf: lang qpc 
51223: @obverse 
51224: @column 1 
51225: 1. 1(N01) , AN NUNUZ~a1 ZATU788# 
51226: 2. 1(N01) 1(N39~a) MUD NA~a# BU~a 
51227: 3. 1(N01) 1(N39~a) U4# KU6~a A 
51228: 4. 1(N01) 1(N39~a) AB~a SZE~a 
51229: 5. 3(N01) , BAHAR2~b 
51230: 6. 2(N01) [...] 
51231: 7. 2(N01) [...] MUD# [NA~a] BU~a# 
51232: @column 2 
51233: 1.a. 2(N01) 1(N57) , NIN 
51234: 1.b. 1(N39~a) TAR~a 
51235: 2.a. 4(N01) , ZATU694 KU6~a KISAL~b1 
51236: 2.b. 2(N39~a) TAR~a 
51237: 3. 1(N39~a) AB~a SZUBUR BULUG3 SZE~a 
51238: 4. , GIBIL GU7 
51239: 5. 2(N04) 1(N41) , U4 
51240: 6. 1(N19) , NAM~a NA2~a 
51241: @column 3 
51242: 1.a. 3(N01) , 3(N57) [...] 
51243: 1.b. 2(N39~a) 1(N24) , [TAR~a] 
51244: 2. 1(N19) , AB~a 
51245: 3.a. 1(N19) 1(N04) , AN NUNUZ~a1# ZATU788 X 
51246: 3.b. 3(N41) 1(N24'')# , [TAR~a] 
51247: 4. , GU7 
51248: @reverse 
51249: $ blank space 


The `L.u()` function takes a node as starting point and looks up all nodes that embed it.
You can restrict those to nodes of a certain type, as we did by `otype='case'`.
It yields a list of nodes, so if you want a single embedder, you have to select one,
as we did by `[0]`.

Let us do the same with the first 10 *quads* (composite signs).

For each such quad we assemble the following pieces of information:

* the P-number of the tablet
* the transcription line number
* a representation of the quad
* the list of signs of which the quad is composed.

In [15]:
for q in quads[0:10]:
    t = L.u(q, otype='tablet')[0]
    c = L.u(q, otype='case')[0]
    lineNum = F.srcLnNum.v(c)
    caseNum = CN.caseFromNode(c)[2]

    pNum = T.sectionFromNode(t)[0]
    qRep = CN.atfFromQuad(q)
    signs = L.d(q, otype='sign')
    signReps = ' , '.join([CN.atfFromSign(s) for s in signs])
    print(f'{lineNum:>5} {pNum} {caseNum:<5} {qRep:<15} with {signReps}')

   27 P006428 5     |DUG~bx1(N57)|  with DUG~b , 1(N57)
   66 P448702 1'    |U4x1(N01)|     with U4 , 1(N01)
   80 P448703 1     |U4.1(N08)|     with U4 , 1(N08)
   81 P448703 2     |U4.1(N08)|     with U4 , 1(N08)
   82 P448703 3     |U4.1(N08)|     with U4 , 1(N08)
   82 P448703 3     |GI&GI|         with GI , GI
   83 P448703 4     |U4.1(N08)|     with U4 , 1(N08)
   84 P448703 5     |U4.1(N08)|     with U4 , 1(N08)
  142 P482083 2a'   |U4x3(N01)|     with U4 , 3(N01)
  161 P499393 2     |LAGAB~bxX|     with LAGAB~b , X


Admittedly, this was a bit advanced. We used things we haven't explained yet.

* there is also a `CN.caseFromNode()`: it gives you section headings
  if you give it a node. (exactly opposite to `CN.nodeFromCase()`).
* likewise, `T.sectionFromNode()` is opposite to `T.nodeFromSection()`.
* we have functions to generate ATF transliterations for nodes, especially for
  quads and signs: 
  * `CN.atfFromQuad(n)` gives you the transliteration of the
    *quad* identified by node (barcode) `n`;
  * `CN.atfFromSign(n)` likewise for *sign*s.

With our mastery of starting points and navigation,
we really do not have to see the actual node numbers (barcodes) anymore.

We'll see less and less of them, but they are the invisible glue that
holds the whole corpus together.

# See also

[jumps](jumps.ipynb)

Because there are more ways to travel ...

# Next

[signs](signs.ipynb)

*Back to the basics ...*

All chapters:
[start](start.ipynb)
[imagery](imagery.ipynb)
[steps](steps.ipynb)
[signs](signs.ipynb)
[quads](quads.ipynb)
[jumps](jumps.ipynb)