# Node structure

We look whether certain node types are neatly stacked, meaning that
if a node `N` of that type  is interrupted by an other node `M` of the same type,
`M` is terminated strictly before `N` is terminated.

If this is the case for a node type, we say that the nodes in that type *stack*.
If it is not the case for a certain node `N`, we say that `N` does not stack.

We inspect the BHSA and see which of there node types do not stack.

In [9]:
import os
import collections

from tf.app import use

In [2]:
A = use('bhsa:clone', checkout='clone', hoist=globals())

In [3]:
def getAcross(nodeType):
    starts = {}
    ends = {}
    gapped = []

    for n in F.otype.s(nodeType):
        slots = E.oslots.s(n)
        (first, last) = (slots[0], slots[-1])
        starts.setdefault(first, set())
        ends.setdefault(last, set())
        starts[first].add(n)
        ends[last].add(n)
        if last - first + 1 > len(slots):
            gapped.append(n)

    across = set()

    for n in gapped:
        slots = E.oslots.s(n)
        starters = set()
        enders = set()
        for s in slots:
            if s in starts:
                for m in starts[s]:
                    if m != n:
                        starters.add(m)
            if s in ends:
                for m in ends[s]:
                    if m != n:
                        enders.add(m)
            if not starters <= enders:
                across.add(n)

    print(f"{nodeType:<20}: {len(gapped):>5} gapped nodes of which {len(across):>5} do not stack")

In [4]:
for nodeType in F.otype.all:
    if nodeType == 'word':
        continue
    getAcross(nodeType)

book                :     0 gapped nodes of which     0 do not stack
chapter             :     0 gapped nodes of which     0 do not stack
lex                 :  6154 gapped nodes of which     0 do not stack
verse               :     0 gapped nodes of which     0 do not stack
half_verse          :     0 gapped nodes of which     0 do not stack
sentence            :   703 gapped nodes of which     0 do not stack
sentence_atom       :     0 gapped nodes of which     0 do not stack
clause              :  2449 gapped nodes of which     0 do not stack
clause_atom         :     0 gapped nodes of which     0 do not stack
phrase              :   670 gapped nodes of which     0 do not stack
phrase_atom         :     0 gapped nodes of which     0 do not stack
subphrase           :     0 gapped nodes of which     0 do not stack


# Outcome

In the BHSA, all node types *stack*.

## False conjecture (i)

We can characterize all outer and inner boundaries of nodes by specifying for each slot the boolean properties start-of-node and end-of-node, saying whether that slot is the start of a node and the end of a node respectively.

Why is this wrong: if we do not tell which node starts/ends, we have to little information.

Example:

`N1 M1 N2 M2`

These are two nodes `N` and `M`, each consisting of two parts, interleaved.
Consider the first word of `N2`. No node starts there, no node ends there, so start-of-node and end-of-node are both `False`, so we have no way of seeing the alternation between `N` and `M` at this point.

Intuitively, we see that this example does not stack and that might cause the trouble. 

## False conjecture (ii)

If a node type *stacks*, the previous conjecture holds true.

But there is a counter example:

`N1 M  N2 P1 N3 P2 N4`.

Consider the first word of `N3`. No node starts or ends here, and the example is stacked.

# Method

In order to characterize inner and outer node boundaries, specify for each slot and each node that has a boundary at that slot:

* the node number of the node
* the kind of boundary: start/end/resume/suspend

In [12]:
def writeBoundaries(outFile):
    boundaries = collections.defaultdict(list)
    
    maxSlot = F.otype.maxSlot

    nodeTypes = {'clause', 'phrase'}
    for nodeType in nodeTypes:
        for n in F.otype.s(nodeType):
            slots = E.oslots.s(n)
            (first, last) = (slots[0], slots[-1])
            boundaries[first].append(('B', n))
            boundaries[last].append(('E', n))
            for (i, s) in enumerate(slots):
                if i > 0 and slots[i - 1] != s - 1:
                    boundaries[s].append(('b', n))
                if i < len(slots) - 1 and slots[i + 1] != s + 1:
                    boundaries[s].append(('e', n))
    with open(outFile, 'w') as fh:           
        for s in range(1, maxSlot + 1):
            bounds = '-'.join("{}{}".format(*x) for x in boundaries[s])
            fh.write(f"{s},{bounds}\n")

In [13]:
outFile = os.path.expanduser('~/Downloads/bhsa-boundaries.csv')
writeBoundaries(outFile)