# Composing Time Constructions

The current method for isolating phrase heads ([here](https://nbviewer.jupyter.org/github/ETCBC/heads/blob/master/phrase_heads.ipynb)) requires strenuous and ineloquent processing of BHSA subphrase relations. The subphrases are not always consistently encoded and suffer from numerous exceptional cases. The result is that the method is rather convoluted and ineloquent.

This notebook will explore the possibility of disconnecting semantic head analysis from the ETCBC subphrase encoding. 

A "semantic" head is the primary content word of a phrase, following Croft's "Primary Information Bearing Unit":

> **The noun and the verb are the PRIMARY INFORMATION_BEARING UNITS (PIBUs) of the phrase and clause respectively. In common parlance, they are the content words. PIBUs have major informational content that functional elements such as articles and [auxiliaries] do not have. (Croft, *Radical Construction Grammar*, 2001, 258; see also Shead, *Radical Frame Semantics and Biblical Hebrew*, 104)**

> **A (semantic) head is the profile equivalent that is the primary information-bearing unit, that is, the most contentful item that most closely profiles the same kind of thing that the whole constituent profiles. (ibid., 259)**

Croft also provides an additional criterion to "profile equivalence":

> **If the criterion of profile equivalence produces two candidates for headhood, the less schematic meaning is the PIBU; that is, the PIBU is the one with the narrower extension, in the formal semantic sense of that term (ibid., 259)**

## Inquiry

Can we isolate semantic phrase heads in BHSA using only the phrase_atom and phrase limits? This question indeed means that we  take the phrase_atom/phrase boundaries for granted. Empirically, the validity of BHSA phrase boundaries needs to be tested. But for now, the exercise of isolating semantic phrase heads could be seen as the first step towards reproducible phrase boundaries.

## Construction-Specific Heads and Roles

A semantic head is the central idea of the phrase and is construction-dependent. In Biblical Hebrew, it could be said that the majority of semantic heads are those words which do not stand in a "genitive" or appositional relationship to another word. But this is not always the case! For instance, in the case of the quantifier כל, the head of the quantified phrase is most usually in the genitive position ("all of"). And there are other cases as well. Thus, one of the efforts in this project is to define headship on a construction by construction basis. A head is modeled as a semantic role in the noun-phrase. It is the central idea, which is somehow modified or specified by the words and phrases which surround it.

In [1]:
import sys
import collections
import pickle
import random
import re
import copy
import numpy as np
import networkx as nx
from datetime import datetime
import matplotlib.pyplot as plt
from Levenshtein import distance as lev_dist
from pprint import pprint

# local packages
from textfabric.load import load_tf
from locations import semvector

# load semantic vectors
with open(semvector, 'rb') as infile: 
    semdist = pickle.load(infile)

# load and configure Text-Fabric
TF, api, A = load_tf()
F, E, T, L = api.F, api.E, api.T, api.L
A.displaySetup(condenseType='phrase', withNodes=True, extraFeatures='st')

This is Text-Fabric 7.8.12
Api reference : https://annotation.github.io/text-fabric/Api/Fabric/

119 features found and 6 ignored
  0.00s loading features ...
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used
  5.40s All features loaded/computed - for details use loadLog()


# Machinery

We could use some machinery to do the hard work of looking in and around a node. In the older approach we used TF search templates. But these are not very efficient at scale, and they are always bound by the limits of the query language. I take another approach here: a set of classes that specify locations and directions within a specified context.

In [2]:
from positions import Positions, PositionsTF, Walker, Dummy

## `Positions(TF)`

The `Positions` class enables concise access to adjacent nodes within a given context. This allows us to write algorithms with query-like efficiency with all of the power of Python. 

This class is instantiated on a word node and can provide contextual look-up data for a given word. For example, given a phrase containing the following word nodes:

> (189681, 189682, **189683**, 189684, 189685, 189686) <br>

representing the following phrase (space separated for clarity):

> ב שׁנת **שׁלשׁים** ו שׁמנה שׁנה

Given that the bolded node, `189683` is our `source` word, we instantiate the class, feeding in the node, the "phrase_atom" string (which is the context we want to search within), and an instance of Text-Fabric (`tf`):

In [3]:
      #    source node    context  TF instance  
      #         |            |       |
P = PositionsTF(189683, 'phrase_atom', A).get

If we want to obtain the word adjacent one space forward, we simply ask `P` for `1`, which gives us the next word in the phrase.

In [4]:
P(1)

189684

If we try to ask for 4 words forward, we go beyond the bounds of the phrase. But `P` handles this by returning nothing:

In [5]:
P(4)

To look back one word, we simply give a negative value:

In [6]:
P(-1)

189682

Finally, `P` can be used to quickly call features on these words. For instance, in order to get the lexeme of the word two words in front of `189683`:

In [7]:
P(2,'lex')

'CMNH/'

And if we want to get a number of features, we can just add other features to the arguments. The result is a feature set:

In [8]:
P(2, 'lex', 'nu')

{'CMNH/', 'sg'}

`P` can also handle features on the source node itself by giving a positionality of `0`:

In [9]:
P(0, 'lex')

'CLC/'

### `Positions` also exists in a non-TF version

When the non-tf version of `Positions` is provided any iterable, it can perform the same functions.

In [10]:
test_ps = ['The', 'good', 'dog', 'jumped.']

P = Positions('good', test_ps).get

In [11]:
P(1)

'dog'

Positions can perform a function on the result with an option `do`. In the example below, the word two words ahead is found and an upper-case function is called on the string.

In [12]:
P(2, do=lambda w: w.upper())

'JUMPED.'

The non-tf version of `Positions` makes it possible to do positionality searches with any ordered list of Python objects that represent linguistic units.

## `Walker`

`Walker` performs a similar function to `Positions`, except it is ambiguous to exact positions, walking either `ahead` or `back` from the source to a target node in the context. A function must be supplied that returns `True` on the target node.

We instantiate the `Walker` using the same source and context as above.

In [13]:
source = 189683
# get words inside source's phrase_atom
positions = L.d(
    L.u(189683,'phrase_atom')[0], 'word'
)

Wk = Walker(source, positions)

`Walker` is demonstrated below with the same word. A simple `lambda` function is used to test for the lexeme. In the example below, we find the first word ahead of `189683` that is a cardinal number:

In [14]:
Wk.ahead(lambda w: F.ls.v(w) == 'card')

189685

An alternative demonstrates the `None` returned on the lack of a valid match.

In [15]:
Wk.ahead(lambda w: F.ls.v(w) == 'BOOGABOOGA')

Another example wherein we walk backwards to the preposition:

In [16]:
Wk.back(lambda w: F.sp.v(w) == 'prep')

189681

We can also specify that the walk should be interrupted under certain conditions with a `stop` function. In this case we walk forward to the next cardinal number, but the walk is interrupted when the `stop` function detects a conjunction.

In [17]:
Wk.ahead(lambda w: F.ls.v(w) == 'card',
         stop=lambda w: F.sp.v(w) == 'conj')

We can also specify the opposite with a `go` function argument, which defines the nodes that allowed to intervene between `source` and `target`. Below we specify that *only* a conjunction should intervene.

In [18]:
Wk.ahead(lambda w: F.ls.v(w) == 'card',
         go=lambda w: F.sp.v(w) == 'conj')

189685

The `go` and `stop` functions can be as permissive or strict as desired.

Finally, we can tell `Walker` that the output of the validation function should be returned instead of the node itself with the optional argument `output=True`:

In [19]:
val_funct = lambda w: F.ls.v(w) if F.ls.v(w)=='card' else None

Wk.ahead(val_funct, output=True)

'card'

This ability is useful for certain tests.

Like `Positions`, `Walker` can be used in non-TF contexts:

In [20]:
test_ps = ['The', 'bad', 'cat', 'swatted.']

Wk_notf = Walker('bad', test_ps)

In [21]:
Wk_notf.ahead(lambda w: w.startswith('sw'))

'swatted.'

### Returning All Results along Path

`Walker` can also return all results along the path by toggling `every=True`

In [22]:
Wk_notf.ahead(lambda w: type(w)==str, every=True)

['cat', 'swatted.']

## `Dummy`

When writing conditions and logic, we want an object that passively receives `NoneType`s or zero `int`s without throwing errors. Such an object should also return `None` to reflect its `False` value. `Dummy`, provides such functionality. `Dummy` can receive all of the arguments, kwargs, and function calls as a `Positions` or `Walker` object. But it returns absolutely nothing. Ouch.

In [23]:
D = Dummy(None, 'phrase_atom', A)

The function call below returns `None`:

In [24]:
D.get(1)

As does this:

In [25]:
D.get(1, 'lex')

And even this:

In [26]:
D.ahead(1)

`D` is essentially a souless void that consumes whatever you throw at it and gives nothing in return.

For safe-calls on a `Position` or `Walker` object, assign nodes to it via a function with a `Dummy` given on null nodes:

In [27]:
def getPos(node, context, tf):
    """A function to get Positions safely."""
    if node:
        return PositionsTF(node, context, tf)
    else:
        return Dummy() # <- give dummy on empty node

So:

In [28]:
P = getPos(None, 'phrase_atom', A)
P.get(1)

Or:

In [29]:
P = getPos(1, 'phrase_atom', A)
P.get(1)

2

# Need for Semantic Data

The accurate processing of word connections depends on fuller semantic data than BHSA provides. Future semantic data could be stored in a similar way to word sets (`wsets`). 

For example, in the two phrases

> (Exod 25:39) ככר זהב טהור <br>
> (2 Sam 24:24) בכסף שקלים חמשׁים

we see that זהב and כסף, despite being in two different positions with two different words indicates a kind of "composed of" semantic concept: "round gold" (i.e. round composed of gold) and "silver shekels" (shekels composed of silver). To process these kinds of links, we need a list of nouns that often function as "material." But this is only the beginning. Many other words will have specific semantic values that motivate their syntactic behavior. Such a scope lies outside the bounds of this author's current project on Hebrew time phrases.

## A Compromise: Time Phrases

Since constructing these semantic classes is vastly time consuming, I want to start with a smaller set of cases. I will instead focus on parsing connections within time phrases for now. This is because I am analyzing time phrases in my current ongoing PhD project. 

In [30]:
def disjoint(ph):
    """Isolate phrases with gaps."""
    ph = L.d(ph,'word')
    for w in ph:
        if ph[-1] == w:
            break
        elif (ph[ph.index(w)+1] - w) > 1:
            return True

In [31]:
alltimes = [
    ph for ph in F.otype.s('timephrase') 
]
    
timephrases = [ph for ph in alltimes if not disjoint(ph)]

print(f'{len(timephrases)} phrases ready')

3864 phrases ready


## Search & Display Functions

The functions below allow for fast searching and displaying of queries using a `Construction` object, described in the next section.

In [32]:
from cx_analysis.search import SearchCX

In [33]:
cx_show = SearchCX(A)
pretty, prettyconds, showcx, search = (
    cx_show.pretty, cx_show.prettyconds, 
    cx_show.showcx, cx_show.search
)

## Construction Classes

* `Construction` - an object that represents a linguistic construction; the class records roles and the words that occupy them, as well as has methods for accessing and retrieving data on embedded roles/other constructions
* `CXBuilder` - matches conditions to build `Construction` objects; populates them with requisite data

In [34]:
from cx_analysis.cx import Construction
from cx_analysis.build import CXbuilder, CXbuilderTF

## Word Constructions

The `wordConstructions` builder class recognizes word semantic classes and types based on provided criteria.

In [35]:
from cx_analysis.grammar import wordConstructions

## Subphrase Constructions

The `SPConstructions` class prepares subphrase constructions.

In [66]:
class SPConstructions(CXbuilderTF):
    """Class for building time phrase constructions."""
    
    def __init__(self, wordcxs, semdist, tf, **kwargs):
        
        """Initialize with Constructions attribs/methods."""
        CXbuilderTF.__init__(self, tf, **kwargs)
        
        self.words = wordcxs
        self.sdist = semdist
        # get maximum semantic distance value from vector space
        self.max_dist = max((
            semdist[lex1][lex2] for lex1, lexs in semdist.items()
                for lex2 in lexs
        ))
        
        # map cx searches for full analyses
        self.cxs = (
            self.defi,
            self.card_chain,
            self.demon,
            self.adjv,
            self.advb,
            self.attrib,
            self.geni,
            self.numb,
            self.prep,
            self.appo,
            self.appo_name,
        )
        
        self.dripbucket = (
            self.wordphrase,
        )
        
        self.kind = 'subphrase'
        
        # submit these cxs to cx in set 
        self.yieldsto = {
            'card_chain': {'numb_ph'},
            'word_cx': {self.kind},
        }
        
    def word(self, w):
        """Safely get word CX"""
        return self.words.get(w, Construction())
        
    def wordphrase(self, w):
        """A phrase construction for one word.
        
        Returns first matching word cx for a word.
        """
        return self.word(w)
        
    def getindex(self, indexable, index, default=None):
        """Safely get an index on an item"""
        try:
            return indexable[index]
        except IndexError:
            return default
        
    def defi(self, w):
        """Matches a definite construction."""
        
        P = self.getP(w)
        
        return self.test( 
            {
                'element': w,
                'name': 'defi_ph',
                'kind': self.kind,
                'roles': {'art': self.word(w), 'head': self.word(P(1))},
                'conds': {

                    f'F.sp.v({w}) == art':
                        self.F.sp.v(w) == 'art',

                    'bool(P(1))':
                        bool(P(1))
                }
            }
        )
    
    def prep(self, w):
        """Matches a preposition with a modified element."""
                
        P = self.getP(w)
        Wk =  self.getWk(w)
        F = self.F
        
        return self.test(
            {
                'element': w,
                'name': 'prep_ph',
                'kind': self.kind,
                'roles': {'prep':self.word(w), 'head':self.word(P(1))},
                'conds': {

                    f'({w}).name == prep':
                        self.word(w).name == 'prep',

                    f'F.prs.v({w}) == absent':
                        self.F.prs.v(w) == 'absent',
                    
                    'bool(P(1))':
                        bool(P(1)),
                }
            },
            {
                'element': w,
                'name': 'prep_ph',
                'pattern': 'suffix',
                'kind': self.kind,
                'roles': {'prep': self.word(w), 'head': self.word(w)},
                'conds': {
                    
                    f'({w}).name == prep':
                        self.word(w).name == 'prep',
                    
                    'F.prs.v(w) not in {absent, NA}':
                        F.prs.v(w) not in {'absent', 'NA'},
                }
                
            },
            {
                'element': w,
                'name': 'prep_ph',
                'pattern': 'prep...on',
                'kind': self.kind,
                'roles': {'prep': self.word(w), 'head': self.word(w)},
                'conds': {
                    f'{F.lex.v(w)} in lexset':
                        F.lex.v(w) in {'M<L/', 'HL>H'},
                    f'Wk.back(({w}).name == prep)':
                        bool(Wk.back(lambda n: self.word(n).name=='prep'))
                }
                
            }
        )
        
    def geni(self, w):
        """Queries for "genitive" relations on a word."""
        
        P = self.getP(w)
        word = self.word
        
        return self.test(
            {
                'element': w,
                'name': 'geni_ph',
                'kind': self.kind,
                'roles': {'geni': self.word(w), 'head': self.word(P(-1))},
                'conds': {

                    'P(-1, st) == c': 
                        P(-1,'st') == 'c',

                    'P(-1).name not in {qquant,card}':
                        word(P(-1)).name not in {'qquant','card'},
                    
                    'P(-1).name != prep':
                        word(P(-1)).name != 'prep',
                }
            }
        )

    def advb(self, w):
        """Match and adverb and its mod."""
        
        P = self.getP(w)
        word = self.word
        
        return self.test(
           {
                'element': w,
                'name': 'advb_ph',
                'kind': self.kind,
                'roles': {'advb': word(w), 'head': word(P(1))},
                'conds': {
                    f'F.sp.v({w}) == advb':
                        self.F.sp.v(w) == 'advb',
                    'P(-1,sp) != art':
                        P(-1,'sp') != 'art',
                    'bool(P(1))':
                        bool(P(1)),
                    'P(1,sp) != conj': # ensure not a nominal use
                        P(1,'sp') != 'conj',
                    'P(-1).name != prep': # ensure not nominal
                        word(P(-1)).name != 'prep',
                    f'F.lex.v({F.lex.v(w)}) not in noadvb_set':
                        F.lex.v(w) not in {'JWMM'},
                }
            }
        )
    
    def adjv(self, w):
        """Matches a word serving as an adjective."""
        
        P = self.getP(w)
        F = self.F
        word = self.word
        name = 'adjv_ph'
        
        # check for recursive adjective matches 
        a2match = self.adjv(P(-1)) if P(-1) else Construction()
        a2match_head = int(a2match.getrole('head', 0))
        
        common = {
            
            'w.name not in {qquant,card}':
                word(w).name not in {'qquant','card'},
            
            'P(-1).name == cont':
                word(P(-1)).name == 'cont',
                        
            'P(-1, st) & {NA, a}': 
                P(-1,'st') in {'NA', 'a'},   
            
            'P(-1).name != quant':
                word(P(-1)).name != 'quant',
            
            'P(-1).name != prep':
                word(P(-1)).name != 'prep',
        }
                
        tests = (
            
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'pattern': 'adjv (1x)',
                'roles': {'adjv':word(w), 'head': word(P(-1))},
                'conds': dict(common, **{
                    'F.sp.v(w) in {adjv, verb}':
                        F.sp.v(w) in {'adjv', 'verb'},
                })
            },
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'pattern': 'adjv (2x)',
                'roles': {'adjv': word(w), 'head': word(a2match_head)},
                'conds': dict(common, **{
                    
                    'F.sp.v(w) in {adjv, verb}':
                        F.sp.v(w) in {'adjv', 'verb'},
                    
                     'self.adjv(P(-1)) and target != P(0)':
                        bool(a2match) and a2match_head != P(0)
                })
            }
        )

        return self.test(*tests)
     
    def attrib(self, w):
        """Identify elements in a attrib construction.
        
        In Hebrew this construction typically consists of four slots:
            > ה + A + ה + B
        Attrib identifies each of these elements and labels them.
        A is assumed to be the head, or modified, element and B
        is assumed to be an adjectival element.
        """
                
        # CX consists of two constituent cxs
        # start walk from head of first match
        P = self.getP(w)
        defi1 = self.defi(w)
        d1head = int(defi1.getrole('head', 0))
        Wk = self.getWk(d1head)

        # walk to next valid defi match
        # and allow adjectives to intervene:
        defi2 = Wk.ahead(
            lambda n: self.defi(n),
            go=lambda n: self.F.sp.v(n)=='adjv',
            output=True
        ) if Wk else Construction()
        defi2 = defi2 or Construction()

        # check for single_defi (only two cases)
        defi_p1 = self.defi(P(1))
        
        return self.test(
            {
                'element': w,
                'name': 'attrib_ph',
                'pattern': 'double_defi',
                'kind': self.kind,
                'roles': {'head': defi1, 'attrib': defi2},
                'conds': {
                    'bool(defi1)':
                        bool(defi1),
                    'bool(defi2)':
                        bool(defi2), 
                }
            },
            {
                'element': w,
                'name': 'attrib_ph',
                'pattern': 'single_defi',
                'kind': self.kind,
                'roles': {'head': self.word(w), 'attrib': defi_p1},
                'conds': {
                    'name(w) == cont':
                        self.word(w).name == 'cont',
                    'F.st.v(w) == a':
                        self.F.st.v(w) == 'a',
                    'P(-1,lex) != H':
                        P(-1,'lex') != 'H',
                    'bool(defi_p1)':
                        bool(defi_p1),
                }
            }
        )
        
    def numb(self, w):
        """Defines numerical relations with an non-quant word.
        
        Often but not always indicates quantification as other
        semantic relations are possible.
        """

        P = self.getP(w)
        Wk = self.getWk(w)
        word = self.word
        is_nom = (
            lambda n: word(n).name == 'cont'
        )
        
        # for the quant ahead check
        # should stop at a preposition or another quantifier
        stop_ahead = (
            lambda n: (word(n).name == 'prep'
                or word(n).name in {'card', 'qquant'} and word(n).name != word(w).name)
        )
        
        behind_nom = Wk.back(is_nom, stop=lambda n: not is_nom(n)) 
        
        return self.test(
        
            {
                'element': w,
                'name': 'numb_ph',
                'kind': self.kind,
                'pattern': 'numbered forward',
                'roles': {'numb': word(w), 'head': word(P(1))},
                'conds': {
                    
                    'w.name in {qquant,card}':
                     word(w).name in {'qquant', 'card'},
                    
                    'bool(P(1))':
                        bool(P(1)),
                    
                    'P(1,sp) != conj':
                        P(1,'sp') != 'conj',
                    
                    'P(1).name not in {qquant,card,prep}':
                        word(P(1)).name not in {'qquant','card','prep'},
        
                    'P(-1,sp) != art':
                        P(-1,'sp') != 'art',
                },
            },  
            {
                'element': w,
                'name': 'numb_ph',
                'kind': self.kind,
                'pattern': 'numbered backward',
                'roles': {'numb': word(w), 'head': word(behind_nom)},
                'conds': {
                    
                    'w.name in {qquant,card}':
                        word(w).name in {'qquant','card'},
                    
                    'not Wk.ahead(is_nominal)':
                        not Wk.ahead(is_nom, stop=stop_ahead),
                    
                    'bool(Wk.back(is_nominal))':
                        bool(behind_nom),
                    
                    'F.st.v(behind_nom) in {a, NA}':
                        self.F.st.v(behind_nom) in {'a', 'NA'},
                }
            }
        )
        
    def card_chain(self, w):
        """Defines cardinal number chain constructions"""
        
        P = self.getP(w)
        F = self.F
        word = self.word
        
        return self.test(
            {
                'element': w,
                'name': 'card_chain',
                'kind': self.kind,
                'pattern': 'adjacent',
                'roles': {'card':word(w), 'head':word(P(-1))},
                'conds': {
                    
                    'F.ls.v(w) == card':
                        F.ls.v(w) == 'card',
                    'P(-1,ls) == card':
                        P(-1,'ls') == 'card',                    
                }
            },
            {
                'element': w,
                'name': 'card_chain',
                'kind': self.kind,
                'pattern': 'conjunctive',
                'roles': {'card': word(w), 'head': word(P(-2)), 'conj': word(P(-1))},
                'conds': {
                    'F.ls.v(w) == card':
                        F.ls.v(w) == 'card',
                    'P(-1,lex) == W':
                        P(-1,'lex') == 'W',
                    'P(-2,ls) == card':
                        P(-2,'ls') == 'card',   
                }
            }
        )
    
    def demon(self, w):
        """Defines an adjacent demonstrative construction."""
        
        P = self.getP(w)
        word = self.word
        F = self.F
        name = 'demon_ph'
        
        return self.test(
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'pattern': 'adjacent forward',
                'roles': {'demon': word(w), 'head': word(P(1))},
                'conds': {
                    'prde in {F.pdp.v(w), F.sp.v(w)}':
                        'prde' in {F.pdp.v(w), F.sp.v(w)},
                    
                    'P(-1,sp) != art': # ensure not part of attrib pattern
                        P(-1,'sp') != 'art',
                    
                    'P(-1).name != prep':
                        word(P(-1)).name != 'prep',
                    
                    'bool(P(1))':
                        bool(P(1)),
                    
                    'P(1).name == cont':
                        word(P(1)).name == 'cont',
                }
            },
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'pattern': 'adjacent back',
                'roles': {'demon':word(w), 'head':word(P(-1))},
                'conds': {
                    'prde in {F.pdp.v(w), F.sp.v(w)}':
                        'prde' in {F.pdp.v(w), F.sp.v(w)},
                    
                    'P(-1).name not in {prep,qquant,card}':
                        word(P(-1)).name not in {'prep','qquant','card'},
                    
                    'P(-1,sp) == subs':
                        P(-1,'sp') == 'subs',
                }
            }
        )
    
    def get_distance(self, w1, w2, default=None):
        """Retrieve semantic distance between two word nodes."""
        default = default or self.max_dist
        lex1, lex2 = F.lex.v(w1), F.lex.v(w2)
        return self.sdist.get(lex1,{}).get(lex2, default)
        
    def set_appo_yield(self, w1, w2, name, 
                       threshold=1, default={}):
        """Determine how to yield an apposition CX
        
        Some words in apposition should pass on their 
        adjectival modifications to the whole phrase.
        Whether or not to do so must be determined semantically.
        That can be done by setting a threshold for semantic
        similarity based on a semantic vector space.
        """ 
        default = default or self.yieldsto
        dist = self.get_distance(w1, w2)
        if dist < threshold:
            return {name:{'numb_ph', 'attrib_ph'}}
        else:
            return default

        
    def appo(self, w):
        """Looks for non-definite appositional constructions"""
        name = 'appo'
        P = self.getP(w)
        F = self.F
        wd = self.word
        dist = self.get_distance(w, P(-1))
        
        return self.test(
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'roles': {'head': wd(P(-1)), 'appo': wd(w)},
                'yieldsto': self.set_appo_yield(w, P(-1), name),
                'conds': {
                    
                    'name(w) == cont':
                        wd(w).name == 'cont',
                    
                    'not adjv(w)':
                        not self.adjv(w),
                    'not advb(w)':
                        not self.advb(w),
                    
                    'name(P-1) == cont':
                        wd(P(-1)).name == 'cont',
                    
                    'st(P-1) == a':
                        F.st.v(P(-1)) == 'a',
                    
                    f'yielded on basis of dist {round(dist, 2)}':
                        dist < self.max_dist
                    
                }
            }
        )
    
    def appo_name(self, w):
        """Match an apposition of name"""
    
        name = 'appo_role'
        F = self.F
        Wk = self.getWk(w)
        wd = self.word
        
        # get word back with only intervention of article
        bk = Wk.back(
            lambda n: wd(n).name == 'name',
            go=lambda n: wd(n).name == 'art'
        )
        
        return self.test(
        
            {
                'element': w,
                'name': name,
                'kind': self.kind,
                'roles': {'role': wd(w), 'head': wd(bk)},
                'yieldsto': self.set_appo_yield(w, bk, name),
                'conds': {
                    
                    'name(w) == cont':
                        wd(w).name == 'cont',
                    
                    'name(back) == name':
                        wd(bk).name == 'name',
                    
                    'bool(back)':
                        bool(bk),
                    
                    'F.st.v(back) == a':
                        F.st.v(bk) == 'a',
                    
                    f'F.nu.v({w}) == F.nu.v({bk})':
                        F.nu.v(w) == F.nu.v(bk),
                    
                    # NB:
                    # rule below reveals the need to be able to say
                    # what head_slot should be; i.e., the lexeme should
                    # be semantically consistent with the ID of the proper name
                    # of a person, head_slot should ~ person, etc.
                    # but for now I'll use a work-around solution
                    'F.lex.v(w) not in timeword set':
                        F.lex.v(w) not in {'CNH/'}
                },
            }
        )

### Load Constructions

In [41]:
words = wordConstructions(A) # word CX builder

# analyze all matches; return as dict
start = datetime.now()
print(f'Beginning word construction analysis...')
wordcxs = words.cxdict(
    s for tp in timephrases
        for s in L.d(tp,'word')
)
print(f'\t{datetime.now() - start} COMPLETE \t[ {len(wordcxs)} ] words loaded')

Beginning word construction analysis...
	0:00:07.151596 COMPLETE 	[ 12887 ] words loaded


In [64]:
# time phrase CX builder
spc = SPConstructions(wordcxs, semdist, A)

### TO FIX:

In [39]:
# pretty(1447386)

NB: L> is marked as the object of the preposition

<hr>

### Small Tests

In [40]:
# pretty(1448320)

In [41]:
# test_small = spc.apposition(375509)
# showcx(test_small, conds=True)

### Stretch Tests

In [67]:
# On deck: apposition of name
# check appo_appo yielding: 1450375
# check appo_attrib yielding: 1450375

# On deck: adjectival preposition
# check performance: 1448556

# resume: 
# old: 1450212

test = spc.analyzestretch(L.d(1450333, 'word'), debug=True)

for res in test:
    showcx(res, conds=True)

rawcxs found: [CX prep_ph (385736, 385737), CX geni_ph (385737, 385738), CX prep_ph (385740, 385741), CX geni_ph (385741, 385742), CX card_chain (385742, 385743, 385744), CX prep_ph (385745, 385746), CX defi_ph (385747, 385748), CX appo_role (385746, 385748), CX appo (385748, 385749), CX numb_ph (385749, 385750), CX card_chain (385750, 385751), CX conj (385739,)]...
cxs clustered into: [[CX prep_ph (385736, 385737), CX geni_ph (385737, 385738)], [CX prep_ph (385740, 385741), CX geni_ph (385741, 385742), CX card_chain (385742, 385743, 385744)], [CX prep_ph (385745, 385746), CX appo_role (385746, 385748), CX defi_ph (385747, 385748), CX appo (385748, 385749), CX numb_ph (385749, 385750), CX card_chain (385750, 385751)], [CX conj (385739,)]]...
Beginning weaveCX method...

Received cxlist [CX prep_ph (385736, 385737), CX geni_ph (385737, 385738)]
Beginning analysis with CX prep_ph (385736, 385737)
	comparing CX prep_ph (385736, 385737) with CX geni_ph (385737, 385738)
	intersect is at CX 

{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {'__cx__': 'card', 'head': 385738},
                'head': {'__cx__': 'cont', 'head': 385737}},
    'prep': {'__cx__': 'prep', 'head': 385736}}

-- CX prep_ph (385736, 385737, 385738) --
pattern: prep_ph
(385736).name == prep                                    True
F.prs.v(385736) == absent                                True
bool(P(1))                                               True

pattern: suffix
(385736).name == prep                                    True
F.prs.v(w) not in {absent, NA}                          False

pattern: prep...on
MN in lexset                                            False
Wk.back((385736).name == prep)                          False

-- CX prep (385736,) --
pattern: ETCBC pdp
F.pdp.v(w) == prep                                       True

pattern: ETCBC ppre words
F.ls.v(w) == ppre                                       False
F.lex.v(w) != DRK/                         

{'__cx__': 'conj', 'head': 385739}

-- CX conj (385739,) --
pattern: conj
bool(F.pdp.v(385739))                                    True



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {   '__cx__': 'card_chain',
                            'card': {'__cx__': 'card', 'head': 385744},
                            'conj': {'__cx__': 'conj', 'head': 385743},
                            'head': {'__cx__': 'card', 'head': 385742}},
                'head': {'__cx__': 'cont', 'head': 385741}},
    'prep': {'__cx__': 'prep', 'head': 385740}}

-- CX prep_ph (385740, 385741, 385742, 385743, 385744) --
pattern: prep_ph
(385740).name == prep                                    True
F.prs.v(385740) == absent                                True
bool(P(1))                                               True

pattern: suffix
(385740).name == prep                                    True
F.prs.v(w) not in {absent, NA}                          False

pattern: prep...on
<D in lexset                                            False
Wk.back((385740).name == prep)                           True

-- CX prep (

{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo_role',
                'head': {'__cx__': 'name', 'head': 385746},
                'role': {   '__cx__': 'defi_ph',
                            'art': {'__cx__': 'art', 'head': 385747},
                            'head': {   '__cx__': 'appo',
                                        'appo': {   '__cx__': 'numb_ph',
                                                    'head': {   '__cx__': 'cont',
                                                                'head': 385749},
                                                    'numb': {   '__cx__': 'card_chain',
                                                                'card': {   '__cx__': 'card',
                                                                            'head': 385751},
                                                                'head': {   '__cx__': 'card',
                                                                            'head': 385750}}},

### Pattern Searches

In [88]:
words = [w for ph in timephrases for w in L.d(ph, 'word')]

# results = search(words, spc.appo_name, pattern='', show=200, shuffle=False)

results = search(words, spc.appo, pattern='', show=0, shuffle=False)

beginning search
	0 found (0/12887)
	5 found (1000/12887)
	11 found (2000/12887)
	12 found (3000/12887)
	18 found (4000/12887)
	29 found (5000/12887)
	32 found (6000/12887)
	38 found (7000/12887)
	42 found (8000/12887)
	42 found (9000/12887)
	42 found (10000/12887)
	49 found (11000/12887)
	57 found (12000/12887)
done at 0:00:33.875077
58 matches found...


### Stretch Tests on Results

In [89]:
elements = sorted(set(L.u(res.element, 'timephrase')[0] for res in results))

for el in elements:
    
    stretch = L.d(el, 'word')
    test = spc.analyzestretch(stretch)
    
    for res in test:
        showcx(res)

{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'numb_ph',
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 3108},
                            'head': {'__cx__': 'cont', 'head': 3107}},
                'numb': {'__cx__': 'card', 'head': 3109}},
    'prep': {'__cx__': 'prep', 'head': 3106}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 16129},
                'head': {'__cx__': 'cont', 'head': 16128}},
    'prep': {'__cx__': 'prep', 'head': 16127}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 21639},
    'head': {'__cx__': 'cont', 'head': 21638}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'prep_ph',
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 22357},
                            'head': {'__cx__': 'cont', 'head': 22356}},
                'prep': {'__cx__': 'prep', 'head': 22355}},
    'prep': {'__cx__': 'prep', 'head': 22354}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 30964},
    'head': {'__cx__': 'cont', 'head': 30963}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 33384},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 33386},
                            'head': {'__cx__': 'cont', 'head': 33385}}},
    'prep': {'__cx__': 'prep', 'head': 33383}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 35963},
                'head': {'__cx__': 'cont', 'head': 35962}},
    'prep': {'__cx__': 'prep', 'head': 35961}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 37487},
    'head': {'__cx__': 'cont', 'head': 37486}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 38478},
                'head': {'__cx__': 'cont', 'head': 38477}},
    'prep': {'__cx__': 'prep', 'head': 38476}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 40444},
                'head': {'__cx__': 'cont', 'head': 40443}},
    'prep': {'__cx__': 'prep', 'head': 40442}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 40574},
                'head': {'__cx__': 'cont', 'head': 40573}},
    'prep': {'__cx__': 'prep', 'head': 40572}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 96159},
                'head': {'__cx__': 'cont', 'head': 96158}},
    'prep': {'__cx__': 'prep', 'head': 96157}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 101795},
    'head': {'__cx__': 'cont', 'head': 101794}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 103950},
                'head': {'__cx__': 'cont', 'head': 103949}},
    'prep': {'__cx__': 'prep', 'head': 103948}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 104027},
                'head': {'__cx__': 'cont', 'head': 104026}},
    'prep': {'__cx__': 'prep', 'head': 104025}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 114041},
                'head': {'__cx__': 'cont', 'head': 114040}},
    'prep': {'__cx__': 'prep', 'head': 114039}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 114808},
                'head': {'__cx__': 'cont', 'head': 114807}},
    'prep': {'__cx__': 'prep', 'head': 114806}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 124246},
                'head': {'__cx__': 'cont', 'head': 124245}},
    'prep': {'__cx__': 'prep', 'head': 124244}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 135366},
                'head': {'__cx__': 'cont', 'head': 135365}},
    'prep': {'__cx__': 'prep', 'head': 135364}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 141373},
                'head': {'__cx__': 'cont', 'head': 141372}},
    'prep': {'__cx__': 'prep', 'head': 141371}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 141592},
                'head': {'__cx__': 'cont', 'head': 141591}},
    'prep': {'__cx__': 'prep', 'head': 141590}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 142458},
                'head': {'__cx__': 'cont', 'head': 142457}},
    'prep': {'__cx__': 'prep', 'head': 142456}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 143463},
    'head': {'__cx__': 'cont', 'head': 143462}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 145881},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 145883},
                            'head': {'__cx__': 'cont', 'head': 145882}}},
    'prep': {'__cx__': 'prep', 'head': 145880}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 146483},
                'head': {'__cx__': 'cont', 'head': 146482}},
    'prep': {'__cx__': 'prep', 'head': 146481}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 148785},
                'head': {'__cx__': 'cont', 'head': 148784}},
    'prep': {'__cx__': 'prep', 'head': 148783}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 152997},
                'head': {'__cx__': 'cont', 'head': 152996}},
    'prep': {'__cx__': 'prep', 'head': 152995}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 153680},
                'head': {   '__cx__': 'attrib_ph',
                            'attrib': {   '__cx__': 'defi_ph',
                                          'art': {   '__cx__': 'art',
                                                     'head': 153683},
                                          'head': {   '__cx__': 'ordn',
                                                      'head': 153684}},
                            'head': {   '__cx__': 'appo',
                                        'appo': {   '__cx__': 'cont',
                                                    'head': 153682},
                                        'head': {   '__cx__': 'cont',
                                                    'head': 153681}}}},
    'prep': {'__cx__': 'prep', 'head': 153679}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 154481},
                'head': {'__cx__': 'cont', 'head': 154480}},
    'prep': {'__cx__': 'prep', 'head': 154479}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 167641},
                'head': {'__cx__': 'cont', 'head': 167640}},
    'prep': {'__cx__': 'prep', 'head': 167639}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 168716},
    'head': {'__cx__': 'cont', 'head': 168715}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {'__cx__': 'name', 'head': 173714},
                'head': {'__cx__': 'cont', 'head': 173713}},
    'prep': {'__cx__': 'prep', 'head': 173712}}



{   '__cx__': 'numb_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 173717},
                'head': {'__cx__': 'cont', 'head': 173716}},
    'numb': {'__cx__': 'card', 'head': 173715}}



{   '__cx__': 'prep_ph',
    'head': {'__cx__': 'cont', 'head': 173719},
    'prep': {'__cx__': 'prep', 'head': 173718}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 191433},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 191435},
                            'head': {'__cx__': 'cont', 'head': 191434}}},
    'prep': {'__cx__': 'prep', 'head': 191432}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 192028},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 192030},
                            'head': {'__cx__': 'cont', 'head': 192029}}},
    'prep': {'__cx__': 'prep', 'head': 192027}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 198928},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 198930},
                            'head': {'__cx__': 'cont', 'head': 198929}}},
    'prep': {'__cx__': 'prep', 'head': 198927}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 199466},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 199468},
                            'head': {'__cx__': 'cont', 'head': 199467}}},
    'prep': {'__cx__': 'prep', 'head': 199465}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 201268},
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 201270},
                            'head': {'__cx__': 'cont', 'head': 201269}}},
    'prep': {'__cx__': 'prep', 'head': 201267}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 203186},
                'head': {'__cx__': 'cont', 'head': 203185}},
    'prep': {'__cx__': 'prep', 'head': 203184}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 231928},
    'head': {'__cx__': 'cont', 'head': 231927}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 242368},
                'head': {'__cx__': 'cont', 'head': 242367}},
    'prep': {'__cx__': 'prep', 'head': 242366}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'prep_ph',
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 249113},
                            'head': {'__cx__': 'cont', 'head': 249112}},
                'prep': {'__cx__': 'prep', 'head': 249111}},
    'prep': {'__cx__': 'prep', 'head': 249110}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'prep_ph',
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 249329},
                            'head': {'__cx__': 'cont', 'head': 249328}},
                'prep': {'__cx__': 'prep', 'head': 249327}},
    'prep': {'__cx__': 'prep', 'head': 249326}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 320456},
    'head': {'__cx__': 'cont', 'head': 320455}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 321446},
    'head': {'__cx__': 'cont', 'head': 321445}}



{   '__cx__': 'prep_ph',
    'head': {'__cx__': 'cont', 'head': 348660},
    'prep': {'__cx__': 'prep', 'head': 348659}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 348663},
                'head': {'__cx__': 'cont', 'head': 348662}},
    'prep': {'__cx__': 'prep', 'head': 348661}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {'__cx__': 'cont', 'head': 348666},
                'head': {'__cx__': 'cont', 'head': 348665}},
    'prep': {'__cx__': 'prep', 'head': 348664}}



{'__cx__': 'conj', 'head': 348667}



{'__cx__': 'cont', 'head': 348668}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 349114},
    'head': {'__cx__': 'cont', 'head': 349113}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 349155},
    'head': {'__cx__': 'cont', 'head': 349154}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 355633},
                'head': {'__cx__': 'cont', 'head': 355632}},
    'prep': {'__cx__': 'prep', 'head': 355631}}



{   '__cx__': 'appo',
    'appo': {'__cx__': 'cont', 'head': 356489},
    'head': {'__cx__': 'cont', 'head': 356488}}



{   '__cx__': 'numb_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 367503},
                'head': {'__cx__': 'cont', 'head': 367502}},
    'numb': {'__cx__': 'card', 'head': 367501}}



{'__cx__': 'conj', 'head': 367504}



{'__cx__': 'cont', 'head': 367505}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'numb_ph',
                'head': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 375509},
                            'head': {'__cx__': 'cont', 'head': 375508}},
                'numb': {   '__cx__': 'card_chain',
                            'card': {   '__cx__': 'card_chain',
                                        'card': {   '__cx__': 'card',
                                                    'head': 375513},
                                        'head': {   '__cx__': 'card',
                                                    'head': 375512}},
                            'conj': {'__cx__': 'conj', 'head': 375511},
                            'head': {'__cx__': 'card', 'head': 375510}}},
    'prep': {'__cx__': 'prep', 'head': 375507}}



{   '__cx__': 'numb_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 376481},
                'head': {'__cx__': 'cont', 'head': 376480}},
    'numb': {'__cx__': 'card', 'head': 376479}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'prep_ph',
                'head': {   '__cx__': 'defi_ph',
                            'art': {'__cx__': 'art', 'head': 377198},
                            'head': {   '__cx__': 'appo',
                                        'appo': {   '__cx__': 'cont',
                                                    'head': 377200},
                                        'head': {   '__cx__': 'cont',
                                                    'head': 377199}}},
                'prep': {'__cx__': 'prep', 'head': 377197}},
    'prep': {'__cx__': 'prep', 'head': 377196}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 378046},
                'head': {'__cx__': 'cont', 'head': 378045}},
    'prep': {'__cx__': 'prep', 'head': 378044}}



{'__cx__': 'conj', 'head': 378047}



{'__cx__': 'qquant', 'head': 378048}



{   '__cx__': 'defi_ph',
    'art': {'__cx__': 'art', 'head': 383556},
    'head': {   '__cx__': 'appo',
                'appo': {'__cx__': 'cont', 'head': 383558},
                'head': {'__cx__': 'cont', 'head': 383557}}}



{'__cx__': 'conj', 'head': 383559}



{'__cx__': 'cont', 'head': 383560}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {'__cx__': 'card', 'head': 385738},
                'head': {'__cx__': 'cont', 'head': 385737}},
    'prep': {'__cx__': 'prep', 'head': 385736}}



{'__cx__': 'conj', 'head': 385739}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'geni_ph',
                'geni': {   '__cx__': 'card_chain',
                            'card': {'__cx__': 'card', 'head': 385744},
                            'conj': {'__cx__': 'conj', 'head': 385743},
                            'head': {'__cx__': 'card', 'head': 385742}},
                'head': {'__cx__': 'cont', 'head': 385741}},
    'prep': {'__cx__': 'prep', 'head': 385740}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'appo_role',
                'head': {'__cx__': 'name', 'head': 385746},
                'role': {   '__cx__': 'defi_ph',
                            'art': {'__cx__': 'art', 'head': 385747},
                            'head': {   '__cx__': 'numb_ph',
                                        'head': {   '__cx__': 'appo',
                                                    'appo': {   '__cx__': 'cont',
                                                                'head': 385749},
                                                    'head': {   '__cx__': 'cont',
                                                                'head': 385748}},
                                        'numb': {   '__cx__': 'card_chain',
                                                    'card': {   '__cx__': 'card',
                                                                'head': 385751},
                                                    'head': 

{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'adjv_ph',
                'adjv': {   '__cx__': 'appo',
                            'appo': {'__cx__': 'cont', 'head': 389048},
                            'head': {'__cx__': 'cont', 'head': 389047}},
                'head': {'__cx__': 'cont', 'head': 389046}},
    'prep': {'__cx__': 'prep', 'head': 389045}}



{   '__cx__': 'prep_ph',
    'head': {'__cx__': 'cont', 'head': 389050},
    'prep': {'__cx__': 'prep', 'head': 389049}}



{   '__cx__': 'prep_ph',
    'head': {   '__cx__': 'defi_ph',
                'art': {'__cx__': 'art', 'head': 410975},
                'head': {   '__cx__': 'appo',
                            'appo': {   '__cx__': 'geni_ph',
                                        'geni': {   '__cx__': 'geni_ph',
                                                    'geni': {   '__cx__': 'name',
                                                                'head': 410979},
                                                    'head': {   '__cx__': 'cont',
                                                                'head': 410978}},
                                        'head': {   '__cx__': 'cont',
                                                    'head': 410977}},
                            'head': {'__cx__': 'cont', 'head': 410976}}},
    'prep': {'__cx__': 'prep', 'head': 410974}}



### Testing on Random Phrases

In [49]:
shuff = [k for k in timephrases
            if len(L.d(k,'word')) > 4]
random.shuffle(shuff)

In [50]:
# for phrase in shuff[:25]:
    
#     print('analyzing', phrase)
#     elements = L.d(phrase,'word')
    
#     try:
#         cxs = tpc.analyzestretch(elements)
#         if cxs:
#             for cx in cxs:
#                 showcx(cx, refslots=elements)
#         else:
#             showcx(Construction(), refslots=elements)
    
#     except:
#         sys.stderr.write(f'\nFAIL...running with debug...\n')
#         pretty(phrase)
#         tpc.analyzestretch(elements, debug=True)
#         raise Exception('...debug complete...')

### Testing on All Timephrases

In [51]:
phrase2cxs = collections.defaultdict(list)
nocxs = []

# time it
start = datetime.now()

print(f'{datetime.now()-start} beginning analysis...')

for i, phrase in enumerate(timephrases):
     
    # analyze all known relas
    elements = L.d(phrase,'word')
    
    # analyze with debug exceptions
    try:
        cxs = spc.analyzestretch(elements)
    except:
        sys.stderr.write(f'\nFAIL...running with debug...\n')
        pretty(phrase)
        spc.analyzestretch(elements, debug=True)
        raise Exception('...debug complete...')

    # save those phrases that have no matching constructions
    if not cxs:
        nocxs.append(phrase)
    else:
        phrase2cxs[phrase] = cxs
        
    # report status
    if i % 500 == 0 and i:
        print(f'\t{datetime.now()-start}\tdone with iter {i}/{len(timephrases)}')
        
print(f'{datetime.now()-start}\tCOMPLETE')
print('-'*20)
print(f'{len(phrase2cxs)} phrases matched with Constructions...')
print(f'{len(nocxs)} phrases not yet matched with Constructions...')

0:00:00.000057 beginning analysis...
	0:00:13.813020	done with iter 500/3864
	0:00:25.902582	done with iter 1000/3864
	0:00:37.354247	done with iter 1500/3864
	0:00:49.642961	done with iter 2000/3864
	0:01:04.888651	done with iter 2500/3864
	0:01:21.649249	done with iter 3000/3864
	0:01:31.794406	done with iter 3500/3864
0:01:43.645283	COMPLETE
--------------------
3864 phrases matched with Constructions...
0 phrases not yet matched with Constructions...


## Closing Gaps

### Identify Gaps

Find timephrases that contain un-covered words besides waw conjunctions.

In [49]:
gapped = []
tested = []

for ph, cxs in phrase2cxs.items():
    
    tested.append(ph)
    
    ph_slots = set(
        s for s in L.d(ph,'word')
    )
    cx_slots = set(
        s for cx in cxs
            for s in cx.slots
    )
    
    if ph_slots.difference(cx_slots):
        gapped.append(cxs)
        
print(f'{len(gapped)} gapped phrases logged...')

0 gapped phrases logged...


In [50]:
for gp in gapped[:25]:
    for cx in gp:
        showcx(cx)

## Connecting Constructions

Developing a CXbuilder to connect all constructions in a complete phrase.


### Ambiguity with Coordinate CXs

Considerable ambiguity is present in several coordinate constructions:

**`A B and C`**<br>
Given A, B, C == nominal words. Is their relationship `A // B // C` or `A+B // C`. In other words: **what is the relationship of two adjacent nominal words given a list?** Is B a descriptor of A or is it an independent element? 

**`A of B and C`**<br>
Is it, `(A of B) // (C)` or `(A of (B // C)`

Or even:

**`A of B C and D`**<br>
This pattern combines elements from both ambiguous cases.

### Method

To address these ambiguities we will apply a battery of disambiguation attempts. At the core of these attempts is a [Semantic Vector Space](https://en.wikipedia.org/wiki/Vector_space_model), which is able to quantify the semantic distance between two words based on their contextual uses throughout the Hebrew Bible.

The working hypothesis of this method is
> Words in coordination with each other will be more semantically similar (i.e. the least distance in the vector space) than other candidates in the phrase.

Semantic similarity in a vector space is not the only method used, however. Another aspect of semantic closeness is phrase structure. For instance, the identity of phrase types is taken into consideration above semantic similarity. 

In [51]:
class CXbuilderPH(CXbuilder):
    """Build complete phrase constructions."""
    
    def __init__(self, phrase2cxs, semdists, tf):
        CXbuilder.__init__(self)
        
        # set up tf methods
        self.tf = tf
        self.F, self.T, self.L = tf.api.F, tf.api.T, tf.api.L
        
        # map cx to phrase node for context retrieval
        self.cx2phrase = {
            cx:ph 
                for ph in phrase2cxs
                    for cx in phrase2cxs[ph]
        }
        
        self.phrase2cxs = phrase2cxs
        self.semdists = semdists
        
        self.cxs = (        
            self.plus_prep,
            self.adjacent
        )
        self.dripbucket = (
            self.cxph,
        )
        
        self.kind = 'phrase'
        
    def cxph(self, cx):
        """Dripbucket function that returns cx as is."""
        return cx
        
    def get_context(self, cx):
        """Get context for a given cx."""
        phrase = self.cx2phrase.get(cx, None)
        if phrase:
            return self.phrase2cxs[phrase]
        else:
            return tuple()
        
    def getP(self, cx):
        """Index positions on phrase context"""
        positions = self.get_context(cx)
        if positions:
            return Positions(
                cx, positions, default=Construction()
            ).get
        else:
            return Dummy

    def getWk(self, cx):
        """Index walks on phrase context"""
        positions = self.get_context(cx)
        if positions:
            return Walker(cx, positions)
        else:
            return Dummy()
    
    def getindex(
        self, indexable, index, 
        default=Construction()
    ):
        """Safe index on iterables w/out IndexErrors."""
        try:
            return indexable[index]
        except:
            return default
    
    def getname(self, cx):
        """Get a cx name"""
        return cx.name
    
    def getkind(self, cx):
        """Get a cx kind."""
        return cx.kind
    
    def getsuccrole(self, cx, role, index=-1):
        """Get a cx role from a list of successive roles.
        
        e.g.
        [big_head, medium_head, small_head][-1] == small_head
        """
        cands = list(cx.getsuccroles(role))
        try:
            return cands[index]
        except IndexError:
            return Construction()
    
    def string_plus(self, cx, plus=1):
        """Stringifies a CX + N-slots for Levenshtein tests."""
        
        # get all slots in the context for plussing
        allslots = sorted(set(
            s for scx in self.get_context(cx)
                for s in scx.slots
        ))
        
        # get plus slots
        P = (Positions(self.getindex(cx.slots, -1), allslots).get
                 if cx.slots and allslots else Dummy)
        plusses = []
        for i in range(plus, plus+1):
            plusses.append(P(i,-1)) # -1 for null slots (== empty string in T.text)
        plusses = [p for p in plusses if type(p) == int]
        
        # format the text string for Levenshtein testing
        ptxt = T.text(
            cx.slots + tuple(plusses),
            fmt='text-orig-plain'
        ) if cx.slots else ''
        
        return ptxt

    
    def coord(self, cx):
        """A coordinate construction.
        
        In order to match a coordinate cx, we need to determine
        which item in the previous phrase this cx belongs with. 
        This is done using a semantic vector space, which can
        quantify the approximate semantic distance between the
        heads of this cx and a candidate cx.
        
        Criteria utilized in validating a coordinate cx between
        an origin cx and a candidate cx are the following:
            TODO: fill in
        """
        
        F, T = self.F, self.T
        P = self.getP(cx)
        semdist = self.semdists
        Wk = self.getWk(cx)
                         
        # get all top-level cxs behind this one that match in name
        cx_behinds = Wk.back(
            lambda c: c.name == cx.name,
            every=True,
            stop=lambda c: (
                c.name == 'conj' and (c != P(-1))
            )
        )
        
        # if top level phrases produce no results,
        # use subphrases instead
        if not cx_behinds:
            topcontext = self.get_context(cx)
            
            # gather all valid subphrase candidates
            subcontext = []
            for topcx in topcontext:
                for subcx in topcx.subgraph():
                    if type(subcx) == int: # skip TF slots
                        continue
                    if (
                        subcx in topcontext or subcx.name != 'conj'
                        and subcx not in cx
                    ):
                        subcontext.append(subcx)        
            
            # walk the new candidates
            Wk2 = Walker(cx, subcontext)
            cx_behinds = Wk2.back(
                lambda c: c.name != 'conj', 
                default=[P(-2)],
                every=True,
                stop=lambda c: (
                    c.name == 'conj' and (c != P(-1))
                )
            )
        
        # map each back-cx to its last slot to make sure
        # every candidate is the last item in its phrase
        # check is made in next series of lines
        cx2last = {
            cxb:self.getindex(sorted(cxb.slots), -1, 0)
                for cxb in cx_behinds
        }
        
        # find coordinate candidate subphrases that stand
        # at the end of the phrase
        cx_subphrases = []
        
        for cx_back in cx_behinds:
            for cxsp in cx_back.subgraph():
                if type(cxsp) == int:
                    continue
                elif (
                    cx2last[cx_back] in cxsp.slots
                    and cxsp.getrole('head')
                ):
                    cx_subphrases.append(cxsp)
        
        # get subphrase heads for semantic tests
        cx2heads = [
            (cxsp, self.getsuccrole(cxsp,'head'))
                for cxsp in cx_behinds
        ]

        # get head of this cx
        head1 = self.getsuccrole(cx,'head')     
        head1lex = F.lex.v(head1)
        
        # sort on a set of priorities
        # the default sort behavior is used (least to greatest)
        # thus when a bigger value should be more important, 
        # a negative is added to the number
        stringp = self.string_plus
        
        # arrange candidates by priority
        cxpriority = []
        for cxsp, headsp in cx2heads:
            name_eq = 0 if cxsp.name == cx.name else 1
            semantic_dist = semdist.get(
                head1lex,{}
            ).get(F.lex.v(headsp), np.inf)
            size = -len(cxsp.slots)
            levenshtein = lev_dist(stringp(cx), stringp(cxsp))
            slot_dist = -next(iter(cxsp.slots), 0)
            heads = (head1, headsp) # for reporting purposes only
            
            cxpriority.append((
                name_eq,
                semantic_dist,
                size,
                levenshtein,
                slot_dist,
                heads,
                cxsp
            ))
            
        # make the sorting
        cxpriority = sorted(cxpriority, key=lambda k: k[:-1])
        
        # select the first priority candidate
        cand = next(iter(cxpriority), (0,0,Construction()))
        
        # add data for conds report / debugging
        data = collections.defaultdict(str)
        for namescore,sdist,leng,ldist,lslot,heads,cxp in cxpriority:
            # name equality
            data['namescore'] += f'\n\t{cxp} namescore: {namescore}'
            # semantic distance
            data['semdists'] += (
                f'\n\t{round(sdist, 2)}, {F.lex.v(heads[0])} ~ {F.lex.v(heads[1])}, {cxp}'
            )
            # size of cx
            data['size'] += f'\n\t{cxp} length: {abs(leng)}'
            
            # Levenstein distance
            data['ldist'] += f'\n\t{cxp} dist: {ldist}'
            
            # dist of last slot
            data['lslot'] += f'\n\t{cxp} last slot: {abs(lslot)}'
    
        
        return self.test(
            {
                'element': cx,
                'name': 'coord',
                'kind': self.kind,
                'roles': {'part2':cx, 'conj': P(-1), 'part1': cand[-1]},
                'conds': {
                    'P(-1).name == conj':
                        P(-1).name == 'conj',
                    'bool(cand)':
                        bool(cand[-1]),
                    f'name matches {data["namescore"]}\n':
                        bool(cxpriority),
                    f'is shortest sem. distance of {data["semdists"]}\n':
                        bool(cxpriority),
                    f'is longest length of: {data["size"]}\n':
                        bool(cxpriority),
                    f'is shortest Levenshtein distance: {data["ldist"]}\n':
                        bool(cxpriority),
                    f'is closest last slot of: {data["lslot"]}\n':
                        bool(cxpriority)
                }
            }
        )
    
    def appo_name(self, cx):
        """Apposition of name"""
        
        P = self.getP(cx)
        geti = self.getindex
        
        # get head and first slot of construction
        cxhead = self.getsuccrole(cx, 'head') # a tf integer
        headcx = next(iter(cx.graph.pred[cxhead])) # a CX
        first_slot = cx.slots[0] # for tests
        
        # get very last embedded cx in P(-1)
        back = P(-1)
        name = geti(back.slots, -1)
        try:
            namecx = next(iter(back.graph.pred[name])) # a CX
        except KeyError:
            namecx = Construction()
        
        return self.test(
        
            {
                'element': cx,
                'name': 'appo_name',
                'kind': self.kind,
                'roles': {'name': cx, 'head':namecx},
                'conds': {
                    
                    'cx(head).name == cont':
                        headcx.name == 'cont',
                    
                    'cx.name not in {prep_ph}':
                        cx.name not in {'prep_ph'},
                    
                    'bool(P(-1))':
                        bool(P(-1)),
                    
                    'backcx.name == name':
                        namecx.name == 'name',
                    
                    f'F.nu.v({cxhead}) == F.nu.v({name})':
                        F.nu.v(cxhead) == F.nu.v(name),
                    
                    'cxhead == first_slot or first_slot==art':
                        (
                            cxhead == first_slot
                            or self.F.sp.v(first_slot) == 'art'
                        ),
                    
                    # NB:
                    # rule below reveals the need to be able to say
                    # what head_slot should be; i.e., the lexeme should
                    # be semantically consistent with the ID of the proper name
                    # if person, head_slot should ~ person, etc.
                    # but for now I'll use a work-around solution
                    'F.lex.v(head_slot) not in timeword set':
                        F.lex.v(cxhead) not in {'CNH/'}
                }
            }
        )
    
    def adjacent(self, cx):
        """Find adjacent CXs"""
        
        P = self.getP(cx)
        
        return self.test(
            {
                'element': cx,
                'name': 'adjacent',
                'kind': self.kind,
                'roles': {'phrase1':cx, 'phrase2':P(1)},
                'conds': {
                    'cx.name != conj':
                        cx.name != 'conj',
                    'P(1).name != prep':
                        P(1).name != 'prep',
                    'bool(P(1))':
                        bool(P(1)),
                    f'name({P(1).name}) not in (conj, prep_ph)':
                        P(1).name not in {'conj','prep_ph'},
                    'not appo_name(P(1))':
                        not (self.appo_name(P(1)) if P(1) else False),
                    'not appo_name(cx)':
                        not self.appo_name(cx),
                }
            }
        
        )
    
    def plus_prep(self, cx):
        """Find phrase+prep CXs"""
        
        P = self.getP(cx)
                
        return self.test(
            {
                'element': cx,
                'name': '+prep',
                'kind': self.kind,
                'roles': {'+prep': cx, 'head': P(-1)},
                'conds': {
                    'cx.name == prep_ph':
                        cx.name == 'prep_ph',
                    'bool(P(-1))':
                        bool(P(-1)),
                    'P(-1,name) != conj':
                        P(-1).name != 'conj',
                }
            }
        )
    
cxp = CXbuilderPH(phrase2cxs, semdist, A)

## Tests

In [55]:
# A.show(A.search('''

# timephrase
#     word pdp=subs ls#card|prpe lex#KL/|JWM/ st=a

#     <: word lex=JWM/
# ''')[:10])

In [136]:
# the following phrases contain cases that still
# need to be fixed for the coordinate cx; some should
# actually be done in the previous cx builder at subphrase level

to_fix = [
    1450039, # coord, add adjacent advb cx with JWM
    1450075, # coord, add adjacent advb cx with >Z
    1450647, # coord, consider prioritizing Levenshtein over size
    
]

### Test Small

In [162]:
# testph = phrase2cxs[1450540]
# testph

In [161]:
# test = cxp.appo_name(testph[-1])

# showcx(test, conds=True)

### Pattern Matches

<hr>

# TOTEST: 

1450333 - from apposition to proper name

<hr>

In [139]:
def filt_gaps(cx):
    """Isolate cxs with gaps"""
    timephrase = L.u(next(iter(cx.slots)),'phrase')[0]
    if set(L.d(timephrase,'word')) - cx.slots:
        return True
    else:
        return False
    
def filt(cx):
    """Find specific lexeme"""
    timephrase = L.u(next(iter(cx.slots)),'phrase')[0]
    phrasewords = L.d(timephrase, 'word')
    if (
        {'JWM/', 'LJLH/'}.issubset(set(F.lex.v(w) for w in phrasewords))
        and len(phrasewords) == 3
    ):
        return True
    else:
        return False

In [59]:
# elements = [
#     cx for ph in list(phrase2cxs.values())
#         for cx in ph
# ]

# test_search(
#     elements, 
#     cxp.adjacent, 
#     pattern='', 
#     shuffle=False,
#     #select=lambda c: filt(c),
#     extraFeatures='lex st',
#     show=150
# )

## Stretch Tests

Testing across a whole phrase.

In [53]:
# test = cxp.analyzestretch(phrase2cxs[1449168], debug=True)
# for res in test:
#     showcx(res, conds=False)