# Going Subphraseless

The current method for isolating phrase heads ([here](https://nbviewer.jupyter.org/github/ETCBC/heads/blob/master/phrase_heads.ipynb)) requires strenuous and ineloquent processing of BHSA subphrase relations. The subphrases are not always consistently encoded and suffer from numerous exceptional cases. The result is that the method is rather convoluted and ineloquent.

This notebook will explore the possibility of disconnecting semantic head analysis from the ETCBC subphrase encoding. 

A "semantic" head is the primary content word of a phrase, following Croft's "Primary Information Bearing Unit":

> **The noun and the verb are the PRIMARY INFORMATION_BEARING UNITS (PIBUs) of the phrase and clause respectively. In common parlance, they are the content words. PIBUs have major informational content that functional elements such as articles and [auxiliaries] do not have. (Croft, *Radical Construction Grammar*, 2001, 258; see also Shead, *Radical Frame Semantics and Biblical Hebrew*, 104)**

> **A (semantic) head is the profile equivalent that is the primary information-bearing unit, that is, the most contentful item that most closely profiles the same kind of thing that the whole constituent profiles. (ibid., 259)**

Croft also provides an additional criterion to "profile equivalence":

> **If the criterion of profile equivalence produces two candidates for headhood, the less schematic meaning is the PIBU; that is, the PIBU is the one with the narrower extension, in the formal semantic sense of that term (ibid., 259)**

## Inquiry

Can we isolate semantic phrase heads in BHSA using only the phrase_atom and phrase limits? This question indeed means that we  take the phrase_atom/phrase boundaries for granted. Empirically, the validity of BHSA phrase boundaries needs to be tested. But for now, the exercise of isolating semantic phrase heads could be seen as the first step towards reproducible phrase boundaries.

## Basic Concepts

A semantic head will most often stand in a syntactically independent position. For Hebrew nominal phrases, that essentially means a word which is not precided by a construct, and which is semantically central (excluding attributive slots (e.g. H + noun + H + ATTRIBUTIVE) or an adjectival slots (e.g. noun + noun as in אישׁ טוב).

Quantifier expressions present unique cases, which may be syntactically independent but semantically secondary. These are expressed through specialized lexical items such as cardinal numbers and qualitative quantifiers (e.g.  "כל" and "חצי").

Another complication is the use of nouns as prepositional items. Such uses can be seen with words like פני "face" such as לפני "in front," and even words like ראשׁ as in ראשׁ החדשׁ "beginning of the month." 

Other expressions of quantity, quality, and function provide similar complexities. These cases have to be specified in advance.

### Ambiguity

Considerable ambiguity is present in several of cases:

**`A B and C`**<br>
Given A, B, C == nominal words. Is their relationship `A // B // C` or `A+B // C`. In other words: **what is the relationship of two adjacent nominal words given a list?** Is B a descriptor of A or is it an independent element? 

**`A of B and C`**<br>
Is it, `(A of B) // (C)` or `(A of (B // C)`

Or even:

**`A of B C and D`**<br>
This pattern combines elements from both ambiguous cases.

To address these ambiguities we will apply a battery of disambiguation attempts. Some of those attempts will draw from corpus data, i.e. do we ever see `B and C` with the conjunction explicitly elsewhere in the corpus? Or do we ever see a `A of C` excplicitly in the corpus? Accents may also play a role: do we see a conjunctive or disjunctive accent between `B C`? 

## Prerequisites

A number of pre-defined word sets are needed for processing quantification and ambiguous adjacency. These sets are made available in the form of `wsets`, a dictionary containing word sets that are calculated in to the `wordsets` directory of this repository. The following wordsets have been defined:

* nominals – a set of word nodes with parts of speech and participles that have the potential to function as nominalized elements. The selected parts of speech are quite permissive: `{'subs', 'nmpr', 'adjv', 'advb', 'prde', 'prps', 'prin', 'inrg'}`. Since parts of speech are not taken as universal linguistic categories but only summaries of language-specific word tendencies (cf. Croft, *Radical Construction Grammar*, 2001), we consider that almost any part of speech can be used in a nominal pattern (or construction). There are some upper limits to this assumption, though. For instance, we exclude cojunctions, articles, prepositions, and negators. 
* prepositions – a word set consisting of words with a part of speech category of `prep`, a lexical set (`ls`) feature of `ppre` ("potential preposition"), as well as a select group of nouns like פני "face" which have been processed for prepositionality. 
* quantifiers - consists of word nodes that are cardinal numbers or qualitative quantifiers such as כל.
* mword – mapping from a word to its phonological word group ("masoretic word"); joins words on maqqeph and ø space
* accent_type – a mapping from a word to its accent type: conjunctive or disjunctive
* conj_pairs – a dict of observed conjunction pairings of lexemes in the corpus: `A & B`
* cons_pairs – a dict of observed construct pairings of lexemes in the corpus: `A of B`
* mom – mapping from word node to its mother word node for a specified relationship: `mom[A]['coord'] = B`
* kid – opposite of mom; mapping from word to its children nodes for a relationship: `kid[A]['cons'] = B`

**Let's get started**. We load the necessary functions and BHSA data (straight from source).

In [1]:
import collections
import pickle
import random
import re
from IPython.display import display, HTML
from datetime import datetime
from pprint import pprint
from tf.app import use
wsets = pickle.load(open('wordsets/wsets.pickle', 'rb'))
A = use('bhsa', hoist=globals(), silent=True)
A.displaySetup(condenseType='phrase', withNodes=True, extraFeatures='st')

   |     0.00s No structure info in otext, the structure part of the T-API cannot be used


### Wordsets

In [2]:
wsets.keys()

dict_keys(['noms', 'preps', 'quants', 'accent_type', 'mwords', 'conj_pairs', 'cons_pairs'])

In [3]:
list(wsets['cons_pairs'].keys())[:10]

['>JC/', 'KL/', 'BN/', 'TPF[', 'MZBX/', 'B>R/', '<T/', 'XRB/', 'RWX/', 'MLK/']

In [4]:
#wsets['conj_pairs']['>JC/']

# Machinery

We could use some machinery to do the hard work of looking in and around a node. In the older approach we used TF search templates. But these are not very efficient at scale, and they are always bound by the limits of the query language. I take another approach here: a set of classes that specify locations and directions within a specified context.

In [5]:
from wordsets.langtools import Positions, Walker

## Positions class

The `Positions` class enables concise access to adjacent nodes within a given context. This allows us to write algorithms with query-like efficiency with all of the power of Python. 

This class is instantiated on a word node and can provide contextual look-up data for a given word. For example, given a phrase containing the following word nodes:

> (189681, 189682, **189683**, 189684, 189685, 189686) <br>

representing the following phrase (space separated for clarity):

> ב שׁנת **שׁלשׁים** ו שׁמנה שׁנה

Given that the bolded node, `189683` is our `source` word, we instantiate the class, feeding in the node, the "phrase_atom" string (which is the context we want to search within), and an instance of Text-Fabric (`tf`):

In [6]:
      #    source node    context  TF instance  
      #         |            |       |
P = Positions(189683, 'phrase_atom', A).get

If we want to obtain the word adjacent one space forward, we simply ask `P` for `1`, which gives us the next word in the phrase.

In [7]:
P(1)

189684

If we try to ask for 4 words forward, we go beyond the bounds of the phrase. But `P` handles this by returning nothing:

In [8]:
P(4)

To look back one word, we simply give a negative value:

In [9]:
P(-1)

189682

Finally, `P` can be used to quickly call features on these words. For instance, in order to get the lexeme of the word two words in front of `189683`:

In [10]:
P(2,'lex')

'CMNH/'

And if we want to get a number of features, we can just add other features to the arguments. The result is a feature set:

In [11]:
P(2, 'lex', 'nu')

{'CMNH/', 'sg'}

`P` can also handle features on the source node itself by giving a positionality of `0`:

In [12]:
P(0, 'lex')

'CLC/'

## Walker class

`Walker` performs a similar function to `Positions`, except it is ambiguous to exact positions, walking either `ahead` or `back` from the source to a target node in the context. A function must be supplied that returns `True` on the target node.

We instantiate the `Walker` using the same source and context as above.

In [13]:
      #  source node    context  TF instance  
      #       |            |       |
Wk = Walker(189683, 'phrase_atom', A)

`Walker` is demonstrated below with the same word. A simple `lambda` function is used to test for the lexeme. In the example below, we find the first word ahead of `189683` that is a cardinal number:

In [14]:
Wk.ahead(lambda w: F.ls.v(w) == 'card')

189685

An alternative demonstrates the `None` returned on the lack of a valid match.

In [15]:
Wk.ahead(lambda w: F.ls.v(w) == 'BOOGABOOGA')

Another example wherein we walk backwards to the preposition:

In [16]:
Wk.back(lambda w: F.sp.v(w) == 'prep')

189681

We can also specify that the walk should be interrupted under certain conditions with a `stop` function. In this case we walk forward to the next cardinal number, but the walk is interrupted when the `stop` function detects a conjunction.

In [17]:
Wk.ahead(lambda w: F.ls.v(w) == 'card',
         stop=lambda w: F.sp.v(w) == 'conj')

We can also specify the opposite with a `go` function argument, which defines the nodes that allowed to intervene between `source` and `target`. Below we specify that *only* a conjunction should intervene.

In [18]:
Wk.ahead(lambda w: F.ls.v(w) == 'card',
         go=lambda w: F.sp.v(w) == 'conj')

189685

The `go` and `stop` functions can be as permissive or strict as desired.

# Need for Semantic Data

The accurate processing of word connections depends on fuller semantic data than BHSA provides. Future semantic data could be stored in a similar way to word sets (`wsets`). 

For example, in the two phrases

> (Exod 25:39) ככר זהב טהור <br>
> (2 Sam 24:24) בכסף שקלים חמשׁים

we see that זהב and כסף, despite being in two different positions with two different words indicates a kind of "composed of" semantic concept: "round gold" (i.e. round composed of gold) and "silver shekels" (shekels composed of silver). To process these kinds of links, we need a list of nouns that often function as "material." But this is only the beginning. Many other words will have specific semantic values that motivate their syntactic behavior. Such a scope lies outside the bounds of this author's current project on Hebrew time phrases.

## A Compromise: Time Phrases

Since constructing these semantic classes is vastly time consuming, I want to start with a smaller set of cases. I will instead focus on parsing connections within time phrases for now. This is because I am analyzing time phrases in my current ongoing PhD project. 

In [19]:
timephrases = [ph for ph in F.otype.s('phrase_atom') 
                   if F.function.v(L.u(ph, 'phrase')[0]) == 'Time'
                   and len(L.d(ph, 'word')) > 1
                   and F.language.v(L.d(ph, 'word')[0]) == 'Hebrew'
            ]

print(f'{len(timephrases)} phrases ready')

3766 phrases ready


## Search & Display Functions

The functions below allow for fast searching and displaying of queries using the `Positions` class. The searches rely on the `Grammar` class, described further below.

In [20]:
def prettyconds(rulesets):
    '''
    Iterate through an explain dict for a rela
    and print out all of checked conditions.
    '''
    
    for ruleset in rulesets:
        name, src, tgt = ruleset['name'], ruleset['src'], ruleset['tgt']
        print(name)
        print(f'{src} -> {tgt}')
        
        for cond, value in ruleset['cnd'].items():
            print('{:<30} {:>30}'.format(cond, str(value)))
        
        print()
        
def showmatch(match):
    
    '''
    Displays a match from a Grammar test.
    '''
    
    hit, conds = match['match'], match['conds']
    
    if not hit:
        print('NO MATCHES')
        print('-'*20)
        A.pretty(L.u(conds[0]['src'], 'phrase_atom')[0], extraFeatures='sp st', withNodes=True)
        prettyconds(conds)
        return None

    name, src, tgt = hit['name'], hit['src'], hit['tgt']

    phrase = L.u(src, 'phrase_atom')[0]

    highlights = {src:'pink',
                  tgt:'lightgreen'}

    A.pretty(phrase, withNodes=True, extraFeatures='sp st', highlights=highlights)
    prettyconds(conds)
    display(HTML('<hr>'))
        
def test_search(relastr, show=10, end=None, name='', phrases=None):
    '''
    Searches phrases with the specified relation 
    and prints out their descriptive explanation.
    '''
    
    start = datetime.now()
    print('beginning search')
    
    # build a convenient test set of words
    phrases = phrases or [ph for ph in F.otype.s('phrase_atom') 
                   if F.typ.v(ph) in {'NP', 'PP'}
                   and len(L.d(ph, 'word')) > 1
              ]
    words = [w for ph in phrases for w in L.d(ph, 'word')]
    
    # random shuffle to get good diversity of examples
    random.shuffle(words)
    
    # set up grammar
    G = NounGrammar(wsets, A)
    
    matches = []
    append = matches.append
    
    # iterate and find matches on words
    for i,w in enumerate(words):

        # update every 5000 iterations
        if i%5000 == 0:
            print(f'\t{len(matches)} found ({i}/{len(words)})')
        
        # run grammar search
        test = G.tests[relastr](w)
        
        # save results
        if test['match']:
            if not name:
                append(test)
            elif test['match']['name'] == name:
                append(test)
            
        # stop at end
        if len(matches) == end:
            break
        
        
    # display
    print('done at', datetime.now() - start)
    print(len(matches), 'matches found...')
    print('showing', end)
    
    for match in matches[:show]:
        showmatch(match)

## Constructions classes

While `Positions` provides concise access to context, A `Constructions` class contains a series of functions which test a bunch of conditions. The conditions are formed by testing word `Positions`. An example of a `Constructions` class is provided below.

In [21]:
class Bunch(object):
    """Stores variables for shorthand access"""
    def __init__(self, vardict):
        """Initialize variables object with dict."""
        for k,v in vardict.items():
            setattr(self, k, v)

class Constructions(object):
    """Identifies constructions around a node."""
    
    def __init__(self, semsets, tf, **kwargs):
        """Initialize Constructions object.
        
        Arguments:
            semsets: A dictionary containing semantic
                sets. Key should be the name of the set.
                Value is a set of TF nodes.
            tf: An instance of Text-Fabric.
            
        **kwargs:
            context: the context that contains the node in 
            which to run the attribute tests.
        """
        self.tf = tf
        self.F, self.T, self.L = tf.api.F, tf.api.T, tf.api.L
        self.context = kwargs.get('context', 'phrase_atom')
        self.semsets = Bunch(semsets)
        self.cxs = {}
    
    def getP(self, n):
        """Get Positions object for a TF node."""
        if not n:
            return lambda n, *args: None
        return Positions(n, self.context, self.tf).get
    
    def getWk(self, n):
        """Get Walker object for a TF word node."""
        if not n:
            return lambda n, *args: None
        return Walker(n, self.context, self.tf)
    
    def evaluate(self, condtuple):
        """Apply test to a construction attribute set.
        
        The last-matching case will be returned in
        order to default to more complex cases.
        
        Arguments:
            condtuple: a tuple containing dictionaries with
                defined attributes. Keys are "src", "tgt", 
                and "cnd" where contains a list of string:boolean
                pairings wherein the string describes the boolean.
        
        Returns:
            results of the test as a dict
        """
        
        # find cases where all cnds == True
        test = [
            attribset for attribset in condtuple
                if all(attribset['cnd'].values())
                    and attribset['tgt']
                    and attribset['src']
        ]
        
        # return last test or empty dict
        if test:
            return test[-1]
        else:
            return {}
        
    def test(self, *conds):
        """Evaluates conditions on a node with tester method."""
        case =  {'match': {}, 'conds':None}
        if not conds:
            return case
        case['conds'] = conds
        case['match'] = self.evaluate(conds)
        return case
        
    def everymatch(self, n):
        """Runs analysis for all constructions with a node.
        
        Returns as dict with test:result as key:value.
        """
        results = {}
        for name, funct in self.cxs.items():
            result = funct(n)
            if result['match']:
                results[name] = result['match']
        return results

In [27]:
class NounsConstructions(Constructions):
    """Class for defining noun constructions."""
    
    def __init__(self, wsets, tf):
        
        """Initialize with Constructions attribs/methods."""
        super().__init__(wsets, tf)
        
        # map cx searches to labels
        self.cxs = {
            'defi': self.defi,
            'card_chain': self.card_chain,
            'adjv': self.adjv,
            'advb': self.advb,
            'attrib': self.attrib,
            'geni': self.geni,
            'numb': self.numb,
            'prep': self.prep,
         }
 
    def defi(self, w):
        """Matches a definite art. word to its mod."""
        
        P = self.getP(w)
        
        return self.test( 
            {
                'name': 'definite',
                'src': w,
                'tgt': P(1),
                'cnd': {

                    'F.sp.v(w) == art':
                        self.F.sp.v(w) == 'art',

                    'bool(P(1))':
                        bool(P(1))
                }
            }
        )
    
    def prep(self, w):
        """Matches a preposition with a modified element."""
                
        P = self.getP(w)
        
        return self.test(
            {
                'name': 'prep',
                'src': w,
                'tgt': P(1),
                'cnd': {

                    'w in preps':
                        w in self.semsets.preps,

                    'F.prs.v(w) == absent':
                        self.F.prs.v(w) == 'absent',
                    
                    'bool(P(1))':
                        bool(P(1)),
                }
            }
        )
        
    def geni(self, w):
        """Queries for "genitive" relations on a word."""
        
        P = self.getP(w)
        sm = self.semsets
        
        return self.test(
            {
                'name': 'geni',
                'src': P(0),
                'tgt': P(-1),
                'cnd': {

                    'P(-1, st) == c': 
                        P(-1,'st') == 'c',

                    'P(-1) not in quants|preps': 
                        P(-1) not in sm.quants|sm.preps,
                }
            }
        )

    def advb(self, w):
        """Match and adverb and its mod."""
        
        P = self.getP(w)
        
        return self.test(
           {
                'name': 'adverb',
                'src': w,
                'tgt': P(1),
                'cnd': {
                    'F.sp.v(w) == advb':
                        self.F.sp.v(w) == 'advb',
                    'P(1) in noms':
                        P(1) in self.semsets.noms,
                }
            }
        )
    
    def adjv(self, w):
        """Matches a word serving as an adjective."""
        
        if not w:
            return self.test()
        
        P = self.getP(w)
        F = self.F
        sm = self.semsets
        
        # check for recursive adjective matches 
        a2match = self.adjv(P(-1))
        
        common = {
            'P(-1) in noms':
                P(-1) in sm.noms,
            
            'P(-1, st) & {NA, a}': 
                P(-1,'st') in {'NA', 'a'},   
            
            'P(-1) not in {quants|preps}':
                P(-1) not in sm.quants|sm.preps,
        }
                
        tests = (
            
            {
                'name': 'adjv',
                'src': w,
                'tgt': P(-1),
                'cnd': dict(common, **{
                    'F.sp.v(w) in {adjv, verb}':
                        F.sp.v(w) in {'adjv', 'verb'},
                })
            },
            {
                'name': 'adjv adjv',
                'src': P(0),
                'tgt': a2match.get('tgt', None),
                'cnd': dict(common, **{
                    
                    'P(0,sp) in {adjv, verb}':
                        P(0,'sp') in {'adjv', 'verb'},
                    
                     'self.adjv(P(-1)) and target != P(0)':
                        bool(a2match['match']) and a2match.get('tgt', None) != P(0)
                })
            }
        )

        return self.test(*tests)
     
    def attrib(self, w):
        """Identify elements in a attrib construction.
        
        In Hebrew this construction typically consists of four slots:
            > ה + A + ה + B
        Attrib identifies each of these elements and labels them.
        A is assumed to be the head, or modified, element and B
        is assumed to be an adjectival element.
        """
        
        P = self.getP(w)
        sm = self.semsets
        
        return self.test(
            {
                'name': 'adjv',
                'src': w,
                'tgt': P(-2),
                'cnd': {
                    'F.sp.v(w) not in {prep, art, conj}':
                        self.F.sp.v(w) not in {'prep', 'art', 'conj'},
                                        
                    'w not in preps':
                       w not in sm.preps,
                    
                    'P(-1,sp) == art':
                        P(-1,'sp') == 'art',
                    
                    'P(-2) in noms':
                        P(-2) in sm.noms,
                    
                    'P(-2) not in quants':
                        P(-2) not in sm.quants,
                    
                    'P(-2,st) in {NA, a}':
                        P(-2,'st') in {'NA', 'a'},
                    
                    'P(-2,sp) != advb':
                        P(-2,'sp') != 'advb',
                }
            }
        )
        
    def numb(self, w):
        """Defines numerical relations with an non-quant word
        
        Often but not always indicates quantification as other
        semantic relations are possible.
        """

        P = self.getP(w)
        Wk = self.getWk(w)
        sm = self.semsets
        is_nom = (lambda n: n in sm.noms and n not in sm.quants)
        behind_nom = Wk.back(is_nom, go=lambda n: F.sp.v(n)=='art') 

        return self.test(
        
            {
                'name': 'numbered forward',
                'src': w,
                'tgt': P(1),
                'cnd': {
                    'w in quants':
                        w in sm.quants,
                    
                    'P(1,sp) != conj':
                       P(1,'sp') != 'conj',
                    
                    'P(1) not in quants':
                        P(1) not in sm.quants,
                    
                    'bool(P(1))':
                        bool(P(1))
                },
            },  
            {
                'name': 'numbered backward',
                'src': w,
                'tgt': behind_nom,
                'cnd': {
                    
                    'w in quants':
                        w in sm.quants,
                    
                    'not Wk.ahead(is_nominal)':
                        not Wk.ahead(is_nom),
                    
                    'bool(Wk.back(is_nominal))':
                        bool(behind_nom)
                }
            }
        )
        
    def card_chain(self, w):
        """Defines cardinal number chain constructions"""
        
        Wk = self.getWk(w)
        F = self.F
        is_card = (lambda n: F.ls.v(n) == 'card')
        back_card = Wk.back(is_card, go=lambda n: F.sp.v(n)=='conj')
        
        return self.test(
        
            {
                'name': 'card connective',
                'src': w,
                'tgt': back_card,
                'cnd': {
                    
                    'F.ls.v(w) == card':
                        self.F.ls.v(w) == 'card',
                    
                    'bool(Wk.back(is_card), go=conj)':
                        bool(back_card),                    
                }
            }
        )

In [23]:
G = NounsConstructions(wsets, A)

In [28]:
showmatch(G.numb(189685))

numbered forward
189685 -> 189686
w in quants                                              True
P(1,sp) != conj                                          True
P(1) not in quants                                       True
bool(P(1))                                               True

numbered backward
189685 -> None
w in quants                                              True
not Wk.ahead(is_nominal)                                False
bool(Wk.back(is_nominal))                               False



## Testing

In [29]:
#test_search('cons', name='', show=100, end=50, phrases=timephrases)

## Generate Chunks

For every word in the timephrase set that possesses a dependent relation, generate a chunk object with that word as its head. Also, generate prepositional chunks.

In [30]:
relas = {}

nomless = []

# set up Grammar class
G = NounsConstructions(wsets, A)

# time it
start = datetime.now()

print(f'{datetime.now()-start} beginning analysis...')

for i, phrase in enumerate(timephrases):
        
    # analyze all known relas
    for w in L.d(phrase, 'word'):
        analysis = G.everymatch(w)
        if analysis:
            relas[w] = analysis
        
    # report status
    if i % 500 == 0 and i:
        print(f'\t{datetime.now()-start}\tdone with iter {i}/{len(timephrases)}')
        
print(f'{datetime.now()-start}\tCOMPLETE')

0:00:00.000031 beginning analysis...
	0:00:16.336917	done with iter 500/3766
	0:00:31.616019	done with iter 1000/3766
	0:00:47.654292	done with iter 1500/3766
	0:01:03.530865	done with iter 2000/3766
	0:01:21.658287	done with iter 2500/3766
	0:01:36.076163	done with iter 3000/3766
	0:01:53.143686	done with iter 3500/3766
0:02:01.730632	COMPLETE


In [31]:
kid2mom = collections.defaultdict(list)

for word, rels in relas.items():
    for rela, data in rels.items():
        kid2mom[word].append((data['tgt'], rela))
        
mom2kids = collections.defaultdict(list)

for kid, rels in kid2mom.items():
    for mom, rela in rels:
        mom2kids[mom].append((kid, rela))
        
# change to regular dict to prevent accidental insertions during testing
kid2moms = dict(kid2mom)
mom2kids = dict(mom2kids)

### Tree Building Algorithm

In [32]:
def getmoms(root, covered=set()):
    """Retrieve all moms connected in a tree.
    
    TODO: explain
    """
    
    if root in covered:
        return None
    
    # start if mom
    if root in mom2kids:
        covered.add(root)
        yield root
    
        # find other moms
        for kid, rela in mom2kids[root]:

            # yield kid as mom
            if kid in mom2kids: 
                yield from getmoms(kid, covered)

            # yield adjacent moms
            else:
                for mom, rel in kid2moms[kid]:
                    if mom != root:
                        yield from getmoms(mom, covered)
                        
    # find mom to start
    else:
        mom = sorted(kid2moms[root])[0][0]
        yield from getmoms(mom, covered)

def sortrelas(rels, order):
    """Sort relations based on a list."""
    sort = sorted((order.index(rel),)+(node, rel, mom) for node,rel,mom in rels)
    return [(node,rel,mom) for pos, node, rel,mom in sort]

def getlinks(root):
    """Get all relation links in prioritized order."""

    # sorted list of relations based on global priority
    order = ['defi', 'card_chain', 'adjv', 'numb', 'geni', 'prep']
    links = []
    covered = set()
    
    for mom in getmoms(root, covered):
        links.extend([(kid, rela, mom) for kid, rela in mom2kids[mom]])
        
    return sortrelas(links, order)

def buildtree(links, tree=[], leaves=set()):
    """Construct a tree from a list of sorted kid/mom links.
    
    TODO: Explain
    
    TODO: Arguments:
    """
    # stop when finished
    if not links:
        return tree
    
    # get next links
    kid, rel, mom = links.pop(0)
    
    # initialize tree
    if not tree:
        tree = [('head', [mom]), (rel, [kid])]
        leaves |= {mom, kid}
        return buildtree(links, tree=tree, leaves=leaves)
        
    # populate tree
    else:
        # add kid rela
        if kid in leaves:
            kidtree = [('head', [mom]), (rel, tree)]
            leaves |= {kid, mom}
            return buildtree(links, tree=kidtree, leaves=leaves)
            
        # add head rela
        elif mom in leaves:
            momtree = [('head', tree), (rel, [kid])]
            leaves |= {kid, mom}
            return buildtree(links, tree=momtree, leaves=leaves)
        
def gettree(root):
    """Retrieves a tree when given a root.
    
    Returns tree and leaves. Leaves are useful
    for keeping track of already-covered words.
    """
    links = getlinks(root)
    tree = []
    leaves = set()
    return {'tree': buildtree(links, tree=tree, leaves=leaves), 'leaves':leaves}

# def phrasetrees(phrase):
#     """Return subphrase trees from a phrase."""
#     trees = []
#     covered = set()
    
#     for w in L.d(phrase, 'word'):
        
#         # skip covered words
#         if w in covered or w not in mom2kids or w not in kid2moms:
#             continue
            
#         treedata = gettree(w)
#         covered |= treedata['leaves']
#         trees.append(treedata['tree'])
        
#     return trees

In [36]:
tree = gettree(189681)['tree']

pprint(tree, indent=4)

[   (   'head',
        [   ('head', [189682]),
            (   'geni',
                [   ('head', [189686]),
                    (   'numb',
                        [('head', [189683]), ('card_chain', [189685])])])]),
    ('prep', [189681])]


In [34]:
A.pretty(653659)

In [35]:
A.pretty(760754)

## Testing Phrase Trees

Visualizing identified trees

In [351]:
# hebshow = '''
# <span style="font-family:Times New Roman; font-size:20pt">
#     {}
# </span>
# '''

# def showchunk():

# display(HTML(hebshow.format('יהשה')))

In [153]:
shuff = [ph for ph in timephrases 
            if len(L.d(ph, 'word')) > 5
        ]

random.shuffle(shuff)

In [165]:
for phrase in shuff[:100]:

    A.pretty(phrase, withNodes=True, extraFeatures='st')
    
    for tree in buildtrees(phrase):
        print(tree)
        print('-'*20)
    print()
    display(HTML('<hr>'))

[('head', [229711]), ('prep', [229710])]
--------------------
[('head', [229712]), ('defi', [('head', [229711]), ('prep', [229710])])]
--------------------
[('head', [229714]), ('prep', [229713])]
--------------------
[('head', [229715]), ('defi', [('head', [229714]), ('prep', [229713])])]
--------------------



[('head', [389940]), ('prep', [389939]), ('geni', [389941])]
--------------------



[('head', [99859]), ('prep', [99858]), ('geni', [('head', [99861]), ('defi', [99860])])]
--------------------
[('head', [99864]), ('prep', [99863]), ('geni', [99865])]
--------------------



[('head', [204403]), ('prep', [204402]), ('geni', [('head', [204407]), ('quant', [('head', [204404]), ('card_chain', [204406])])])]
--------------------



[('head', [3800]), ('prep', [3799]), ('card_chain', [('head', [3804]), ('quant', [('head', [3802]), ('card_chain', [3803])])])]
--------------------



[('head', [236534]), ('prep', [236533])]
--------------------
[('head', [236537]), ('prep', [236536])]
--------------------
[('head', [236538]), ('defi', [('head', [236537]), ('prep', [236536])]), ('adjv', [('head', [236540]), ('defi', [236539])])]
--------------------



[('head', [5349]), ('quant', [('head', [5347]), ('card_chain', [5348])])]
--------------------
[('head', [5352]), ('quant', [5351])]
--------------------



[('head', [48203]), ('prep', [48202])]
--------------------
[('head', [48204]), ('defi', [('head', [48203]), ('prep', [48202])])]
--------------------
[('head', [48207]), ('prep', [48206])]
--------------------
[('head', [48208]), ('defi', [('head', [48207]), ('prep', [48206])])]
--------------------



[('head', [204797]), ('prep', [204796]), ('geni', [('head', [204801]), ('quant', [('head', [204798]), ('card_chain', [204800])])])]
--------------------



[('head', [221160]), ('prep', [221159])]
--------------------
[('head', [221161]), ('defi', [('head', [221160]), ('prep', [221159])])]
--------------------
[('head', [221164]), ('prep', [221163])]
--------------------
[('head', [221165]), ('defi', [('head', [221164]), ('prep', [221163])])]
--------------------



[('head', [295004]), ('prep', [295003])]
--------------------
[('head', [295005]), ('defi', [('head', [295004]), ('prep', [295003])]), ('adjv', [('head', [295007]), ('defi', [295006])])]
--------------------
[('head', [295010]), ('prep', [295009])]
--------------------
[('head', [295011]), ('defi', [('head', [295010]), ('prep', [295009])]), ('adjv', [('head', [295013]), ('defi', [295012])])]
--------------------



[('head', [2472]), ('quant', [('head', [2469]), ('card_chain', [2471])])]
--------------------
[('head', [2475]), ('quant', [2474])]
--------------------



[('head', [55005]), ('prep', [55004])]
--------------------
[('head', [55006]), ('defi', [('head', [55005]), ('prep', [55004])])]
--------------------
[('head', [55008]), ('prep', [55007])]
--------------------
[('head', [55009]), ('defi', [('head', [55008]), ('prep', [55007])])]
--------------------



[('head', [29570]), ('prep', [29569])]
--------------------
[('head', [29571]), ('defi', [('head', [29570]), ('prep', [29569])]), ('adjv', [('head', [29573]), ('defi', [29572])]), ('quant', [('head', [29573]), ('defi', [29572])])]
--------------------
[('head', [29575]), ('defi', [29574])]
--------------------



[('head', [309109]), ('prep', [309108])]
--------------------
[('head', [309110]), ('defi', [('head', [309109]), ('prep', [309108])])]
--------------------
[('head', [309113]), ('prep', [309112])]
--------------------
[('head', [309114]), ('defi', [('head', [309113]), ('prep', [309112])])]
--------------------



[('head', [410875]), ('prep', [410874])]
--------------------
[('head', [410876]), ('defi', [('head', [410875]), ('prep', [410874])])]
--------------------
[('head', [410879]), ('prep', [410878])]
--------------------
[('head', [410880]), ('defi', [('head', [410879]), ('prep', [410878])])]
--------------------
[('head', [410883]), ('prep', [410882])]
--------------------
[('head', [410884]), ('defi', [('head', [410883]), ('prep', [410882])])]
--------------------



[('head', [4533]), ('quant', [('head', [4531]), ('card_chain', [4532])])]
--------------------
[('head', [4536]), ('quant', [4535])]
--------------------



[('head', [276598]), ('prep', [276597])]
--------------------
[('head', [276599]), ('prep', [('head', [276598]), ('prep', [276597])])]
--------------------
[('head', [276600]), ('defi', [('head', [276599]), ('prep', [('head', [276598]), ('prep', [276597])])]), ('adjv', [('head', [276602]), ('defi', [276601])])]
--------------------



[('head', [2374]), ('quant', [('head', [2371]), ('card_chain', [2373])])]
--------------------
[('head', [2377]), ('quant', [2376])]
--------------------



[('head', [406611]), ('prep', [406610]), ('geni', [('head', [406613]), ('defi', [406612]), ('adjv', [('head', [406615]), ('defi', [406614])])])]
--------------------



[('head', [76399]), ('prep', [76398]), ('geni', [76400])]
--------------------
[('head', [76403]), ('prep', [76402])]
--------------------
[('head', [76406]), ('prep', [76405])]
--------------------
[('head', [76407]), ('prep', [('head', [76406]), ('prep', [76405])])]
--------------------



[('head', [362245]), ('prep', [362244])]
--------------------
[('head', [362246]), ('defi', [('head', [362245]), ('prep', [362244])])]
--------------------
[('head', [362249]), ('prep', [362248])]
--------------------
[('head', [362250]), ('defi', [('head', [362249]), ('prep', [362248])])]
--------------------



[('head', [221153]), ('prep', [221152])]
--------------------
[('head', [221154]), ('defi', [('head', [221153]), ('prep', [221152])])]
--------------------
[('head', [221156]), ('prep', [221155])]
--------------------
[('head', [221157]), ('defi', [('head', [221156]), ('prep', [221155])])]
--------------------



[('head', [112119]), ('prep', [112118])]
--------------------
[('head', [112120]), ('prep', [('head', [112119]), ('prep', [112118])])]
--------------------
[('head', [112121]), ('defi', [('head', [112120]), ('prep', [('head', [112119]), ('prep', [112118])])]), ('adjv', [('head', [112123]), ('defi', [112122])])]
--------------------



[('head', [307154]), ('prep', [307153])]
--------------------
[('head', [307155]), ('prep', [('head', [307154]), ('prep', [307153])])]
--------------------
[('head', [307156]), ('defi', [('head', [307155]), ('prep', [('head', [307154]), ('prep', [307153])])]), ('adjv', [('head', [307158]), ('defi', [307157])])]
--------------------



[('head', [391071]), ('prep', [391070])]
--------------------
[('head', [391072]), ('defi', [('head', [391071]), ('prep', [391070])]), ('adjv', [('head', [391074]), ('defi', [391073])])]
--------------------



[('head', [66437]), ('prep', [66436])]
--------------------
[('head', [66438]), ('prep', [('head', [66437]), ('prep', [66436])])]
--------------------
[('head', [66439]), ('defi', [('head', [66438]), ('prep', [('head', [66437]), ('prep', [66436])])]), ('adjv', [('head', [66441]), ('defi', [66440])])]
--------------------



[('head', [190915]), ('prep', [190914])]
--------------------
[('head', [190916]), ('defi', [('head', [190915]), ('prep', [190914])])]
--------------------
[('head', [190919]), ('prep', [190918])]
--------------------
[('head', [190920]), ('defi', [('head', [190919]), ('prep', [190918])])]
--------------------



[('head', [2490]), ('quant', [('head', [2487]), ('card_chain', [2489])])]
--------------------
[('head', [2494]), ('quant', [('head', [2492]), ('card_chain', [2493])])]
--------------------



[('head', [9184]), ('prep', [9183])]
--------------------
[('head', [9185]), ('defi', [('head', [9184]), ('prep', [9183])]), ('adjv', [('head', [9187]), ('defi', [9186])])]
--------------------



[('head', [51870]), ('prep', [51869]), ('geni', [('head', [51872]), ('defi', [51871]), ('adjv', [('head', [51874]), ('defi', [51873])])])]
--------------------



[('head', [2218]), ('quant', [2217])]
--------------------
[('head', [2222]), ('quant', [('head', [2220]), ('card_chain', [2221])])]
--------------------



[('head', [265373]), ('prep', [265372])]
--------------------
[('head', [265374]), ('prep', [('head', [265373]), ('prep', [265372])])]
--------------------
[('head', [265375]), ('defi', [('head', [265374]), ('prep', [('head', [265373]), ('prep', [265372])])]), ('adjv', [('head', [265377]), ('defi', [265376])])]
--------------------



[('head', [154162]), ('prep', [154161]), ('geni', [('head', [154164]), ('defi', [154163]), ('adjv', [('head', [154166]), ('defi', [154165])])])]
--------------------



[('head', [7776]), ('prep', [7775])]
--------------------
[('head', [7777]), ('prep', [('head', [7776]), ('prep', [7775])])]
--------------------
[('head', [7778]), ('defi', [('head', [7777]), ('prep', [('head', [7776]), ('prep', [7775])])]), ('adjv', [('head', [7780]), ('defi', [7779])])]
--------------------



[('head', [400149]), ('prep', [400148])]
--------------------
[('head', [400150]), ('defi', [('head', [400149]), ('prep', [400148])])]
--------------------
[('head', [400153]), ('prep', [400152])]
--------------------
[('head', [400154]), ('defi', [('head', [400153]), ('prep', [400152])])]
--------------------



[('head', [385737]), ('prep', [385736]), ('geni', [385738])]
--------------------
[('head', [385741]), ('prep', [385740]), ('geni', [('head', [385742]), ('card_chain', [385744])]), ('quant', [('head', [385742]), ('card_chain', [385744])])]
--------------------



[('head', [99005]), ('prep', [99004])]
--------------------
[('head', [99006]), ('quant', [('head', [99005]), ('prep', [99004])])]
--------------------
[('head', [99007]), ('defi', [('head', [99006]), ('quant', [('head', [99005]), ('prep', [99004])])])]
--------------------
[('head', [99010]), ('prep', [99009])]
--------------------
[('head', [99011]), ('quant', [('head', [99010]), ('prep', [99009])])]
--------------------
[('head', [99012]), ('defi', [('head', [99011]), ('quant', [('head', [99010]), ('prep', [99009])])])]
--------------------



[('head', [189682]), ('prep', [189681]), ('geni', [('head', [189686]), ('quant', [('head', [189683]), ('card_chain', [189685])])])]
--------------------



[('head', [66408]), ('prep', [66407])]
--------------------
[('head', [66409]), ('defi', [('head', [66408]), ('prep', [66407])]), ('adjv', [('head', [66411]), ('defi', [66410]), ('adjv', [('head', [66413]), ('defi', [66412])])])]
--------------------



[('head', [35747]), ('prep', [35746])]
--------------------
[('head', [35748]), ('prep', [('head', [35747]), ('prep', [35746])])]
--------------------
[('head', [35749]), ('defi', [('head', [35748]), ('prep', [('head', [35747]), ('prep', [35746])])]), ('adjv', [('head', [35751]), ('defi', [35750])])]
--------------------



[('head', [92377]), ('prep', [92376]), ('geni', [('head', [92379]), ('defi', [92378]), ('adjv', [('head', [92381]), ('defi', [92380])])])]
--------------------



[('head', [289309]), ('prep', [289308])]
--------------------
[('head', [289310]), ('defi', [('head', [289309]), ('prep', [289308])])]
--------------------
[('head', [289313]), ('prep', [289312])]
--------------------
[('head', [289314]), ('defi', [('head', [289313]), ('prep', [289312])])]
--------------------



[('head', [214400]), ('prep', [214399])]
--------------------
[('head', [214404]), ('quant', [('head', [214401]), ('prep', [('head', [214400]), ('prep', [214399])]), ('card_chain', [214403])])]
--------------------



[('head', [290928]), ('prep', [290927]), ('geni', [290929])]
--------------------



[('head', [217901]), ('prep', [217900]), ('geni', [217902])]
--------------------
[('head', [217904]), ('adjv', [217905])]
--------------------



[('head', [242417]), ('prep', [242416]), ('geni', [242418])]
--------------------
[('head', [242421]), ('prep', [242420]), ('geni', [242422])]
--------------------



[('head', [35099]), ('prep', [35098]), ('geni', [('head', [35101]), ('defi', [35100]), ('card_chain', [35103])]), ('quant', [('head', [35101]), ('defi', [35100]), ('card_chain', [35103])])]
--------------------



[('head', [289445]), ('prep', [289444])]
--------------------
[('head', [289446]), ('defi', [('head', [289445]), ('prep', [289444])])]
--------------------
[('head', [289448]), ('prep', [289447])]
--------------------
[('head', [289449]), ('defi', [('head', [289448]), ('prep', [289447])])]
--------------------



[('head', [2559]), ('quant', [('head', [2556]), ('card_chain', [2558])])]
--------------------
[('head', [2563]), ('quant', [('head', [2561]), ('card_chain', [2562])])]
--------------------



[('head', [294983]), ('prep', [294982]), ('geni', [294984])]
--------------------
[('head', [294987]), ('prep', [294986])]
--------------------



[('head', [423081]), ('prep', [423080])]
--------------------
[('head', [423082]), ('defi', [('head', [423081]), ('prep', [423080])])]
--------------------
[('head', [423085]), ('defi', [423084]), ('adjv', [('head', [423087]), ('defi', [423086])])]
--------------------



[('head', [45629]), ('prep', [45628])]
--------------------
[('head', [45630]), ('defi', [('head', [45629]), ('prep', [45628])])]
--------------------
[('head', [45632]), ('prep', [45631])]
--------------------
[('head', [45633]), ('defi', [('head', [45632]), ('prep', [45631])])]
--------------------



[('head', [2343]), ('quant', [2342])]
--------------------
[('head', [2347]), ('quant', [('head', [2345]), ('card_chain', [2346])])]
--------------------



[('head', [204579]), ('prep', [204578]), ('geni', [('head', [204583]), ('quant', [('head', [204580]), ('card_chain', [204582])])])]
--------------------



[('head', [2521]), ('quant', [('head', [2518]), ('card_chain', [2520])])]
--------------------
[('head', [2524]), ('quant', [2523])]
--------------------



[('head', [262233]), ('prep', [262232])]
--------------------
[('head', [262234]), ('defi', [('head', [262233]), ('prep', [262232])]), ('adjv', [('head', [262236]), ('defi', [262235])])]
--------------------
[('head', [262239]), ('prep', [262238])]
--------------------
[('head', [262240]), ('defi', [('head', [262239]), ('prep', [262238])]), ('adjv', [('head', [262242]), ('defi', [262241])])]
--------------------



[('head', [420261]), ('prep', [420260])]
--------------------
[('head', [420262]), ('defi', [('head', [420261]), ('prep', [420260])]), ('adjv', [('head', [420264]), ('defi', [420263])])]
--------------------
[('head', [420267]), ('defi', [420266])]
--------------------



[('head', [385367]), ('prep', [385366])]
--------------------
[('head', [385368]), ('defi', [('head', [385367]), ('prep', [385366])]), ('adjv', [('head', [385370]), ('defi', [385369])])]
--------------------



[('head', [288841]), ('prep', [288840])]
--------------------
[('head', [288842]), ('defi', [('head', [288841]), ('prep', [288840])])]
--------------------
[('head', [288845]), ('prep', [288844])]
--------------------
[('head', [288846]), ('defi', [('head', [288845]), ('prep', [288844])])]
--------------------
[('head', [288849]), ('prep', [288848])]
--------------------
[('head', [288850]), ('defi', [('head', [288849]), ('prep', [288848])])]
--------------------



[('head', [289396]), ('prep', [289395])]
--------------------
[('head', [289397]), ('defi', [('head', [289396]), ('prep', [289395])])]
--------------------
[('head', [289399]), ('prep', [289398])]
--------------------
[('head', [289400]), ('defi', [('head', [289399]), ('prep', [289398])])]
--------------------



[('head', [189547]), ('prep', [189546]), ('geni', [('head', [189551]), ('quant', [('head', [189548]), ('card_chain', [189550])])])]
--------------------



[('head', [49255]), ('prep', [49254])]
--------------------
[('head', [49256]), ('defi', [('head', [49255]), ('prep', [49254])])]
--------------------
[('head', [49258]), ('prep', [49257])]
--------------------
[('head', [49259]), ('defi', [('head', [49258]), ('prep', [49257])])]
--------------------



[('head', [4522]), ('quant', [('head', [4520]), ('card_chain', [4521])])]
--------------------
[('head', [4525]), ('quant', [4524])]
--------------------



[('head', [159825]), ('prep', [159824])]
--------------------
[('head', [159826]), ('defi', [('head', [159825]), ('prep', [159824])])]
--------------------
[('head', [159829]), ('prep', [159828])]
--------------------
[('head', [159830]), ('defi', [('head', [159829]), ('prep', [159828])])]
--------------------



[('head', [82437]), ('prep', [82436])]
--------------------
[('head', [82438]), ('defi', [('head', [82437]), ('prep', [82436])]), ('adjv', [('head', [82440]), ('defi', [82439])])]
--------------------
[('head', [82443]), ('prep', [82442])]
--------------------
[('head', [82444]), ('defi', [('head', [82443]), ('prep', [82442])]), ('adjv', [('head', [82446]), ('defi', [82445])])]
--------------------



[('head', [203272]), ('prep', [203271]), ('geni', [('head', [203276]), ('quant', [('head', [203273]), ('card_chain', [203275])])])]
--------------------



[('head', [175784]), ('prep', [175783])]
--------------------
[('head', [175785]), ('defi', [('head', [175784]), ('prep', [175783])])]
--------------------
[('head', [175788]), ('prep', [175787]), ('geni', [175789])]
--------------------



[('head', [385925]), ('prep', [385924])]
--------------------
[('head', [385926]), ('defi', [('head', [385925]), ('prep', [385924])]), ('adjv', [('head', [385928]), ('defi', [385927])])]
--------------------



[('head', [89241]), ('prep', [89240])]
--------------------
[('head', [89242]), ('defi', [('head', [89241]), ('prep', [89240])]), ('adjv', [('head', [89244]), ('defi', [89243])])]
--------------------
[('head', [89247]), ('prep', [89246])]
--------------------
[('head', [89248]), ('defi', [('head', [89247]), ('prep', [89246])]), ('adjv', [('head', [89250]), ('defi', [89249])])]
--------------------



[('head', [66479]), ('prep', [66478])]
--------------------
[('head', [66480]), ('prep', [('head', [66479]), ('prep', [66478])])]
--------------------
[('head', [66481]), ('defi', [('head', [66480]), ('prep', [('head', [66479]), ('prep', [66478])])]), ('adjv', [('head', [66483]), ('defi', [66482])])]
--------------------



[('head', [128341]), ('quant', [128340]), ('geni', [128342])]
--------------------
[('head', [128345]), ('quant', [128344]), ('geni', [('head', [128347]), ('defi', [128346])])]
--------------------



[('head', [37804]), ('prep', [37803])]
--------------------
[('head', [37805]), ('defi', [('head', [37804]), ('prep', [37803])])]
--------------------
[('head', [37807]), ('prep', [37806])]
--------------------
[('head', [37808]), ('defi', [('head', [37807]), ('prep', [37806])])]
--------------------



[('head', [388951]), ('prep', [388950])]
--------------------
[('head', [388952]), ('defi', [('head', [388951]), ('prep', [388950])])]
--------------------
[('head', [388955]), ('prep', [388954]), ('geni', [388956])]
--------------------



[('head', [5242]), ('quant', [5241])]
--------------------
[('head', [5246]), ('quant', [('head', [5244]), ('card_chain', [5245])])]
--------------------



[('head', [115713]), ('prep', [115712])]
--------------------
[('head', [115714]), ('defi', [('head', [115713]), ('prep', [115712])]), ('adjv', [('head', [115716]), ('defi', [115715])])]
--------------------



[('head', [189358]), ('prep', [189357]), ('geni', [('head', [189362]), ('quant', [('head', [189359]), ('card_chain', [189361])])])]
--------------------



[('head', [175607]), ('prep', [175606])]
--------------------
[('head', [175608]), ('prep', [('head', [175607]), ('prep', [175606])])]
--------------------
[('head', [175609]), ('quant', [('head', [175608]), ('prep', [('head', [175607]), ('prep', [175606])])])]
--------------------
[('head', [175612]), ('quant', [175611])]
--------------------



[('head', [233334]), ('quant', [233333])]
--------------------
[('head', [233335]), ('defi', [('head', [233334]), ('quant', [233333])])]
--------------------
[('head', [233338]), ('quant', [233337])]
--------------------
[('head', [233339]), ('defi', [('head', [233338]), ('quant', [233337])])]
--------------------



[('head', [330611]), ('prep', [330610])]
--------------------
[('head', [330612]), ('defi', [('head', [330611]), ('prep', [330610])])]
--------------------
[('head', [330615]), ('prep', [330614])]
--------------------
[('head', [330616]), ('defi', [('head', [330615]), ('prep', [330614])])]
--------------------



[('head', [119020]), ('prep', [119019])]
--------------------
[('head', [119021]), ('prep', [('head', [119020]), ('prep', [119019])])]
--------------------
[('head', [119022]), ('defi', [('head', [119021]), ('prep', [('head', [119020]), ('prep', [119019])])]), ('adjv', [('head', [119024]), ('defi', [119023])])]
--------------------



[('head', [2260]), ('quant', [('head', [2258]), ('card_chain', [2259])])]
--------------------
[('head', [2264]), ('quant', [('head', [2262]), ('card_chain', [2263])])]
--------------------



[('head', [33011]), ('prep', [33010])]
--------------------
[('head', [33012]), ('defi', [('head', [33011]), ('prep', [33010])]), ('adjv', [('head', [33014]), ('defi', [33013])])]
--------------------



[('head', [289148]), ('prep', [289147])]
--------------------
[('head', [289149]), ('defi', [('head', [289148]), ('prep', [289147])])]
--------------------
[('head', [289152]), ('prep', [289151])]
--------------------
[('head', [289153]), ('defi', [('head', [289152]), ('prep', [289151])])]
--------------------



[('head', [413244]), ('prep', [413243])]
--------------------
[('head', [413245]), ('defi', [('head', [413244]), ('prep', [413243])])]
--------------------
[('head', [413247]), ('prep', [413246])]
--------------------
[('head', [413248]), ('defi', [('head', [413247]), ('prep', [413246])])]
--------------------



[('head', [66306]), ('prep', [66305])]
--------------------
[('head', [66307]), ('prep', [('head', [66306]), ('prep', [66305])])]
--------------------
[('head', [66308]), ('defi', [('head', [66307]), ('prep', [('head', [66306]), ('prep', [66305])])]), ('adjv', [('head', [66310]), ('defi', [66309])])]
--------------------



[('head', [306789]), ('prep', [306788])]
--------------------
[('head', [306790]), ('defi', [('head', [306789]), ('prep', [306788])])]
--------------------
[('head', [306793]), ('prep', [306792])]
--------------------
[('head', [306794]), ('defi', [('head', [306793]), ('prep', [306792])])]
--------------------



[('head', [153392]), ('quant', [153391])]
--------------------
[('head', [153393]), ('defi', [('head', [153392]), ('quant', [153391])]), ('adjv', [('head', [153395]), ('defi', [153394])])]
--------------------
[('head', [153398]), ('quant', [153397])]
--------------------
[('head', [153399]), ('defi', [('head', [153398]), ('quant', [153397])])]
--------------------



[('head', [35589]), ('prep', [35588])]
--------------------
[('head', [35590]), ('prep', [('head', [35589]), ('prep', [35588])])]
--------------------
[('head', [35591]), ('defi', [('head', [35590]), ('prep', [('head', [35589]), ('prep', [35588])])]), ('adjv', [('head', [35593]), ('defi', [35592])])]
--------------------



[('head', [317367]), ('prep', [317366])]
--------------------
[('head', [317368]), ('defi', [('head', [317367]), ('prep', [317366])])]
--------------------
[('head', [317371]), ('prep', [317370])]
--------------------
[('head', [317372]), ('defi', [('head', [317371]), ('prep', [317370])])]
--------------------



[('head', [407320]), ('prep', [407319])]
--------------------
[('head', [407321]), ('defi', [('head', [407320]), ('prep', [407319])])]
--------------------
[('head', [407324]), ('prep', [407323])]
--------------------
[('head', [407325]), ('defi', [('head', [407324]), ('prep', [407323])])]
--------------------
[('head', [407328]), ('prep', [407327]), ('geni', [407329])]
--------------------



[('head', [153980]), ('prep', [153979]), ('geni', [('head', [153982]), ('defi', [153981]), ('adjv', [('head', [153984]), ('defi', [153983])])])]
--------------------



[('head', [358208]), ('prep', [358207]), ('geni', [358209])]
--------------------
[('head', [358212]), ('prep', [358211]), ('geni', [('head', [358213]), ('geni', [358214])])]
--------------------



[('head', [180817]), ('prep', [180816])]
--------------------
[('head', [180818]), ('defi', [('head', [180817]), ('prep', [180816])]), ('adjv', [('head', [180820]), ('defi', [180819]), ('card_chain', [180821])]), ('quant', [('head', [180820]), ('defi', [180819]), ('card_chain', [180821])])]
--------------------



[('head', [368351]), ('prep', [368350])]
--------------------
[('head', [368352]), ('defi', [('head', [368351]), ('prep', [368350])]), ('adjv', [('head', [368354]), ('defi', [368353])])]
--------------------



[('head', [228900]), ('prep', [228899])]
--------------------
[('head', [228903]), ('prep', [228902])]
--------------------
[('head', [228904]), ('prep', [('head', [228903]), ('prep', [228902])])]
--------------------



[('head', [7742]), ('prep', [7741])]
--------------------
[('head', [7743]), ('prep', [('head', [7742]), ('prep', [7741])])]
--------------------
[('head', [7744]), ('defi', [('head', [7743]), ('prep', [('head', [7742]), ('prep', [7741])])]), ('adjv', [('head', [7746]), ('defi', [7745])])]
--------------------



[('head', [3606]), ('prep', [3605])]
--------------------
[('head', [3610]), ('quant', [('head', [3607]), ('prep', [('head', [3606]), ('prep', [3605])]), ('card_chain', [3609])])]
--------------------



[('head', [284967]), ('prep', [284966])]
--------------------
[('head', [284968]), ('prep', [('head', [284967]), ('prep', [284966])])]
--------------------
[('head', [284969]), ('defi', [('head', [284968]), ('prep', [('head', [284967]), ('prep', [284966])])]), ('adjv', [('head', [284971]), ('defi', [284970])])]
--------------------



[('head', [202672]), ('prep', [202671]), ('geni', [('head', [202676]), ('quant', [('head', [202673]), ('card_chain', [202675])])])]
--------------------



<hr>

# Old / Scratch Code

In [356]:
## !! CODE TO BE DISCARDED !!

# def findmom(w, relastr):
#     """Find specified mom in a tree."""
#     for rel,node in kid2mom.get(w, []):
#         if rel == relastr:
#             return node
        
# def findkid(w, relastr):
#     """Find any of specified kids in a tree."""
#     for rel,node in mom2kids.get(w, []):
#         if rel == relastr:
#             return node
    
# def findtree(mom, covered, dolinks=True):
#     """Construct a tree using primitive relations.
    
#     Arguments:
#         mom: the starting point of the analysis,
#             a node with relations stored in dict 
#             mom2kids
#         covered: a set of nodes that have already
#             been analyzed. The algorithm consumes
#             relations linearly from start of phrase
#             to end.
#         dolinks: an option to decide whether to process 
#             dependent links that are at the front of 
#             and element.
#     """
#     # link together dependent chains
#     dep_links = {
#         'card_chain': {
#             'link':'card_chain',
#             'tgt': 'quant'
#         }
#     }
    
#     covered.add(mom)
        
#     tree = [('head', [mom])] 
                
#     for rela, kid in mom2kids[mom]:
        
#         # handle cases where a fronted element is dependent
#         # on a mother connected via a chain of elements it governs
#         if rela in dep_links and dolinks:
            
#             link = dep_links[rela]['link']
#             target = dep_links[rela]['tgt']            
#             thisnode = kid
            
#             # iterate down through links until finding target
#             while thisnode:
                
#                 # stop if target found and do tree from target
#                 get_tgt = findmom(thisnode, target)
#                 if get_tgt:
#                     covered.add(get_tgt)
#                     tree = [('head', [get_tgt]), ('quant', findtree(mom, covered, dolinks=False))]
#                     break
                
#                 # continue if link continued
#                 thisnode = findkid(thisnode, link)
                
#             # stop at this kid if search was success
#             if thisnode:
#                 continue
        
#         # store simple kid/relas
#         if len(kid2mom[kid]) == 1 and kid not in mom2kids:
#             tree.append((rela, [kid]))
            
#         # retrieve kid's own tree
#         elif kid in mom2kids:                
#             kidtree = findtree(kid, covered)
#             tree.append((rela, kidtree))
            
#         # retrieve kid's mom's tree for elements not yet covered
#         else:
#             for rela2, mom2 in kid2mom[kid]:
#                 if mom2 != mom and mom2 not in covered:
#                     dephrase = findtree(mom2, covered)
#                     tree.append((rela, dephrase))
                    
#                 elif mom2 != mom:
#                     tree.append((rela, [kid]))
#     return tree

# def yieldtrees(phrase):
#     """Yield subphrase trees using findtree.
    
#     Iterates through words in phrase and calls
#     find tree on those that have dependency data
#     stored in mom2kids dict.
    
#     Yields:
#         trees as lists
#     """
    
#     # track words that have been analyzed
#     covered = set()
    
#     for w in L.d(phrase, 'word'):
        
#         # avoid circular analyses
#         if w in covered:
#             continue
            
#         # analyze from mom 
#         if w in mom2kids:
#             yield findtree(w, covered)

# def findmom(w, findfunct):
#     """Find specified mom in a tree."""
#     for rel,node in kid2mom.get(w, []):
#         if findfunct(rel,node):
#             return (rel,node)
        
# def findkid(w, findfunct):
#     """Find any of specified kids in a tree."""
#     for rel,node in mom2kids.get(w, []):
#         if findfunct(rel,node):
#             return (rel,node)
    
# # def findtree(root, tree=[]):
    
#     # get all moms w/ their relations
#     # sort relations
#     # build blocks from relations 