<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-small.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/DANS-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-small.png"/></a>
<a href="https://www.academic-bible.com/en/online-bibles/biblia-hebraica-stuttgartensia-bhs/read-the-bible-text/" target="_blank"><img align="right" src="files/images/DBG-small.png"/></a>

# Verbal valence

*Verbal valence* is a kind of signature of a verb, not unlike overloading in programming languages.
The meaning of a verb depends on the number and kind of its complements, i.e. the linguistic entities that act as arguments for the semantic function of the verb.

We will use a set of flowcharts to specify and compute the sense of a verb in specific contexts depending on the verbal valence. The flowcharts have been composed by Janet Dyk. Although they are not difficult to understand, it takes a good deal of ingenuity to apply them in all the real world situations that we encounter in our corpus.


# Authors

This notebook is being written by [Dirk Roorda](dirk.roorda@dans.knaw.nl) following the ideas of 
[Janet Dyk](j.w.dyk@vu.nl). Janet's ideas have been published in various ways, see the references below.
They can be summarized as a set of flowcharts. Each flowchart describes set of rules how to choose between
the senses of a specific verb based on the constituents in each context where it occurs.
The role of Dirk is to turn those ideas into a working program based on the ETCBC data.

# About

This is an [Jupyter](http://jupyter.org) notebook. It contains a working program to carry out the computations
that we need for making use of verbal valence patterns.
You can download this notebook and run it on your computer, provided you have
[LAF-Fabric](http://laf-fabric.readthedocs.org/en/latest/texts/welcome.html) installed.

There is not only code in this notebook, but also extensive documentation, and a description how to view
the results on 
[SHEBANQ](https://shebanq.ancient-data.org) as a set of *Notes*.
See the end of the notebook for precise links.

# Status

**Last modified: 2016-07-07**

This notebook is not yet finished. 
It turns out that the ETCBC data at present does not contain all bits and pieces that are needed to follow
the rules in Janet's flowcharts. It is difficult to find all direct objects, especially implied ones.
And there are many cases where the database encodes a phrase as a complement, where the flowchart expects it to be a direct object.

We have set up a workflow for correcting and enriching the ETCBC data. See the
[corr_enrich notebook](corr_enrich.ipynb).
There we take care that all relevant phrases get there proper *function*
labels. And we analyse those phrases and assign new properties to them, based on certain heuristics.

This flowchart notebook takes those new properties as input for determining the valencies of verbs.
 
# More about flowcharts

Here is an original flowchart by Janet, the one for NTN (*give*).

<img src="images/FlowChartNTN-orig.pdf"/>

In order to run the flowcharts, preliminary work has to be done. 
We have to 

* identify direct objects;
* divide them into principal and secundary ones if there are multiple;
* identify complements;
* divide them into locatives, indirect objects, and other complements;
* detect relativa and offer them as potential direct objects;
* detect phrases starting with MN (*from*) and offer them as potential direct objects.

These are exactly the things that we outsource to the 
[corr_enrich notebook](corr_enrich.ipynb).


# Generic flowchart

The generic flowchart rules can be read off this diagram.

<img src="images/Valence-Generic.pdf"/>

In fact, this part of the flowchart requires the most programming effort.

# Specific flowcharts

Using the generic flowchart, we state the rules for individual verbs, which can be expressed as simple
multiple choice lists. Far below in this notebook, these rules will be applied to all clauses.

As an example, this is a simplified flowchart for NTN in diagram form as we will implement it below.

<img src="images/Valence-NTN.pdf"/>

# Flowchart logic

Here is the bare logic of the flow charts for the individual verbs.

The ``senses`` data structure is a dictionary keyed by verb lexemes. 
For each verb it is keyed by *sense labels*, which is a code for the number of direct objects and the nature of complements that are present in the context.

Behind each sense label there is information about the meaning of the verb in such a context.
The meaning consists of 2 or 3 pieces of information.

The important part is the second one, the *sense template*, which consist of a gloss augmented with placeholders for the direct objecs and complements.

* **{verb}** the verb occurrence in question
* **{pdo}** principal direct object
* **{sdos}** secundary direct objects
* **{inds}** indirect objects
* **{locs}** locatives
* **{cpls}** complements, not marked as either indirect object or locative

In case there are multiple entities, the algorithm returns them chunked as phrases/clauses.

Apart from the template, there is also a *status* and an optional *account*. 

The status is ``!`` in normal cases, ``?`` in dubious cases, and ``-`` in erroneous cases.
In SHEBANQ these statuses are translated into colors of the notes (blue/orange/red).

The account contains information about the grounds of which the algorithm has arrived at its conclusions.

* **{ilc}** the outcome of the heuristic that distinguishes locatives from indirect objects

A typical case is ``NTN[`` sense ``0c``. This verbs prefers indirect objects and not locatives.
So when the context has a complement that fails to be classified beforehand as either locative or indirect object, this is the moment that we finally decide it is an indirect object after all.
But this is risky, so we give it status ``?`` and we tell the user that we have decided to change ``C`` into ``I`` for this complement.

Likewise, sense ``0l`` is not expected to occur. When we encounter it, we conclude that our heuristic for choosing between ``L`` and ``I`` has failed here, and we overrule that decision and change ``L`` to ``I``.
We tell the user that here we have encountered an error.

In [9]:
senses_spec = '''
<FH
00:!: act; take action
0i:?: act; take action for {inds} :: {inds} taken as benefactive adjunct
0l:?: act; take action at {locs} :: {locs} taken as locative adjunct
0c:?: do; make; perform; observe {cpls} :: {cpls} taken as direct object
10:!: do; make; perform; observe {pdo}
1i:?: do; make; perform; observe {pdo} for {inds} :: {inds} taken as benefactive adjunct
1l:?: do; make; perform; observe {pdo} at {locs} :: {locs} taken as locative adjunct
1c:?: make {pdo} to be {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: make {pdo} to be {sdos}

BR>
00:-: !not encountered!
0i:?: create for {inds} :: {inds} taken as benefactive adjunct
0l:?: create at {locs} :: {locs} taken as locative adjunct
0c:?: create {cpls} :: {cpls} taken as direct object
10:!: create {pdo}
1i:?: create {pdo} for {inds} :: {inds} taken as benefactive adjunct
1l:?: create {pdo} at {locs} :: {locs} taken as locative adjunct
1c:?: create {pdo} to be {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: create {pdo} to be {sdos}

CJT
00:-: !not encountered!
0i:-: !not encountered!
0l:-: !not encountered!
0c:?: install; set up; put in place {cpls} :: {cpls} taken as direct object
10:!: install; set up; put in place {pdo}
1i:?: place {pdo} for the benefit of {inds} :: {inds} taken as benefactive adjunct
1l:!: place {pdo} ... {locs}
1c:?: make {pdo} to be {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: make {pdo} to be {sdos}

DBQ
00:-: !not encountered!
0i:?: cling; cleave; adhere to {inds} :: {inds} taken as locative
0l:!: cling; cleave; adhere after/to {locs}
0c:?: cling; cleave; adhere to {cpls} :: {cpls} taken as locative
10:-: !not encountered! :: Should {verb} be hiphil? :: ?
1i:-: !not encountered! :: Should {verb} be hiphil? :: ?
1l:-: !not encountered! :: Should {verb} be hiphil? :: ?
1c:-: !not encountered! :: Should {verb} be hiphil? :: ?
2 :-: !not encountered! :: Should {verb} be hiphil? :: ?

FJM
00:!: prepare; put in place; make ready
0i:?: prepare; put in place; make ready for {inds} :: {inds} taken as benefactive adjunct
0l:!: make ready; prepare {locs} (specific meaning depending on preposition)
0c:?: prepare; put in place; institute {pdo} :: {cpls} taken as extra direct object besides {pdo}
10:!: prepare; put in place; institute {pdo}
1i:?: prepare; put in place; institute {pdo} for {inds} :: {inds} taken as benefactive adjunct
1l:!: put; place {pdo} ... {locs} (specific meaning depending on preposition)
1c:?: make {pdo} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: make {pdo} (to be (as)/to become/to do) {sdos}

NTN
00:!: (act of) producing; yielding; giving (in itself)
0i:!: produce for; yield for; give to {inds}
0l:-: !not encountered!
0c:?: produce; yield; give {cpls} :: {cpls} taken as extra direct object besides {pdo}
10:!: produce; yield; give {pdo}
1i:!: give {pdo} to {inds}
1l:!: place {pdo} ... {locs}
1c:?: make {pdo} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: make {pdo} (to be (as)/to become/to do) {sdos}

QR>
00:!: shout; call; invoke
0i:!: call; summon {inds}
0l:?: call at {locs} :: {locs} taken as locative adjunct.
0c:?: call {cpls} (content) :: {cpls} taken as direct object
10:!: call; summon {pdo} (content or addressee)
1i:!: summon {pdo} for {inds}
1l:!: call out {pdo} before {locs}
1c:?: call {pdo} (to be named) {cpls} :: {cpls} taken as extra direct object besides {pdo}
2 :!: call {pdo} (to be named) {sdos}

ZQN
00:!: be old
0i:?: be old for {inds} :: {inds} taken as benefactive adjunct
0l:?: be old in {locs} :: {locs} taken as locative adjunct
0c:?: be old ... {cpls} :: {cpls} taken as adjunct
10:-: !not encountered!
1i:-: !not encountered!
1l:-: !not encountered!
1c:-: !not encountered!
2 :-: !not encountered!
'''

# Results

See the results on SHEBANQ.

The complete set of results is in the note set 
[valence](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxlbmNl&tp=txt_tb1).
You can find it on the Notes page in SHEBANQ:

<img src="images/valnotes.png"/>

By checking the other note sets you *mute* them, so they do not show up among the lines.

In order to see a note set, click on its name. You then go to pages with all verses that have a note of this set attached. 

<img src="images/notesview.png"/>

In order to see the actual notes, click the comment cloud icons. If you click the upper left one, notes are fetched for all verses on the page.

<img src="images/withnotes.png"/>

You can also export the notes as csv, or view them in a chart.

The *valence* set has the following subsets:

* Unresolved results: [val_nb](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfbmI_&tp=txt_tb1);
* Uncertain results: [val_wrn](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfd3Ju&tp=txt_tb1);
* Erroneous results: [val_err](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfZXJy&tp=txt_tb1);
* Promotion candidates [val_prom](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfcHJvbQ__&tp=txt_tb1)

So if you follow the *valence* link you see them all, but you can also focus on the problematic cases.

And if you are logged in, you can add remarks in free text. Just start typing in one of the new note boxes.
Hint: use the keyword **val_note** for your manual notes to valence, then other users can see all relevant information about valence together.

By clicking on the status symbol you can cycle through different display styles and colors for your note.
Do not forget to save when you are done!

See also the SHEBANQ help on notes:
[general](https://shebanq.ancient-data.org/help#notes)
[notes view](https://shebanq.ancient-data.org/help#notes_style)
[working with notes](https://shebanq.ancient-data.org/help#working_with_notes)

If you have a solid contribution to make, e.g. the outcome of an algorithm, consider
[bulk uploading notes](https://shebanq.ancient-data.org/help#bulk_uploading_notes).

[]()

# References

(Janet Dyk, Reinoud Oosting and Oliver Glanz, 2014) 
Analysing Valence Patterns in Biblical Hebrew: Theoretical Questions and Analytic Frameworks.
*J. of Northwest Semitic Languages, vol. 40 (2014), no. 1, pp. 43-62*.
[pdf abstract](http://academic.sun.ac.za/jnsl/Volumes/JNSL%2040%201%20abstracts%20and%20bookreview.pdf)
[pdf fulltext (author's copy with deviant page numbering)](https://shebanq.ancient-data.org/static/docs/methods/2014_Dyk_jnsl.pdf)

(Janet Dyk 2014)
Deportation or Forgiveness in Hosea 1.6? Verb Valence Patterns and Translation Proposals.
*The Bible Translator 2014, Vol. 65(3) 235–279*.
[pdf](http://tbt.sagepub.com/content/65/3/235.full.pdf?ijkey=VK2CEHvVrvSGA5B&keytype=finite)

(Janet Dyk 014)
Traces of Valence Shift in Classical Hebrew.
In: *Discourse, Dialogue, and Debate in the Bible: Essays in Honour of Frank Polak*.
Ed. Athalya Brenner-Idan.
*Sheffield Pheonix Press, 48–65*.
[book behind pay-wall](http://www.sheffieldphoenix.com/showbook.asp?bkid=273)

# Firing up the engines

In [4]:
import sys
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.6.2
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



# Loading the feature data

In [8]:
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lcomplements', 'valence', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        function rela
        g_word_utf8 trailer_utf8
        lex prs uvf sp ls vs vt nametype det gloss
        book chapter verse label number
        function s_manual f_correction
        valence grammatical lexical semantic
    ''',
    '''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='NORMAL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s USING main  DATA COMPILED AT: 2015-11-02T15-08-56
  0.00s USING annox DATA COMPILED AT: 2016-07-07T13-59-43
  7.94s LOGFILE=/Users/dirk/laf/laf-fabric-output/etcbc4b/valence/__log__valence.txt
  7.94s INFO: LOADING PREPARED data: please wait ... 
  7.94s prep prep: G.node_sort
  8.06s prep prep: G.node_sort_inv
  8.67s prep prep: L.node_up
    13s prep prep: L.node_down
    19s prep prep: V.verses
    19s prep prep: V.books_la
    19s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    21s INFO: LOADED PREPARED data
    21s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK valence AT 2016-07-07T14-00-17


In [6]:
msg('Finding direct objects and determining the principal one')
directobjects = {}
principal_dos = {}
promoted_dos = {}
 
'''
    _promoted_direct_object
    principal_direct_object
    direct_object
    indirect_object
    copula
    copula+subject
    predication
    predication+subject
    predication+_promoted_direct_object
    predication+principal_direct_object
    predication+direct_object
    predication+indirect_object
'''

gf_direct_object = {
    '_promoted_direct_object',
    'principal_direct_object',
    'direct_object',
    'predication+_promoted_direct_object',
    'predication+principal_direct_object',
    'predication+direct_object',
}
gf_promoted_do = {
    '_promoted_direct_object',
    'predication+_promoted_direct_object',
}
gf_principal_do = {
    'principal_direct_object',
    'predication+principal_direct_object',
}

for c in F.s('clause'):    
    for p in L.d('phrase', c):
        gf = F.grammatical.v(p)
        if pf in gf_direct_objects:
            directobjects.setdefault(c, set()).add(p)
        if pf in gf_promoted_do:
            promoted_do.setdefault(c, set()).add(p)
        if pf in gf_principal_do:
            principal_do.setdefault(c, set()).add(p)

            
        elif pf == 'Cmpl':
            pwords = L.d('word', p)
            w1 = pwords[0]
            w1l = F.lex.v(w1)
            w2l = F.lex.v(pwords[1]) if len(pwords) > 1 else None
            if w1l == 'MN':
                dobjects_c.setdefault('p_MN_'+pf, set()).add(p)
                nobjects_c += 'M'
                dobjects_set_c.add(p)
            if w1l in cmpl_as_obj_preps and F.prs.v(w1) in no_prs and not (w1l == 'L' and w2l in body_parts):
                prom.append(p)
    cobjects[nobjects_c] += 1
    nprom = len(prom)
    if nprom:
        cmobjects[nprom] += 1
        promotions[c] = prom

    # find clause objects
    for ac in L.d('clause', L.u('sentence', c)):
        cr = F.rela.v(ac)
        if cr in {'Objc'} and list(C.mother.v(ac))[0] == c:
            dobjects.setdefault('c_'+cr, set()).add(p)
            nobjects += 1
            dobjects_set.add(p)
    mobjects[nobjects] += 1

    # order the objects in the natural ordering
    dobjects_order = sorted(dobjects_set, key=NK)

    # compute the principal object
    principal_object = None

    for x in [1]:
        # just one object 
        if nobjects == 1:
            theobject = list(dobjects_set)[0]
            if F.otype.v(theobject) == 'phrase': principal_object = theobject
            break
        # rule 1: suffixes
        principal_candidates = dobjects.get('p_PreO', set()) | dobjects.get('p_PtcO', set())
        if len(principal_candidates) != 0:
            principal_object = sorted(principal_candidates, key=NK)[0]
            break
        principal_candidates = dobjects.get('p_Objc', set())
        if len(principal_candidates) != 0:
            if len(principal_candidates) > 0:
                principal_object = sorted(principal_candidates, key=NK)[0]
                break
            objects_marked = set()
            objects_unmarked = set()
            for p in principal_candidates:
                if is_marked(p):
                    objects_marked.add(p)
                else:
                    objects_unmarked.add(p)
            if len(objects_marked) != 0:
                principal_object = sorted(objects_marked, key=NK)[0]
                break
            if len(objects_unmarked) != 0:
                principal_object = sorted(objects_unmarked, key=NK)[0]
                break            
    if principal_object != None:
        primdirectobjects[c] = principal_object

    if len(dobjects_set): directobjects[c] = dobjects_set
    if len(dobjects_set_c): directobjects_c[c] = dobjects_set_c

msg('Done') 

for (label, n) in sorted(mobjects.items(), key=lambda y: -y[0]):
    print('{:<40}: {:>5}'.format('Clauses with {:>2} objects'.format(label), n))
for (label, n) in sorted(cobjects.items(), key=lambda y: (len(y[0]), y)):
    print('{:<40}: {:>5}'.format('Clauses with {:>2} implied objects'.format(label), n))
for (label, n) in sorted(cmobjects.items(), key=lambda y: -y[0]):
    print('{:<40}: {:>5}'.format('Clauses with {:>2} complements as objects'.format(label), n))

print('{:<40}: {:>5}'.format('Clauses with a principal object', len(primdirectobjects)))
print('{:<40}: {:>5}'.format('Clauses with a direct object', len(directobjects)))
print('{:<40}: {:>5}'.format('Clauses with an implied object', len(directobjects_c)))
print('{:<40}: {:>5}'.format('Clauses with a complement as object', sum(cmobjects.values())))
print('{:<40}: {:>5}'.format('Total number of clauses', len(clause_verb)))


    18s Finding direct objects and determining the primary one
    19s Done


Clauses with  3 objects                 :     3
Clauses with  2 objects                 :  1126
Clauses with  1 objects                 : 26124
Clauses with  0 objects                 : 42171
Clauses with    implied objects         : 61120
Clauses with  M implied objects         :  3360
Clauses with  R implied objects         :  4574
Clauses with MM implied objects         :     3
Clauses with RM implied objects         :   366
Clauses with RMM implied objects        :     1
Clauses with  2 complements as objects  :    32
Clauses with  1 complements as objects  :  3909
Clauses with a primary object           : 27244
Clauses with a direct object            : 27253
Clauses with an implied object          :  8304
Clauses with a complement as object     :  3941
Total number of clauses                 : 69424


# Complements: Indirect object or Locative?

The ETCBC database has not feature that marks indirect objects.
We will use computation to determine whether a complement is an indirect object or a locative.
This computation is just an approximation.

## Cues for a locative complement

* ``# loc lexemes`` how many distinct lexemes with a locative meaning occur in the complement (given by a fixed list)
* ``# topo`` how many lexemes with nametype = ``topo`` occur in the complement (nametype is a feature of the lexicon)
* ``# prep_b`` how many occurrences of the preposition ``B`` occur in the complement
* ``# h_loc`` how many H-locales are carried on words in the complement
* ``body_part`` is 2 if the phrase starts with the preposition ``L`` followed by a body part, else 0
* ``locativity`` ($loc$) a crude measure of the locativity of the complement, just the sum of ``# loc lexemes``, ``#topo``, ``# prep_b``, ``# h_loc`` and ``body_part``.

## Cues for an indirect object
* ``# prep_l`` how many occurrences of the preposition ``L`` or ``>L`` with a pronominal suffix on it occur in the complement
* ``# L prop`` how many occurrences of ``L`` or ``>L`` plus proper name or person reference word occur in the complement
* ``indirect object`` ($ind$) a crude indicator of whether the complement is an indirect object, just the sum of ``# prep_l`` and ``# L prop`` 

## The decision

We take a decision as follows.
The outcome is $L$ (complement is *locative*) or $I$ (complement is *indirect object*) or $C$ (complement is neither *locative* nor *indirect object*)

(1) $ loc > 0 \wedge ind = 0 \Rightarrow L $

(2) $ loc = 0 \wedge ind > 0 \Rightarrow I $

(3) $ loc > 0 \wedge ind > 0 \wedge\ loc - 1 > ind \Rightarrow L$

(4) $ loc > 0 \wedge ind > 0 \wedge\ loc + 1 < ind \Rightarrow I$

(5) $ loc > 0 \wedge ind > 0 \wedge |ind - loc| <= 1 \Rightarrow C$

In words:

* if there are positive signals for L or I and none for the other, we choose the one for which there are positive signals;
* if there are positive signals for both L and I, we follow the majority count, but only if the difference is at least two;
* in all other cases we leave it at C: not necessarilty locative and not necessarily indirect object.

In [7]:
msg('Determinig kind of complements')

complements = collections.defaultdict(lambda: collections.defaultdict(lambda: []))
complementk = {}
kcomplements = collections.Counter()

nphrases = 0
ncomplements = 0

for c in clause_verb:
    for p in L.d('phrase', c):
        nphrases += 1
        pf = F.function.v(p)
        if pf != 'Cmpl': continue
        ncomplements += 1
        words = L.d('word', p)
        lexemes = [F.lex.v(w) for w in words]
        lexeme_set = set(lexemes)

        # measuring locativity
        lex_locativity = len(locative_lexemes & lexeme_set)
        prep_b = len([x for x in lexeme_set if x == 'B'])
        topo = len([x for x in words if F.nametype.v(x) == 'topo'])
        h_loc = len([x for x in words if F.uvf.v(x) == 'H'])
        body_part = 0
        if len(words) > 1 and F.lex.v(words[0]) == 'L' and F.lex.v(words[1]) in body_parts:
            body_part = 2
        loca = lex_locativity + topo + prep_b + h_loc + body_part

        # measuring indirect object
        prep_l = len([x for x in words if F.lex.v(x) in cmpl_as_iobj_preps and F.prs.v(x) not in no_prs])
        prep_lpr = 0
        lwn = len(words)
        for (n, wn) in enumerate(words):
            if F.lex.v(wn) in cmpl_as_iobj_preps:
                if n+1 < lwn:
                    nextw = words[n+1]
                    if F.lex.v(nextw) in personal_lexemes or F.ls.v(nextw) == 'gntl' or (
                        F.sp.v(nextw) == 'nmpr' and F.nametype.v(nextw) == 'pers'):
                        prep_lpr += 1                        
        indi = prep_l + prep_lpr

        # the verdict
        ckind = 'C'
        if loca == 0 and indi > 0: ckind = 'I'
        elif loca > 0 and indi == 0: ckind = 'L'
        elif loca > indi + 1: ckind = 'L'
        elif loca < indi - 1: ckind = 'I'
        complementk[p] = (loca, indi, ckind)
        kcomplements[ckind] += 1
        complements[c][ckind].append(p)

msg('Done')
for (label, n) in sorted(kcomplements.items(), key=lambda y: -y[1]):
    print('Phrases of kind {:<2}: {:>6}'.format(label, n))
print('Total complements : {:>6}'.format(ncomplements))
print('Total phrases     : {:>6}'.format(nphrases))

    24s Determinig kind of complements
    26s Done


Phrases of kind L :  11331
Phrases of kind C :   9671
Phrases of kind I :   7428
Total complements :  28430
Total phrases     : 212951


# Applying the flowchart

We can now apply the flowchart in a straightforward manner.

We output the results as a stand-alone comma separated file, with these columns as specified in the code below.
This file can be used to import into a spreadsheet and check results.

We also provide a comma separated file that can be imported directly into SHEBANQ as a set of notes, so that the reader can check results within SHEBANQ. This has the benefit that the full context is available, and also data view can be called up easily to inspect the coding situation for each particular instance.

In [8]:
status_rep = {
    '*': 'note',
    '!': 'good',
    '?': 'warning',
    '-': 'error',
}
stat_rep = {
    '*': 'NB',
    '!': '',
    '?': 'wrn',
    '-': 'err',
}

def reptext(label, phrases, num=False, txt=False, gl=False): 
    if phrases == None: return ''
    label_rep = '{}='.format(label) if label else ''
    phrases_rep = []
    for p in sorted(phrases, key=NK):
        ptext = '[{}|'.format(F.number.v(p) if num else '[')
        if txt:
            ptext += (''.join('{}{}'.format(
                F.g_word_utf8.v(w),
                F.trailer_utf8.v(w),
            ) for w in L.d('word',p ))).replace('\n','')
        if gl:
            wtexts = []
            for w in L.d('word',p ):
                g = F.gloss.v(w).replace('<object marker>','&')
                prs = F.prs.v(w)
                prs_g = pronominal_suffix.get(prs, (None, None))[1]
                uvf = F.uvf.v(w)
                wtext = ''
                if uvf == 'H': ptext += 'toward '
                wtext += g
                wtext += ('~'+prs_g) if prs_g != None else ''
                wtexts.append(wtext)
            ptext += ' '.join(wtexts)
        ptext += ']'
        phrases_rep.append(ptext)
    return ' '.join(phrases_rep)

def ilc_info(inds, locs, cpls):
    pinfos = []
    for p in set(inds) | set(locs) | set(cpls):
        (loca, indi, ckind) = complementk[p]
        pinfos.append('[{}| L={} I={} => {}]'.format(F.number.v(p), loca, indi, ckind))
    return ' '.join(pinfos)

def flowchart(lex, verb, dos, pdo, sdos, inds, locs, cpls):
    sense_label = None
    n_dos = len(dos)
    n_pdo = len(pdo)
    n_sdos = len(sdos)
    n_inds = len(inds)
    n_locs = len(locs)
    n_cpls = len(cpls)
    na_cpls = n_inds + n_locs + n_cpls
    ndo = ''
    kcp = ''

    if n_dos == 0: ndo = '0'
    elif n_dos == 1: ndo = '1'
    else: ndo = '2'
    
    if na_cpls == 0: kcp = '0'
    elif n_inds: kcp = 'i'
    elif n_locs: kcp = 'l'
    else: kcp = 'c'
    sense_label = ndo+kcp if ndo != '2' else '2'
    
    sinfo = senses.\
        get(lex, {lex: {'': ('-', 'no senses given for {}'.format(lex))}}).\
        get(sense_label, ('-', 'no sense {} given for {}'.format(sense_label, lex)))
    status = sinfo[0]
    sense_fmt = sinfo[1][0]
    action_fmt = sinfo[1][1] if len(sinfo[1]) >= 2 else ''
    action_stat = sinfo[1][2] if len(sinfo) >= 3 else status

    verb_rep = reptext('', verb, num=True, gl=True)
    pdo_rep = reptext('', pdo, num=True, gl=True)
    sdos_rep = reptext('', sdos, num=True, gl=True)
    inds_rep = reptext('', inds, num=True, gl=True)
    locs_rep = reptext('', locs, num=True, gl=True)
    cpls_rep = reptext('', cpls, num=True, gl=True)
    ilc_rep = ''
    if na_cpls: ilc_rep = ilc_info(inds, locs, cpls)
    
    sense_txt = sense_fmt.format(
        verb=verb_rep, pdo=pdo_rep, sdos=sdos_rep, inds=inds_rep, locs=locs_rep, cpls=cpls_rep, ilc=ilc_rep,
    )
    action_txt = action_fmt.format(
        verb=verb_rep, pdo=pdo_rep, sdos=sdos_rep, inds=inds_rep, locs=locs_rep, cpls=cpls_rep, ilc=ilc_rep,
    )

    return (sense_label, status, sense_txt, action_txt, action_stat)

fields = '''
    book
    chapter
    verse
    sentence#
    clause#
    lex
    status
    sense_label
    sense
    action_status
    action
    #do
    #pdo
    #sdos
    #inds
    #locs
    #cpls
    text
'''.strip().split()

sfields = '''
    version
    book
    chapter
    verse
    clause_atom
    is_shared
    is_published
    status
    keywords
    ntext
'''.strip().split()

fields_fmt = ('{},' * (len(fields) - 1)) + '{}\n' 
sfields_fmt = ('{}\t' * (len(sfields) - 1)) + '{}\n' 

# Running the flowchart

The next cell finally performs all the flowchart computations for all verbs in all contexts.

In [11]:
msg('Applying the flowchart')

outcome_sta = collections.Counter()
outcome_lab = collections.Counter()
outcome_sta_l = collections.defaultdict(lambda: collections.Counter())
outcome_lab_l = collections.defaultdict(lambda: collections.Counter())

OUTBASE = 'files'
of = open('{}/{}'.format(OUTBASE, 'valence_results.csv'), 'w')
ofs = open('{}/{}'.format(OUTBASE, 'valence_notes.csv'), 'w')
of.write('{}\n'.format(','.join(fields)))
ofs.write('{}\n'.format('\t'.join(sfields)))

senses = {}
senses_blocks = senses_spec.strip().split('\n\n')
for b in senses_blocks:
    lines = b.split('\n')
    verb = lines[0]+'['
    sense_parts = [l.split(':', 2) for l in lines[1:]]
    senses[verb] = dict(
        (x[0].strip(), (x[1].strip(), [y.strip() for y in x[2].strip().split('::')])) for x in sense_parts
    )

nnotes = collections.Counter()

for lex in senses:
    if lex not in verb_clause:
        msg('No verb {} in corpus'.format(lex))
        continue
    for (c,v) in verb_clause[lex]:
        if F.vs.v(v) != 'qal': continue
    
        book = F.book.v(L.u('book', v))
        chapter = F.chapter.v(L.u('chapter', v))
        verse = F.verse.v(L.u('verse', v))
        sentence_n = F.number.v(L.u('sentence', v))
        clause_n = F.number.v(c)
        clause_atom_n = F.number.v(L.u('clause_atom', v))
        
        verb = [L.u('phrase', v)]
        dos = directobjects.get(c, set())
        pdo = primdirectobjects.get(c, None)
        pdo = set() if pdo == None else {pdo}
        sdos = sorted(dos - pdo)
        dos_can = directobjects_c.get(c, set())
        dos_can_cpls = {p for p in dos_can if F.function.v(p) == 'Cmpl'}
        dos_c = set()
        pdo_c = set()
        sdos_c = set()
        if len(dos_can):
            dos_can_lst = sorted(dos_can, key=NK)
            dos_c = dos | dos_can
            pdo_c = {dos_can_lst[0]}
            sdos_c = sorted(dos | set(dos_can_lst[1:]), key=NK)

        inds = complements.get(c, {}).get('I', [])
        locs = complements.get(c, {}).get('L', [])
        cpls = complements.get(c, {}).get('C', [])

        (sense_label, status, sense_txt, action_txt, action_stat) = flowchart(
            lex, verb, dos, pdo, sdos, inds, locs, cpls,
        )
        if len(dos_c):
            inds_c = [p for p in inds if p not in dos_can_cpls]
            locs_c = [p for p in locs if p not in dos_can_cpls]
            cpls_c = [p for p in cpls if p not in dos_can_cpls]
            (sense_label_c, status_c, sense_txt_c, action_txt_c, action_stat_c) = flowchart(
                lex, verb, dos_c, pdo_c, sdos_c, inds_c, locs_c, cpls_c)
            if status == '-' and status_c != '-':
                status = status_c
                sense_label = sense_label_c
                sense_txt = sense_txt_c
                action_txt = action_txt_c
                action_stat = action_stat_c
            elif status != '-' and status_c == '-':
                pass # the values of the vars are OK
            elif status != '-' and status_c != '-':
                status = '*'
                sense_label = sense_label+'|'+sense_label_c
                sense_txt = '(A) '+sense_txt+' (B) '+sense_txt_c
                action_txt = '(A) '+action_txt+' (B) '+action_txt_c

        outcome_sta[status] += 1
        outcome_sta_l[lex][status] += 1
        outcome_lab[sense_label] += 1
        outcome_lab_l[lex][sense_label] += 1
        text = reptext('', L.d('phrase', c), num=True, txt=True)

        of.write(fields_fmt.format(
            book,
            chapter,
            verse,
            sentence_n,
            clause_n,
            lex,
            stat_rep[status],
            sense_label,
            sense_txt,
            action_stat,
            action_txt,
            len(dos),
            len(pdo),
            len(sdos),
            len(inds),
            len(locs),
            len(cpls),
            text,
        ))
        ofs.write(sfields_fmt.format(
            version,
            book,
            chapter,
            verse,
            clause_atom_n,
            'T',
            '',
            status,
            'valence'+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
            '_{sl}_ [{nm}|{vb}] {st}'.format(
                nm=F.number.v(L.u('phrase', v)),
                vb=F.g_word_utf8.v(v),
                st=sense_txt,
                sl=sense_label,
            ),
        ))
        nnotes['valence'] += 1
        if action_txt != '':
            ofs.write(sfields_fmt.format(
                version,
                book,
                chapter,
                verse,
                clause_atom_n,
                'T',
                '',
                action_stat,
                'valence'+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
                action_txt,
            ))
            nnotes['action'] += 1
            
# generate notes for the promotion candidates
            
for c in promotions:
    w1 = L.d('word', c)[0]
    book = F.book.v(L.u('book', w1))
    chapter = F.chapter.v(L.u('chapter', w1))
    verse = F.verse.v(L.u('verse', w1))
    clause_atom_n = F.number.v(L.u('clause_atom', w1))
    ps = reptext('', promotions[c], num=True, gl=True)
        
    ofs.write(sfields_fmt.format(
        version,
        book,
        chapter,
        verse,
        clause_atom_n,
        'T',
        '',
        '?',
        'valence val_prom',
        'Consider C => DO for {}'.format(ps),
    ))
    nnotes['prom'] += 1
of.close()
ofs.close()
msg('Done')

msg('Computed {} clauses with flowchart'.format(sum(outcome_sta.values())))
msg('Added notes for {} clauses with complement promotion candidates'.format(len(promotions)))
ntot = 0
for (lab, n) in sorted(nnotes.items(), key=lambda x: x[0]):
    ntot += n
    print('{:<10} notes: {}'.format(lab, n))
print('{:<10} notes: {}'.format('Total', ntot))

for lex in [''] + sorted(senses):
    print('All lexemes with flowchart specification' if lex == '' else lex)
    src_sta = outcome_sta if lex == '' else outcome_sta_l.get(lex, {})
    src_lab = outcome_lab if lex == '' else outcome_lab_l.get(lex, {})
    tot = 0
    for (x, n) in sorted(src_sta.items()):
        tot += n
        print('     Status   {:<7}: {:>4} clauses'.format(status_rep[x], n))
    print('     All status      : {:>4} clauses'.format(tot))
    tot = 0
    for (x, n) in sorted(src_lab.items()):
        tot += n
        print('     Sense    {:<7}: {:>4} clauses'.format(x, n))
    print('     All senses     : {:>4} clauses'.format(tot))
    print(' ')

 3m 21s Applying the flowchart
 3m 22s Done
 3m 22s Computed 5788 clauses with flowchart
 3m 22s Added notes for 3941 clauses with complement promotion candidates


action     notes: 2064
prom       notes: 3941
valence    notes: 5788
Total      notes: 11793
All lexemes with flowchart specification
     Status   good   : 3682 clauses
     Status   note   :  727 clauses
     Status   error  :   45 clauses
     All status      : 5788 clauses
     Sense    00     :  543 clauses
     Sense    00|10  :  265 clauses
     Sense    0c     :  231 clauses
     Sense    0c|10  :   11 clauses
     Sense    0c|1c  :   72 clauses
     Sense    0c|2   :    3 clauses
     Sense    0i     :  309 clauses
     Sense    0i|1i  :  173 clauses
     Sense    0i|2   :    2 clauses
     Sense    0l     :  179 clauses
     Sense    0l|1l  :   53 clauses
     Sense    10     : 1401 clauses
     Sense    10|2   :   55 clauses
     Sense    1c     :  639 clauses
     Sense    1c|10  :    1 clauses
     Sense    1c|2   :   32 clauses
     Sense    1i     :  717 clauses
     Sense    1i|2   :   30 clauses
     Sense    1l     :  651 clauses
     Sense    1l|2   :   21 clauses
  