<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-small.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/DANS-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-small.png"/></a>
<a href="https://www.academic-bible.com/en/online-bibles/biblia-hebraica-stuttgartensia-bhs/read-the-bible-text/" target="_blank"><img align="right" src="files/images/DBG-small.png"/></a>

# Verbal valence

*Verbal valence* is a kind of signature of a verb, not unlike overloading in programming languages.
The meaning of a verb depends on the number and kind of its complements, i.e. the linguistic entities that act as arguments for the semantic function of the verb.

We will use a set of flowcharts to specify and compute the sense of a verb in specific contexts depending on the verbal valence. The flowcharts have been composed by Janet Dyk. Although they are not difficult to understand, it takes a good deal of ingenuity to apply them in all the real world situations that we encounter in our corpus.


# Authors

This notebook is being written by [Dirk Roorda](dirk.roorda@dans.knaw.nl) following the ideas of 
[Janet Dyk](j.w.dyk@vu.nl). Janet's ideas have been published in various ways, see the references below.
They can be summarized as a set of flowcharts. Each flowchart describes set of rules how to choose between
the senses of a specific verb based on the constituents in each context where it occurs.
The role of Dirk is to turn those ideas into a working program based on the ETCBC data.

# About

This is an [Jupyter](http://jupyter.org) notebook. It contains a working program to carry out the computations
that we need for making use of verbal valence patterns.
You can download this notebook and run it on your computer, provided you have
[LAF-Fabric](http://laf-fabric.readthedocs.org/en/latest/texts/welcome.html) installed.

There is not only code in this notebook, but also extensive documentation, and a description how to view
the results on 
[SHEBANQ](https://shebanq.ancient-data.org) as a set of *Notes*.
See the end of the notebook for precise links.

# Status

**Last modified: 2016-07-07**

This notebook is not yet finished. 
It turns out that the ETCBC data at present does not contain all bits and pieces that are needed to follow
the rules in Janet's flowcharts. It is difficult to find all direct objects, especially implied ones.
And there are many cases where the database encodes a phrase as a complement, where the flowchart expects it to be a direct object.

We have set up a workflow for correcting and enriching the ETCBC data. See the
[corr_enrich notebook](corr_enrich.ipynb).
There we take care that all relevant phrases get there proper *function*
labels. And we analyse those phrases and assign new properties to them, based on certain heuristics.

This flowchart notebook takes those new properties as input for determining the valencies of verbs.
 
# More about flowcharts

Here is an original flowchart by Janet, the one for NTN (*give*).

<img src="images/FlowChartNTN-orig.pdf"/>

In order to run the flowcharts, preliminary work has to be done. 
We have to 

* identify direct objects;
* divide them into principal and secundary ones if there are multiple;
* identify complements;
* divide them into locatives, indirect objects, and other complements;
* detect relativa and offer them as potential direct objects;
* detect phrases starting with MN (*from*) and offer them as potential direct objects.

These are exactly the things that we outsource to the 
[corr_enrich notebook](corr_enrich.ipynb).


# Generic flowchart

The generic flowchart rules can be read off this diagram.

<img src="images/Valence-Generic.pdf"/>

In fact, this part of the flowchart requires the most programming effort.

# Specific flowcharts

Using the generic flowchart, we state the rules for individual verbs, which can be expressed as simple
multiple choice lists. Far below in this notebook, these rules will be applied to all clauses.

As an example, this is a simplified flowchart for NTN in diagram form as we will implement it below.

<img src="images/Valence-NTN.pdf"/>

# Flowchart logic

Here is the bare logic of the flow charts for the individual verbs.

The ``senses`` data structure is a dictionary keyed by verb lexemes. 
For each verb it is keyed by *sense labels*, which is a code for the number of direct objects and the nature of complements that are present in the context.

Behind each sense label there is information about the meaning of the verb in such a context.
The meaning consists of 2 or 3 pieces of information.

The important part is the second one, the *sense template*, which consist of a gloss augmented with placeholders for the direct objecs and complements.

* **{verb}** the verb occurrence in question
* **{dos} ** direct objects
* **{pdos}** principal direct objects
* **{sdos}** secundary direct objects
* **{inds}** indirect objects
* **{locs}** locatives
* **{cpls}** complements, not marked as either indirect object or locative

In case there are multiple entities, the algorithm returns them chunked as phrases/clauses.

Apart from the template, there is also a *status* and an optional *account*. 

The status is ``!`` in normal cases, ``?`` in dubious cases, and ``-`` in erroneous cases.
In SHEBANQ these statuses are translated into colors of the notes (blue/orange/red).

The account contains information about the grounds of which the algorithm has arrived at its conclusions.

A typical case is ``NTN[`` sense ``0c``. This verbs prefers indirect objects and not locatives.
So when the context has a complement that fails to be classified beforehand as either locative or indirect object, this is the moment that we finally decide it is an indirect object after all.
But this is risky, so we give it status ``?`` and we tell the user that we have decided to change ``C`` into ``I`` for this complement.

Likewise, sense ``0l`` is not expected to occur. When we encounter it, we conclude that our heuristic for choosing between ``L`` and ``I`` has failed here, and we overrule that decision and change ``L`` to ``I``.
We tell the user that here we have encountered an error.

In [1]:
senses_spec = '''
<FH
00:!: act; take action
0i:?: act; take action for {inds} :: {inds} taken as benefactive adjunct
0l:?: act; take action at {locs} :: {locs} taken as locative adjunct
0c:?: do; make; perform; observe {cpls} :: {cpls} taken as direct object
10:!: do; make; perform; observe {dos}
1i:?: do; make; perform; observe {dos} for {inds} :: {inds} taken as benefactive adjunct
1l:?: do; make; perform; observe {dos} at {locs} :: {locs} taken as locative adjunct
1c:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: make {pdos} to be {sdos}

BR>
00:-: !not encountered!
0i:?: create for {inds} :: {inds} taken as benefactive adjunct
0l:?: create at {locs} :: {locs} taken as locative adjunct
0c:?: create {cpls} :: {cpls} taken as direct object
10:!: create {dos}
1i:?: create {dos} for {inds} :: {inds} taken as benefactive adjunct
1l:?: create {dos} at {locs} :: {locs} taken as locative adjunct
1c:?: create {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: create {pdos} to be {sdos}

CJT
00:-: !not encountered!
0i:-: !not encountered!
0l:-: !not encountered!
0c:?: install; set up; put in place {cpls} :: {cpls} taken as direct object
10:!: install; set up; put in place {dos}
1i:?: place {dos} for the benefit of {inds} :: {inds} taken as benefactive adjunct
1l:!: place {dos} ... {locs}
1c:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: make {pdos} to be {sdos}

DBQ
00:-: !not encountered!
0i:?: cling; cleave; adhere to {inds} :: {inds} taken as locative
0l:!: cling; cleave; adhere after/to {locs}
0c:?: cling; cleave; adhere to {cpls} :: {cpls} taken as locative
10:-: !not encountered! :: Should {verb} be hiphil? :: ?
1i:-: !not encountered! :: Should {verb} be hiphil? :: ?
1l:-: !not encountered! :: Should {verb} be hiphil? :: ?
1c:-: !not encountered! :: Should {verb} be hiphil? :: ?
2 :-: !not encountered! :: Should {verb} be hiphil? :: ?

FJM
00:!: prepare; put in place; make ready
0i:?: prepare; put in place; make ready for {inds} :: {inds} taken as benefactive adjunct
0l:!: make ready; prepare {locs} (specific meaning depending on preposition)
0c:?: prepare; put in place; institute {pdos} :: {cpls} taken as extra direct object besides {pdos}
10:!: prepare; put in place; institute {dos}
1i:?: prepare; put in place; institute {dos} for {inds} :: {inds} taken as benefactive adjunct
1l:!: put; place {dos} ... {locs} (specific meaning depending on preposition)
1c:?: make {dos} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: make {pdos} (to be (as)/to become/to do) {sdos}

NTN
00:!: (act of) producing; yielding; giving (in itself)
0i:!: produce for; yield for; give to {inds}
0l:-: !not encountered!
0c:?: produce; yield; give {cpls} :: {cpls} taken as extra direct object besides {pdos}
10:!: produce; yield; give {dos}
1i:!: give {dos} to {inds}
1l:!: place {dos} ... {locs}
1c:?: make {dos} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: make {pdos} (to be (as)/to become/to do) {sdos}

QR>
00:!: shout; call; invoke
0i:!: call; summon {inds}
0l:?: call at {locs} :: {locs} taken as locative adjunct.
0c:?: call {cpls} (content) :: {cpls} taken as direct object
10:!: call; summon {dos} (content or addressee)
1i:!: summon {dos} for {inds}
1l:!: call out {dos} before {locs}
1c:?: call {dos} (to be named) {cpls} :: {cpls} taken as extra direct object besides {dos}
2 :!: call {pdos} (to be named) {sdos}

ZQN
00:!: be old
0i:?: be old for {inds} :: {inds} taken as benefactive adjunct
0l:?: be old in {locs} :: {locs} taken as locative adjunct
0c:?: be old ... {cpls} :: {cpls} taken as adjunct
10:-: !not encountered!
1i:-: !not encountered!
1l:-: !not encountered!
1c:-: !not encountered!
2 :-: !not encountered!
'''

# Results

See the results on SHEBANQ.

The complete set of results is in the note set 
[valence](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxlbmNl&tp=txt_tb1).
You can find it on the Notes page in SHEBANQ:

<img src="images/valnotes.png"/>

By checking the other note sets you *mute* them, so they do not show up among the lines.

In order to see a note set, click on its name. You then go to pages with all verses that have a note of this set attached. 

<img src="images/notesview.png"/>

In order to see the actual notes, click the comment cloud icons. If you click the upper left one, notes are fetched for all verses on the page.

<img src="images/withnotes.png"/>

You can also export the notes as csv, or view them in a chart.

The *valence* set has the following subsets:

* Unresolved results: [val_nb](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfbmI_&tp=txt_tb1);
* Uncertain results: [val_wrn](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfd3Ju&tp=txt_tb1);
* Erroneous results: [val_err](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfZXJy&tp=txt_tb1);
* Promotion candidates [val_prom](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfcHJvbQ__&tp=txt_tb1)

So if you follow the *valence* link you see them all, but you can also focus on the problematic cases.

And if you are logged in, you can add remarks in free text. Just start typing in one of the new note boxes.
Hint: use the keyword **val_note** for your manual notes to valence, then other users can see all relevant information about valence together.

By clicking on the status symbol you can cycle through different display styles and colors for your note.
Do not forget to save when you are done!

See also the SHEBANQ help on notes:
[general](https://shebanq.ancient-data.org/help#notes)
[notes view](https://shebanq.ancient-data.org/help#notes_style)
[working with notes](https://shebanq.ancient-data.org/help#working_with_notes)

If you have a solid contribution to make, e.g. the outcome of an algorithm, consider
[bulk uploading notes](https://shebanq.ancient-data.org/help#bulk_uploading_notes).

[]()

# References

(Janet Dyk, Reinoud Oosting and Oliver Glanz, 2014) 
Analysing Valence Patterns in Biblical Hebrew: Theoretical Questions and Analytic Frameworks.
*J. of Northwest Semitic Languages, vol. 40 (2014), no. 1, pp. 43-62*.
[pdf abstract](http://academic.sun.ac.za/jnsl/Volumes/JNSL%2040%201%20abstracts%20and%20bookreview.pdf)
[pdf fulltext (author's copy with deviant page numbering)](https://shebanq.ancient-data.org/static/docs/methods/2014_Dyk_jnsl.pdf)

(Janet Dyk 2014)
Deportation or Forgiveness in Hosea 1.6? Verb Valence Patterns and Translation Proposals.
*The Bible Translator 2014, Vol. 65(3) 235–279*.
[pdf](http://tbt.sagepub.com/content/65/3/235.full.pdf?ijkey=VK2CEHvVrvSGA5B&keytype=finite)

(Janet Dyk 014)
Traces of Valence Shift in Classical Hebrew.
In: *Discourse, Dialogue, and Debate in the Bible: Essays in Honour of Frank Polak*.
Ed. Athalya Brenner-Idan.
*Sheffield Pheonix Press, 48–65*.
[book behind pay-wall](http://www.sheffieldphoenix.com/showbook.asp?bkid=273)

# Firing up the engines

In [2]:
import sys, os
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.7.2
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



# Loading the feature data

In [3]:
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon,complements', 'valence', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        JanetDyk:ft.function rela
        g_word_utf8 trailer_utf8
        lex prs uvf sp ls vs vt nametype det gloss
        book chapter verse label number
        s_manual f_correction
        valence predication grammatical original lexical semantic
    ''',
    '''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: etcbc4b: UP TO DATE
  0.00s USING main: etcbc4b DATA COMPILED AT: 2015-11-02T15-08-56
  0.01s DETAIL: COMPILING a: complements: UP TO DATE
  0.01s USING annox: complements DATA COMPILED AT: 2016-08-25T12-59-29
  0.01s DETAIL: COMPILING a: lexicon: UP TO DATE
  0.01s USING annox: lexicon DATA COMPILED AT: 2016-07-08T14-32-54
  0.02s DETAIL: load main: G.node_anchor_min
  0.17s DETAIL: load main: G.node_anchor_max
  0.27s DETAIL: load main: G.node_sort
  0.38s DETAIL: load main: G.node_sort_inv
  0.88s DETAIL: load main: G.edges_from
  1.01s DETAIL: load main: G.edges_to
  1.15s DETAIL: load main: F.etcbc4_db_monads [node] 
  2.05s DETAIL: load main: F.etcbc4_db_oid [node] 
  2.91s DETAIL: load main: F.etcbc4_db_otype [node] 
  3.69s DETAIL: load main: F.etcbc4_ft_det [node] 
  3.98s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
  4.35s DETAIL: load main: F.etcbc4_ft_lex [node] 
  4.73s DETAIL: load main: F.etcbc4_ft

# Locations

In [4]:
home_dir = os.path.expanduser('~').replace('\\', '/')
base_dir = '{}/Dropbox/SYNVAR'.format(home_dir)
result_dir = '{}/results'.format(base_dir)

# Indicators

Here we specify by what features we recognize key constituents.
We use predominantly features that come from the correction/enrichment workflow.

In [8]:
# pf ... : predication feature
# gf_... : grammatical feature
# vf_... : valence feature
# sf_... : lexical feature
# of_... : original feature

pf_predicate = {
    'regular',
    'copula',
}
gf_direct_object = {
    'principal_direct_object',
    'direct_object',
}
gf_principal_do = {
    'principal_direct_object',
}
gf_indirect_object = {
    'indirect_object',
}
gf_complement = {
    '*',
}
sf_locative = {
    'location',
}
vf_locative = {
    'complement',
    'adjunct',
}

to_be = set('''
    HJH[ HWH[
'''.strip().split())

verbal_stems = set('''
    qal
'''.strip().split())

pronominal_suffix = {
    'W': ('p3-sg-m', 'him'),
    'K': ('p2-sg-m', 'you:m'),
    'J': ('p1-sg-', 'me'),
    'M': ('p3-pl-m', 'them:mm'),
    'H': ('p3-sg-f', 'her'),
    'HM': ('p3-pl-m', 'them:mm'),
    'KM': ('p2-pl-m', 'you:mm'),
    'NW': ('p1-pl-', 'us'),
    'HW': ('p3-sg-m', 'him'),
    'NJ': ('p1-sg-', 'me'),
    'K=': ('p2-sg-f', 'you:f'),
    'HN': ('p3-pl-f', 'them:ff'),
    'MW': ('p3-pl-m', 'them:mm'),
    'N': ('p3-pl-f', 'them:ff'),
    'KN': ('p2-pl-f', 'you:ff'),
}

# Making a verb-clause index

We generate an index which gives for each verb lexeme a list of clauses that have that lexeme as the main verb.
In the index we store the clause node together with the word node(s) that carries the main verb(s).

Clauses may have multiple verbs. In many cases it is 'HJH[' (or 'HWH[') plus an other verb.
In those cases, it is the other verb that is the main verb.

Yet, there are also sentences with more than one main verb.
In those cases, we treat both verbs separately as main verb of one and the same clause.

In [6]:
msg('Making the verb-clause index')
nclauses = 0
multiple = []
verb_clause = collections.defaultdict(lambda: [])
clause_verb = collections.OrderedDict()

for c in F.otype.s('clause'):
    nclauses += 1
    the_verbs = []
    for p in L.d('phrase', c):
        pf = F.predication.v(p)
        if pf in pf_predicate:
            for w in L.d('word', p):
                if F.sp.v(w) == 'verb': the_verbs.append(w)
    if len(the_verbs):
        real_verbs = []
        keep_to_be = len(the_verbs) == 1
        for v in the_verbs:
            vl = F.lex.v(v)
            if keep_to_be or (vl not in to_be): real_verbs.append(v)
        if len(real_verbs) > 1: multiple.append('{} {}:{}#{}_{} {}'.format(
            F.book.v(L.u('book', v)),
            F.chapter.v(L.u('chapter', v)),
            F.verse.v(L.u('verse', v)),
            F.number.v(L.u('sentence', v)),
            F.number.v(c),
            ' '.join(F.lex.v(x) for x in real_verbs),
        ))
        for v in real_verbs:
            vl = F.lex.v(v)
            verb_clause[vl].append((c,v))
        if len(real_verbs):
            clause_verb[c] = tuple(real_verbs)
msg('Done')
print('There are {} multiple verb clauses of total {} clauses'.format(len(multiple), nclauses))
print('\n'.join(multiple))

    11s Making the verb-clause index
    12s Done


There are 3 multiple verb clauses of total 88011 clauses
Genesis 8:5#9_1 HLK[ XSR[
Sacharia 8:10#25_1 JY>[ BW>[
Chronica_II 15:5#14_1 JY>[ BW>[


# (Indirect) Objects, Locatives

In [9]:
msg('Finding key constituents')
directobjects = {}
principal_dos = {}
secondary_dos = {}
cast_constituents = {}
indirectobjects = {}
locatives = {}
complements = {}

# go through all clauses and collect all types of direct objects
for c in F.otype.s('clause'): 
    # phrase like constituents
    directobjects[c] = set()
    principal_dos[c] = set()
    secondary_dos[c] = set()
    cast_constituents[c] = set()
    indirectobjects[c] = set()
    locatives[c] = set()
    complements[c] = set()
    for p in L.d('phrase', c):
        gf = F.grammatical.v(p)
        of = F.original.v(p)
        sf = F.semantic.v(p)
        vf = F.valence.v(p)
        if gf in gf_direct_object:
            directobjects[c].add(p)
        if gf in gf_principal_do:
            principal_dos[c].add(p)
        if gf in gf_indirect_object:
            indirectobjects[c].add(p)
        if gf in gf_complement:
            complements[c].add(p)
        if sf in sf_locative and vf in vf_locative:
            locatives[c].add(p)
        if of :
            cast_constituents[c].add(p)

    # clause like constituents: look for all Obj clauses dependent on the current clause
    for ac in L.d('clause', L.u('sentence', c)):
        cr = F.rela.v(ac)
        dep = list(C.mother.v(ac))
        if cr in {'Objc'} and len(dep) and dep[0] == c:
            directobjects.setdefault(c, set()).add(ac)

    # compute secondary objects, i.e. direct objects minus the principal one if there is one
    secondary_dos[c] = directobjects[c] - principal_dos[c]

    # order the objects in the natural ordering
    # dobjects_order = sorted(dobjects_set, key=NK)

# NB: the map directobjects has as values sets of nodes.
# These nodes can be phrases or clauses.
msg('Done') 

 1m 05s Finding key constituents
 1m 09s Done


In [10]:
testp = 647761
print('new function = {}; text={}'.format(
        F.function.v(testp), F.JanetDyk_ft_function.v(testp), T.words(L.d('word', testp),
)))
print('valence = {}; grammatical = {}; lexical = {}; semantic = {}'.format(
    F.valence.v(testp),
    F.grammatical.v(testp),
    F.lexical.v(testp),
    F.semantic.v(testp),
))
testc = 440568
print('{}: {}'.format(F.otype.v(testc), T.words(L.d('word', testc))))

print(complements[testc])

new function = Adju; text=Adju
valence = adjunct; grammatical = NA; lexical = ; semantic = 
clause: וְיָשֵׂ֥ם לְךָ֖ שָׁלֹֽום׃ ס 

set()


# Overview of quantities

In [11]:
fc = list(F.otype.s('clause'))[0]
cast_constituents[fc]

set()

In [12]:
# Counting constituents
cnt_directobjects = collections.Counter()
cnt_clauseobjects = collections.Counter()
cnt_principal_dos = collections.Counter()
cnt_secondary_dos = collections.Counter()
cnt_indirectobjects = collections.Counter()
cnt_complements = collections.Counter()
cnt_locatives = collections.Counter()
cnt_cast_constituents = collections.Counter()

for (c, xs) in directobjects.items(): 
    cnt_directobjects[len(xs)] += 1
    nco = len({x for x in xs if F.otype.v(x) == 'clause'})
    if nco != 0: cnt_clauseobjects[nco] += 1
for (c, xs) in principal_dos.items(): cnt_principal_dos[len(xs)] += 1
for (c, xs) in secondary_dos.items(): cnt_secondary_dos[len(xs)] += 1
for (c, xs) in indirectobjects.items(): cnt_indirectobjects[len(xs)] += 1
for (c, xs) in complements.items(): cnt_complements[len(xs)] += 1
for (c, xs) in locatives.items(): cnt_locatives[len(xs)] += 1
for (c, xs) in cast_constituents.items(): cnt_cast_constituents[len(xs)] += 1

for (label, cnt_map) in (
        ('direct objects', cnt_directobjects),
        ('clause objects', cnt_clauseobjects),
        ('principal objects', cnt_principal_dos),
        ('secondary objects', cnt_secondary_dos),
        ('indirect objects', cnt_indirectobjects),
        ('complements', cnt_complements),
        ('locatives', cnt_locatives),
        ('cast constituents', cnt_cast_constituents),
    ):
    for n in sorted(cnt_map):
        print('\t {:>5} clauses with {:>2} {}'.format(cnt_map[n], n, label))

	 79528 clauses with  0 direct objects
	  7288 clauses with  1 direct objects
	  1131 clauses with  2 direct objects
	    64 clauses with  3 direct objects
	  1421 clauses with  1 clause objects
	    12 clauses with  2 clause objects
	 86827 clauses with  0 principal objects
	  1184 clauses with  1 principal objects
	 79528 clauses with  0 secondary objects
	  8408 clauses with  1 secondary objects
	    75 clauses with  2 secondary objects
	 86373 clauses with  0 indirect objects
	  1635 clauses with  1 indirect objects
	     3 clauses with  2 indirect objects
	 81625 clauses with  0 complements
	  6160 clauses with  1 complements
	   219 clauses with  2 complements
	     5 clauses with  3 complements
	     2 clauses with  4 complements
	 83699 clauses with  0 locatives
	  4169 clauses with  1 locatives
	   131 clauses with  2 locatives
	    12 clauses with  3 locatives
	 86605 clauses with  0 cast constituents
	  1381 clauses with  1 cast constituents
	    25 clauses with  2 cast cons

# Applying the flowchart

We can now apply the flowchart in a straightforward manner.

We output the results as a stand-alone comma separated file, with these columns as specified in the code below.
This file can be used to import into a spreadsheet and check results.

We also provide a comma separated file that can be imported directly into SHEBANQ as a set of notes, so that the reader can check results within SHEBANQ. This has the benefit that the full context is available, and also data view can be called up easily to inspect the coding situation for each particular instance.

In [13]:
status_rep = {
    '*': 'note',
    '!': 'good',
    '?': 'warning',
    '-': 'error',
}
stat_rep = {
    '*': 'NB',
    '!': '',
    '?': 'wrn',
    '-': 'err',
}

def reptext(label, phrases, num=False, txt=False, gloss=False, textformat='ec'): 
    if phrases == None: return ''
    label_rep = '{}='.format(label) if label else ''
    phrases_rep = []
    for p in sorted(phrases, key=NK):
        ptext = '[{}|'.format(F.number.v(p) if num else '[')
        if txt:
            #ptext += (''.join('{}{}'.format(
            #    F.g_word_utf8.v(w),
            #    F.trailer_utf8.v(w),
            #) for w in L.d('word',p ))).replace('\n','')
            ptext += T.words(L.d('word', p), fmt=textformat).replace('\n', '.')
        if gloss:
            wtexts = []
            for w in L.d('word',p ):
                g = F.gloss.v(w).replace('<object marker>','&')
                prs = F.prs.v(w)
                prs_g = pronominal_suffix.get(prs, (None, None))[1]
                uvf = F.uvf.v(w)
                wtext = ''
                if uvf == 'H': ptext += 'toward '
                wtext += g
                wtext += ('~'+prs_g) if prs_g != None else ''
                wtexts.append(wtext)
            ptext += ' '.join(wtexts)
        ptext += ']'
        phrases_rep.append(ptext)
    return ' '.join(phrases_rep)

def flowchart(lex, verb, dos, pdos, sdos, inds, locs, cpls):
    sense_label = None
    n_dos = len(dos)
    n_pdos = len(pdos)
    n_sdos = len(sdos)
    n_inds = len(inds)
    n_locs = len(locs)
    n_cpls = len(cpls)
    na_cpls = n_inds + n_locs + n_cpls
    ndo = ''
    kcp = ''

    if n_dos == 0: ndo = '0'
    elif n_dos == 1: ndo = '1'
    else: ndo = '2'
    
    if na_cpls == 0: kcp = '0'
    elif n_inds: kcp = 'i'
    elif n_locs: kcp = 'l'
    else: kcp = 'c'
    sense_label = ndo+kcp if ndo != '2' else '2'
    
    sinfo = senses.\
        get(lex, {lex: {'': ('-', 'no senses given for {}'.format(lex))}}).\
        get(sense_label, ('-', 'no sense {} given for {}'.format(sense_label, lex)))
    status = sinfo[0]
    sense_fmt = sinfo[1][0]
    action_fmt = sinfo[1][1] if len(sinfo[1]) >= 2 else ''
    action_stat = sinfo[1][2] if len(sinfo) >= 3 else status

    verb_rep = reptext('', verb, num=True, gloss=True)
    dos_rep  = reptext('', dos,  num=True, gloss=True)
    pdos_rep = reptext('', pdos, num=True, gloss=True)
    sdos_rep = reptext('', sdos, num=True, gloss=True)
    inds_rep = reptext('', inds, num=True, gloss=True)
    locs_rep = reptext('', locs, num=True, gloss=True)
    cpls_rep = reptext('', cpls, num=True, gloss=True)
    
    sense_txt = sense_fmt.format(
        verb=verb_rep, dos=dos_rep, pdos=pdos_rep, sdos=sdos_rep, inds=inds_rep, locs=locs_rep, cpls=cpls_rep,
    )
    action_txt = action_fmt.format(
        verb=verb_rep, dos=dos_rep, pdos=pdos_rep, sdos=sdos_rep, inds=inds_rep, locs=locs_rep, cpls=cpls_rep,
    )

    return (sense_label, status, sense_txt, action_txt, action_stat)

fields = '''
    book
    chapter
    verse
    sentence#
    clause#
    lex
    status
    sense_label
    sense
    action_status
    action
    #dos
    #pdos
    #sdos
    #inds
    #locs
    #cpls
    text
'''.strip().split()

sfields = '''
    version
    book
    chapter
    verse
    clause_atom
    is_shared
    is_published
    status
    keywords
    ntext
'''.strip().split()

fields_fmt = ('{};' * (len(fields) - 1)) + '{}\n' 
sfields_fmt = ('{}\t' * (len(sfields) - 1)) + '{}\n' 

# Running the flowchart

The next cell finally performs all the flowchart computations for all verbs in all contexts.

In [19]:
msg('Applying the flowchart')

outcome_sta = collections.Counter()
outcome_lab = collections.Counter()
outcome_sta_l = collections.defaultdict(lambda: collections.Counter())
outcome_lab_l = collections.defaultdict(lambda: collections.Counter())

of = open('{}/{}'.format(result_dir, 'valence_results.csv'), 'w')
ofs = open('{}/{}'.format(result_dir, 'valence_notes.csv'), 'w')
of.write('{}\n'.format(';'.join(fields)))
ofs.write('{}\n'.format('\t'.join(sfields)))

note_keyword_base = 'valence'
note_keyword_base = 'test_valence'

senses = {}
senses_blocks = senses_spec.strip().split('\n\n')
for b in senses_blocks:
    lines = b.split('\n')
    verb = lines[0]+'['
    sense_parts = [l.split(':', 2) for l in lines[1:]]
    senses[verb] = dict(
        (x[0].strip(), (x[1].strip(), [y.strip() for y in x[2].strip().split('::')])) for x in sense_parts
    )

nnotes = collections.Counter()

for lex in verb_clause:
    if lex not in senses:
        msg('No flowchart definition for verb {}'.format(lex))
for lex in senses:
    if lex not in verb_clause:
        msg('No verb {} in enriched corpus'.format(lex))
        continue
    for (c,v) in verb_clause[lex]:
        if F.vs.v(v) not in verbal_stems: continue
    
        book = F.book.v(L.u('book', v))
        chapter = F.chapter.v(L.u('chapter', v))
        verse = F.verse.v(L.u('verse', v))
        sentence_n = F.number.v(L.u('sentence', v))
        clause_n = F.number.v(c)
        clause_atom_n = F.number.v(L.u('clause_atom', v))
        
        verb = [L.u('phrase', v)]
        dos = directobjects[c]
        pdos = principal_dos[c]
        sdos = secondary_dos[c]
        inds = indirectobjects[c]
        locs = locatives[c]
        cpls = complements[c]
        
        (sense_label, status, sense_txt, action_txt, action_stat) = flowchart(
            lex, verb, dos, pdos, sdos, inds, locs, cpls,
        )

        outcome_sta[status] += 1
        outcome_sta_l[lex][status] += 1
        outcome_lab[sense_label] += 1
        outcome_lab_l[lex][sense_label] += 1
        text = reptext('', L.d('phrase', c), num=True, txt=True)

        of.write(fields_fmt.format(
            book,
            chapter,
            verse,
            sentence_n,
            clause_n,
            '"'+lex+'"',
            stat_rep[status],
            '"-'+sense_label+'-"',
            '"'+sense_txt+'"',
            action_stat,
            '"'+action_txt+'"',
            len(dos),
            len(pdos),
            len(sdos),
            len(inds),
            len(locs),
            len(cpls),
            '"'+text+'"',
        ))
        ofs.write(sfields_fmt.format(
            version,
            book,
            chapter,
            verse,
            clause_atom_n,
            'T',
            '',
            status,
            note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
            '_{sl}_ [{nm}|{vb}] {st}'.format(
                nm=F.number.v(L.u('phrase', v)),
                vb=F.g_word_utf8.v(v),
                st=sense_txt,
                sl=sense_label,
            ),
        ))
        nnotes[note_keyword_base] += 1
        if action_txt != '':
            ofs.write(sfields_fmt.format(
                version,
                book,
                chapter,
                verse,
                clause_atom_n,
                'T',
                '',
                action_stat,
                note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
                action_txt,
            ))
            nnotes['action'] += 1
            
# generate notes for the promotion candidates
            
for c in cast_constituents:
    if len(cast_constituents[c]) == 0: continue
    w1 = L.d('word', c)[0]
    book = F.book.v(L.u('book', w1))
    chapter = F.chapter.v(L.u('chapter', w1))
    verse = F.verse.v(L.u('verse', w1))
    clause_atom_n = F.number.v(L.u('clause_atom', w1))
    for p in cast_constituents[c]:
        ps = reptext('', [p], num=True, gloss=True)
        ofs.write(sfields_fmt.format(
            version,
            book,
            chapter,
            verse,
            clause_atom_n,
            'T',
            '',
            '?',
            note_keyword_base+' val_cast',
            'Cast {}: {} ==> {}'.format(ps, F.original.v(p), F.grammatical.v(p)),
        ))
        nnotes['cast'] += 1
of.close()
ofs.close()
msg('Done')

msg('Computed {} clauses with flowchart'.format(sum(outcome_sta.values())))
msg('Added notes for cast constituents')
ntot = 0
for (lab, n) in sorted(nnotes.items(), key=lambda x: x[0]):
    ntot += n
    print('{:<10} notes: {}'.format(lab, n))
print('{:<10} notes: {}'.format('Total', ntot))

for lex in [''] + sorted(senses):
    print('All lexemes with flowchart specification' if lex == '' else lex)
    src_sta = outcome_sta if lex == '' else outcome_sta_l.get(lex, {})
    src_lab = outcome_lab if lex == '' else outcome_lab_l.get(lex, {})
    tot = 0
    for (x, n) in sorted(src_sta.items()):
        tot += n
        print('     Status   {:<7}: {:>4} clauses'.format(status_rep[x], n))
    print('     All status      : {:>4} clauses'.format(tot))
    tot = 0
    for (x, n) in sorted(src_lab.items()):
        tot += n
        print('     Sense    {:<7}: {:>4} clauses'.format(x, n))
    print('     All senses     : {:>4} clauses'.format(tot))
    print(' ')

15m 57s Applying the flowchart
15m 57s No flowchart definition for verb XRP[
15m 57s No flowchart definition for verb FBR[
15m 57s No flowchart definition for verb BNH[
15m 57s No flowchart definition for verb QWM[
15m 57s No flowchart definition for verb CWB=[
15m 57s No flowchart definition for verb >RH[
15m 57s No flowchart definition for verb >MR[
15m 57s No flowchart definition for verb VBL[
15m 57s No flowchart definition for verb MWT[
15m 57s No flowchart definition for verb NYX[
15m 57s No flowchart definition for verb >KL[
15m 57s No flowchart definition for verb XRB[
15m 57s No flowchart definition for verb <MD[
15m 57s No flowchart definition for verb LHV[
15m 57s No flowchart definition for verb BW>[
15m 57s No flowchart definition for verb C<H[
15m 57s No flowchart definition for verb <MS[
15m 57s No flowchart definition for verb CQL[
15m 57s No flowchart definition for verb JRD[
15m 57s No flowchart definition for verb CLM[
15m 57s No flowchart definition for verb NWX[
15

action     notes: 957
cast       notes: 1431
test_valence notes: 5725
Total      notes: 8113
All lexemes with flowchart specification
     Status   good   : 4706 clauses
     Status   error  :   62 clauses
     All status      : 5725 clauses
     Sense    00     :  748 clauses
     Sense    0c     :  138 clauses
     Sense    0i     :  361 clauses
     Sense    0l     :  217 clauses
     Sense    10     : 1676 clauses
     Sense    1c     :  323 clauses
     Sense    1i     :  540 clauses
     Sense    1l     :  689 clauses
     Sense    2      : 1033 clauses
     All senses     : 5725 clauses
 
<FH[
     Status   good   : 1946 clauses
     All status      : 2468 clauses
     Sense    00     :  609 clauses
     Sense    0c     :   34 clauses
     Sense    0i     :  100 clauses
     Sense    0l     :   73 clauses
     Sense    10     : 1084 clauses
     Sense    1c     :   63 clauses
     Sense    1i     :  151 clauses
     Sense    1l     :  101 clauses
     Sense    2      :  253 clau