<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-small.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/DANS-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-small.png"/></a>
<a href="https://www.academic-bible.com/en/online-bibles/biblia-hebraica-stuttgartensia-bhs/read-the-bible-text/" target="_blank"><img align="right" src="files/images/DBG-small.png"/></a>

# Verbal valence

*Verbal valence* is a kind of signature of a verb, not unlike overloading in programming languages.
The meaning of a verb depends on the number and kind of its complements, i.e. the linguistic entities that act as arguments for the semantic function of the verb.

We will use a set of flowcharts to specify and compute the sense of a verb in specific contexts depending on the verbal valence. The flowcharts have been composed by Janet Dyk. Although they are not difficult to understand, it takes a good deal of ingenuity to apply them in all the real world situations that we encounter in our corpus.


# Authors

This notebook is being written by [Dirk Roorda](dirk.roorda@dans.knaw.nl) following the ideas of 
[Janet Dyk](j.w.dyk@vu.nl). Janet's ideas have been published in various ways, see the references below.
They can be summarized as a set of flowcharts. Each flowchart describes set of rules how to choose between
the senses of a specific verb based on the constituents in each context where it occurs.
The role of Dirk is to turn those ideas into a working program based on the ETCBC data.

# About

This is an [Jupyter](http://jupyter.org) notebook. It contains a working program to carry out the computations
that we need for making use of verbal valence patterns.
You can download this notebook and run it on your computer, provided you have
[LAF-Fabric](http://laf-fabric.readthedocs.org/en/latest/texts/welcome.html) installed.

There is not only code in this notebook, but also extensive documentation, and a description how to view
the results on 
[SHEBANQ](https://shebanq.ancient-data.org) as a set of *Notes*.
See the end of the notebook for precise links.

# Status

**Last modified: 2016-07-07**

This notebook is not yet finished. 
It turns out that the ETCBC data at present does not contain all bits and pieces that are needed to follow
the rules in Janet's flowcharts. It is difficult to find all direct objects, especially implied ones.
And there are many cases where the database encodes a phrase as a complement, where the flowchart expects it to be a direct object.

We have set up a workflow for correcting and enriching the ETCBC data. See the
[corr_enrich notebook](corr_enrich.ipynb).
There we take care that all relevant phrases get there proper *function*
labels. And we analyse those phrases and assign new properties to them, based on certain heuristics.

This flowchart notebook takes those new properties as input for determining the valencies of verbs.
 
# More about flowcharts

Here is an original flowchart by Janet, the one for NTN (*give*).

<img src="images/FlowChartNTN-orig.pdf"/>

In order to run the flowcharts, preliminary work has to be done. 
We have to 

* identify direct objects;
* divide them into principal and secundary ones if there are multiple;
* identify complements;
* divide them into locatives, indirect objects, and other complements;
* detect relativa and offer them as potential direct objects;
* detect phrases starting with MN (*from*) and offer them as potential direct objects.

These are exactly the things that we outsource to the 
[corr_enrich notebook](corr_enrich.ipynb).


# Generic flowchart

The generic flowchart rules can be read off this diagram.

<img src="images/Valence-Generic.pdf"/>

In fact, this part of the flowchart requires the most programming effort.

# Specific flowcharts

Using the generic flowchart, we state the rules for individual verbs, which can be expressed as simple
multiple choice lists. Far below in this notebook, these rules will be applied to all clauses.

As an example, this is a simplified flowchart for NTN in diagram form as we will implement it below.

<img src="images/Valence-NTN.pdf"/>

# Flowchart logic

Here is the bare logic of the flow charts for the individual verbs.

The ``senses`` data structure is a dictionary keyed by verb lexemes. 
For each verb it is keyed by *sense labels*, which is a code for the presence and nature of direct objects and  complements that are present in the context.

These are the possible sense labels.

The **object** column may contain:

* `n` phrase[type=NP] or phrase[type=PP] and starts with >T
* `l` phrase starting with L (but not indirect object or benefactive)
* `k` phrase starting with K
* `i` clause starting with L and having an infinitive as predicate, but not coded as `rela=Objc`
* `-` no object present

The **complement** column may contain:

* `i` indirect object or adjunct benefactive
* `p` locative, fine distinction within these cases dependent on preposition
* `c` other complement
* `-` no complement present
* `.` presence of complement not relevant

object|complement
------|----------
`-`|`-`
`-`|`c`
`-`|`i`
`-`|`p`
`d`|`-`
`d`|`c`
`d`|`i`
`d`|`p`
`n`|`.`
`l`|`.`
`k`|`.`
`i`|`.`

Behind each sense label there is information about the meaning of the verb in such a context.
The meaning consists of 2 or 3 pieces of information.

The important part is the second one, the *sense template*, which consist of a gloss augmented with placeholders for the direct objecs and complements.

* `{verb}` the verb occurrence in question
* `{pdos}` principal direct objects (phrase)
* `{kdos}` K-objects (phrase)
* `{ldos}` L-objects (phrase)
* `{ndos}` direct objects (phrase) (none of the above)
* `{idos}` infinitive construct (clause) objects
* `{cdos}` direct objects (clause) (none of the above)
* `{inds}` indirect objects
* `{locs}` locatives
* `{cpls}` complements, not marked as either indirect object or locative

In case there are multiple entities, the algorithm returns them chunked as phrases/clauses.

Apart from the template, there is also a *status* and an optional *account*. 

The status is ``!`` in normal cases, ``?`` in dubious cases, and ``-`` in erroneous cases.
In SHEBANQ these statuses are translated into colors of the notes (blue/orange/red).

The account contains information about the grounds of which the algorithm has arrived at its conclusions.

A typical case is ``NTN[`` sense ``0c``. This verbs prefers indirect objects and not locatives.
So when the context has a complement that fails to be classified beforehand as either locative or indirect object, this is the moment that we finally decide it is an indirect object after all.
But this is risky, so we give it status ``?`` and we tell the user that we have decided to change ``C`` into ``I`` for this complement.

Likewise, sense ``0l`` is not expected to occur. When we encounter it, we conclude that our heuristic for choosing between ``L`` and ``I`` has failed here, and we overrule that decision and change ``L`` to ``I``.
We tell the user that here we have encountered an error.

In [1]:
senses_spec = '''
<FH
--:!: act; take action
-i:?: act; take action for {inds} :: {inds} taken as benefactive adjunct
-p:?: act; take action at {locs} :: {locs} taken as locative adjunct
-c:?: do; make; perform; observe {cpls} :: {cpls} taken as direct object
d-:!: do; make; perform; observe {dos}
di:?: do; make; perform; observe {dos} for {inds} :: {inds} taken as benefactive adjunct
dp:?: do; make; perform; observe {dos} at {locs} :: {locs} taken as locative adjunct
dc:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:-: !not encountered!

BR>
--:-: !not encountered!
-i:?: create for {inds} :: {inds} taken as benefactive adjunct
-p:?: create at {locs} :: {locs} taken as locative adjunct
-c:?: create {cpls} :: {cpls} taken as direct object
d-:!: create {dos}
di:?: create {dos} for {inds} :: {inds} taken as benefactive adjunct
dp:?: create {dos} at {locs} :: {locs} taken as locative adjunct
dc:?: create {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: create {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:-: !not encountered!
k.:-: !not encountered!
i.:-: !not encountered!

CJT
--:-: !not encountered!
-i:-: !not encountered!
-p:-: !not encountered!
-c:?: install; set up; put in place {cpls} :: {cpls} taken as direct object
d-:!: install; set up; put in place {dos}
di:?: place {dos} for the benefit of {inds} :: {inds} taken as benefactive adjunct
dp:!: place {dos} ... {locs}
dc:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!

DBQ
--:-: !not encountered!
-i:?: cling; cleave; adhere to {inds} :: {inds} taken as locative
-p:!: cling; cleave; adhere after/to {locs}
-c:?: cling; cleave; adhere to {cpls} :: {cpls} taken as locative
d-:-: !not encountered! :: Should {verb} be hiphil? :: ?
di:-: !not encountered! :: Should {verb} be hiphil? :: ?
dp:-: !not encountered! :: Should {verb} be hiphil? :: ?
dc:-: !not encountered! :: Should {verb} be hiphil? :: ?
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!

FJM
--:!: prepare; put in place; make ready
-i:?: prepare; put in place; make ready for {inds} :: {inds} taken as benefactive adjunct
-p:!: make ready; prepare {locs} (specific meaning depending on preposition)
-c:?: prepare; put in place; institute {pdos} :: {cpls} taken as extra direct object besides {pdos}
d-:!: prepare; put in place; institute {dos}
di:?: prepare; put in place; institute {dos} for {inds} :: {inds} taken as benefactive adjunct
dp:!: put; place {dos} ... {locs} (specific meaning depending on preposition)
dc:?: make {dos} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: be determined to do {idos}

NTN
--:!: (act of) producing; yielding; giving (in itself)
-i:!: produce for; yield for; give to {inds}
-p:-: !not encountered!
-c:?: produce; yield; give {cpls} :: {cpls} taken as extra direct object besides {pdos}
d-:!: produce; yield; give {dos}
di:!: give {dos} to {inds}
dp:!: place {dos} ... {locs}
dc:?: make {dos} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:!: allow {pdos} to do {idos}

QR>
--:!: shout; call; invoke
-i:!: call; summon {inds}
-p:?: call at {locs} :: {locs} taken as locative adjunct.
-c:?: call {cpls} (content) :: {cpls} taken as direct object
d-:!: call; summon {dos} (content or addressee)
di:!: summon {dos} for {inds}
dp:!: call out {dos} before {locs}
dc:?: call {dos} (to be named) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: call {pdos} (to be named) {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: call {pdos} (to be named) {ldos}
k.:!: call {pdos} (to be named) according to {kdos}
i.:?: !specific significance!

ZQN
--:!: be old
-i:?: be old for {inds} :: {inds} taken as benefactive adjunct
-p:?: be old in {locs} :: {locs} taken as locative adjunct
-c:?: be old ... {cpls} :: {cpls} taken as adjunct
d-:-: !not encountered!
di:-: !not encountered!
dp:-: !not encountered!
dc:-: !not encountered!
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!
'''

# Results

See the results on SHEBANQ.

The complete set of results is in the note set 
[valence](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxlbmNl&tp=txt_tb1).
You can find it on the Notes page in SHEBANQ:

<img src="images/valnotes.png"/>

By checking the other note sets you *mute* them, so they do not show up among the lines.

In order to see a note set, click on its name. You then go to pages with all verses that have a note of this set attached. 

<img src="images/notesview.png"/>

In order to see the actual notes, click the comment cloud icons. If you click the upper left one, notes are fetched for all verses on the page.

<img src="images/withnotes.png"/>

You can also export the notes as csv, or view them in a chart.

The *valence* set has the following subsets:

* Unresolved results: [val_nb](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfbmI_&tp=txt_tb1);
* Uncertain results: [val_wrn](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfd3Ju&tp=txt_tb1);
* Erroneous results: [val_err](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfZXJy&tp=txt_tb1);
* Promotion candidates [val_prom](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfcHJvbQ__&tp=txt_tb1)

So if you follow the *valence* link you see them all, but you can also focus on the problematic cases.

And if you are logged in, you can add remarks in free text. Just start typing in one of the new note boxes.
Hint: use the keyword **val_note** for your manual notes to valence, then other users can see all relevant information about valence together.

By clicking on the status symbol you can cycle through different display styles and colors for your note.
Do not forget to save when you are done!

See also the SHEBANQ help on notes:
[general](https://shebanq.ancient-data.org/help#notes)
[notes view](https://shebanq.ancient-data.org/help#notes_style)
[working with notes](https://shebanq.ancient-data.org/help#working_with_notes)

If you have a solid contribution to make, e.g. the outcome of an algorithm, consider
[bulk uploading notes](https://shebanq.ancient-data.org/help#bulk_uploading_notes).

[]()

# References

(Janet Dyk, Reinoud Oosting and Oliver Glanz, 2014) 
Analysing Valence Patterns in Biblical Hebrew: Theoretical Questions and Analytic Frameworks.
*J. of Northwest Semitic Languages, vol. 40 (2014), no. 1, pp. 43-62*.
[pdf abstract](http://academic.sun.ac.za/jnsl/Volumes/JNSL%2040%201%20abstracts%20and%20bookreview.pdf)
[pdf fulltext (author's copy with deviant page numbering)](https://shebanq.ancient-data.org/static/docs/methods/2014_Dyk_jnsl.pdf)

(Janet Dyk 2014)
Deportation or Forgiveness in Hosea 1.6? Verb Valence Patterns and Translation Proposals.
*The Bible Translator 2014, Vol. 65(3) 235–279*.
[pdf](http://tbt.sagepub.com/content/65/3/235.full.pdf?ijkey=VK2CEHvVrvSGA5B&keytype=finite)

(Janet Dyk 014)
Traces of Valence Shift in Classical Hebrew.
In: *Discourse, Dialogue, and Debate in the Bible: Essays in Honour of Frank Polak*.
Ed. Athalya Brenner-Idan.
*Sheffield Pheonix Press, 48–65*.
[book behind pay-wall](http://www.sheffieldphoenix.com/showbook.asp?bkid=273)

# Firing up the engines

In [2]:
import sys, os
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.8.3
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



# Loading the feature data

In [3]:
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon,complements', 'valence', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        JanetDyk:ft.function rela typ
        g_word_utf8 trailer_utf8
        lex prs uvf sp ls vs vt nametype det gloss
        book chapter verse label number
        s_manual f_correction
        valence predication grammatical original lexical semantic
    ''',
    '''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: etcbc4b: UP TO DATE
  0.01s USING main: etcbc4b DATA COMPILED AT: 2015-11-02T15-08-56
  0.01s DETAIL: COMPILING a: complements: UP TO DATE
  0.01s USING annox: complements DATA COMPILED AT: 2016-11-11T18-07-56
  0.01s DETAIL: COMPILING a: lexicon: UP TO DATE
  0.01s USING annox: lexicon DATA COMPILED AT: 2016-07-08T14-32-54
  0.03s DETAIL: load main: G.node_anchor_min
  0.09s DETAIL: load main: G.node_anchor_max
  0.16s DETAIL: load main: G.node_sort
  0.21s DETAIL: load main: G.node_sort_inv
  0.61s DETAIL: load main: G.edges_from
  0.67s DETAIL: load main: G.edges_to
  0.73s DETAIL: load main: F.etcbc4_db_monads [node] 
  1.40s DETAIL: load main: F.etcbc4_db_oid [node] 
  2.06s DETAIL: load main: F.etcbc4_db_otype [node] 
  2.67s DETAIL: load main: F.etcbc4_ft_det [node] 
  2.87s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
  3.15s DETAIL: load main: F.etcbc4_ft_lex [node] 
  3.31s DETAIL: load main: F.etcbc4_ft

# Locations

In [4]:
home_dir = os.path.expanduser('~').replace('\\', '/')
base_dir = '{}/Dropbox/SYNVAR'.format(home_dir)
result_dir = '{}/results'.format(base_dir)

# Indicators

Here we specify by what features we recognize key constituents.
We use predominantly features that come from the correction/enrichment workflow.

In [5]:
# pf ... : predication feature
# gf_... : grammatical feature
# vf_... : valence feature
# sf_... : lexical feature
# of_... : original feature

pf_predicate = {
    'regular',
}
gf_direct_object = {
    'principal_direct_object',
    'NP_direct_object',
    'direct_object',
    'L_object',
    'K_object',
    'infinitve_object,'
}
gf_indirect_object = {
    'indirect_object',
}
gf_complement = {
    '*',
}
sf_locative = {
    'location',
}
vf_locative = {
    'complement',
    'adjunct',
}

verbal_stems = set('''
    qal
'''.strip().split())

pronominal_suffix = {
    'W': ('p3-sg-m', 'him'),
    'K': ('p2-sg-m', 'you:m'),
    'J': ('p1-sg-', 'me'),
    'M': ('p3-pl-m', 'them:mm'),
    'H': ('p3-sg-f', 'her'),
    'HM': ('p3-pl-m', 'them:mm'),
    'KM': ('p2-pl-m', 'you:mm'),
    'NW': ('p1-pl-', 'us'),
    'HW': ('p3-sg-m', 'him'),
    'NJ': ('p1-sg-', 'me'),
    'K=': ('p2-sg-f', 'you:f'),
    'HN': ('p3-pl-f', 'them:ff'),
    'MW': ('p3-pl-m', 'them:mm'),
    'N': ('p3-pl-f', 'them:ff'),
    'KN': ('p2-pl-f', 'you:ff'),
}

# Compiling the senses

In [6]:
slabels = '''
--
-c
-i
-p
d-
dc
di
dp
n.
c.
l.
k.
i.
'''.strip().split()

senses = {}
senses_blocks = senses_spec.strip().split('\n\n')
for b in senses_blocks:
    lines = b.split('\n')
    verb = lines[0]
    sense_parts = [l.split(':', 2) for l in lines[1:]]
    senses[verb] = dict(
        (x[0].strip(), (x[1].strip(), [y.strip() for y in x[2].strip().split('::')])) for x in sense_parts
    )
    for slabel in slabels:
        if slabel not in senses[verb]:
            msg('{:<6}: Missing sense label: {}'.format(verb, slabel))
    for slabel in sorted(senses[verb]):
        if slabel not in slabels:
            msg('{:<6}: Unknown sense label: {}'.format(verb, slabel))
inf('Senses for {} verbs:\n\t{}'.format(len(senses), '\n\t'.join(sorted(senses))))

  0.08s Senses for 8 verbs:
	<FH
	BR>
	CJT
	DBQ
	FJM
	NTN
	QR>
	ZQN


# Making a verb-clause index

We generate an index which gives for each verb lexeme a list of clauses that have that lexeme as the main verb.
In the index we store the clause node together with the word node(s) that carries the main verb(s).

Clauses may have multiple verbs. In many cases it is a copula plus an other verb.
In those cases, we are interested in the other verb, so we exclude copulas.

Yet, there are also sentences with more than one main verb.
In those cases, we treat both verbs separately as main verb of one and the same clause.

In [7]:
msg('Making the verb-clause index')
occs = collections.defaultdict(list)   # dictionary of all verb occurrence nodes per verb lexeme
verb_clause = collections.defaultdict(list)    # dictionary of all verb occurrence nodes per clause node
clause_verb = collections.OrderedDict() # idem but for the occurrences of selected verbs

for w in F.otype.s('word'):
    if F.sp.v(w) != 'verb': continue
    lex = F.lex.v(w).rstrip('[')
    if lex not in senses: continue   
    pf = F.predication.v(L.u('phrase', w))
    if pf in pf_predicate:
        cn = L.u('clause', w)
        clause_verb.setdefault(cn, []).append(w)
        verb_clause[lex].append((cn, w))
msg('Done ({} clauses with a flowchart verb)'.format(len(clause_verb)))

  0.11s Making the verb-clause index
  2.04s Done (6044 clauses with a flowchart verb)


# (Indirect) Objects, Locatives

In [8]:
msg('Finding key constituents')
constituents = collections.defaultdict(lambda: collections.defaultdict(set))
ckinds = '''
    dos pdos ndos kdos ldos idos cdos inds locs cpls
'''.strip().split()

# go through all relevant clauses and collect all types of direct objects
for c in clause_verb:
    these_constituents = collections.defaultdict(set)
    # phrase like constituents
    for p in L.d('phrase', c):
        gf = F.grammatical.v(p)
        of = F.original.v(p)
        sf = F.semantic.v(p)
        vf = F.valence.v(p)
        ckind = None
        if gf in gf_direct_object:
            if gf =='principal_direct_object':
                ckind = 'pdos'
            elif gf == 'NP_direct_object':
                ckind = 'ndos'
            elif gf == 'L_object':
                ckind = 'ldos'
            elif gf == 'K_object':
                ckind = 'kdos'
            else:
                ckind = 'dos'
        elif gf in gf_indirect_object:
            ckind = 'inds'
        elif gf in gf_complement:
            ckind = 'cpls'
        elif sf in sf_locative and vf in vf_locative:
            ckind = 'locs'
        if ckind: these_constituents[ckind].add(p)

    # clause like constituents: only look for object clauses dependent on this clause
    for ac in L.d('clause', L.u('sentence', c)):
        dep = list(C.mother.v(ac))
        if len(dep) and dep[0] == c:
            gf = F.grammatical.v(ac)
            ckind = None
            if gf in gf_direct_object:
                if gf == 'direct_object':
                    ckind = 'cdos'
                elif gf == 'infinitive_object':
                    ckind = 'idos'
            if ckind: these_constituents[ckind].add(ac)
    
    for ckind in these_constituents:
        constituents[c][ckind] |= these_constituents[ckind]

msg('Done') 

  2.09s Finding key constituents
  2.25s Done


In [9]:
testp = 647761
testc = 440568

def showcase(n):
    otype = F.otype.v(n)
    att1 = F.function.v(n) if otype == 'phrase' else F.rela.v(n)
    att2 = F.typ.v(n)
    print('''{}={} ({}-{}) {}{}'''.format(
        n, otype, att1, att2,
        T.words(L.d('word', n), fmt='ec'), 
        T.text(
            book=F.book.v(L.u('book', n)), 
            chapter=int(F.chapter.v(L.u('chapter', n))),
            verse=int(F.verse.v(L.u('verse', n))), 
            fmt='ec', lang='la',
        ),
    ))
    print('valence = {}; grammatical = {}; lexical = {}; semantic = {}\n'.format(
        F.valence.v(n),
        F.grammatical.v(n),
        F.lexical.v(n),
        F.semantic.v(n),
    ))

for n in (testp, testc): showcase(n)

647761=phrase (Adju-PP) LK Numeri 6:26	J#> JHWH PNJW >LJK WJ#M LK #LWM00

valence = adjunct; grammatical = NA; lexical = ; semantic = benefactive

440568=clause (NA-WYq0) WJ#M LK #LWM00
Numeri 6:26	J#> JHWH PNJW >LJK WJ#M LK #LWM00

valence = None; grammatical = None; lexical = None; semantic = None



# Overview of quantities

In [10]:
# Counting constituents

constituents_count = collections.defaultdict(collections.Counter)

for c in constituents:
    for ckind in ckinds:
        n = len(constituents[c][ckind])
        constituents_count[ckind][n] += 1

for ckind in ckinds:
    total = 0
    for (count, n) in sorted(constituents_count[ckind].items(), key=lambda y: -y[0]):
        if count: total += n
        inf('{:>5} clauses with {:>2} {:<10} constituents'.format(n, count, ckind), withtime=False)
    inf('{:>5} clauses with {:>2} {:<10} constituents'.format(total, 'a', ckind), withtime=False)
inf('{:>5} clauses with {:>2} flowchart verb'.format(len(clause_verb), 'a'), withtime=False)

    2 clauses with  2 dos        constituents
 3450 clauses with  1 dos        constituents
 1445 clauses with  0 dos        constituents
 3452 clauses with  a dos        constituents
  544 clauses with  1 pdos       constituents
 4353 clauses with  0 pdos       constituents
  544 clauses with  a pdos       constituents
  338 clauses with  1 ndos       constituents
 4559 clauses with  0 ndos       constituents
  338 clauses with  a ndos       constituents
   33 clauses with  1 kdos       constituents
 4864 clauses with  0 kdos       constituents
   33 clauses with  a kdos       constituents
   20 clauses with  2 ldos       constituents
  622 clauses with  1 ldos       constituents
 4255 clauses with  0 ldos       constituents
  642 clauses with  a ldos       constituents
 4897 clauses with  0 idos       constituents
    0 clauses with  a idos       constituents
    1 clauses with  2 cdos       constituents
  153 clauses with  1 cdos       constituents
 4743 clauses with  0 cdos       c

# Applying the flowchart

We can now apply the flowchart in a straightforward manner.

We output the results as a stand-alone comma separated file, with these columns as specified in the code below.
This file can be used to import into a spreadsheet and check results.

We also provide a comma separated file that can be imported directly into SHEBANQ as a set of notes, so that the reader can check results within SHEBANQ. This has the benefit that the full context is available, and also data view can be called up easily to inspect the coding situation for each particular instance.

In [11]:
status_rep = {
    '*': 'note',
    '!': 'good',
    '?': 'warning',
    '-': 'error',
}
stat_rep = {
    '*': 'NB',
    '!': '',
    '?': 'wrn',
    '-': 'err',
}

def reptext(label, phrases, num=False, txt=False, gloss=False, textformat='ec'): 
    if phrases == None: return ''
    label_rep = '{}='.format(label) if label else ''
    phrases_rep = []
    for p in sorted(phrases, key=NK):
        ptext = '[{}|'.format(F.number.v(p) if num else '[')
        if txt:
            ptext += T.words(L.d('word', p), fmt=textformat).replace('\n', '.')
        if gloss:
            wtexts = []
            for w in L.d('word',p ):
                g = F.gloss.v(w).replace('<object marker>','&')
                prs = F.prs.v(w)
                prs_g = pronominal_suffix.get(prs, (None, None))[1]
                uvf = F.uvf.v(w)
                wtext = ''
                if uvf == 'H': ptext += 'toward '
                wtext += g
                wtext += ('~'+prs_g) if prs_g != None else ''
                wtexts.append(wtext)
            ptext += ' '.join(wtexts)
        ptext += ']'
        phrases_rep.append(ptext)
    return ' '.join(phrases_rep)

debug_messages = collections.defaultdict(lambda: collections.defaultdict(list))

def flowchart(v, lex, verb, consts):
    sense_label = None
    n_ = collections.defaultdict(lambda: 0)
    for ckind in ckinds: n_[ckind] = len(consts[ckind])
    char1 = None
    char2 = None
    # determine char 1 of the sense label
    if n_['ndos'] > 0: char1 = 'n'
    elif n_['cdos'] > 0: char1 = 'c'
    elif n_['ldos'] > 0: char1 = 'l'
    elif n_['kdos'] > 0: char1 = 'k'
    elif n_['idos'] > 0: char1 = 'i'
    elif n_['pdos'] > 0:
        # in trouble: if there is a principal direct object, there should be an other object as well
        # and the other one should be an NP, object clause, L_object, K_object, or I_object
        # If this happens, it is probably the result of manual correction
        # We warn, and remedy
        msg_rep = '; '.join('{} {}'.format(n_[ckind], ckind) for ckind in ckinds)
        if n_['dos'] > 0:
            # there is an other object (dos should only be used if there is a single object)
            # we'll put the dos in the ndos (which was empty)
            # This could be caused by a manual enrichment sheet that has been generated 
            # before the concept of NP_direct_object had been introduced
            char1 = 'n'
            consts['ndos'] = consts['dos']
            debug_messages[lex]['pdos with dos'].append('{}: {}'.format(T.passage(v), msg_rep))
        else:
            # there is not another object, we treat this as a single object, so as a dos
            char1 = 'd'
            consts['dos'] = consts['pdos']
            debug_messages[lex]['lonely pdos'].append('{}: {}'.format(T.passage(v), msg_rep))
    elif n_['dos'] > 0:
        char1 = 'd'

    else:
        char1 = '-'
    # determine char 2 of the sense label
    if char1 in 'nclki':
        char2 = '.'
    else:
        if n_['inds'] > 0:
            char2 = 'i'
        elif n_['cpls'] > 0:
            char2 = 'c'
        elif n_['locs'] > 0:
            char2 = 'p'
        else:
            char2 = '-'

    sense_label = char1+char2
    
    sinfo = senses.\
        get(lex, {lex: {'': ('-', 'no senses given for {}'.format(lex))}}).\
        get(sense_label, ('-', 'no sense {} given for {}'.format(sense_label, lex)))
    status = sinfo[0]
    sense_fmt = sinfo[1][0]
    action_fmt = sinfo[1][1] if len(sinfo[1]) >= 2 else ''
    action_stat = sinfo[1][2] if len(sinfo) >= 3 else status

    verb_rep = reptext('', verb, num=True, gloss=True)
    consts_rep = dict((ckind, reptext('', consts[ckind], num=True, gloss=True)) for ckind in consts)
        
    sense_txt = sense_fmt.format(verb=verb_rep, **consts_rep)
    action_txt = action_fmt.format(verb=verb_rep, **consts_rep)

    return (sense_label, status, sense_txt, action_txt, action_stat)

In [12]:
fields = ('''
    book
    chapter
    verse
    sentence#
    clause#
    lex
    status
    sense_label
    sense
    action_status
    action
    '''+(''.join('\n\t#{}'.format(ckind) for ckind in ckinds))+'''
    text
''').strip().split()

sfields = '''
    version
    book
    chapter
    verse
    clause_atom
    is_shared
    is_published
    status
    keywords
    ntext
'''.strip().split()

fields_fmt = ('{};' * (len(fields) - 1)) + '{}\n' 
sfields_fmt = ('{}\t' * (len(sfields) - 1)) + '{}\n' 

# Running the flowchart

The next cell finally performs all the flowchart computations for all verbs in all contexts.

In [13]:
msg('Applying the flowchart')

outcome_sta = collections.Counter()
outcome_lab = collections.Counter()
outcome_sta_l = collections.defaultdict(lambda: collections.Counter())
outcome_lab_l = collections.defaultdict(lambda: collections.Counter())

of = open('{}/{}'.format(result_dir, 'valence_results.csv'), 'w')
ofs = open('{}/{}'.format(result_dir, 'valence_notes.csv'), 'w')
of.write('{}\n'.format(';'.join(fields)))
ofs.write('{}\n'.format('\t'.join(sfields)))

note_keyword_base = 'valence'

nnotes = collections.Counter()

for lex in verb_clause:
    if lex not in senses:
        msg('No flowchart definition for verb {}'.format(lex))
for lex in senses:
    if lex not in verb_clause:
        msg('No verb {} in enriched corpus'.format(lex))
        continue
    for (c,v) in verb_clause[lex]:
        if F.vs.v(v) not in verbal_stems: continue
    
        book = F.book.v(L.u('book', v))
        chapter = F.chapter.v(L.u('chapter', v))
        verse = F.verse.v(L.u('verse', v))
        sentence_n = F.number.v(L.u('sentence', v))
        clause_n = F.number.v(c)
        clause_atom_n = F.number.v(L.u('clause_atom', v))
        
        verb = [L.u('phrase', v)]
        consts = constituents[c]
        n_ = collections.defaultdict(lambda: 0)
        for ckind in ckinds: n_[ckind] = len(consts[ckind])
        
        (sense_label, status, sense_txt, action_txt, action_stat) = flowchart(v, lex, verb, consts)
        
        outcome_sta[status] += 1
        outcome_sta_l[lex][status] += 1
        outcome_lab[sense_label] += 1
        outcome_lab_l[lex][sense_label] += 1
        text = reptext('', L.d('phrase', c), num=True, txt=True)

        of.write(fields_fmt.format(
            book,
            chapter,
            verse,
            sentence_n,
            clause_n,
            '"'+lex+'"',
            stat_rep[status],
            '"-'+sense_label+'-"',
            '"'+sense_txt+'"',
            action_stat,
            '"'+action_txt+'"',
            *(n_[ckind] for ckind in ckinds),
            '"'+text+'"',
        ))
        ofs.write(sfields_fmt.format(
            version,
            book,
            chapter,
            verse,
            clause_atom_n,
            'T',
            '',
            status,
            note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
            '_{sl}_ [{nm}|{vb}] {st}'.format(
                nm=F.number.v(L.u('phrase', v)),
                vb=F.g_word_utf8.v(v),
                st=sense_txt,
                sl=sense_label,
            ),
        ))
        nnotes[note_keyword_base] += 1
        if action_txt != '':
            ofs.write(sfields_fmt.format(
                version,
                book,
                chapter,
                verse,
                clause_atom_n,
                'T',
                '',
                action_stat,
                note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
                action_txt,
            ))
            nnotes['action'] += 1
            
of.close()
ofs.close()
msg('Done')

show_limit = 20
for lex in debug_messages:
    msg(lex, withtime=False)
    for kind in debug_messages[lex]:
        msg('\t{}'.format(kind), withtime=False)
        messages = debug_messages[lex][kind]
        lm = len(messages)
        msg('\t\t{}{}'.format(
            '\n\t\t'.join(messages[0:show_limit]),
            '' if lm <= show_limit else '\n\t\tAND {} more'.format(lm-show_limit),
        ), withtime=False)

inf('Computed {} clauses with flowchart'.format(sum(outcome_sta.values())), withtime=False)
ntot = 0
for (lab, n) in sorted(nnotes.items(), key=lambda x: x[0]):
    ntot += n
    print('{:<10} notes: {}'.format(lab, n))
print('{:<10} notes: {}'.format('Total', ntot))

for lex in [''] + sorted(senses):
    print('All lexemes with flowchart specification' if lex == '' else lex)
    src_sta = outcome_sta if lex == '' else outcome_sta_l.get(lex, collections.defaultdict(lambda: 0))
    src_lab = outcome_lab if lex == '' else outcome_lab_l.get(lex, collections.defaultdict(lambda: 0))
    tot = 0
    for (x, n) in src_sta.items():
        tot += n
        print('     Status   {:<7}: {:>4} clauses'.format(status_rep[x], n))
    print('     All status      : {:>4} clauses'.format(tot))
    tot = 0
    for x in slabels:
        n = src_lab[x]
        tot += n
        print('     Sense    {:<7}: {:>4} clauses'.format(x, n))
    print('     All senses      : {:>4} clauses'.format(tot))
    print(' ')

  2.90s Applying the flowchart
  3.66s No verb DBQ in enriched corpus
  3.66s No verb ZQN in enriched corpus
  4.38s Done
FJM
	lonely pdos
		Genesis 4:15: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 0 cpls
		Genesis 30:41: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 2 cpls
		Genesis 45:7: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 1 cpls
		Exodus 1:11: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 1 cpls
		Exodus 9:5: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 0 cpls
		Deuteronomy 12:5: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 1 cpls
		Deuteronomy 22:17: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 0 cpls
		1_Samuel 8:5: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 0 cpls
		1_Samuel 8:12: 0 dos; 1 pdos; 0 ndos; 0 kdos; 0 ldos; 0 idos; 0 cdos; 0 inds; 0 locs; 0 cpls
		2_Sam

Computed 5727 clauses with flowchart
action     notes: 1197
valence    notes: 5727
Total      notes: 6924
All lexemes with flowchart specification
     Status   error  :  159 clauses
     Status   good   : 4371 clauses
     All status      : 5727 clauses
     Sense    --     : 1044 clauses
     Sense    -c     :  249 clauses
     Sense    -i     :  259 clauses
     Sense    -p     :   51 clauses
     Sense    d-     : 1587 clauses
     Sense    dc     :  816 clauses
     Sense    di     :  380 clauses
     Sense    dp     :   74 clauses
     Sense    n.     :  491 clauses
     Sense    c.     :  153 clauses
     Sense    l.     :  590 clauses
     Sense    k.     :   33 clauses
     Sense    i.     :    0 clauses
     All senses      : 5727 clauses
 
<FH
     Status   error  :  119 clauses
     Status   good   : 2230 clauses
     All status      : 2468 clauses
     Sense    --     :  907 clauses
     Sense    -c     :    6 clauses
     Sense    -i     :    0 clauses
     Sense    -p   