<img align="right" src="images/tf-small.png"/>

# Verbal valence

*Verbal valence* is a kind of signature of a verb, not unlike overloading in programming languages.
The meaning of a verb depends on the number and kind of its complements, i.e. the linguistic entities that act as arguments for the semantic function of the verb.

We will use a set of flowcharts to specify and compute the sense of a verb in specific contexts depending on the verbal valence. The flowcharts have been composed by Janet Dyk. Although they are not difficult to understand, it takes a good deal of ingenuity to apply them in all the real world situations that we encounter in our corpus.


# Authors

This notebook is being written by [Dirk Roorda](dirk.roorda@dans.knaw.nl) following the ideas of 
[Janet Dyk](j.w.dyk@vu.nl). Janet's ideas have been published in various ways, see the references below.
They can be summarized as a set of flowcharts. Each flowchart describes set of rules how to choose between
the senses of a specific verb based on the constituents in each context where it occurs.
The role of Dirk is to turn those ideas into a working program based on the ETCBC data.

# About

This is an [Jupyter](http://jupyter.org) notebook. It contains a working program to carry out the computations
that we need for making use of verbal valence patterns.
You can download this notebook and run it on your computer, provided you have
[LAF-Fabric](http://laf-fabric.readthedocs.org/en/latest/texts/welcome.html) installed.

There is not only code in this notebook, but also extensive documentation, and a description how to view
the results on 
[SHEBANQ](https://shebanq.ancient-data.org) as a set of *Notes*.
See the end of the notebook for precise links.

# Status

**Last modified: 2016-11-15**

This notebook is not yet finished. 
It turns out that the ETCBC data at present does not contain all bits and pieces that are needed to follow
the rules in Janet's flowcharts. It is difficult to find all direct objects, especially implied ones.
And there are many cases where the database encodes a phrase as a complement, where the flowchart expects it to be a direct object.

We have set up a workflow for correcting and enriching the ETCBC data. See the
[corr_enrich notebook](corrEnrich.ipynb).
There we take care that all relevant phrases get there proper *function*
labels. And we analyse those phrases and assign new properties to them, based on certain heuristics.

This flowchart notebook takes those new properties as input for determining the valencies of verbs.
 
# More about flowcharts

Here is an original flowchart by Janet, the one for NTN (*give*).

<img src="images/FlowChartNTN-orig.pdf"/>

In order to run the flowcharts, preliminary work has to be done. 
We have to 

* identify direct objects;
* divide them into principal and secundary ones if there are multiple;
* identify complements;
* divide them into locatives, indirect objects, and other complements;
* detect relativa and offer them as potential direct objects;
* detect phrases starting with MN (*from*) and offer them as potential direct objects.

These are exactly the things that we outsource to the 
[corr_enrich notebook](corr_enrich.ipynb).


# Generic flowchart

The generic flowchart rules can be read off this diagram.

<img src="images/Valence/Valence.001.jpeg"/>

In fact, this part of the flowchart requires the most programming effort.

# Specific flowcharts

Using the generic flowchart, we state the rules for individual verbs, which can be expressed as simple
multiple choice lists. Far below in this notebook, these rules will be applied to all clauses.

As an example, this is a simplified flowchart for NTN in diagram form as we will implement it below.

<img src="images/Valence/Valence.002.jpeg"/>

# Flowchart logic

Here is the bare logic of the flow charts for the individual verbs.

The ``senses`` data structure is a dictionary keyed by verb lexemes. 
For each verb it is keyed by *sense labels*, which is a code for the presence and nature of direct objects and  complements that are present in the context.

These are the possible sense labels.

The **object** column may contain:

* in case there is a principal direct object and other objects or object-like constituents,
  the first letter indicates that kind of non-principal object
  * `n` phrase[type=NP] or phrase[type=PP] and starts with >T
  * `l` phrase starting with L (but not indirect object or benefactive)
  * `k` phrase starting with K
  * `i` clause starting with L and having an infinitive as predicate, 
    but not coded as `rela=Objc`
* in case there is a single direct object:
  * `d` (maybe in the shape of a phrase or a clause)
* in case of no objects:
  * `-`

The **complement** column may contain:

* `i` indirect object
* `b` adjunct benefactive
* `p` locative, there is a fine distinction within these cases dependent on preposition
* `c` other complement
* `-` no complement present
* `.` presence of complement not relevant

object|complement
------|----------
`-`|`-`
`-`|`c`
`-`|`i`
`-`|`b`
`-`|`p`
`d`|`-`
`d`|`c`
`d`|`i`
`d`|`b`
`d`|`p`
`n`|`.`
`l`|`.`
`k`|`.`
`i`|`.`

Behind each sense label there is information about the meaning of the verb in such a context.
The meaning consists of 2 or 3 pieces of information.

The important part is the second one, the *sense template*, which consist of a gloss augmented with placeholders for the direct objecs and complements.

* `{verb}` the verb occurrence in question
* `{pdos}` principal direct objects (phrase)
* `{kdos}` K-objects (phrase)
* `{ldos}` L-objects (phrase)
* `{ndos}` direct objects (phrase) (none of the above)
* `{idos}` infinitive construct (clause) objects
* `{cdos}` direct objects (clause) (none of the above)
* `{inds}` indirect objects
* `{bens}` benefactive adjuncts
* `{locs}` locatives
* `{cpls}` complements, not marked as either indirect object or locative

In case there are multiple entities, the algorithm returns them chunked as phrases/clauses.

Apart from the template, there is also a *status* and an optional *account*. 

The status is ``!`` in normal cases, ``?`` in dubious cases, and ``-`` in erroneous cases.
In SHEBANQ these statuses are translated into colors of the notes (blue/orange/red).

The account contains information about the grounds of which the algorithm has arrived at its conclusions.

In [1]:
senses_spec = '''
<FH
--:!: act; take action
-i:?: act; take action for {inds} :: {inds} taken as benefactive adjunct
-b:!: act; take action (for the benefit of) {bens}
-p:?: act; take action at {locs} :: {locs} taken as locative adjunct
-c:?: do; make; perform; observe {cpls} :: {cpls} taken as direct object
d-:!: do; make; perform; observe {dos}
di:?: do; make; perform; observe {dos} for {inds} :: {inds} taken as benefactive adjunct
db:!: do; make; perform; observe {dos} (for the benefit of) {bens}
dp:?: do; make; perform; observe {dos} at {locs} :: {locs} taken as locative adjunct
dc:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:-: !not encountered!

BR>
--:-: !not encountered!
-i:?: create for {inds} :: {inds} taken as benefactive adjunct
-b:!: create (for the benefit of) {bens}
-p:?: create at {locs} :: {locs} taken as locative adjunct
-c:?: create {cpls} :: {cpls} taken as direct object
d-:!: create {dos}
di:?: create {dos} for {inds} :: {inds} taken as benefactive adjunct
db:!: create {dos} (for the benefit of) {bens}
dp:?: create {dos} at {locs} :: {locs} taken as locative adjunct
dc:?: create {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: create {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:-: !not encountered!
k.:-: !not encountered!
i.:-: !not encountered!

CJT
--:-: !not encountered!
-i:-: !not encountered!
-b:!: install (for the benefit of) {bens}
-p:-: !not encountered!
-c:?: install; set up; put in place {cpls} :: {cpls} taken as direct object
d-:!: install; set up; put in place {dos}
di:?: place {dos} for the benefit of {inds} :: {inds} taken as benefactive adjunct
db:!: place {dos} (for the benefit of) {bens}
dp:!: place {dos} {locs}
dc:?: make {dos} to be {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!

DBQ
--:-: !not encountered!
-i:?: cling; cleave; adhere to {inds} :: {inds} taken as locative
-b:-: !not encountered!
-p:!: cling; cleave; adhere after/to {locs}
-c:?: cling; cleave; adhere to {cpls} :: {cpls} taken as locative
d-:-: !not encountered! :: Should {verb} be hiphil? :: ?
di:-: !not encountered! :: Should {verb} be hiphil? :: ?
db:-: !not encountered! :: Should {verb} be hiphil? :: ?
dp:-: !not encountered! :: Should {verb} be hiphil? :: ?
dc:-: !not encountered! :: Should {verb} be hiphil? :: ?
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!

FJM
--:!: prepare; put in place; institute
-i:?: prepare; put in place; for {inds} :: {inds} taken as benefactive adjunct
-b:!: prepare; put in place; (for the benefit of) {bens}
-p:!: place {locs}
-c:?: prepare; put in place; institute {pdos} :: {cpls} taken as extra direct object besides {pdos}
d-:!: prepare; put in place; institute {dos}
di:?: prepare; put in place; institute {dos} for {inds} :: {inds} taken as benefactive adjunct
db:!: prepare; put in place; institute {dos} (for the benefit of) {bens}
dp:!: put; place {dos} {locs}
dc:?: make {dos} (to be) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: set {pdos} to {cdos}
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: be determined to do {idos}

NTN
--:!: (act of) producing; yielding; giving (in itself)
-i:!: produce for; yield for; give to {inds}
-b:-: !not encountered!
-p:-: !not encountered!
-c:?: produce; yield; give {cpls} :: {cpls} taken as extra direct object besides {pdos}
d-:!: produce; yield; give {dos}
di:!: give {dos} to {inds}
db:-: !not encountered!
dp:!: place {dos} {locs}
dc:?: make {dos} (to be (as)/to become/to do) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:!: allow {pdos} to do {idos}

QR>
--:!: shout; call; invoke
-i:!: call; summon {inds}
-b:-: !not encountered!
-p:?: call at {locs} :: {locs} taken as locative adjunct.
-c:?: call {cpls} (content) :: {cpls} taken as direct object
d-:!: call; summon {dos} (content or addressee)
di:!: summon {dos} for {inds}
db:-: !not encountered!
dp:!: call out {dos} before {locs}
dc:?: call {dos} (to be named) {cpls} :: {cpls} taken as extra direct object besides {dos}
n.:!: call {pdos} (to be named) {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: call {pdos} (to be named) {ldos}
k.:!: call {pdos} (to be named) according to {kdos}
i.:?: !specific significance!

ZQN
--:!: be old
-i:?: be old for {inds} :: {inds} taken as benefactive adjunct
-b:!: be old (for the benefit of) {bens}
-p:?: be old in {locs} :: {locs} taken as locative adjunct
-c:?: be old ... {cpls} :: {cpls} taken as adjunct
d-:-: !not encountered!
di:-: !not encountered!
di:-: !not encountered!
db:-: !not encountered!
dc:-: !not encountered!
n.:!: make {pdos} to be {ndos}
c.:-: !not defined principal={pdos}, secundary(clause)={cdos}!
l.:!: make {pdos} to become {ldos}
k.:!: make {pdos} to be as {kdos}
i.:?: !specific significance!
'''

# Results

See the results on SHEBANQ.

The complete set of results is in the note set 
[valence](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxlbmNl&tp=txt_tb1).
You can find it on the Notes page in SHEBANQ:

<img src="images/valnotes.png"/>

By checking the other note sets you *mute* them, so they do not show up among the lines.

In order to see a note set, click on its name. You then go to pages with all verses that have a note of this set attached. 

<img src="images/notesview.png"/>

In order to see the actual notes, click the comment cloud icons. If you click the upper left one, notes are fetched for all verses on the page.

<img src="images/withnotes.png"/>

You can also export the notes as csv, or view them in a chart.

The *valence* set has the following subsets:

* Unresolved results: [val_nb](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfbmI_&tp=txt_tb1);
* Uncertain results: [val_wrn](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfd3Ju&tp=txt_tb1);
* Erroneous results: [val_err](https://shebanq.ancient-data.org/hebrew/note?version=4b&id=Mnx2YWxfZXJy&tp=txt_tb1).

So if you follow the *valence* link you see them all, but you can also focus on the problematic cases.

And if you are logged in, you can add remarks in free text. Just start typing in one of the new note boxes.
Hint: use the keyword **val_note** for your manual notes to valence, then other users can see all relevant information about valence together.

By clicking on the status symbol you can cycle through different display styles and colors for your note.
Do not forget to save when you are done!

See also the SHEBANQ help on notes:
[general](https://shebanq.ancient-data.org/help#notes)
[notes view](https://shebanq.ancient-data.org/help#notes_style)
[working with notes](https://shebanq.ancient-data.org/help#working_with_notes)

If you have a solid contribution to make, e.g. the outcome of an algorithm, consider
[bulk uploading notes](https://shebanq.ancient-data.org/help#bulk_uploading_notes).

[]()

# References

(Janet Dyk, Reinoud Oosting and Oliver Glanz, 2014) 
Analysing Valence Patterns in Biblical Hebrew: Theoretical Questions and Analytic Frameworks.
*J. of Northwest Semitic Languages, vol. 40 (2014), no. 1, pp. 43-62*.
[pdf abstract](http://academic.sun.ac.za/jnsl/Volumes/JNSL%2040%201%20abstracts%20and%20bookreview.pdf)
[pdf fulltext (author's copy with deviant page numbering)](https://shebanq.ancient-data.org/static/docs/methods/2014_Dyk_jnsl.pdf)

(Janet Dyk 2014)
Deportation or Forgiveness in Hosea 1.6? Verb Valence Patterns and Translation Proposals.
*The Bible Translator 2014, Vol. 65(3) 235–279*.
[pdf](http://tbt.sagepub.com/content/65/3/235.full.pdf?ijkey=VK2CEHvVrvSGA5B&keytype=finite)

(Janet Dyk 014)
Traces of Valence Shift in Classical Hebrew.
In: *Discourse, Dialogue, and Debate in the Bible: Essays in Honour of Frank Polak*.
Ed. Athalya Brenner-Idan.
*Sheffield Pheonix Press, 48–65*.
[book behind pay-wall](http://www.sheffieldphoenix.com/showbook.asp?bkid=273)

# Firing up the engines

In [2]:
import sys, os
import collections
from copy import deepcopy

from tf.fabric import Fabric

In [3]:
source = 'etcbc'
version = '4b'

In [4]:
ETCBC = f'hebrew/{source}{version}'
VALENCE = f'tf/{version}'
TF = Fabric(locations=['~/github/text-fabric-data-legacy', '~/github/valence'], modules=[ETCBC, VALENCE])

This is Text-Fabric 2.3.10
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
120 features found and 0 ignored


# Loading the feature data

In [5]:
api = TF.load('''
    function rela typ
    g_word_utf8 trailer_utf8
    lex prs uvf sp pdp ls vs vt nametype gloss
    book chapter verse label number
    s_manual f_correction
    valence predication grammatical original lexical semantic
    mother
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.02s B book                 from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.02s B chapter              from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.02s B verse                from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.22s B g_word_utf8          from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.19s B trailer_utf8         from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.14s B function             from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.28s B rela                 from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.26s B typ                  from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.15s B lex                  from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.16s B prs                  from /Users/dirk/github/

# Locations

In [6]:
home_dir = os.path.expanduser('~').replace('\\', '/')
base_dir = '{}/github/workflow'.format(home_dir)
result_dir = '{}/results'.format(base_dir)
result_v_dir = '{}/by_verb'.format(result_dir)
os.makedirs(result_v_dir, exist_ok=True)

# Indicators

Here we specify by what features we recognize key constituents.
We use predominantly features that come from the correction/enrichment workflow.

In [7]:
# pf ... : predication feature
# gf_... : grammatical feature
# vf_... : valence feature
# sf_... : lexical feature
# of_... : original feature

pf_predicate = {
    'regular',
}
gf_direct_object = {
    'principal_direct_object',
    'NP_direct_object',
    'direct_object',
    'L_object',
    'K_object',
    'infinitive_object',
}
gf_indirect_object = {
    'indirect_object',
}
gf_complement = {
    '*',
}
sf_locative = {
    'location',
}
sf_benefactive ={
    'benefactive',
}
vf_locative = {
    'complement',
    'adjunct',
}

verbal_stems = set('''
    qal
'''.strip().split())

# Pronominal suffixes
We collect the information to determine how to render pronominal suffixes on words. 
On verbs, they must be rendered *accusatively*, like `see him`.
But on nouns, they must be rendered *genitively*, like `hand my`.
So we make an inventory of part of speech types and the pronominal suffixes that occur on them.
On that basis we make the translation dictionaries `pronominal suffix` and `switch_prs`.

Finally, we define a function `get_prs_info` that for each word delivers the pronominal suffix info and gloss,
if there is any, and else `(None, None)`.

In [8]:
prss = collections.defaultdict(lambda: collections.defaultdict(lambda: 0))
for w in F.otype.s('word'):
    prss[F.sp.v(w)][F.prs.v(w)] += 1
for sp in sorted(prss):
    for prs in sorted(prss[sp]):
        print('{:<5} {:<3} : {:>5}'.format(sp, prs, prss[sp][prs]))

adjv  H   :    16
adjv  HM  :    10
adjv  J   :    25
adjv  K   :    35
adjv  K=  :     3
adjv  KM  :     7
adjv  M   :     8
adjv  MW  :     1
adjv  NW  :     5
adjv  W   :    59
adjv  absent :  9323
advb  n/a :  4550
art   n/a : 30380
conj  n/a : 62723
inrg  K   :     1
inrg  M   :     2
inrg  W   :     5
inrg  absent :  1277
intj  K   :    13
intj  K=  :     7
intj  KM  :     2
intj  M   :    37
intj  NJ  :   181
intj  NW  :     8
intj  W   :     3
intj  absent :  1634
nega  n/a :  6053
nmpr  n/a : 33083
prde  n/a :  2660
prep  H   :  1019
prep  H=  :    36
prep  HJ  :    13
prep  HM  :  1499
prep  HN  :    74
prep  HW  :   174
prep  HWN :    19
prep  J   :  1853
prep  K   :  1634
prep  K=  :   353
prep  KM  :  1181
prep  KN  :     2
prep  KWN :     1
prep  M   :   684
prep  MW  :    68
prep  N   :     3
prep  N>  :     4
prep  NJ  :   105
prep  NW  :   539
prep  W   :  3247
prep  absent : 60765
prin  n/a :  1021
prps  n/a :  5011
subs  H   :  1633
subs  H=  :   108
subs  HJ  :    5

In [9]:
pronominal_suffix = {
    'accusative': {
        'W': ('p3-sg-m', 'him'),
        'K': ('p2-sg-m', 'you:m'),
        'J': ('p1-sg-', 'me'),
        'M': ('p3-pl-m', 'them:mm'),
        'H': ('p3-sg-f', 'her'),
        'HM': ('p3-pl-m', 'them:mm'),
        'KM': ('p2-pl-m', 'you:mm'),
        'NW': ('p1-pl-', 'us'),
        'HW': ('p3-sg-m', 'him'),
        'NJ': ('p1-sg-', 'me'),
        'K=': ('p2-sg-f', 'you:f'),
        'HN': ('p3-pl-f', 'them:ff'),
        'MW': ('p3-pl-m', 'them:mm'),
        'N': ('p3-pl-f', 'them:ff'),
        'KN': ('p2-pl-f', 'you:ff'),
    },
    'genitive' : {
        'W': ('p3-sg-m', 'his'),
        'K': ('p2-sg-m', 'your:m'),
        'J': ('p1-sg-', 'my'),
        'M': ('p3-pl-m', 'their:mm'),
        'H': ('p3-sg-f', 'her'),
        'HM': ('p3-pl-m', 'their:mm'),
        'KM': ('p2-pl-m', 'your:mm'),
        'NW': ('p1-pl-', 'our'),
        'HW': ('p3-sg-m', 'his'),
        'NJ': ('p1-sg-', 'my'),
        'K=': ('p2-sg-f', 'your:f'),
        'HN': ('p3-pl-f', 'their:ff'),
        'MW': ('p3-pl-m', 'their:mm'),
        'N': ('p3-pl-f', 'their:ff'),
        'KN': ('p2-pl-f', 'your:ff'),        
    }
}
switch_prs = dict(
    subs = 'genitive',
    verb = 'accusative',
    prep = 'accusative',
    conj = None,
    nmpr = None,
    art = None,
    adjv = 'genitive',
    nega = None,
    prps = None,
    advb = None,
    prde = None,
    intj = 'accusative',
    inrg = 'genitive',
    prin = None,
)

def get_prs_info(w):
    sp = F.sp.v(w)
    prs = F.prs.v(w)
    switch = switch_prs[sp]
    return pronominal_suffix.get(switch, {}).get(prs, (None, None))

# Compiling the senses

In [10]:
slabels = '''
--
-c
-i
-b
-p
d-
dc
di
db
dp
n.
c.
l.
k.
i.
'''.strip().split()

senses = {}
senses_blocks = senses_spec.strip().split('\n\n')
for b in senses_blocks:
    lines = b.split('\n')
    verb = lines[0]
    sense_parts = [l.split(':', 2) for l in lines[1:]]
    senses[verb] = dict(
        (x[0].strip(), (x[1].strip(), [y.strip() for y in x[2].strip().split('::')])) for x in sense_parts
    )
    for slabel in slabels:
        if slabel not in senses[verb]:
            error('{:<6}: Missing sense label: {}'.format(verb, slabel))
    for slabel in sorted(senses[verb]):
        if slabel not in slabels:
            error('{:<6}: Unknown sense label: {}'.format(verb, slabel))
info('Senses for {} verbs:\n\t{}'.format(len(senses), '\n\t'.join(sorted(senses))))

    46s ZQN   : Missing sense label: dp


    46s Senses for 8 verbs:
	<FH
	BR>
	CJT
	DBQ
	FJM
	NTN
	QR>
	ZQN


# Making a verb-clause index

We generate an index which gives for each verb lexeme a list of clauses that have that lexeme as the main verb.
In the index we store the clause node together with the word node(s) that carries the main verb(s).

Clauses may have multiple verbs. In many cases it is a copula plus an other verb.
In those cases, we are interested in the other verb, so we exclude copulas.

Yet, there are also sentences with more than one main verb.
In those cases, we treat both verbs separately as main verb of one and the same clause.

In [11]:
info('Making the verb-clause index')
occs = collections.defaultdict(list)   # dictionary of all verb occurrence nodes per verb lexeme
verb_clause = collections.defaultdict(list)    # dictionary of all verb occurrence nodes per clause node
clause_verb = collections.OrderedDict() # idem but for the occurrences of selected verbs

for w in F.otype.s('word'):
    if F.sp.v(w) != 'verb': continue
    lex = F.lex.v(w).rstrip('[')
    if lex not in senses: continue   
    pf = F.predication.v(L.u(w, 'phrase')[0])
    if pf in pf_predicate:
        cn = L.u(w, 'clause')[0]
        clause_verb.setdefault(cn, []).append(w)
        verb_clause[lex].append((cn, w))
info('Done ({} clauses with a flowchart verb)'.format(len(clause_verb)))

    53s Making the verb-clause index
    54s Done (5784 clauses with a flowchart verb)


# (Indirect) Objects, Locatives, Benefactives

In [12]:
info('Finding key constituents')
constituents = collections.defaultdict(lambda: collections.defaultdict(set))
ckinds = '''
    dos pdos ndos kdos ldos idos cdos inds locs cpls bens
'''.strip().split()

# go through all relevant clauses and collect all types of direct objects
for c in clause_verb:
    these_constituents = collections.defaultdict(set)
    # phrase like constituents
    for p in L.d(c, 'phrase'):
        gf = F.grammatical.v(p)
        of = F.original.v(p)
        sf = F.semantic.v(p)
        vf = F.valence.v(p)
        ckind = None
        if gf in gf_direct_object:
            if gf =='principal_direct_object':
                ckind = 'pdos'
            elif gf == 'NP_direct_object':
                ckind = 'ndos'
            elif gf == 'L_object':
                ckind = 'ldos'
            elif gf == 'K_object':
                ckind = 'kdos'
            else:
                ckind = 'dos'
        elif gf in gf_indirect_object:
            ckind = 'inds'
        elif  sf and sf in sf_benefactive:
            ckind = 'bens'
        elif sf in sf_locative and vf in vf_locative:
            ckind = 'locs'
        elif gf in gf_complement:
            ckind = 'cpls'
        if ckind: these_constituents[ckind].add(p)

    # clause like constituents: only look for object clauses dependent on this clause
    for ac in L.d(L.u(c, 'sentence')[0], 'clause'):
        dep = list(E.mother.f(ac))
        if len(dep) and dep[0] == c:
            gf = F.grammatical.v(ac)
            ckind = None
            if gf in gf_direct_object:
                if gf == 'direct_object':
                    ckind = 'cdos'
                elif gf == 'infinitive_object':
                    ckind = 'idos'
            if ckind: these_constituents[ckind].add(ac)
    
    for ckind in these_constituents:
        constituents[c][ckind] |= these_constituents[ckind]

info('Done') 

 3m 54s Finding key constituents
 3m 55s Done


In [13]:
testcases = (
#    426955,
#    427654,
#    428420,
#    429412,
#    429501,
#    429862,
#    431695,
#    431893,
    430372,
)

def showcase(n):
    otype = F.otype.v(n)
    verseNode = L.u(n, 'verse')[0]
    place = T.sectionFromNode(verseNode)
    print('''CASE {}={} ({}-{})\nCLAUSE: {}\nVERSE\n{} {}\nGLOSS {}\n'''.format(
        n, otype, F.rela.v(n), F.typ.v(n),
        T.text(L.d(n, 'word'), fmt='text-trans-plain'),
        '{} {}:{}'.format(*place),
        T.text(L.d(verseNode, 'word'), fmt='text-trans-plain'),
        ' '.join(F.gloss.v(w) for w in L.d(verseNode, 'word'))
    ))
    print('PHRASES\n')
    for p in L.d(n, 'phrase'):
        print('''{} ({}-{}) {} "{}"'''.format(
            p, F.function.v(p), F.typ.v(n),
            T.text(L.d(p, 'word'), fmt='text-trans-plain'),
            ' '.join(F.gloss.v(w) for w in L.d(p, 'word')),
        ))
        print('valence = {}; grammatical = {}; lexical = {}; semantic = {}\n'.format(
            F.valence.v(p),
            F.grammatical.v(p),
            F.lexical.v(p),
            F.semantic.v(p),
        ))
    print('SUBCLAUSES\n')
    for ac in L.d(L.u(n, 'sentence')[0], 'clause'):
        dep = list(E.mother.f(ac))
        if not(len(dep) and dep[0] == n): continue
        print('''{} ({}-{}) {} "{}"'''.format(
            ac, F.rela.v(ac), F.typ.v(ac),
            T.text(L.d(ac, 'word'), fmt='text-trans-plain'),
            ' '.join(F.gloss.v(w) for w in L.d(ac, 'word')),
        ))
        print('valence = {}; grammatical = {}; lexical = {}; semantic = {}\n'.format(
            F.valence.v(ac),
            F.grammatical.v(ac),
            F.lexical.v(ac),
            F.semantic.v(ac),
        ))

    print('CONSTITUENTS')
    for ckind in ckinds:
        print('{:<4}: {}'.format(ckind, ','.join(str(x) for x in sorted(constituents[n][ckind]))))
    print('================\n')

for n in (testcases): showcase(n)

CASE 430372=clause (Cmpl-InfC)
CLAUSE: L <FWT H DBR H ZH 
VERSE
Genesis 34:14 W J>MRW >LJHM L> NWKL L <FWT H DBR H ZH L TT >T >XTNW L >JC >CR LW <RLH KJ XRPH HW> LNW 
GLOSS and say to not be able to make the word the this to give <object marker> sister to man <relative> to foreskin that reproach she to

PHRASES

616668 (Pred-InfC) L <FWT  "to make"
valence = core; grammatical = NA; lexical = ; semantic = 

616669 (Objc-InfC) H DBR H ZH  "the word the this"
valence = complement; grammatical = principal_direct_object; lexical = ; semantic = 

SUBCLAUSES

430373 (Adju-InfC) L TT >T >XTNW L >JC  "to give <object marker> sister to man"
valence = NA; grammatical = infinitive_object; lexical = ; semantic = 

CONSTITUENTS
dos : 
pdos: 616669
ndos: 
kdos: 
ldos: 
idos: 430373
cdos: 
inds: 
locs: 
cpls: 
bens: 



# Overview of quantities

In [14]:
# Counting constituents

constituents_count = collections.defaultdict(collections.Counter)

for c in constituents:
    for ckind in ckinds:
        n = len(constituents[c][ckind])
        constituents_count[ckind][n] += 1

for ckind in ckinds:
    total = 0
    for (count, n) in sorted(constituents_count[ckind].items(), key=lambda y: -y[0]):
        if count: total += n
        info('{:>5} clauses with {:>2} {:<10} constituents'.format(n, count, ckind), tm=False)
    info('{:>5} clauses with {:>2} {:<10} constituent'.format(total, 'a', ckind), tm=False)
info('{:>5} clauses with {:>2} flowchart verb'.format(len(clause_verb), 'a'), tm=False)

 2485 clauses with  1 dos        constituents
 2483 clauses with  0 dos        constituents
 2485 clauses with  a dos        constituent
 1075 clauses with  1 pdos       constituents
 3893 clauses with  0 pdos       constituents
 1075 clauses with  a pdos       constituent
  372 clauses with  1 ndos       constituents
 4596 clauses with  0 ndos       constituents
  372 clauses with  a ndos       constituent
   68 clauses with  1 kdos       constituents
 4900 clauses with  0 kdos       constituents
   68 clauses with  a kdos       constituent
   18 clauses with  2 ldos       constituents
  823 clauses with  1 ldos       constituents
 4127 clauses with  0 ldos       constituents
  841 clauses with  a ldos       constituent
    2 clauses with  2 idos       constituents
  205 clauses with  1 idos       constituents
 4761 clauses with  0 idos       constituents
  207 clauses with  a idos       constituent
    1 clauses with  2 cdos       constituents
  153 clauses with  1 cdos       constit

# Applying the flowchart

We can now apply the flowchart in a straightforward manner.

We output the results as a stand-alone comma separated file, with these columns as specified in the code below.
This file can be used to import into a spreadsheet and check results.

We also provide a comma separated file that can be imported directly into SHEBANQ as a set of notes, so that the reader can check results within SHEBANQ. This has the benefit that the full context is available, and also data view can be called up easily to inspect the coding situation for each particular instance.

In [15]:
status_rep = {
    '*': 'note',
    '!': 'good',
    '?': 'warning',
    '-': 'error',
}
stat_rep = {
    '*': 'NB',
    '!': '',
    '?': 'wrn',
    '-': 'err',
}

gloss_hacks = {
    'XQ/': 'law/precept',
}

In [16]:
def reptext(label, ckind, v, phrases, num=False, txt=False, gloss=False, textformat='text-trans-plain'): 
    if phrases == None: return ''
    label_rep = '{}='.format(label) if label else ''
    phrases_rep = []
    for p in sorted(phrases, key=sortKey):
        ptext = '[{}|'.format(F.number.v(p) if num else '[')
        if txt:
            ptext += T.text(L.d(p, 'word'), fmt=textformat)
        if gloss:
            words = L.d(p, 'word')
            if ckind == 'ldos' and F.lex.v(words[0]) == 'L': words = words[1:]

            wtexts = []
            for w in words:
                g = gloss_hacks.get(F.lex.v(w), F.gloss.v(w)).replace('<object marker>','&')
                if F.lex.v(w) == 'BJN/' and F.pdp.v(w) == 'prep': g = 'between'
                prs_g = get_prs_info(w)[1]
                uvf = F.uvf.v(w)
                wtext = ''
                if uvf == 'H': ptext += 'toward '
                wtext += g if w != v else '' # we do not have to put in the gloss of the verb in question
                wtext += ('~'+prs_g) if prs_g != None else ''
                wtexts.append(wtext)
            ptext += ' '.join(wtexts)
        ptext += ']'
        phrases_rep.append(ptext)
    return ' '.join(phrases_rep)

In [17]:
debug_messages = collections.defaultdict(lambda: collections.defaultdict(list))

def flowchart(v, lex, verb, consts):
    consts = deepcopy(consts)
    sense_label = None
    n_ = collections.defaultdict(lambda: 0)
    for ckind in ckinds: n_[ckind] = len(consts[ckind])
    char1 = None
    char2 = None
    # determine char 1 of the sense label
    if n_['pdos'] > 0:
        if n_['ndos'] > 0: char1 = 'n'
        elif n_['cdos'] > 0: char1 = 'c'
        elif n_['ldos'] > 0: char1 = 'l'
        elif n_['kdos'] > 0: char1 = 'k'
        elif n_['idos'] > 0: char1 = 'i'
        else:
        # in trouble: if there is a principal direct object, there should be an other object as well
        # and the other one should be an NP, object clause, L_object, K_object, or I_object
        # If this happens, it is probably the result of manual correction
        # We warn, and remedy
            msg_rep = '; '.join('{} {}'.format(n_[ckind], ckind) for ckind in ckinds)
            if n_['dos'] > 0:
                # there is an other object (dos should only be used if there is a single object)
                # we'll put the dos in the ndos (which was empty)
                # This could be caused by a manual enrichment sheet that has been generated 
                # before the concept of NP_direct_object had been introduced
                char1 = 'n'
                consts['ndos'] = consts['dos']
                debug_messages[lex]['pdos with dos'].append('{}: {}'.format(T.sectionFromNode(v), msg_rep))
            else:
                # there is not another object, we treat this as a single object, so as a dos
                char1 = 'd'
                consts['dos'] = consts['pdos']
                debug_messages[lex]['lonely pdos'].append('{}: {}'.format(T.sectionFromNode(v), msg_rep))
    else:
        if n_['cdos'] > 0:
        # in the case of a single object, the clause objects act as ordinary objects
            char1 = 'd'
            consts['dos'] |= consts['cdos']
        if n_['ndos'] > 0:
        # in the case of a single object, the np_objects act as ordinary objects
            char1 = 'd'
            consts['dos'] |= consts['ndos']

    n_ = collections.defaultdict(lambda: 0)
    for ckind in ckinds: n_[ckind] = len(consts[ckind])

    if n_['pdos'] == 0 and n_['dos'] > 0:
        char1 = 'd'
    if n_['pdos'] == 0 and n_['dos'] == 0:
        char1 = '-'

    # determine char 2 of the sense label
    if char1 in 'nclki':
        char2 = '.'
    else:
        if n_['inds'] > 0:
            char2 = 'i'
        elif n_['bens'] > 0:
            char2 = 'b'
        elif n_['locs'] > 0:
            char2 = 'p'
        elif n_['cpls'] > 0:
            char2 = 'c'
        else:
            char2 = '-'

    sense_label = char1+char2
    
    sinfo = senses.\
        get(lex, {lex: {'': ('-', 'no senses given for {}'.format(lex))}}).\
        get(sense_label, ('-', 'no sense {} given for {}'.format(sense_label, lex)))
    status = sinfo[0]
    sense_fmt = sinfo[1][0]
    action_fmt = sinfo[1][1] if len(sinfo[1]) >= 2 else ''
    action_stat = sinfo[1][2] if len(sinfo) >= 3 else status

    verb_rep = reptext('', '', v, verb, num=True, gloss=True)
    consts_rep = dict((ckind, reptext('', ckind, v, consts[ckind], num=True, gloss=True)) for ckind in consts)
        
    sense_txt = sense_fmt.format(verb=verb_rep, **consts_rep)
    action_txt = action_fmt.format(verb=verb_rep, **consts_rep)

    return (sense_label, status, sense_txt, action_txt, action_stat)

In [18]:
fields = ('''
    book
    chapter
    verse
    sentence#
    clause#
    lex
    status
    sense_label
    sense
    action_status
    action
    '''+(''.join('\n\t#{}'.format(ckind) for ckind in ckinds))+'''
    text
''').strip().split()

sfields = '''
    version
    book
    chapter
    verse
    clause_atom
    is_shared
    is_published
    status
    keywords
    ntext
'''.strip().split()

fields_fmt = ('{};' * (len(fields) - 1)) + '{}\n' 
sfields_fmt = ('{}\t' * (len(sfields) - 1)) + '{}\n' 

# Running the flowchart

The next cell finally performs all the flowchart computations for all verbs in all contexts.

In [19]:
error('Applying the flowchart')

outcome_sta = collections.Counter()
outcome_lab = collections.Counter()
outcome_sta_l = collections.defaultdict(lambda: collections.Counter())
outcome_lab_l = collections.defaultdict(lambda: collections.Counter())

# we want an overview of the flowchart decisions per lexeme
# Per lexeme, per sense_label we store the clauses

decisions = collections.defaultdict(lambda: collections.defaultdict(dict))

of = open('{}/{}'.format(result_dir, 'valence_results.csv'), 'w')
ofs = open('{}/{}'.format(result_dir, 'valence_notes.csv'), 'w')
of.write('{}\n'.format(';'.join(fields)))
ofs.write('{}\n'.format('\t'.join(sfields)))

note_keyword_base = 'valence'

nnotes = collections.Counter()

for lex in verb_clause:
    if lex not in senses:
        error('No flowchart definition for verb {}'.format(lex))
for lex in senses:
    this_of = open('{}/{}.csv'.format(result_v_dir, lex), 'w')
    if lex not in verb_clause:
        error('No verb {} in enriched corpus'.format(lex))
        continue
    for (c,v) in verb_clause[lex]:
        if F.vs.v(v) not in verbal_stems: continue
    
        book = F.book.v(L.u(v, 'book')[0])
        chapter = F.chapter.v(L.u(v, 'chapter')[0])
        verse = F.verse.v(L.u(v, 'verse')[0])
        sentence_n = F.number.v(L.u(v, 'sentence')[0])
        clause_n = F.number.v(c)
        clause_atom_n = F.number.v(L.u(v, 'clause_atom')[0])
        
        verb = [L.u(v, 'phrase')[0]]
        consts = constituents[c]
        n_ = collections.defaultdict(lambda: 0)
        for ckind in ckinds: n_[ckind] = len(consts[ckind])
        
        (sense_label, status, sense_txt, action_txt, action_stat) = flowchart(v, lex, verb, consts)
        
        outcome_sta[status] += 1
        outcome_sta_l[lex][status] += 1
        outcome_lab[sense_label] += 1
        outcome_lab_l[lex][sense_label] += 1
        decisions[lex][sense_label][c] = '{} :: {}'.format(sense_txt, action_txt)
        text = reptext('', '', v, L.d(c, 'phrase'), num=True, txt=True)

        txt = fields_fmt.format(
            book,
            chapter,
            verse,
            sentence_n,
            clause_n,
            '"'+lex+'"',
            stat_rep[status],
            '"<'+sense_label+'>"',
            '"'+sense_txt+'"',
            action_stat,
            '"'+action_txt+'"',
            *(n_[ckind] for ckind in ckinds),
            '"'+text+'"',
        )
        of.write(txt)
        this_of.write(txt)
        ofs.write(sfields_fmt.format(
            version,
            book,
            chapter,
            verse,
            clause_atom_n,
            'T',
            '',
            status,
            note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
            '_{sl}_ [{nm}|{vb}] {st}'.format(
                nm=F.number.v(L.u(v, 'phrase')[0]),
                vb=F.g_word_utf8.v(v),
                st=sense_txt,
                sl=sense_label,
            ),
        ))
        nnotes[note_keyword_base] += 1
        if action_txt != '':
            ofs.write(sfields_fmt.format(
                version,
                book,
                chapter,
                verse,
                clause_atom_n,
                'T',
                '',
                action_stat,
                note_keyword_base+(' val_{}'.format(stat_rep[status]) if status != '!' else ''),
                action_txt,
            ))
            nnotes['action'] += 1
    this_of.close()
            
of.close()
ofs.close()
error('Done')

show_limit = 20
for lex in debug_messages:
    error(lex, tm=False)
    for kind in debug_messages[lex]:
        error('\t{}'.format(kind), tm=False)
        messages = debug_messages[lex][kind]
        lm = len(messages)
        error('\t\t{}{}'.format(
            '\n\t\t'.join(messages[0:show_limit]),
            '' if lm <= show_limit else '\n\t\tAND {} more'.format(lm-show_limit),
        ), tm=False)

info('Computed {} clauses with flowchart'.format(sum(outcome_sta.values())), tm=False)
ntot = 0
for (lab, n) in sorted(nnotes.items(), key=lambda x: x[0]):
    ntot += n
    print('{:<10} notes: {}'.format(lab, n))
print('{:<10} notes: {}'.format('Total', ntot))

for lex in [''] + sorted(senses):
    print('All lexemes with flowchart specification' if lex == '' else lex)
    src_sta = outcome_sta if lex == '' else outcome_sta_l.get(lex, collections.defaultdict(lambda: 0))
    src_lab = outcome_lab if lex == '' else outcome_lab_l.get(lex, collections.defaultdict(lambda: 0))
    tot = 0
    for (x, n) in src_sta.items():
        tot += n
        print('     Status   {:<7}: {:>4} clauses'.format(status_rep[x], n))
    print('     All status      : {:>4} clauses'.format(tot))
    tot = 0
    for x in slabels:
        n = src_lab[x]
        tot += n
        print('     Sense    {:<7}: {:>4} clauses'.format(x, n))
    print('     All senses      : {:>4} clauses'.format(tot))
    print(' ')

 4m 36s Applying the flowchart
 4m 37s No verb DBQ in enriched corpus
 4m 38s No verb ZQN in enriched corpus
 4m 38s Done


Computed 5476 clauses with flowchart
action     notes: 881
valence    notes: 5476
Total      notes: 6357
All lexemes with flowchart specification
     Status   good   : 4427 clauses
     Status   error  :  135 clauses
     All status      : 5476 clauses
     Sense    --     : 1028 clauses
     Sense    -c     :  144 clauses
     Sense    -i     :  400 clauses
     Sense    -b     :    7 clauses
     Sense    -p     :  246 clauses
     Sense    d-     : 1263 clauses
     Sense    dc     :  266 clauses
     Sense    di     :  447 clauses
     Sense    db     :   11 clauses
     Sense    dp     :  605 clauses
     Sense    n.     :  369 clauses
     Sense    c.     :   12 clauses
     Sense    l.     :  468 clauses
     Sense    k.     :   48 clauses
     Sense    i.     :  162 clauses
     All senses      : 5476 clauses
 
<FH
     Status   good   : 1810 clauses
     Status   error  :   43 clauses
     All status      : 2294 clauses
     Sense    --     :  727 clauses
     Sense    -c    

In [20]:
def show_decision(verbs=None, labels=None, books=None): # show all clauses that have a verb in verbs and a sense label in labels
    results = []
    for verb in decisions:
        if verbs != None and verb not in verbs: continue
        for label in decisions[verb]:
            if labels != None and label not in labels: continue
            for (c, stxt) in sorted(decisions[verb][label].items()):
                book = T.sectionFromNode(L.u(c, 'book')[0])[0]
                if books != None and book not in books: continue
                sentence_words = L.d(L.u(c, 'sentence')[0], 'word')
                results.append('{:<7} {:<12} {:<5} {:<2} {}\n\t{}\n\t{}\n'.format(
                    c,
                    '{} {}: {}'.format(*T.sectionFromNode(c)),
                    verb,
                    label,
                    stxt,
                    T.text(sentence_words, fmt='text-trans-plain'),
                    ' '.join(F.gloss.v(w) for w in sentence_words),
                ).replace('<', '&lt;'))
    print('\n'.join(sorted(results)))

In [29]:
show_decision(verbs={'FJM'}, books={'Isaiah'})

467293  Isaiah 3: 7  FJM   n. make [2|~me] to be [3|chief people] :: 
	L> TFJMNJ QYJN &lt;M 
	not put chief people

467456  Isaiah 5: 20 FJM   l. make [2|darkness] to become [3|light] :: 
	HWJ H >MRJM L  R&lt; W L  VWB FMJM XCK L >WR W >WR L XCK FMJM MR L MTWQ W MTWQ L MR 
	alas the say to the evil and to the good put darkness to light and light to darkness put bitter to sweet and sweet to bitter

467458  Isaiah 5: 20 FJM   l. make [2|bitter] to become [3|sweet] :: 
	HWJ H >MRJM L  R&lt; W L  VWB FMJM XCK L >WR W >WR L XCK FMJM MR L MTWQ W MTWQ L MR 
	alas the say to the evil and to the good put darkness to light and light to darkness put bitter to sweet and sweet to bitter

467855  Isaiah 10: 6 FJM   n. make [2|to ~him] to be [3|trampled land] :: 
	W &lt;L &lt;M &lt;BRTJ >YWNW L CLL CLL W L BZ BZ W L FJMW MRMS K XMR XWYWT 
	and upon people anger command to plunder plunder and to spoil spoiling and to put trampled land as clay outside

468059  Isaiah 13: 9 FJM   l. make [2|the earth] t