<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-xsmall.png"/></a>
<a href="http://emdros.org" target="_blank"><img align="left" src="files/images/Emdros-xsmall.png"/></a>
<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/etcbc4easy-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-xsmall.png"/></a>

# נָתַן and locatives

We try to mark all those occurrences of נָתַן + object + complement.
What we want to mark is whether the complement is a proper indirect object or a locative.

This is part of Janet's project of creating a valence dictionary and using a flowchart to arrive at the meaning of a verb in its context of complements.

First we use an MQL query to get all occurrences of נָתַן with an object and a complement.
Then we apply a few heuristics to detect those cases where the complement is a locative or an indirect object.

The query is also on SHEBANQ, a version by [Dirk](http://shebanq.ancient-data.org/hebrew/query?id=558) and a version by [Janet](http://shebanq.ancient-data.org/hebrew/query?id=560).

In [1]:
import sys
import collections
import subprocess

from lxml import etree

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.5.0
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: http://shebanq-doc.readthedocs.org/en/latest/texts/welcome.html



In [14]:
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'ntn', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        oid otype monads
        function
        g_word_utf8 trailer_utf8
        lex prs sp nametype
        book chapter verse label number
    ''',''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s DETAIL: COMPILING m: UP TO DATE
  0.10s INFO: USING DATA COMPILED AT: 2015-05-04T13-46-20
  0.10s DETAIL: COMPILING a: UP TO DATE
  0.10s INFO: USING DATA COMPILED AT: 2015-05-04T14-07-34
  0.11s DETAIL: keep main: G.node_anchor_min
  0.11s DETAIL: keep main: G.node_anchor_max
  0.11s DETAIL: keep main: G.node_sort
  0.12s DETAIL: keep main: G.node_sort_inv
  0.12s DETAIL: keep main: G.edges_from
  0.12s DETAIL: keep main: G.edges_to
  0.12s DETAIL: keep main: F.etcbc4_db_monads [node] 
  0.12s DETAIL: keep main: F.etcbc4_db_oid [node] 
  0.12s DETAIL: keep main: F.etcbc4_db_otype [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_function [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_g_word_utf8 [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_lex [node] 
  0.12s DETAIL: keep main: F.etcbc4_ft_number [node] 
  0.13s DETAIL: keep main: F.etcbc4_ft_prs [node] 
  0.13s DETAIL: keep main: F.etcbc4_ft_sp [node] 
  0.13s DETAIL: keep main: F.etcbc4_l

For each result, we write out a line of information.
Here is a description of the columns.

* ``order`` in what order the **P**redicate, **O**bject, and **C**omplement have been encountered
* ``verb`` the verb occurrence in vocalised Hebrew
* ``object`` the text of the complete (direct) object in Hebrew
* ``# loc lexemes`` how many distinct lexemes with a locative meaning occur in the complement (given by a fixed list)
* ``# topo`` how many lexemes with nametype = ``topo`` occur in the complement (nametype is a feature of the lexicon)
* ``# prep_b`` how many occurrences of the preposition ``B`` occur in the complement
* ``locativity`` a crude measure of the locativity of the complement, just the sum of ``# loc lexemes``, ``# topo``, and ``# prep_b``
* ``# prep_l`` how many occurrences of the preposition ``L`` with a pronominal suffix on it occur in the complement
* ``# L prop`` how many occurrences of ``L`` plus proper name occur in the complement
* ``indirect object`` a crude indicator of whether the complement is an indirect object, just the sum of ``# prep_l`` and ``# L prop`` 
* ``complement text`` the text of the complete complement as a sequence of transcribed, consonantal lexemes
* clause atom number of the clause_atom containing the predicate with NTN
* ``clause text`` the text of the complete clause

In [37]:
locative_lexemes = {
    '>RY/',
    'BJT/',
    'DRK/',
    'HR/',
    'JM/',
    'JRDN/',
    'JRWCLM/',
    'JFR>L/',
    'MDBR/',
    'MW<D/',
    'MZBX/',
    'MYRJM/',
    'MQWM/',
    'SBJB/',
    '<JR/',
    'FDH/',
    'CM',
    'CMJM/',
    'CMC/',
    'C<R/',
}
no_prs = {'absent', 'n/a'}

statclass = {
    'o': 'info',
    '+': 'good',
    '-': 'error',
    '?': 'warning',
    '!': 'special',
    '*': 'note',
}
statsym = dict((x[1], x[0]) for x in statclass.items())

def cert_status(cert):
    if cert == 0: return 'error'
    elif cert == 1: return 'warning'
    elif cert <= 10: return 'good'
    else: return 'special'

tsvfile = outfile('ntn.csv')
notefile = outfile('ntn-note.csv')
nresults = 0
nclauses = 0
orders = collections.Counter()
certs = collections.Counter()
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\tloc\tloc\tloc\tloc\tind\tind\tind\tcomplement text\tca_num\tclause text\n')
tsvfile.write('book\tchapter\tverse\torder\tverb\tobject\t# loc lexemes\t# topo\t# prep_b\tlocativity\t# prep_l\t# L prop\tindirect object\tcomplement text\tca_num\tclause text\n')
pclass = collections.Counter()
pclass['LI'] = 0
notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\n'.format(
    'version', 'book', 'chapter', 'verse', 'clause_atom', 'is_shared', 'is_published', 'status', 'keywords', 'ntext',
))
keywords = 'ntn-loca'
is_shared = 'T'
is_published = ''
status = statsym['info']
ntext_fmt = 'locative versus indirect object: L={} I={}; {}'

climit = 900

kws = ''.join(' {} '.format(k) for k in set(keywords.strip().split()))

for clause in F.otype.s('clause'):
    nclauses += 1
    phrases = {}
    order = ''
    verb = None
    for phrase in L.d('phrase', clause):
        pf = F.function.v(phrase)
        if pf in {'Pred', 'Objc', 'Cmpl'}:
            words = L.d('word', phrase)
            if pf not in phrases:
                order += pf[0]
                phrases[pf] = words
            else:
                phrases[pf].extend(words)
    is_ntn = False

    for w in phrases.get('Pred', []):
        if F.sp.v(w) == 'verb' and F.lex.v(w) == 'NTN[':
            is_ntn = True
            verb = w
            break
    if not is_ntn: continue
    nresults += 1    
    orders[order] += 1    

    book = F.book.v(L.u('book', verb))
    chapter = F.chapter.v(L.u('chapter', verb))
    verse = F.verse.v(L.u('verse', verb))    
    clause_atom = F.number.v(L.u('clause_atom', verb))
    
    verb_txt = F.g_word_utf8.v(verb)
    obj_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Objc', []))
    cmpl_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in phrases.get('Cmpl', []))
    if len(cmpl_txt) > climit:
        cmpl_txt = cmpl_txt[0:climit]+'...'
    clause_txt = ''.join(F.g_word_utf8.v(x)+F.trailer_utf8.v(x) for x in L.d('word', clause))

    compl_wnodes = phrases.get('Cmpl', [])
    compl_lexemes = [F.lex.v(w) for w in compl_wnodes]
    compl_lset = set(compl_lexemes)
    lex_locativity = len(locative_lexemes & compl_lset)
    prep_b = len([x for x in compl_lexemes if x == 'B'])
    prep_l = len([x for x in compl_wnodes if F.lex.v(x) == 'L' and F.prs.v(x) not in no_prs])
    prep_lpr = 0
    lwn = len(compl_wnodes)
    for (n, wn) in enumerate(compl_wnodes):
        if F.lex.v(wn) == 'L':
            if n+1 < lwn:
                if F.sp.v(compl_wnodes[n+1]) == 'nmpr':
                    prep_lpr += 1
    topo = len([x for x in compl_wnodes if F.nametype.v(x) == 'topo'])

    loca = lex_locativity + topo + prep_b
    indi = prep_l + prep_lpr

    this_class = ''
    this_class += 'L' if loca else ''
    this_class += 'I' if indi else ''
    pclass[this_class] += 1
    
    tsvfile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
        book, 
        chapter, 
        verse,
        order,
        verb_txt,
        obj_txt,
        lex_locativity,
        topo,
        prep_b,
        loca,
        prep_l,
        prep_lpr,
        indi,
        ' '.join(compl_lexemes),
        clause_atom,
        clause_txt,
    ).replace('\n', ' ')+'\n')
    
    ntext = ntext_fmt.format(loca, indi, cmpl_txt)
    certainty = abs(loca - indi) * max((loca, indi))
    certs[certainty] += 1
    status = statsym[cert_status(certainty)]
    notefile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}'.format(
        version, book, chapter, verse, clause_atom, is_shared, is_published, status, kws, ntext,
    ).replace('\n', ' ')+'\n')
    
tsvfile.close()
notefile.close()
for order in sorted(orders):
    print("{:<5}: {:>3} results".format(order, orders[order]))

for cert in sorted(certs):
    print("{:>5} = {:<8}: {:>3} results".format(cert, cert_status(cert), certs[cert]))

for this_class in pclass:
    print("{:<2}: {:>3} results".format(this_class, pclass[this_class]))
print('Total: {:>3} results in {} clauses'.format(nresults, nclauses))



CP   :  22 results
CPO  :  59 results
OCP  :  17 results
OP   :  32 results
OPC  : 139 results
P    :  60 results
PC   : 351 results
PCO  : 372 results
PO   : 200 results
POC  : 364 results
    0 = error   : 790 results
    2 = good    :   1 results
    4 = good    :  57 results
    9 = good    :   9 results
   16 = special :   5 results
   25 = special :   1 results
   36 = special :   2 results
   49 = special :   1 results
  156 = special :   6 results
  169 = special :   5 results
  196 = special :   2 results
  676 = special :   1 results
I : 497 results
  : 781 results
L : 322 results
LI:  16 results
Total: 1616 results in 87900 clauses


In [27]:
!head -n 10 {my_file('ntn.csv')}

book	chapter	verse	order	verb	object	loc	loc	loc	loc	ind	ind	ind	complement text	ca_num	clause text
book	chapter	verse	order	verb	object	# loc lexemes	# topo	# prep_b	locativity	# prep_l	# L prop	indirect object	complement text	ca_num	clause text
Genesis	1	17	POC	יִּתֵּ֥ן	אֹתָ֛ם 	1	0	1	2	0	0	0	B RQJ</ H CMJM/	67	וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם 
Genesis	1	29	PCO	נָתַ֨תִּי	אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ 	0	0	0	0	1	0	1	L	121	הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־עֵ֣שֶׂב ׀ וְאֶת־כָּל־הָעֵ֛ץ 
Genesis	3	6	PC	תִּתֵּ֧ן		0	0	0	0	0	0	0	GM L >JC/	258	וַתִּתֵּ֧ן גַּם־לְאִישָׁ֛הּ עִמָּ֖הּ 
Genesis	3	12	PC	נָתַ֣תָּה		0	0	0	0	0	0	0	<MD/	285	אֲשֶׁ֣ר נָתַ֣תָּה עִמָּדִ֔י 
Genesis	3	12	PC	נָֽתְנָה		0	0	0	0	0	0	0	MN H <Y/	286	הִ֛וא נָֽתְנָה־לִּ֥י מִן־הָעֵ֖ץ 
Genesis	4	12	POC	תֵּת	כֹּחָ֖הּ 	0	0	0	0	1	0	1	L	385	תֵּת־כֹּחָ֖הּ לָ֑ךְ 
Genesis	9	2	CP	נִתָּֽנוּ		0	0	1	1	0	0	0	B JD/	762	בְּיֶדְכֶ֥ם נִתָּֽנוּ׃



In [28]:
!head -n 10 {my_file('ntn-note.csv')}

version	book	chapter	verse	clause_atom	is_shared	is_published	status	keywords	ntext
4b	Genesis	1	17	67	T		+	ntn-loca	locative versus indirect object: L=2 I=0; בִּרְקִ֣יעַ הַשָּׁמָ֑יִם 
4b	Genesis	1	29	121	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָכֶ֜ם 
4b	Genesis	3	6	258	T		-	ntn-loca	locative versus indirect object: L=0 I=0; גַּם־לְאִישָׁ֛הּ 
4b	Genesis	3	12	285	T		-	ntn-loca	locative versus indirect object: L=0 I=0; עִמָּדִ֔י 
4b	Genesis	3	12	286	T		-	ntn-loca	locative versus indirect object: L=0 I=0; מִן־הָעֵ֖ץ 
4b	Genesis	4	12	385	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָ֑ךְ 
4b	Genesis	9	2	762	T		?	ntn-loca	locative versus indirect object: L=1 I=0; בְּיֶדְכֶ֥ם 
4b	Genesis	9	3	766	T		?	ntn-loca	locative versus indirect object: L=0 I=1; לָכֶ֖ם 
4b	Genesis	9	13	797	T		?	ntn-loca	locative versus indirect object: L=1 I=0; בֶּֽעָנָ֑ן 


Download the result files from my SURFdrive: [tab separated file](https://surfdrive.surf.nl/files/public.php?service=files&t=9c80db85478ae91a74cb3408a24a1b14) and a formatted [openoffice spreadsheet](https://surfdrive.surf.nl/files/public.php?service=files&t=38df9044eb5ac29a09a9daff81e1f276)

# Generate a manual annotation file

We need per note:
1. Data version
1. Book
1. Chapter number
1. Verse number
1. Clause atom number
1. Status
1. Message