<a href="http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71" target="_blank"><img align="left"src="images/etcbc4easy-small.png"/></a>
<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-xsmall.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-xsmall.png"/></a>

# Verbless mothers

Joint work of Martijn Naaijer and Dirk Roorda and Constantijn Sikkel.

During preparation of his contribution to SBL 2015 Martijn Naaijer discovered coding errors in the ETCBC4b database.
Here we collect those instances.

# Specification

There are pairs of mother-daugther clauses, where the daughter is an object clause and the mother has no verbs.
That is strange, because the object clause should function as a direct object in the main clause.
In this notebook we look up all those cases in two ways: 

1. by means of walking programmatically through the data and collecting all instances,
   [see note set in SHEBANQ](https://shebanq.ancient-data.org/hebrew/text?id=Mnx2ZXJibGVzc19tb3RoZXI_&page=1&version=4b&mr=r&qw=n&tp=txt_tb1&tr=hb),
1. by means of an MQL query,
   [see query in SHEBANQ](https://shebanq.ancient-data.org/hebrew/query?version=4b&id=981).

# Results

There are 26 results, and they all seem to involve some sort of coding error.

# Discussion

Both methods for collecting the cases do not give the same result set.
The walk yields 26 results, and the query yields 25 of them and misses 1.
All those cases involve gaps in one of the clauses.
So the query should have been reformulated to allow for mothers and daughters to have a more general
spatial relationship.

This may serve as a warning how tricky it is to write MQL queries of which you can be confident that they cover all cases.
The walk is much more robust, in that you can carefully collect the parts of the solution in isolation.

In [17]:
# discrepancies between the two methods
HTML(''.join(results_diff_html))

0,1,2
1,Judices 7:12,כַּחֹ֛ול שֶׁעַל־שְׂפַ֥ת הַיָּ֖ם לָרֹֽב׃


In [18]:
# all results
HTML(''.join(results_html))

0,1,2
1,Genesis 33:13,אֲדֹנִ֤י יֹדֵ֨עַ֙ כִּֽי־הַיְלָדִ֣ים רַכִּ֔ים וְהַצֹּ֥אן וְהַבָּקָ֖ר עָלֹ֣ות עָלָ֑י
2,Genesis 43:27,הֲשָׁלֹ֛ום אֲבִיכֶ֥ם הַזָּקֵ֖ן אֲשֶׁ֣ר אֲמַרְתֶּ֑ם
3,Genesis 43:29,הֲזֶה֙ אֲחִיכֶ֣ם הַקָּטֹ֔ן אֲשֶׁ֥ר אֲמַרְתֶּ֖ם אֵלָ֑י
4,Exodus 5:2,מִ֤י יְהוָה֙ אֲשֶׁ֣ר אֶשְׁמַ֣ע בְּקֹלֹ֔ו לְשַׁלַּ֖ח אֶת־יִשְׂרָאֵ֑ל
5,Exodus 6:26,ה֥וּא אַהֲרֹ֖ן וּמֹשֶׁ֑ה אֲשֶׁ֨ר אָמַ֤ר יְהוָה֙ לָהֶ֔ם
6,Exodus 8:17,וְגַ֥ם הָאֲדָמָ֖ה אֲשֶׁר־הֵ֥ם עָלֶֽיהָ׃
7,Exodus 9:26,רַ֚ק בְּאֶ֣רֶץ גֹּ֔שֶׁן אֲשֶׁר־שָׁ֖ם בְּנֵ֣י יִשְׂרָאֵ֑ל
8,Exodus 16:15,ה֣וּא הַלֶּ֔חֶם אֲשֶׁ֨ר נָתַ֧ן יְהוָ֛ה לָכֶ֖ם לְאָכְלָֽה׃
9,Exodus 16:16,זֶ֤ה הַדָּבָר֙ אֲשֶׁ֣ר צִוָּ֣ה יְהוָ֔ה
10,Exodus 16:32,זֶ֤ה הַדָּבָר֙ אֲשֶׁ֣ר צִוָּ֣ה יְהוָ֔ה


# Firing up the engines

In [1]:
import sys,os
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
from etcbc.mql import MQL
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.5.4
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



# Loading the data

In [2]:
source = 'etcbc'
version = '4b'

In [3]:
API = fabric.load(source+version, '--', 'verblessmothers', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        otype oid 
        sp typ rela kind
        g_word_utf8 trailer_utf8
        book chapter verse number
    ''','''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='NORMAL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s INFO: USING DATA COMPILED AT: 2015-11-02T15-08-56
  5.04s LOGFILE=/Users/dirk/SURFdrive/laf-fabric-output/etcbc4b/verblessmothers/__log__verblessmothers.txt
    15s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
  0.00s LOADING API with EXTRAs: please wait ... 
  0.00s INFO: USING DATA COMPILED AT: 2015-11-02T15-08-56
  0.01s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX -- FOR TASK verblessmothers AT 2015-11-17T14-21-52
  0.00s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX -- FOR TASK verblessmothers AT 2015-11-17T14-21-52


# Check on clause kind

We are going to use the
[kind](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/kind.html) 
feature on clauses, which is related to the
[typ](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/typ.html)
feature.
We perform a routine check to see whether the indicated relation ship between ``kind`` values and ``typ`` values 
holds for all clauses.

In [4]:
msg("Checking")
kinds = {
    'VC': {
        'InfA','InfC','Ptcp','Way0','WayX','WIm0','WImX','WQt0','WQtX','WxI0',
        'WXIm','WxIX','WxQ0','WXQt','WxQX','WxY0','WXYq','WxYX','WYq0','WYqX',
        'xIm0','XImp','xImX','xQt0','XQtl','xQtX','xYq0','XYqt','xYqX','ZIm0',
        'ZImX','ZQt0','ZQtX','ZYq0','ZYqX',
    },
    'NC': {'AjCl','NmCl'},
    'WP': {'CPen','Ellp','MSyn','Reop','Voct','XPos'},
}

errors = collections.defaultdict(lambda: set())
for n in F.otype.s('clause'):
    typ = F.typ.v(n)
    kind = F.kind.v(n)
    if typ not in kinds.get(kind, set()):
        errors[(kind, typ)].add(n)
        
if errors:
    msg("There were errors")
else:
    msg("All typ values have been correctly assigned to kind values")

    15s Checking
    16s All typ values have been correctly assigned to kind values


# Making a mother-daughter clause index

In [5]:
mothers_objc = collections.defaultdict(lambda: set())
msg('Looking for mothers ...')
for cd in F.otype.s('clause'):
    if F.rela.v(cd) != 'Objc': continue
    for cm in C.mother.v(cd): mothers_objc[cd].add(cm)
msg('Done')

 1m 18s Looking for mothers ...
 1m 19s Done


# Gathering verbless mothers of object clauses

The first idea is that when a clause does not have a verb, it cannot have a subordinate object clause.
However, there are cases of verbless clauses that can (correctly) have object clauses, such as
clauses where the verb is missing because of ellipsis. There are a few other possible causes as well.

The relevant criterium turns out to be: the clause
[kind](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/kind.html)
feature is ``NC``.

So, we look at *nominal* mothers rather than *verbless* mother.
Note that there are nominal mothers that are not verbless.

In [6]:
no_mothers = set()
verbless_mothers = set()
nominal_mothers = set()
multiple_sentence = set()

def is_verbless(c): return 'verb' not in {F.sp.v(w) for w in L.d('word', c)}
def is_nominal(c): return F.kind.v(c) == 'NC'

msg('Looking for verbless mothers ...')
for cd in F.otype.s('clause'):
    if F.rela.v(cd) != 'Objc': continue
    if cd not in mothers_objc:
        no_mothers.add(cd)
        continue
    for cm in mothers_objc[cd]:
        if is_verbless(cm): verbless_mothers.add((cd, cm))
        if is_nominal(cm): nominal_mothers.add((cd, cm))
for (cd, cm) in verbless_mothers | nominal_mothers:
    if L.u('sentence', cd) != L.u('sentence', cm):
        multiple_sentence.add((cd, cm))
msg('{} {}; {} {}; {} {}; of which {} {}'.format(
    len(no_mothers), 'objc-clauses without mother',
    len(verbless_mothers), 'objc-clauses with verbless mother',
    len(nominal_mothers), 'objc-clauses with nominal mother',
    len(multiple_sentence), 'in a different sentence',
))
if nominal_mothers <= verbless_mothers:
    msg('All nominal mothers are also verbless')
else:
    nom_with_verb = nominal_mothers - verbless_mothers
    msg('Some nominal mothers are not verbless: {}x'.format(len(nom_with_verb)))
    for (cd, cm) in nom_with_verb:
        words = L.d('word', cm)
        fw = words[0]
        passage = '{} {}:{}'.format(
            F.book.v(L.u('book', fw)),
            F.chapter.v(L.u('chapter', fw)),
            F.verse.v(L.u('verse', fw)),
        )
        text = ''.join('{}{}'.format(F.g_word_utf8.v(w), F.trailer_utf8.v(w)) for w in words)
        print('{} {}\n'.format(passage, text))

 1m 48s Looking for verbless mothers ...
 1m 49s 0 objc-clauses without mother; 52 objc-clauses with verbless mother; 26 objc-clauses with nominal mother; of which 0 in a different sentence
 1m 49s Some nominal mothers are not verbless: 1x


Genesis 33:13 אֲדֹנִ֤י יֹדֵ֨עַ֙ 



# Pretty printing mother-daughter clauses

In [7]:
from IPython.display import HTML, display

In [8]:
css = '''
<style type="text/css">
.m {
  background-color: #ffaaaa;
}
.d {
  background-color: #ccccff;
}
.v {
    font-family: Verdana, Arial, sans-serif;
    font-size: small;
    text-align: right;
    color: #aaaaaa;
    width: 10%;
    direction: ltr;
    border-left: 2px solid #aaaaaa;
    border-right: 2px solid #aaaaaa;
}
.l {
    font-family: Verdana, Arial, sans-serif;
    font-size: normal;
    text-align: center;
}
.t {
    font-family: Ezra SIL, SBL Hebrew, Verdana, sans-serif;
    font-size: x-large;
    line-height: 1.7;
    text-align: right;
    direction: rtl;
    border-left: 2px solid #aaaaaa;
    border-right: 2px solid #aaaaaa;
}
table.t {
    width: 100%;
    direction: rtl;
    border-collapse: collapse;
}
td.t {
    text-align: right;
}
tr.t {
    border-top: 2px solid #aaaaaa;
    border-bottom: 2px solid #aaaaaa;
    border-left: 2px solid #aaaaaa;
    border-right: 2px solid #aaaaaa;
}
</style>
'''
head = '''
<html>
<head>
    <meta http-equiv="Content-Type"
          content="text/html; charset=UTF-8" />
    <title></title>
    {}
</head>
<body>
'''.format(css)
table_head = '''
<table class="t">
'''

table_tail = '''
</table>
'''

tail = '''
</body>
</html>
'''

legend = '''
<p class="l"><span class="m">verbless mother clause</span> <span class="d">daughter object clause</span></p>
'''

def getverse(ca):
    fw = L.d('word', ca)[0]
    return (F.book.v(L.u('book', fw)),
            F.chapter.v(L.u('chapter', fw)),
            F.verse.v(L.u('verse', fw)),
    )

def print_cm(i, cd, cm):
    s = L.u('sentence', cd)
    passages = {getverse(x) for x in (s, cd, cm)}
    dwords = set(L.d('word', cd))
    mwords = set(L.d('word', cm))
    dmwords = dwords | mwords
    htext = []
    hcd = F.number.v(cd)
    hcm = F.number.v(cm)
    for w in L.d('word', s):
        hw = F.g_word_utf8.v(w)
        ht = F.trailer_utf8.v(w)
        if w in dmwords:
            htext.append('<span title="{}" class="{}">{}</span>{}'.format(
                hcd if w in dwords else hcm, 'd' if w in dwords else 'm', hw, ht,
            ))
        else:
            htext.append('{}{}'.format(hw, ht))
    return '<tr class="t"><td class="v">{}</td><td class="v">{}</td><td class="t">{}</td></tr>\n'.format(
        i, 
        ' - '.join('{} {}:{}'.format(*x) for x in sorted(passages, key=lambda x: (x[0], int(x[1]), int(x[2])))),
        ''.join(htext),
    )

In [9]:
h = []
#h.append(head)
h.append(legend)
h.append(table_head)
for (i, (cd, cm)) in enumerate(sorted(nominal_mothers)):
    h.append(print_cm(i+1, cd, cm))
h.append(table_tail)
#h.append(tail)

results_html = h

# Using an MQL query

In [10]:
Q = MQL(API)

In [11]:
mother_obj_query1 = '''
select all objects where
[sentence 
 [clause as clause_mother focus kind = NC]
 ..
 [clause focus mother = clause_mother.self and rela = Objc]
]
'''
mother_obj_query2 = '''
select all objects where
[sentence 
 [clause as clause_daughter focus rela = Objc
 ]
 ..
 [clause focus self = clause_daughter.mother and kind = NC]
]
'''

## Executing the query

In [12]:
sheaf1 = Q.mql(mother_obj_query1)
sheaf2 = Q.mql(mother_obj_query2)

## Collecting the results

In [13]:
nominal_mothers_q = set()
for ((s, ((cm,), (cd,))),) in sheaf1.results(): nominal_mothers_q.add((cd, cm))
for ((s, ((cd,), (cm,))),) in sheaf2.results(): nominal_mothers_q.add((cd, cm))
msg('{} results'.format(len(nominal_mothers_q)))

 2m 31s 25 results


# Examining the differences

In [14]:
print('''
Results by the MQL query that are not delivered by the walk: {}
Results by the walk that are not delivered by the MQL query: {}
'''.format(
    len(nominal_mothers_q - nominal_mothers),
    len(nominal_mothers - nominal_mothers_q),
))


Results by the MQL query that are not delivered by the walk: 0
Results by the walk that are not delivered by the MQL query: 1



## Pretty printing the differences

In [15]:
HTML(css)

In [16]:
h = []
#h.append(head)
h.append(legend)
h.append('''<h1>In MQL query but not in Walk</h1>''')
h.append(table_head)
for (i, (cd, cm)) in enumerate(sorted(nominal_mothers_q - nominal_mothers)):
    h.append(print_cm(i+1, cd, cm))
h.append(table_tail)
h.append('''<h1>In Walk but not in MQL query</h1>''')
h.append(table_head)
for (i, (cd, cm)) in enumerate(sorted(nominal_mothers - nominal_mothers_q)):
    h.append(print_cm(i+1, cd, cm))
h.append(table_tail)
#h.append(tail)

results_diff_html = h

# Generating notes

In [19]:
sfields = '''
    version
    book
    chapter
    verse
    clause_atom
    is_shared
    is_published
    status
    keywords
    ntext
'''.strip().split()

sfields_fmt = ('{}\t' * (len(sfields) - 1)) + '{}\n'

def generate_notes():
    nf = outfile('verbless_mothers.csv')
    nf.write('{}\n'.format('\t'.join(sfields)))
    for (cd, cm) in sorted(verbless_mothers):
        is_nom = (cd, cm) in nominal_mothers
        casd = L.d('clause_atom', cd)
        casm = L.d('clause_atom', cm)
        cald = len(casd)
        calm = len(casm)
        cad = casd[0]
        cam = casm[0]
        cnd = F.number.v(cad)
        cnm = F.number.v(cam)
        if is_nom: # in case of a nominal mother there is probably a coding error
            for (ca, cn, stat, txt) in (
                (cad, cnd, '-',
                 '''{} an object clause to a verbless mother clause. Possibly a coding error. [query](shebanq:?id=981&page=1&version=4b&mr=r&qw=q&tp=txt_p&tr=hb)'''.format(
                    'This is' if cald == 1 else 'This and {} more line{} form'.format(cald-1, '' if cald == 2 else ''),
                )),
                (cam, cnm, '?',
                 '''{} the verbless mother of an object clause. [query](shebanq:?id=981&page=1&version=4b&mr=r&qw=q&tp=txt_p&tr=hb)'''.format(
                    'This is' if calm == 1 else 'This and {} more line{} form'.format(calm-1, '' if calm == 2 else ''),
                )),
            ):
                (bk, ch, vs) = getverse(ca)
                nf.write(sfields_fmt.format(
                    version,
                    bk,
                    ch,
                    vs,
                    cn,
                    'T',
                    '',
                    stat,
                    'verbless_mother',
                    txt,
                ))
        else: # in case of a non-nominal mother this is not an indication that the coding is wrong
            for (ca, cn, stat, txt) in (
                (cad, cnd, '+',
                 '''{} an object clause to a verbless but non-nominal mother clause.'''.format(
                    'This is' if cald == 1 else 'This and {} more line{} form'.format(cald-1, '' if cald == 2 else ''),
                )),
                (cam, cnm, '+',
                 '''{} the verbless but non-nominal mother of an object clause.'''.format(
                    'This is' if calm == 1 else 'This and {} more line{} form'.format(calm-1, '' if calm == 2 else ''),
                )),
            ):
                (bk, ch, vs) = getverse(ca)
                nf.write(sfields_fmt.format(
                    version,
                    bk,
                    ch,
                    vs,
                    cn,
                    'T',
                    '',
                    stat,
                    'verbless_mother',
                    txt,
                ))
    nf.close()

generate_notes()