This is an exercise based on a question by Stephen Ku:

>I'm searching for direct objects within an `Ellp` clause whose `Pred` is not in the immediately preceding clause.

>My logic is that if there is already a verb in the immediately preceding clause, then any verb found in a clause prior to the immediately preceding clause is probably not for the direct object in the `Ellp`.

> The query below searches for two `Pred`s in two clauses preceding and `Ellp` in which the direct object is found.

>The assumption is that in this scenario, the first `Pred` does not go with the direct object, but this assumption is not always true.

>Is there a way to search more accurately for a `Pred`/`Objc` pair in which the `Pred` and `Objc` can be several clauses apart (even when there is an intervening `Pred`) and the results are always correct? Can `mother` be used somehow for this?

In [None]:
import collections
from tf.fabric import Fabric

In [1]:
query1 = """
sentence
  c1:clause
    phrase function=Pred
      word pdp=verb
  c2:clause
    phrase function=Pred
  c3:clause typ=Ellp
    phrase function=Objc
      word pdp=subs|nmpr|prps|prde|prin
  c1 << c2
  c2 << c3
"""

Let's see what is happening here.

# Load data
We load the some features of the
[BHSA](https://github.com/etcbc/bhsa) data.
See the [feature documentation](https://etcbc.github.io/bhsa/features/hebrew/2017/0_home.html) for more info.

In [17]:
BHSA = "BHSA/tf/2017"

TF = Fabric(locations="~/github/etcbc", modules=BHSA)
api = TF.load(
    """
    function
    mother
"""
)
api.makeAvailableIn(globals())

This is Text-Fabric 3.2.2
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

115 features found and 0 ignored
  0.00s loading features ...
   |     0.09s B function             from /Users/dirk/github/etcbc/BHSA/tf/2017
   |     3.38s B mother               from /Users/dirk/github/etcbc/BHSA/tf/2017
   |     0.00s Feature overview: 109 for nodes; 5 for edges; 1 configs; 7 computed
  9.97s All features loaded/computed - for details use loadLog()


In [8]:
results = list(S.search(query1))

In [9]:
len(results)

1410

In [11]:
for r in results[0:10]:
    print(S.glean(r))

sentence[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] clause[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] phrase[שְׁלִ֔יחַ ] שְׁלִ֔יחַ  clause[לְבַקָּרָ֥א עַל־יְה֖וּד וְ...] phrase[לְבַקָּרָ֥א ] clause[וְכֹל֙ כְּסַ֣ף וּדְהַ֔ב ] phrase[כֹל֙ כְּסַ֣ף וּדְהַ֔ב ] דְהַ֔ב 
sentence[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] clause[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] phrase[שְׁלִ֔יחַ ] שְׁלִ֔יחַ  clause[וּלְהֵיבָלָ֖ה כְּסַ֣ף וּ...] phrase[לְהֵיבָלָ֖ה ] clause[וְכֹל֙ כְּסַ֣ף וּדְהַ֔ב ] phrase[כֹל֙ כְּסַ֣ף וּדְהַ֔ב ] דְהַ֔ב 
sentence[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] clause[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] phrase[שְׁלִ֔יחַ ] שְׁלִ֔יחַ  clause[דִּֽי־מַלְכָּ֣א וְיָעֲטֹ֗והִי הִתְנַדַּ֨בוּ֙ ...] phrase[הִתְנַדַּ֨בוּ֙ ] clause[וְכֹל֙ כְּסַ֣ף וּדְהַ֔ב ] phrase[כֹל֙ כְּסַ֣ף וּדְהַ֔ב ] דְהַ֔ב 
sentence[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] clause[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] phrase[שְׁלִ֔יחַ ] שְׁלִ֔יחַ  clause[לְבַקָּרָ֥א עַל־יְה֖וּד וְ...] phrase[לְבַקָּרָ֥א ] clause[וְכֹל֙ כְּסַ֣ף וּדְהַ֔ב ] phrase[כֹל֙ כְּסַ֣ף וּדְהַ֔ב ] כֹל֙ 
sentence[כָּל־קֳבֵ֗ל דִּי֩ מִן־...] clause[כָּל־קֳבֵ֗ל דִּ

# Mothers and phrases

Just to make sure: are there mother edges that arrive at an `Objc` phrase or at a `Pred` phrase?
Let's explore the distribution of mother edges.
First we need to see between what node types they occur, and then we see between what
phrase functions they occur.

We start with a generic function that shows the distribution of a data set.
Think of the data set as a mapping from nodes to sets of nodes.
If we have a property of nodes, we want to see how many times a nodes with property value1
are mapped to nodes of property value 2.

The next function takes `data`, which is a node mapping to sets, and `dataKey`, which is a *function* that gives a value for each node.

In [35]:
def showDist(data, dataKey):
    dataDist = collections.Counter()
    for (p, ms) in data.items():
        for m in ms:
            dataDist[(dataKey(p), dataKey(m))] += 1
    for (combi, amount) in sorted(
        dataDist.items(),
        key=lambda x: (-x[1], x[0]),
    ):
        print(f'{amount:>3} x {" => ".join(str(s) for s in combi)}')

First we get the distribution of node types between which mother edges may occur.

In [36]:
allMothers = {}

for n in N():
    allMothers[n] = set(E.mother.f(n))

showDist(allMothers, F.otype.v)

89580 x clause_atom => clause_atom
34883 x subphrase => word
22009 x subphrase => subphrase
13907 x clause => clause
12497 x phrase_atom => phrase_atom
5716 x clause => phrase
1835 x phrase_atom => word
1167 x clause => word
506 x phrase => phrase
 51 x phrase => clause
  8 x phrase => word


Given that there are more than 250,000 phrases, it is clear that the mother relation is only used very sparsely among phrases. We are going to show how many times there are mothers between phrases with specified `function`s.

Probably, we should look at phrase_atoms.

In [37]:
phraseMothers = collections.defaultdict(set)

for p in F.otype.s("phrase_atom"):
    mothers = E.mother.f(p)
    for m in mothers:
        if F.otype.v(m) == "phrase_atom":
            phraseMothers[p].add(m)

In [38]:
len(phraseMothers)

12497

In [39]:
showDist(phraseMothers, F.function.v)

12497 x None => None


Ah, phrase_atoms do not have functions!
We just take the function of the phrase they are contained in.

In [40]:
def getFunction(pa):
    return F.function.v(L.u(pa, otype="phrase")[0])

In [41]:
showDist(phraseMothers, getFunction)

3875 x Subj => Subj
2502 x Objc => Objc
2278 x Cmpl => Cmpl
1598 x PreC => PreC
917 x Adju => Adju
451 x Time => Time
312 x Loca => Loca
244 x Frnt => Frnt
202 x Voct => Voct
 62 x Modi => Modi
 23 x PreO => PreO
 19 x PrAd => PrAd
  8 x PreS => PreS
  4 x PrcS => PrcS
  1 x PtcO => PtcO
  1 x Ques => Ques


That's a pity. It seems that the mother edges between phrase_atoms only link phrase_atoms in the same phrase.

Let's have a look at the mothers between phrases after all.

In [42]:
phraseMothers = collections.defaultdict(set)

for p in F.otype.s("phrase"):
    mothers = E.mother.f(p)
    for m in mothers:
        if F.otype.v(m) == "phrase":
            phraseMothers[p].add(m)

In [43]:
showDist(phraseMothers, F.function.v)

123 x Subj => Frnt
 72 x PrAd => Objc
 66 x PrAd => Subj
 59 x PrAd => Pred
 32 x PreO => Frnt
 31 x PrAd => PreO
 22 x PreC => Frnt
 20 x Modi => Frnt
 17 x Cmpl => Frnt
 14 x Objc => Frnt
 11 x Pred => Frnt
 10 x IntS => Frnt
  6 x NCoS => Frnt
  4 x PrAd => Cmpl
  3 x PrAd => Rela
  2 x Adju => Frnt
  2 x Loca => Frnt
  2 x PrAd => PreC
  2 x Time => Frnt
  1 x Conj => Frnt
  1 x ExsS => Frnt
  1 x ModS => Frnt
  1 x PrAd => Adju
  1 x PrAd => Conj
  1 x PrAd => IntS
  1 x PrAd => Modi
  1 x Ques => Frnt


All cases have to do with `Frnt` and `PreAd`. So the mother is definitely not helping out with Stephen's original question.