<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [3]:
VERSION = "2017"

In [4]:
# A = use('bhsa', hoist=globals(), version=VERSION)
A = use("bhsa:clone", checkout="clone", hoist=globals(), version=VERSION)

# Quantifiers

Quantifiers add considerable power to search templates.

Quantifiers consist of full-fledged search templates themselves, and give rise to
auxiliary searches being performed.

The use of quantifiers may prevent the need to resort to hand-coding in many cases.
That said, they can also be exceedingly tricky, so that it is advisable to check the results
by hand-coding anyway, until you are perfectly comfortable with them.

# Examples

## Lexemes

It is easy to find the lexemes that occur in a specific book only.
Because the `lex` node of such a lexeme is contained in the node of that specific book.

Lets get the lexemes specific to Ezra and then those specific to Nehemiah.

In [5]:
query = """
book book@en=Ezra
    lex
"""
ezLexemes = A.search(query)
ezSet = {r[1] for r in ezLexemes}

query = """
book book@en=Nehemiah
    lex
"""
nhLexemes = A.search(query)
nhSet = {r[1] for r in nhLexemes}

print(f"Total {len(ezSet | nhSet)} lexemes")

  0.01s 199 results
  0.01s 110 results
Total 309 lexemes


What if we want to have the lexemes that occur only in Ezra and Nehemia?

If such a lexeme occurs in both books, it will not be contained by either book.
So we have missed them by the two queries above.

We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.

With the template constructions you have seen so far, this is impossible to say.

This is where [*quantifiers*](https://annotation.github.io/text-fabric/about/searchusage.html#quantifiers) come in.

## /without/

First we are going to query for these lexemes by means of a `no:` quantifier.

In [6]:
query = """
lex
/without/
book book@en#Ezra|Nehemiah
  w:word
  w ]] ..
/-/
"""
query1results = A.search(query, shallow=True)

  1.86s 382 results


## /where/

Now the `/without/` quantifier is a bit of a roundabout way to say what you really mean.
We can also employ the more positive `/where/` quantifier.

In [7]:
query = """
lex
/where/
  w:word
/have/
b:book book@en=Ezra|Nehemiah
w ]] b
/-/
"""
query2results = A.search(query, shallow=True)

  0.76s 382 results


Check by hand coding:

In [8]:
A.silentOff()
A.indent(reset=True)
universe = F.otype.s("lex")
wordsEzNh = set(
    L.d(T.bookNode("Ezra", lang="en"), otype="word")
    + L.d(T.bookNode("Nehemiah", lang="en"), otype="word")
)
handResults = set()
for lex in universe:
    occs = set(L.d(lex, otype="word"))
    if occs <= wordsEzNh:
        handResults.add(lex)
A.info(len(handResults))

  0.13s 382


Looks good, but we are thorough:

In [9]:
print(query1results == handResults)
print(query2results == handResults)

True
True


## Verb phrases

Let's look for clauses with where all `Pred` phrases contain only verbs and look for `Subj`
phrases in those clauses.

In [10]:
query = """
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
"""
queryResults = A.search(query)

  1.45s 31399 results


In [11]:
A.show(queryResults, end=5, condenseType="sentence")

Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!

Check by hand:

In [12]:
A.indent(reset=True)
handResults = []
for clause in F.otype.s("clause"):
    phrases = L.d(clause, otype="phrase")
    preds = [p for p in phrases if F.function.v(p) == "Pred"]
    good = True
    for pred in preds:
        if any(F.sp.v(w) != "verb" for w in L.d(pred, otype="word")):
            good = False
    if good:
        subjs = [p for p in phrases if F.function.v(p) == "Subj"]
        for subj in subjs:
            handResults.append((clause, subj))
A.info(len(handResults))

  1.07s 31399


In [13]:
queryResults == handResults

True

### Inspection

We can see which templates are being composed in the course of interpreting the quantifier.
We use the good old `S.study()`:

In [14]:
query = """
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
"""
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
   |     0.00s "Quantifier on "parent:clause"
   |      |   /where/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     0.51s 57070 matching nodes
   |      |   /have/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     /without/
   |      |       word sp#verb
   |      |     /-/
   |      |   /-/
   |      |      |   /without/
   |      |      |   parent:phrase function=Pred
   |      |      |     word sp#verb
   |      |      |   /-/
   |      |      |     0.90s 4893 nodes to exclude
   |      |     0.91s reduction from 57070 to 52177 nodes
  0.90s Constraining search space with 1 relations ...
  0.98s 	1 edges thinned
  0.98s Setting up retrieval plan with strategy small_choice_multi ...
  1.20s Ready to deliver results from 104354 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
   |      |     1.21

Observe the stepwise unraveling of the quantifiers, and the auxiliary templates that are distilled
from your original template.

If you ever get syntax errors, run `S.study()` to find clues.

## Subject at start or at end

We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.

In [15]:
query = """
c:clause
/with/
  =: phrase function=Subj
/or/
  := phrase function=Subj
/-/
  phrase
  <: phrase
"""

queryResults = sorted(A.search(query, shallow=True))

  1.15s 15332 results


Check by hand:

In [16]:
A.indent(reset=True)
handResults = []
for clause in F.otype.s("clause"):
    clauseWords = L.d(clause, otype="word")
    phrases = set(L.d(clause, otype="phrase"))
    if any(
        L.n(p, otype="phrase") and (L.n(p, otype="phrase")[0] in phrases)
        for p in phrases
    ):
        # handResults.append(clause)
        # continue
        subjPhrases = [p for p in phrases if F.function.v(p) == "Subj"]
        if any(L.d(p, otype="word")[0] == clauseWords[0] for p in subjPhrases) or any(
            L.d(p, otype="word")[-1] == clauseWords[-1] for p in subjPhrases
        ):
            handResults.append(clause)
A.info(len(handResults))

  2.16s 15332


A nice case where the search template performs better than this particular piece of hand-coding.

In [17]:
queryResults == handResults

True

Let's also study this query:

In [18]:
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
   |     0.00s "Quantifier on "c:clause"
   |      |   /with/
   |      |   c:clause
   |      |     =: phrase function=Subj
   |      |     0.38s adding 5297 to 0 yields 5297 nodes
   |      |   /or/
   |      |   c:clause
   |      |     := phrase function=Subj
   |      |     0.42s adding 11118 to 5297 yields 15924 nodes
   |      |   /-/
   |     0.42s reduction from 88101 to 15924 nodes
  0.50s Constraining search space with 3 relations ...
  0.96s 	2 edges thinned
  0.96s Setting up retrieval plan with strategy small_choice_multi ...
  0.97s Ready to deliver results from 105062 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


## Verb-containing phrases

Suppose we want to collect all phrases with the condition that if they
contain a verb, their `function` is `Pred`.

This is a bit theoretical, but it shows two powerful constructs to increase readability
of quantifiers.

### Unreadable

First we express it without special constructs.

In [19]:
query = """
p:phrase
/where/
  w:word pdp=verb
/have/
q:phrase function=Pred
q = p
/-/
"""
results = A.search(query, shallow=True)

  1.04s 241232 results


We check the query by means of hand-coding:

1. is every result a phrase: either without verbs, or with function Pred?
2. is every phrase without verbs or with function Pred contained in the results?

In [20]:
allPhrases = set(F.otype.s("phrase"))

ok1 = all(
    F.function.v(p) == "Pred" or all(F.pdp.v(w) != "verb" for w in L.d(p, otype="word"))
    for p in results
)
ok2 = all(
    p in results
    for p in allPhrases
    if (
        F.function.v(p) == "Pred"
        or all(F.pdp.v(w) != "verb" for w in L.d(p, otype="word"))
    )
)

print(f"Check 1: {ok1}")
print(f"Check 2: {ok2}")

Check 1: True
Check 2: True


Ok, we are sure that the query does what we think it does.

### Readable

Now let's make it more readable.

In [21]:
query = """
phrase
/where/
  w:word pdp=verb
/have/
.. function=Pred
/-/
"""

In [22]:
results2 = A.search(query, shallow=True)

print(f"Same results as before? {results == results2}")

  0.92s 241232 results
Same results as before? True


Try to see how search is providing the name `parent` to the phrase atom and how it resolves the name `..`:

In [23]:
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 1 objects ...
   |     0.00s "Quantifier on "parent:phrase"
   |      |   /where/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |     0.73s 69026 matching nodes
   |      |   /have/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |   parent function=Pred
   |      |   /-/
   |      |     1.01s 57070 matching nodes
   |      |     1.03s 11955 match antecedent but not consequent
   |     1.04s reduction from 253187 to 241232 nodes
  1.04s Constraining search space with 0 relations ...
  1.04s 	0 edges thinned
  1.04s Setting up retrieval plan with strategy small_choice_multi ...
  1.04s Ready to deliver results from 241232 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates

---

[advanced](searchAdvanced.ipynb)
[sets](searchSets.ipynb)
[relations](searchRelations.ipynb)
quantifiers

You have come far.

Time to have a look at prior work.

[fromMQL](searchFromMQL.ipynb)
[rough](searchRough.ipynb)
[gaps](searchGaps.ipynb)

---

* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[export](export.ipynb)** export your dataset as an Emdros database
* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
* **[volumes](volumes.ipynb)** work with selected books only
* **[trees](trees.ipynb)** work with the BHSA data as syntax trees

CC-BY Dirk Roorda