<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Quran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [3]:
A = use("ETCBC/bhsa", hoist=globals())

This is Text-Fabric 9.3.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

122 features found and 0 ignored


# Quantifiers

Quantifiers add considerable power to search templates.

Quantifiers consist of full-fledged search templates themselves, and give rise to
auxiliary searches being performed.

The use of quantifiers may prevent the need to resort to hand-coding in many cases.
That said, they can also be exceedingly tricky, so that it is advisable to check the results
by hand-coding anyway, until you are perfectly comfortable with them.

# Examples

## Lexemes

It is easy to find the lexemes that occur in a specific book only.
Because the `lex` node of such a lexeme is contained in the node of that specific book.

Lets get the lexemes specific to Ezra and then those specific to Nehemiah.

In [4]:
query = """
book book@en=Ezra
    lex
"""
ezLexemes = A.search(query)
ezSet = {r[1] for r in ezLexemes}

query = """
book book@en=Nehemiah
    lex
"""
nhLexemes = A.search(query)
nhSet = {r[1] for r in nhLexemes}

print(f"Total {len(ezSet | nhSet)} lexemes")

  0.00s 199 results
  0.00s 110 results
Total 309 lexemes


What if we want to have the lexemes that occur only in Ezra and Nehemia?

If such a lexeme occurs in both books, it will not be contained by either book.
So we have missed them by the two queries above.

We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.

With the template constructions you have seen so far, this is impossible to say.

This is where [*quantifiers*](https://annotation.github.io/text-fabric/tf/about/searchusage.html#quantifiers) come in.

## /without/

First we are going to query for these lexemes by means of a `no:` quantifier.

In [5]:
query = """
lex
/without/
book book@en#Ezra|Nehemiah
  w:word
  w ]] ..
/-/
"""
query1results = A.search(query, shallow=True)

  0.69s 382 results


## /where/

Now the `/without/` quantifier is a bit of a roundabout way to say what you really mean.
We can also employ the more positive `/where/` quantifier.

In [6]:
query = """
lex
/where/
  w:word
/have/
b:book book@en=Ezra|Nehemiah
w ]] b
/-/
"""
query2results = A.search(query, shallow=True)

  0.22s 382 results


Check by hand coding:

In [7]:
A.silentOff()
A.indent(reset=True)
universe = F.otype.s("lex")
wordsEzNh = set(
    L.d(T.bookNode("Ezra", lang="en"), otype="word")
    + L.d(T.bookNode("Nehemiah", lang="en"), otype="word")
)
handResults = set()
for lex in universe:
    occs = set(L.d(lex, otype="word"))
    if occs <= wordsEzNh:
        handResults.add(lex)
A.info(len(handResults))

  0.07s 382


Looks good, but we are thorough:

In [8]:
print(query1results == handResults)
print(query2results == handResults)

True
True


## Verb phrases

Let's look for clauses with where all `Pred` phrases contain only verbs and look for `Subj`
phrases in those clauses.

In [9]:
query = """
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
"""
queryResults = A.search(query)

  0.57s 31431 results


In [10]:
A.show(queryResults, end=5, condenseType="sentence")

Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!

Check by hand:

In [11]:
A.indent(reset=True)
handResults = []
for clause in F.otype.s("clause"):
    phrases = L.d(clause, otype="phrase")
    preds = [p for p in phrases if F.function.v(p) == "Pred"]
    good = True
    for pred in preds:
        if any(F.sp.v(w) != "verb" for w in L.d(pred, otype="word")):
            good = False
    if good:
        subjs = [p for p in phrases if F.function.v(p) == "Subj"]
        for subj in subjs:
            handResults.append((clause, subj))
A.info(len(handResults))

  0.38s 31431


In [12]:
queryResults == handResults

True

### Inspection

We can see which templates are being composed in the course of interpreting the quantifier.
We use the good old `S.study()`:

In [13]:
query = """
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
"""
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
   |     0.00s "Quantifier on "parent:clause"
   |      |   /where/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     0.20s 57070 matching nodes
   |      |   /have/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     /without/
   |      |       word sp#verb
   |      |     /-/
   |      |   /-/
   |      |      |   /without/
   |      |      |   parent:phrase function=Pred
   |      |      |     word sp#verb
   |      |      |   /-/
   |      |      |     0.38s 4893 nodes to exclude
   |      |     0.38s reduction from 57070 to 52177 nodes
  0.38s Constraining search space with 1 relations ...
  0.40s 	1 edges thinned
  0.40s Setting up retrieval plan with strategy small_choice_multi ...
  0.48s Ready to deliver results from 104354 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
   |      |     0.49

Observe the stepwise unravelling of the quantifiers, and the auxiliary templates that are distilled
from your original template.

If you ever get syntax errors, run `S.study()` to find clues.

## Subject at start or at end

We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.

In [14]:
query = """
c:clause
/with/
  =: phrase function=Subj
/or/
  := phrase function=Subj
/-/
  phrase
  <: phrase
"""

queryResults = sorted(A.search(query, shallow=True))

  0.39s 15360 results


Check by hand:

In [15]:
A.indent(reset=True)
handResults = []
for clause in F.otype.s("clause"):
    clauseWords = L.d(clause, otype="word")
    phrases = set(L.d(clause, otype="phrase"))
    if any(
        L.n(p, otype="phrase") and (L.n(p, otype="phrase")[0] in phrases)
        for p in phrases
    ):
        # handResults.append(clause)
        # continue
        subjPhrases = [p for p in phrases if F.function.v(p) == "Subj"]
        if any(L.d(p, otype="word")[0] == clauseWords[0] for p in subjPhrases) or any(
            L.d(p, otype="word")[-1] == clauseWords[-1] for p in subjPhrases
        ):
            handResults.append(clause)
A.info(len(handResults))

  0.72s 15360


A nice case where the search template performs better than this particular piece of hand-coding.

In [16]:
queryResults == handResults

True

Let's also study this query:

In [17]:
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
   |     0.00s "Quantifier on "c:clause"
   |      |   /with/
   |      |   c:clause
   |      |     =: phrase function=Subj
   |      |     0.15s adding 5311 to 0 yields 5311 nodes
   |      |   /or/
   |      |   c:clause
   |      |     := phrase function=Subj
   |      |     0.15s adding 11126 to 5311 yields 15949 nodes
   |      |   /-/
   |     0.31s reduction from 88131 to 15949 nodes
  0.18s Constraining search space with 3 relations ...
  0.33s 	2 edges thinned
  0.33s Setting up retrieval plan with strategy small_choice_multi ...
  0.33s Ready to deliver results from 105241 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


## Verb-containing phrases

Suppose we want to collect all phrases with the condition that if they
contain a verb, their `function` is `Pred`.

This is a bit theoretical, but it shows two powerful constructs to increase readability
of quantifiers.

### Unreadable

First we express it without special constructs.

In [18]:
query = """
p:phrase
/where/
  w:word pdp=verb
/have/
q:phrase function=Pred
q = p
/-/
"""
results = A.search(query, shallow=True)

  0.37s 241249 results


We check the query by means of hand-coding:

1. is every result a phrase: either without verbs, or with function Pred?
2. is every phrase without verbs or with function Pred contained in the results?

In [19]:
allPhrases = set(F.otype.s("phrase"))

ok1 = all(
    F.function.v(p) == "Pred" or all(F.pdp.v(w) != "verb" for w in L.d(p, otype="word"))
    for p in results
)
ok2 = all(
    p in results
    for p in allPhrases
    if (
        F.function.v(p) == "Pred"
        or all(F.pdp.v(w) != "verb" for w in L.d(p, otype="word"))
    )
)

print(f"Check 1: {ok1}")
print(f"Check 2: {ok2}")

Check 1: True
Check 2: True


Ok, we are sure that the query does what we think it does.

### Readable

Now let's make it more readable.

In [20]:
query = """
phrase
/where/
  w:word pdp=verb
/have/
.. function=Pred
/-/
"""

In [21]:
results2 = A.search(query, shallow=True)

print(f"Same results as before? {results == results2}")

  0.37s 241249 results
Same results as before? True


Try to see how search is providing the name `parent` to the phrase atom and how it resolves the name `..`:

In [22]:
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 1 objects ...
   |     0.00s "Quantifier on "parent:phrase"
   |      |   /where/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |     0.28s 69025 matching nodes
   |      |   /have/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |   parent function=Pred
   |      |   /-/
   |      |     0.35s 57070 matching nodes
   |      |     0.37s 11954 match antecedent but not consequent
   |     0.66s reduction from 253203 to 241249 nodes
  0.37s Constraining search space with 0 relations ...
  0.37s 	0 edges thinned
  0.37s Setting up retrieval plan with strategy small_choice_multi ...
  0.37s Ready to deliver results from 241249 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates

---

[advanced](searchAdvanced.ipynb)
[sets](searchSets.ipynb)
[relations](searchRelations.ipynb)
quantifiers

You have come far.

Time to have a look at prior work.

[from MQL](searchFromMQL.ipynb)
[rough](searchRough.ipynb)
[gaps](searchGaps.ipynb)

---

* **[export Excel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[export](export.ipynb)** export your dataset as an Emdros database
* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus
* **[volumes](volumes.ipynb)** work with selected books only
* **[trees](trees.ipynb)** work with the BHSA data as syntax trees

CC-BY Dirk Roorda