<img align="right" src="images/dans-small.png"/>
<img align="right" src="images/tf-small.png"/>
<img align="right" src="images/etcbc.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.fabric import Fabric
from tf.extra.bhsa import Bhsa

In [3]:
VERSION = '2017'
DATABASE = '~/github/etcbc'
BHSA = f'bhsa/tf/{VERSION}'
PARA = f'parallels/tf/{VERSION}'
TF = Fabric(locations=[DATABASE], modules=[BHSA, PARA], silent=True )

In [4]:
api = TF.load('', silent=True)
api.makeAvailableIn(globals())
B = Bhsa(api, 'search', version=VERSION)

**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="{provenance of this corpus}">BHSA</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/hebrew/2017/0_home.html" title="{CORPUS} feature documentation">Feature docs</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/Bhsa/" title="BHSA API documentation">BHSA API</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/" title="text-fabric-api">Text-Fabric API 4.2.1</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/#search-templates" title="Search Templates Introduction and Reference">Search Reference</a>


This notebook online:
<a target="_blank" href="http://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/search.ipynb">NBViewer</a>
<a target="_blank" href="https://github.com/etcbc/bhsa/blob/master/tutorial/search.ipynb">GitHub</a>


# Quantifiers

# Disclaimer

**This part of search templates is still experimental.**

* **bugs may be discovered**
* **the syntax of quantifiers may change**

Quantifiers add considerable power to search templates.

Quantifiers consist of full-fledged search templates themselves, and give rise to 
auxiliary searches being performed.

The use of quantifiers may prevent the need to resort to hand-coding in many cases.
That said, they can also be exceedingly tricky, so that it is advisable to check the results
by hand-coding anyway, until you are perfectly comfortable with them.

# Examples

## Lexemes

It is easy to find the lexemes that occur in a specific book only.
Because the `lex` node of such a lexeme is contained in the node of that specific book.

Lets get the lexemes specific to Ezra and then those specific to Nehemiah.

In [5]:
query = '''
book book@en=Ezra
    lex
'''
ezLexemes = B.search(query)
ezSet = {r[1] for r in ezLexemes}

query = '''
book book@en=Nehemiah
    lex
'''
nhLexemes = B.search(query)
nhSet = {r[1] for r in nhLexemes}

print(f'Total {len(ezSet | nhSet)} lexemes')

199 results
110 results
Total 309 lexemes


What if we want to have the lexemes that occur only in Ezra and Nehemia?

If such a lexeme occurs in both books, it will not be contained by either book.
So we have missed them by the two queries above.

We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.

With the template constructions you have seen so far, this is impossible to say.

This is where [*quantifiers*](https://dans-labs.github.io/text-fabric/Api/General/#quantifiers) come in.

## no: end:

First we are going to query for these lexemes by means of a `no:` quantifier.

In [6]:
query = '''
lex
no:
  ^ w:word
  b:book book@en#Ezra|Nehemiah
  w ]] b
end:
'''
indent(reset=True)
query1results = B.search(query, shallow=True)
info('Done')

382 results
  1.61s Done


Note the caret `^`. It serves to indicate the position of zero indentation with respect to the
previous atom (`lex`). The caret itself plus the following white space is used as indentation
relative `lex`. One caret per quantifier suffices.

## all: have:

Now the `no:` quantifier is a bit of a roundabout way to say what you really mean.
We can also employ the `all: have:` quantifier.

In [7]:
query = '''
lex
all:
  ^ w:word
have:
  b:book book@en=Ezra|Nehemiah
  w ]] b
end:
'''
indent(reset=True)
query2results = B.search(query, shallow=True)
info('Done')

382 results
  1.13s Done


Check by hand coding:

In [8]:
indent(reset=True)
universe = F.otype.s('lex')
wordsEzNh = set(
    L.d(T.bookNode('Ezra', lang='en'), otype='word') + 
    L.d(T.bookNode('Nehemiah', lang='en'), otype='word')
)
handResults = set()
for lex in universe:
    occs = set(L.d(lex, otype='word'))
    if occs <= wordsEzNh:
        handResults.add(lex)
info(len(handResults))

  0.22s 382


Looks good, but we are thorough:

In [9]:
print(query1results == handResults)
print(query2results == handResults)

True
True


## Verb phrases

Let's look for clauses with where all `Pred` phrases contain only verbs and look for `Subj`
phrases in those clauses.

In [10]:
query = '''
clause
all:
  ^ phrase function=Pred
have:
    no:
      ^ word sp#verb
    end:
end:
  phrase function=Subj
'''
indent(reset=True)
queryResults = B.search(query)
info('Done')

B.show(queryResults, end=5)

31399 results
  2.37s Done



##### Passage 1



##### Passage 2



##### Passage 3



##### Passage 4



##### Passage 5


Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!

Check by hand:

In [11]:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    preds = [p for p in phrases if F.function.v(p) == 'Pred']
    good = True
    for pred in preds:
        if any(F.sp.v(w) != 'verb' for w in L.d(pred, otype='word')):
            good = False
    if good:
        subjs = [p for p in phrases if F.function.v(p) == 'Subj']
        for subj in subjs:
            handResults.append((clause, subj))
info(len(handResults))

  1.37s 31399


In [12]:
queryResults == handResults

True

## Subject at start or at end

We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.

In [13]:
query = '''
c:clause
  either:
    ^ =: phrase function=Subj
  or:
    ^ := phrase function=Subj
  end: 
  phrase
  <: phrase
'''

indent(reset=True)
queryResults = sorted(B.search(query, shallow=True))
info('Done')

15332 results
  1.63s Done


Check by hand:

In [14]:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    clauseWords = L.d(clause, otype='word')
    phrases = set(L.d(clause, otype='phrase'))
    if any(L.n(p, otype='phrase') and (L.n(p, otype='phrase')[0] in phrases) for p in phrases):
        # handResults.append(clause)
        # continue
        subjPhrases = [p for p in phrases if F.function.v(p) == 'Subj']
        if (
            any(L.d(p, otype='word')[0] == clauseWords[0] for p in subjPhrases)
            or
            any(L.d(p, otype='word')[-1] == clauseWords[-1] for p in subjPhrases)
        ):
            handResults.append(clause)
info(len(handResults))

  2.85s 15332


A nice case where the search template performs better than this particular piece of hand-coding.

In [15]:
queryResults == handResults

True

# Next

You master the theory.

In practice, their are pitfalls:
[rough edges](searchRough.ipynb)

---

[basic](search.ipynb)
[advanced](searchAdvanced.ipynb)
[relations](searchRelations.ipynb)
quantifiers
[rough](searchRough.ipynb)
[gaps](searchGaps.ipynb)