<img align="right" src="tf-small.png"/>

# Search

*Search* in Text-Fabric is a template based way of looking for structural patterns in your dataset.

It is inspired by the idea of
[topographic query](http://books.google.nl/books?id=9ggOBRz1dO4C),
as worked out in 
[MQL](https://shebanq.ancient-data.org/shebanq/static/docs/MQL-Query-Guide.pdf)
which has been implemented in 
[Emdros](http://emdros.org).
See also [pitfalls of MQL](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_mql.html)

Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for
complicated syntactical patterns with the power of programmatically processing the results.

This notebook will show you how to get up and running.

# Before we continue
Search is a big feature in Text-Fabric.
It is also a very recent addition.

##### Caution:
> There might be bugs.

Search is also costly.
A lot of work of implementing search has been dedicated to optimize performance.
But the search templates are very powerful, and can be very diverse.
I do not pretend to have found strategies that work optimally for all search templates.

That being said, I think search might turn out useful in many cases, and I welcome your feedback.

*Dirk Roorda, 2016-12-23*

# Search command

It al starts by saying (just an example)

```
S.study('''
# here comes my search template:

c:clause
    p1:phrase det=und
        =: word sp=verb gn=f nu=pl ps=p3
        <  word sp=subs
        :=
    p2:phrase
    
  p1 < p2  
''')
```

See all ins and outs in the
[search template reference](https://github.com/ETCBC/text-fabric/wiki/Api#search-template-reference).

All search related things use the
[`S` api](https://github.com/ETCBC/text-fabric/wiki/Api#search).

In [30]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [31]:
from tf.fabric import Fabric

In [32]:
ETCBC = 'hebrew/etcbc4c'
TF = Fabric( modules=ETCBC )

This is Text-Fabric 2.0.0
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
107 features found and 0 ignored


Let us just *not* load any specific features.

In [33]:
api = TF.load('')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.00s Feature overview: 102 nodes; 4 edges; 1 configs; 7 computeds
  5.61s All features loaded/computed - for details use loadLog()


Here is a simple query to get started.
We are interested in two lexemes, but we would also like to fetch the nodes in their context.

##### Note
> This is not a very good use case, 
because in Text-Fabric it is easy to find context nodes around your nodes of interest.

In [34]:
query = '''
book
  chapter
    verse
      clause
        clause_atom
          phrase
            phrase_atom
              word lex=JC/|>JN/
'''

The next thing to do is to feed it to the search api, which will *study* it.
The syntax will be checked, features loaded, the search space will be set up, narrowed down, 
and the fetching of results will be prepared, but not yet executed.

In [35]:
S.study(query)

  0.00s Checking search template ...
  0.00s loading features ...
   |     0.19s B lex                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  0.19s All additional features loaded - for details use loadLog()
  0.20s Setting up search space for 8 objects ...
  0.97s Constraining search space with 7 relations ...
  1.02s Setting up retrieval plan ...
  1.04s Ready to deliver results from 5870 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


Before we rush to the results, lets have a look at the *plan*.

In [36]:
S.showPlan()

  3.26s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1    chapter
 3 R2      verse
 4 R3        clause
 5 R4          clause_atom
 6 R5            phrase
 7 R6              phrase_atom
 8 R7                word lex=JC/|>JN/
 9     


Here you see already what your results will look like.
Each result `r` is a *tuple* of nodes:
```
(R0, R1, R2, R3, R4, R5, R6, R7)
```
that instantiate the objects in your template.

## Excursion
In case you are curious, you can get details about the search space as well:

In [37]:
S.showPlan(details=True)

Search with 8 objects and 7 relations
Results are instantiations of the following objects:
node  0-book                              (    39   choices)
node  1-chapter                           (   416   choices)
node  2-verse                             (   799   choices)
node  3-clause                            (   922   choices)
node  4-clause_atom                       (   922   choices)
node  5-phrase                            (   923   choices)
node  6-phrase_atom                       (   923   choices)
node  7-word                              (   926   choices)
Instantiations are computed along the following relations:
node                      0-book          (    39   choices)
edge  0-book          [[  1-chapter       (    11.7 choices)
edge  1-chapter       [[  2-verse         (     2.4 choices)
edge  2-verse         [[  3-clause        (     1.0 choices)
edge  3-clause        [[  4-clause_atom   (     1.0 choices)
edge  4-clause_atom   [[  5-phrase        (     1.0 choic

The part about the *nodes* shows you how many possible instantiations for each object in your template
has been found.
These are not results yet, because only combinations of instantiations
that satisfy all constraints are results.

The constraints come from the relations between the objects that you specified.
In this case, there are only implicit relations: those of embedding `[[`. 
Later on we'll examine all basic relations.

The part about the *edges* shows you the constraints, and in what order they will be computed
when stitching results together.
In this case the order is exactly the order by which the relations appear in the template,
but that will not always be the case.
Text-Fabric spends some time and ingenuity to find out an optimal *stitch plan*.

Nevertheless, fetching results may take time. 

For some queries, it can take a large amount of time to walk through all results.
Even worse, it may happen that it takes a large amount of time before getting the *first* result.

This has to do with search strategies on the one hand,
and the very likely possibility to encounter *pathological* search patterns,
which have billions of results, mostly unintended.
For example, a simple query that asks for 5 words in the Hebrew Bible without further constraints,
will have 425,000 to the power of 5 results.
That is 10-e28 (a one with 28 zeroes,
roughly the number of molecules in a few hundred litres of air.
That may not sound much, but it is 10,000 times the amount of bytes
that can be currently stored on the whole internet.

Text-Fabric search is not yet done with finding optimal search strategies,
and I hope to refine its arsenal of methods in the future, depending on what you report.

## Back to business
It is always a good idea to get a feel for the amount of results, before you dive into them head-on.

In [38]:
S.count(progress=1, limit=5)

  0.00s Counting results per 1 up to 5 ...
   |     0.00s 1
   |     0.00s 2
   |     0.00s 3
   |     0.00s 4
   |     0.01s 5
  0.01s Done: 5 results


We asked for 5 result in total, with a progress message for every one.
That was a bit conservative.

In [39]:
S.count(progress=100, limit=500)

  0.00s Counting results per 100 up to 500 ...
   |     0.02s 100
   |     0.05s 200
   |     0.07s 300
   |     0.11s 400
   |     0.14s 500
  0.14s Done: 500 results


Still pretty quick, now we want to count all results.

In [40]:
S.count(progress=200, limit=-1)

  0.00s Counting results per 200 up to  the end of the results ...
   |     0.04s 200
   |     0.07s 400
   |     0.14s 600
   |     0.17s 800
  0.19s Done: 926 results


Now it is time to see something of those results.

In [41]:
S.fetch(amount=10)

((1367552, 1368104, 1428265, 486532, 576266, 781925, 1045061, 299521),
 (1367553, 1368106, 1428282, 486600, 576336, 782114, 1045258, 299826),
 (1367553, 1368107, 1428301, 486694, 576432, 782371, 1045520, 300192),
 (1367553, 1368108, 1428315, 486756, 576495, 782555, 1045709, 300482),
 (1367553, 1368108, 1428310, 486735, 576473, 782501, 1045653, 300389),
 (1367553, 1368109, 1428327, 486823, 576563, 782738, 1045900, 300764),
 (1367553, 1368111, 1428352, 486915, 576657, 782998, 1046167, 301139),
 (1367553, 1368111, 1428351, 486910, 576652, 782986, 1046155, 301122),
 (1367554, 1368113, 1428393, 487091, 576838, 783442, 1046621, 301774),
 (1367554, 1368113, 1428394, 487094, 576841, 783449, 1046628, 301781))

Not very informative.
Just a quick observation: look at the last column.
These are the result nodes for the `word` part in the query, indicated as `R7` by `showPlan()` before.
And indeed, they are all below 425,000, the number of words in the Hebrew Bible.

Nevertheless, we want to glean a bit more information off them.

In [42]:
for r in S.fetch(amount=10):
    print(S.glean(r))

  Jonah 4:11 clause[אֲשֶׁ֣ר יֶשׁ־בָּ֡הּ ] clause_atom[אֲשֶׁ֣ר יֶשׁ־בָּ֡הּ ] phrase[יֶשׁ־] phrase_atom[יֶשׁ־] 
  Micah 2:1 clause[כִּ֥י יֶשׁ־לְאֵ֖ל יָדָֽם׃ ] clause_atom[כִּ֥י יֶשׁ־לְאֵ֖ל יָדָֽם׃ ] phrase[יֶשׁ־] phrase_atom[יֶשׁ־] 
  Micah 3:7 clause[כִּ֛י אֵ֥ין מַעֲנֵ֖ה אֱלֹהִֽים׃ ] clause_atom[כִּ֛י אֵ֥ין מַעֲנֵ֖ה אֱלֹהִֽים׃ ] phrase[אֵ֥ין ] phrase_atom[אֵ֥ין ] 
  Micah 4:9 clause[הֲמֶ֣לֶךְ אֵֽין־בָּ֗ךְ ] clause_atom[הֲמֶ֣לֶךְ אֵֽין־בָּ֗ךְ ] phrase[אֵֽין־] phrase_atom[אֵֽין־] 
  Micah 4:4 clause[וְאֵ֣ין מַחֲרִ֑יד ] clause_atom[וְאֵ֣ין מַחֲרִ֑יד ] phrase[אֵ֣ין ] phrase_atom[אֵ֣ין ] 
  Micah 5:7 clause[וְאֵ֥ין מַצִּֽיל׃ ] clause_atom[וְאֵ֥ין מַצִּֽיל׃ ] phrase[אֵ֥ין ] phrase_atom[אֵ֥ין ] 
  Micah 7:2 clause[וְיָשָׁ֥ר בָּאָדָ֖ם ...] clause_atom[וְיָשָׁ֥ר בָּאָדָ֖ם ...] phrase[אָ֑יִן ] phrase_atom[אָ֑יִן ] 
  Micah 7:1 clause[אֵין־אֶשְׁכֹּ֣ול ] clause_atom[אֵין־אֶשְׁכֹּ֣ול ] phrase[אֵין־] phrase_atom[אֵין־] 
  Nahum 2:9 clause[וְאֵ֥ין מַפְנֶֽה׃ ] clause_atom[וְאֵ֥ין מַפְנֶֽה׃ ] phrase[אֵ֥

##### Caution
> It is not possible to do `len(S.fetch())`.
Because it is a *generator*, not a list.
It will deliver a result every time it is being asked and for as long as there are results,
but it does not know in advance how many there will be.

>Fetching a result can be costly, because due to the constraints, a lot of possibilities
may have to be tried and rejected before a the next result is found.

> That is why you often see results coming in at varying speeds when counting them.

This search template has some pretty tight constraints on one of its objects,
so the amount of data to dealt with it pretty limited.

Let us turn to a template where this is not so.

In [45]:
query = '''
# test
# verse book=Genesis chapter=2 verse=25
verse
  clause
                                 
    p1:phrase
        w1:word
        w3:word
        w1 < w3

    p2:phrase
        w2:word
        w1 < w2 
        w3 > w2
    
    p1 < p2   
'''

A couple of remarks.

* some objects have got a name
* there are additional relations specified between named objects
* `<` means: *comes before*, and `>`: *comes after* in the canonical order for nodes,
  which for words means: comes textually before/after, but for other nodes the meaning
  is explained [here](https://github.com/ETCBC/text-fabric/wiki/Api#sorting-nodes)
* later on we describe those relations in more detail

##### Note on order
> Look at the words `w1` and `w3` below phrase `p1`.
Although in the template `w1` comes before `w3`, this is not 
translated in a search constraint of the same nature.

> Order between objects in a template is never significant, only embedding is.

Because order is not significant, you have to specify order relations yourself.

It turns out that this is better than the other way around.
In MQL order *is* significant, and it is very difficult to 
search for `w1` and `w2` in any order.

##### Note on gaps
> Look at the phrases `p1` and `p2`.
We do not specify an order here, only that they are different.
In order to prevent duplicated searches with `p1` and `p2` interchanged, we even 
stipulate that `p1 < p2`.
There are many spatial relationships possible between different objects.
In many cases, neither the one comes before the other, nor vice versa.
They can overlap, one can occur in a gap of the other, they can be completely disjoint
and interleaved, etc.

In [46]:
S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 7 objects ...
  0.50s Constraining search space with 10 relations ...
  0.55s Setting up retrieval plan ...
  0.61s Ready to deliver results from 1897304 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results


That was quick!
Well, Text-Fabric knows that narrowing down the search space in this case would take ages,
without resulting in a significantly shrunken space.
So it skips doing so for most constraints.

Let us see the plan, with details.

In [47]:
S.showPlan(details=True)

Search with 7 objects and 10 relations
Results are instantiations of the following objects:
node  0-verse                             ( 23213   choices)
node  1-clause                            ( 88000   choices)
node  2-phrase                            (253174   choices)
node  3-word                              (426581   choices)
node  4-word                              (426581   choices)
node  5-phrase                            (253174   choices)
node  6-word                              (426581   choices)
Instantiations are computed along the following relations:
node                      0-verse         ( 23213   choices)
edge  0-verse         [[  1-clause        (     3.8 choices)
edge  1-clause        [[  2-phrase        (     2.6 choices)
edge  2-phrase        [[  3-word          (     1.2 choices)
edge  2-phrase        [[  4-word          (     1.7 choices)
edge  3-word          <   4-word          (213290.5 choices)
edge  1-clause        [[  5-phrase        (     2.7 choi

As you see, we have a hefty search space here.
Let us play with the `count()` function.

In [48]:
S.count(progress=10, limit=100)

  0.00s Counting results per 10 up to 100 ...
   |     0.13s 10
   |     0.14s 20
   |     0.14s 30
   |     0.16s 40
   |     0.16s 50
   |     0.17s 60
   |     0.20s 70
   |     0.20s 80
   |     0.20s 90
   |     0.21s 100
  0.21s Done: 100 results


We can be bolder than this!

In [49]:
S.count(progress=100, limit=1000)

  0.00s Counting results per 100 up to 1000 ...
   |     0.17s 100
   |     0.21s 200
   |     0.21s 300
   |     0.35s 400
   |     0.43s 500
   |     0.44s 600
   |     0.53s 700
   |     0.63s 800
   |     0.76s 900
   |     0.85s 1000
  0.85s Done: 1000 results


Ok, not too bad, but note that it takes a big fraction of a second to get just 100 results.

Now let us go for all of them by the thousand.

In [50]:
S.count(progress=1000, limit=-1)

  0.00s Counting results per 1000 up to  the end of the results ...
   |     0.78s 1000
   |     1.38s 2000
   |     2.16s 3000
   |     2.58s 4000
   |     3.28s 5000
   |     4.44s 6000
   |     6.54s 7000
  7.80s Done: 7512 results


See? This is substantial work.

In [51]:
for r in S.fetch(amount=10):
    print(S.glean(r))

Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ]   phrase[עֲרוּמִּ֔ים ] 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ]   phrase[עֲרוּמִּ֔ים ] 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ]   phrase[עֲרוּמִּ֔ים ] 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ]   phrase[עֲרוּמִּ֔ים ] 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ]   phrase[הֵבִ֥יא ] 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ]   phrase[הֵבִ֥יא ] 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...]   phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...]   phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] 
Genesis 10:21 cl

By the way, here is some code that looks for basically the same phenomenon: a phrase within the
gap of another phrase.
It does not use search, and it gets a bit more focused results, in half the time.

##### Hint
> If you are comfortable with programming, and what you look for is fairly generic,
you may be better off without search, provided you can translate your insight in the
data into an effective procedure within Text-Fabric.

In [52]:
indent(reset=True)
info('Getting gapped phrases')
results = []
for v in F.otype.s('verse'):
    for c in L.d(v, otype='clause'):
        ps = L.d(c, otype='phrase')
        first = {}
        last = {}
        slots = {}
        # make index of phrase boundaries
        for p in ps:
            words = L.d(p, otype='word')
            first[p] = words[0]
            last[p] = words[-1]
            slots[p] = set(words)
        for p1 in ps:
            for p2 in ps:
                if p2 < p1: continue
                if len(slots[p1] & slots[p2]) != 0: continue
                if first[p1] < first[p2] and last[p2] < last[p1]:
                    results.append((v, c, p1, p2, first[p1], first[p2], last[p2], last[p1]))
info('{} results'.format(len(results)))
for r in results[0:10]:
    print(r)

  0.00s Getting gapped phrases
  3.55s 369 results
(1413737, 426799, 605793, 605794, 1159, 1160, 1160, 1164)
(1413765, 426921, 606150, 606151, 1720, 1721, 1721, 1723)
(1413937, 427418, 607746, 607747, 4819, 4821, 4824, 4828)
(1413997, 427601, 608322, 608323, 5803, 5805, 5806, 5809)
(1414001, 427616, 608369, 608370, 5868, 5869, 5870, 5875)
(1414034, 427723, 608705, 608706, 6515, 6521, 6521, 6530)
(1414086, 427917, 609286, 609287, 7431, 7432, 7433, 7437)
(1414143, 428159, 609997, 609998, 8502, 8507, 8507, 8520)
(1414143, 428159, 609997, 609999, 8502, 8508, 8510, 8520)
(1414172, 428286, 610379, 610380, 9127, 9129, 9129, 9133)


But we can use the pretty printing of `glean()` here as well!

In [53]:
for r in results[0:10]:
    print(S.glean(r))

Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] phrase[עֲרוּמִּ֔ים ]    
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ] phrase[הֵבִ֥יא ]    
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ]    
Genesis 12:17 clause[וַיְנַגַּ֨ע יְהוָ֧ה׀ אֶת־פַּרְעֹ֛ה ...] phrase[אֶת־פַּרְעֹ֛ה וְאֶת־בֵּיתֹ֑ו ] phrase[נְגָעִ֥ים גְּדֹלִ֖ים ]    
Genesis 13:1 clause[וַיַּעַל֩ אַבְרָ֨ם מִמִּצְרַ֜יִם ...] phrase[אַבְרָ֨ם ה֠וּא וְאִשְׁתֹּ֧ו וְ...] phrase[מִמִּצְרַ֜יִם ]    
Genesis 14:16 clause[וְגַם֩ אֶת־לֹ֨וט אָחִ֤יו ...] phrase[גַם֩ אֶת־לֹ֨וט אָחִ֤יו וּ...] phrase[הֵשִׁ֔יב ]    
Genesis 17:7 clause[לִהְיֹ֤ות לְךָ֙ לֵֽאלֹהִ֔ים ...] phrase[לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ ] phrase[לֵֽאלֹהִ֔ים ]    
Genesis 19:4 clause[וְאַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י ...] phrase[אַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י סְדֹם֙ ...] phrase[נָסַ֣בּוּ ]    
Genesis 19:4 clause[ו

# Refine the search template

A second look at our search results reveals that there are multiple results per phrase pair,
because there are in general multiple words in both
phrases that satisfy the condition.
We can make the search stricter, by requiring alignment of the words with the starts and ends of the phrases
they are in.

For this, we employ a convenient device in search templates that we have not explained yet.

Before each atom we may put a relational operator.
The meaning is that this relation holds between the preceding atom and the current one.
If there is an lonely operator all by itself on a line, it means that 
this relation holds between the preceding sibling atom and the parent.

These operators are very handy to indicate that there is an order between children of a parent,
and also that a child should start or end where the parent starts or ends.

In [54]:
query = '''
verse
  clause
                                 
    p1:phrase
        =: w1:word
        <  w3:word
        :=

    p2:phrase
        =: w2:word
    
    p1 < p2
    w1 < p2
    w2 < w3
    
'''

The line `=: w1:word` constrains word `w1` to start exactly at the start of its parent, phrase `p1`.
The line `:=` constrains the preceding sibling, word `w3` to end exactly at the end of its parent, phrase `p1`.
The line `=: w2:word` constrains word `w2` to start exactly at the start of its parent, phrase `p2`.

Given two phrases `p1` and `p2`, the positions of all three words `w1`, `w2`, `w3` are fixed, so for every
pair `p1`, `p2` that satisfies the conditions, there is exactly one result.

In [55]:
S.study(query)
S.showPlan(details=True)
S.count(progress=100, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 7 objects ...
  0.50s Constraining search space with 13 relations ...
  0.55s Setting up retrieval plan ...
  0.68s Ready to deliver results from 1897304 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 7 objects and 13 relations
Results are instantiations of the following objects:
node  0-verse                             ( 23213   choices)
node  1-clause                            ( 88000   choices)
node  2-phrase                            (253174   choices)
node  3-word                              (426581   choices)
node  4-word                              (426581   choices)
node  5-phrase                            (253174   choices)
node  6-word                              (426581   choices)
Instantiations are computed along the following relations:
node                      0-verse         ( 23213   choices)
edge  0-verse         [[  1-clause        ( 

And here we have exactly the same results. It takes just a little bit more time.

So, search templates might be a convenient means to get perform a complicated hunt for patterns.
Even the performance is not far below a hand-written walk through the data.

But beware of complications. Search templates are powerful, but sometimes they define a different
result set than you might think.
Here is an example.

# A tricky example

Suppose we want to count the clauses consisting of exactly two phrases.
The following template should do it: a clause, starting with a phrase, followed by an adjacent phrase,
which terminates the clause.

In [70]:
query = '''
clause
    =: phrase
    <: phrase
    :=
'''

In [71]:
S.study(query)
S.showPlan()
qresults = sorted(r[0] for r in S.fetch())
info(f'Done: found {len(qresults)}')

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.18s Constraining search space with 5 relations ...
  0.20s Setting up retrieval plan ...
  0.26s Ready to deliver results from 594348 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.26s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: phrase
 3 R2      <: phrase
 4         :=
 5     
  1.25s Done: found 23415


Let us check this with a piece of hand-written code.

In [59]:
indent(reset=True)
info('counting ...')

cresults = []
for c in F.otype.s('clause'):
    wc = L.d(c, otype='word')
    ps = L.d(c, otype='phrase')
    if len(ps) == 2:
        (fp, lp) = ps
        wf = L.d(fp, otype='word')
        wl = L.d(lp, otype='word')
        if wf[0] == wc[0] and wf[-1] + 1 == wl[0] and wl[-1] == wc[-1]:
            cresults.append(c)
cresults = sorted(cresults)
info(f'Done: found {len(cresults)}')

  0.00s counting ...
  1.53s Done: found 23332


Strange, we end up with less cases. What is happening? Let us compare the results.
We look at the first result where both methods diverge.

In [62]:
diff = [x for x in zip(qresults, cresults) if x[0] != x[1]]
print(f'{len(diff)} differences')
print(diff[0])

23053 differences
(427723, 427728)


Let's look at the phrases of 427723:

In [64]:
for p in L.d(diff[0][0], otype='phrase'):
    print(f'Phrase {p} has words {L.d(p, otype="word")}')

Phrase 608704 has words [6514]
Phrase 608705 has words [6515, 6516, 6517, 6518, 6519, 6520, 6522, 6523, 6524, 6525, 6526, 6527, 6528, 6529, 6530]
Phrase 608706 has words [6521]


This clause has three phrases, but the third lies inside the second, so that the clause indeed satisfies the
pattern of two adjacent phrases.

Can we adjust the pattern to exclude cases like this? 
At the moment, our search template mechanism is not powerful enough for that.

We can count how often it happens, however. 
We require a third phrase to be present, not equal to one of the first two.

In [65]:
query = '''
clause
    =: p1:phrase
    <: p2:phrase
    :=
    p3:phrase
    p1 # p3
    p2 # p3
'''

In [66]:
S.study(query)
S.showPlan()
rresults = sorted(r[0] for r in S.fetch())
info(f'Done: found {len(rresults)}')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.24s Constraining search space with 8 relations ...
  0.27s Setting up retrieval plan ...
  0.34s Ready to deliver results from 847522 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.35s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: p1:phrase
 3 R2      <: p2:phrase
 4         :=
 5 R3      p3:phrase
 6         p1 # p3
 7         p2 # p3
 8     
  1.45s Done: found 115


But we have to filter this, because per `p1`, `p2` there might be multiple `p3` that satisfy the constraints.
So lets gather the set of `p1`, `p2` pairs.

In [68]:
len(set(rresults))

83

And this is exactly the difference between 
the number of results of the search template and the hand-written piece of code.

# Testing basic relations

Basic relations are about the identity spatial ordering of objects.
Are they the same, do they occupy the same slots, do they overlap, is one embedded in the other,
does one come before the other?

We also have edge features, that specify relationships between nodes.

Although the basic relationships are easy to define, and even easy to implement,
they may be very costly to use. 
When searching, most of them have to be computed very many times.

Some of them have been precomputed and stored in an index, e.g. the embedding relationships.
They can be used without penalty.

Other relations are not suitable for precomputing: most inequality relations are of that kind.
It would require an enormous amount of storage to precompute for each node the set of nodes that
occupy different slots. This type of relation will not be used in narrowing down the search space,
which means that it may take more time to get the results.

We are going to test all of our basic relationships here.

Let us first see what relations we have:

In [29]:
print(S.relationLegend)

                      = left equal to right (as node)
                      # left unequal to right (as node)
                      < left before right (in canonical node ordering)
                      > left after right (in canonical node ordering)
                     == left occupies same slots as right
                     && left has overlapping slots with right
                     ## left and right do not have the same slot set
                     || left and right do not have common slots
                     [[ left embeds right
                     ]] left embedded in right
                     << left completely before right
                     >> left completely after right
                     =: left and right start at the same slot
                     := left and right end at the same slot
                     :: left and right start and end at the same slot
                     <: left immediately before right
                     :> left immediately after right
-di

# = (equal as node)

The `=` means that both parts are the same node. Left and right are not two things with similar properties,
no, they are one and the same thing.

Useful if the thing you search for it part of two wildly different patterns.

In [24]:
query = '''
v1:verse
  sentence
    clause rela=Objc
      phrase
        word sp=verb gn=f nu=pl
v2:verse
  sentence
    c1:clause
    c2:clause
    c3:clause
    c1 < c2
    c2 < c3
v1 = v2
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s loading features ...
   |     0.15s B gn                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.13s B nu                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.22s B rela                 from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.13s B sp                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  0.64s All additional features loaded - for details use loadLog()
  0.64s Setting up search space for 10 objects ...
  1.59s Constraining search space with 11 relations ...
  1.63s Setting up retrieval plan ...
  1.68s Ready to deliver results from 327603 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  1.68s The results are connected to the original search template as follows:
 0     
 1 R0  v1:verse
 2 R1    sentence
 3 R2      clause rela=Objc
 4 R3        phrase
 5 R4          word sp=verb gn=f nu=pl
 6 R

# # (unequal as node)

`n # m` if `n` and `m` are not the same node.

If you write a template, and you know that the one should come before the other,
consider using `<` or `>`, which will constrain the results better.

We have seen this in action in the search for gapped phrases.

# < and > (canonical)

`n < m` if `n` comes before `m` in the
[canonical ordering](https://github.com/ETCBC/text-fabric/wiki/Api#sorting-nodes)
of nodes.

We have seen them in action before.

# == (same slots)

Two objects are extensionally equal if they occupy exactly the same slots.

Quite an expensive relation, as you will see: 30 seconds for 3608 results.

In [25]:
query = '''
v:verse
    s:sentence
v == s
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))
S.count(progress=1000, limit=10000)

  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
  0.04s Constraining search space with 2 relations ...
  0.91s Setting up retrieval plan ...
  0.94s Ready to deliver results from 7216 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.94s The results are connected to the original search template as follows:
 0     
 1 R0  v:verse
 2 R1      s:sentence
 3     v == s
 4     
Jeremiah 10:17 sentence[אִסְפִּ֥י מֵאֶ֖רֶץ כִּנְעָתֵ֑ךְ יֹשֶׁ֖בֶת ...]
Jeremiah 10:22 sentence[קֹ֤ול שְׁמוּעָה֙ הִנֵּ֣ה בָאָ֔ה וְ...]
Jeremiah 10:23 sentence[יָדַ֣עְתִּי יְהוָ֔ה כִּ֛י לֹ֥א לָ...]
Jeremiah 11:1 sentence[הַדָּבָר֙ אֲשֶׁ֣ר הָיָ֣ה אֶֽל־...]
Jeremiah 11:17 sentence[וַיהוָ֤ה צְבָאֹות֙ הַנֹּוטֵ֣עַ ...]
Jeremiah 13:3 sentence[וַיְהִ֧י דְבַר־יְהוָ֛ה אֵלַ֖י ...]
Jeremiah 13:8 sentence[וַיְהִ֥י דְבַר־יְהוָ֖ה אֵלַ֥י ...]
Jeremiah 13:10 sentence[הָעָם֩ הַזֶּ֨ה הָ...]
Jeremiah 13:24 sentence[וַאֲפִיצֵ֖ם כְּקַשׁ־עֹובֵ֑ר ...]
Jeremiah 

# && (overlap)

Two objects overlap if and only if they share at least one slot.
This is quite costly to use in some cases.

In [26]:
query = '''
verse
    phrase
      s1:subphrase
      s2:subphrase
      s1 # s2
      s1 && s2
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.13s Constraining search space with 5 relations ...
  0.71s Setting up retrieval plan ...
  1.03s Ready to deliver results from 503971 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  1.03s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      phrase
 3 R2        s1:subphrase
 4 R3        s2:subphrase
 5           s1 # s2
 6           s1 && s2
 7     
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ ] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ] subphrase[לְאֹתֹת֙ ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ] subphrase[לְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְמֹ֣ועֲדִ֔ים ] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְא

# ## (not the same slots)

True when the two objects in question do not occupy exactly the same set of slots.
This is a very loose relationship.

It is implemented, but not yet tested, and at the moment I have not a clear use case for it.

# || (disjoint slots)

True when the two objects in question do not share any slots.
This is a rather loose relationship.

It is implemented, but not yet tested, and at the moment I have not a clear use case for it.

# [[ and ]] (embedding)

`n [[ m` if object `n` embeds `m`.

`n ]] m` if object `n` lies embedded in `n`.

These relations are used implicitly in templates when there is indentation:

```
s:sentence
  p:phrase
    w1:word gn=f
    w2:word gn=m
```

implicitly states the following embeddings:

* `s ]] p`
* `p ]] w1`
* `p ]] w2`

We have seen these relations in action.

# << and >> (before and after with slots)

These relations test whether one object comes before or after an other, in the sense that the slots
occupied by the one object ly completely before or after the slots occupied by the other object.

In [27]:
query = '''
verse
  sentence
    c1:clause
    p:phrase
    c2:clause
    c1 << p
    c2 >> p
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.15s Constraining search space with 6 relations ...
  0.16s Setting up retrieval plan ...
  0.18s Ready to deliver results from 515957 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.18s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1    sentence
 3 R2      c1:clause
 4 R3      p:phrase
 5 R4      c2:clause
 6         c1 << p
 7         c2 >> p
 8     
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[עֹ֤שֶׂה ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[פְּרִי֙ ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[לְמִינֹ֔

# =: (start at same slots)
This relation holds when the left and right hand sides are nodes that have the same first slot.
It serves to enforce the the children of a parent are textually the first things inside that
parent. We have seen it in action before.

# := (end at same slots)
This relation holds when the left and right hand sides are nodes that have the same last slot
It serves to enforce the the children of a parent are textually the last things inside that
parent. We have seen it in action before.

# :: (same start and end slots)
This relation holds when `=:` and `:=` both hold between the left and right hand sides.
It serves to look for parents with single children, or at least, where the parent is textually spanned by a single child.

In [75]:
query = '''
verse
    clause
        :: phrase
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.14s Constraining search space with 3 relations ...
  0.16s Setting up retrieval plan ...
  0.21s Ready to deliver results from 364387 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.22s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      clause
 3 R2          :: phrase
 4     
  0.00s Counting results per 1000 up to  the end of the results ...
   |     0.13s 1000
   |     0.25s 2000
   |     0.37s 3000
   |     0.45s 4000
   |     0.50s 5000
   |     0.56s 6000
   |     0.60s 7000
   |     0.66s 8000
   |     0.73s 9000
  0.77s Done: 9387 results
Genesis 1:5 clause[יֹ֥ום אֶחָֽד׃ פ ] phrase[יֹ֥ום אֶחָֽד׃ פ ]
Genesis 1:8 clause[יֹ֥ום שֵׁנִֽי׃ פ ] phrase[יֹ֥ום שֵׁנִֽי׃ פ ]
Genesis 1:13 clause[יֹ֥ום שְׁלִישִֽׁי׃ פ ] phrase[יֹ֥ום שְׁלִישִֽׁי׃ פ ]
Genesis 1:16 clause[אֶת־הַמָּאֹ֤ור הַגָּדֹל֙ ...] phrase[אֶת־

Like before, there might be extra phrases in such clauses, lying embedded in the clause-spanning phrase.

In [76]:
query = '''
verse
    clause
        :: p1:phrase
        p2:phrase
        p1 # p2
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.20s Constraining search space with 5 relations ...
  0.21s Setting up retrieval plan ...
  0.24s Ready to deliver results from 617561 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.25s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      clause
 3 R2          :: p1:phrase
 4 R3          p2:phrase
 5             p1 # p2
 6     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.67s Done: 82 results
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ]
Genesis 24:24 clause[בַּת־בְּתוּאֵ֖ל אָנֹ֑כִי בֶּן־מִלְכָּ֕ה ] phrase[בַּת־בְּתוּאֵ֖ל בֶּן־מִלְכָּ֕ה ] phrase[אָנֹ֑כִי ]
Genesis 31:16 clause[לָ֥נוּ ה֖וּא וּלְבָנֵ֑ינוּ ] phrase[לָ֥נוּ וּלְבָנֵ֑ינוּ ] phrase[ה֖וּא ]
Genesis 31:53 clause[אֱלֹהֵ֨י אַבְרָהָ֜ם וֵֽא

# <: (adjacent before) 
This relation holds when the left hand sides ends in a slot that lies before the first slot of the right hand side.
It serves to enforce an ordering between siblings of a parent.

# :> (adjacent after)
This relation holds when the left hand sides starts in a slot that lies after the last slot of the right hand side.

As an example: are there clauses with multiple clause atoms without a gap between the two?

In [77]:
query = '''
verse
    clause
        clause_atom
        <: clause_atom
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.08s Constraining search space with 4 relations ...
  0.11s Setting up retrieval plan ...
  0.17s Ready to deliver results from 292337 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.17s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      clause
 3 R2          clause_atom
 4 R3          <: clause_atom
 5     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.87s Done: 0 results


Conclusion: there is always textual material between clause_atoms of the same clause.
If we lift the adjacency to sequentially before (`<<`) we do get results:

In [78]:
query = '''
verse
    clause
        clause_atom
        << clause_atom
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.08s Constraining search space with 4 relations ...
  0.10s Setting up retrieval plan ...
  0.12s Ready to deliver results from 292337 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.13s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      clause
 3 R2          clause_atom
 4 R3          << clause_atom
 5     
  0.00s Counting results per 1000 up to  the end of the results ...
   |     0.35s 1000
   |     0.66s 2000
  0.81s Done: 2587 results
Genesis 1:7 clause[וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ ...] clause_atom[וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ ] clause_atom[וּבֵ֣ין הַמַּ֔יִם ]
Genesis 1:11 clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause_atom[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ] clause_atom[עֵ֣ץ פְּרִ֞י ]
Genesis 1:11 clause[עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו עַל־...] clause_atom[עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו

# Queries from SHEBANQ

## Example by Oliver Glanz

[Oliver Glanz: PP with adjective followed by noun](https://shebanq.ancient-data.org/hebrew/query?version=4b&id=547)
```
select all objects where
[phrase FOCUS typ = PP
  [word sp= prep]
  [word sp=adjv]
  [word sp=subs]
]
```
64 results having 251 words.

In [84]:
query = '''
phrase typ=PP
  word sp=prep
  <: word sp=adjv
  <: word sp=subs
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  2.18s Constraining search space with 5 relations ...
  2.25s Setting up retrieval plan ...
  2.25s Ready to deliver results from 256 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  2.26s The results are connected to the original search template as follows:
 0     
 1 R0  phrase typ=PP
 2 R1    word sp=prep
 3 R2    <: word sp=adjv
 4 R3    <: word sp=subs
 5     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.00s Done: 64 results
phrase[אֵ֖ת רַבַּ֣ת בְּנֵֽי־עַמֹּ֑ון וְ...]   
phrase[לָכֶ֜ם יִרְאֵ֤י שְׁמִי֙ ]   
phrase[עַ֥ל רִגְעֵי־אֶ֑רֶץ ]   
phrase[אֶת־חַלְלֵי־חָ֑רֶב ]   
phrase[לַחֲכַם־לֵ֭ב ]   
phrase[לְמָ֣רֵי נָֽפֶשׁ׃ ]   
phrase[לְיִשְׁרֵי־לֵֽב׃ ]   
phrase[עִם־יְפֵ֥ה עֵינַ֖יִם וְטֹ֣וב ...]   
phrase[עִם־מְלֵ֥א יָמִֽים׃ ]   
phrase[אֶת־חַלְלֵי־חָֽרֶב׃ ]   


The number of results is right. The number of words that SHEBANQ reports
is the number of words in the phrases of the result. Let us count them:

In [85]:
print(sum([len(L.d(r[0], otype='word')) for r in S.fetch()]))

251


## Example by Martijn Naaijer

[Martijn Naaijer: Object clauses with >CR](https://shebanq.ancient-data.org/hebrew/query?version=4b&id=997)

```
Select all objects where 

[clause rela = Objc
   [word focus first lex = '>CR']
]
```

157 results

In [90]:
query = '''
verse
    clause rela=Objc
        =: word lex=>CR
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in sorted(S.fetch(), key=lambda x: C.rank.data[x[0]-1])[0:10]:
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.76s Constraining search space with 3 relations ...
  0.77s Setting up retrieval plan ...
  0.78s Ready to deliver results from 284 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.78s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      clause rela=Objc
 3 R2          =: word lex=>CR
 4     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.01s Done: 96 results
Genesis 14:24 clause[אֲשֶׁ֣ר אָֽכְל֣וּ הַנְּעָרִ֔ים ] 
Genesis 18:17 clause[אֲשֶׁ֖ר אֲנִ֥י עֹשֶֽׂה׃ ] 
Genesis 24:3 clause[אֲשֶׁ֨ר לֹֽא־תִקַּ֤ח אִשָּׁה֙ לִ...] 
Genesis 34:11 clause[אֲשֶׁ֥ר תֹּאמְר֛וּ אֵלַ֖י ] 
Genesis 39:23 clause[אֲשֶׁר־ה֥וּא עֹשֶׂ֖ה ] 
Genesis 41:28 clause[אֲשֶׁ֧ר הָאֱלֹהִ֛ים עֹשֶׂ֖ה ] 
Genesis 41:55 clause[אֲשֶׁר־יֹאמַ֥ר לָכֶ֖ם ] 
Genesis 44:5 clause[אֲשֶׁ֥ר עֲשִׂיתֶֽם׃ ] 
Exodus 4:12 clause[אֲשֶׁ֥ר תְּדַ

We have fewer cases: 96 instead of 157.
We are working on the ETCBC version 4c, and the query has been executed against 4b.
There have been coding updates that are relevant to this query, e.g. in Genesis 43:27, which is in the results
on SHEBANQ, but not here. In 4c the `rela` is `Attr`, and not `Objc`.

In [97]:
query = '''
verse book=Genesis chapter=43 verse=27
    clause
        =: word lex=>CR
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
results = sorted(S.fetch(), key=lambda x: C.rank.data[x[0]-1])
for r in results:
    print(r[1], F.rela.v(r[1]), S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.69s Constraining search space with 3 relations ...
  0.78s Setting up retrieval plan ...
  0.78s Ready to deliver results from 3 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.78s The results are connected to the original search template as follows:
 0     
 1 R0  verse book=Genesis chapter=43 verse=27
 2 R1      clause
 3 R2          =: word lex=>CR
 4     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.00s Done: 1 results
431688 Attr Genesis 43:27 clause[אֲשֶׁ֣ר אֲמַרְתֶּ֑ם ] 


## Example by Cody Kingham

[Cody Kingham: MI Hierarchies. p.18n49. First Person Verbs in Narrative](https://shebanq.ancient-data.org/hebrew/query?version=4b&id=1050)

```
SELECT ALL OBJECTS WHERE

[book
   [clause txt = 'N'
      [word FOCUS sp = verb
        [word ps = p1
         ]
      ]
   ]
]
OR
[book
   [clause txt = '?N'
      [word FOCUS sp = verb
        [word ps = p1
         ]
      ]
   ]
]
```

273 results.

In [98]:
query = '''
book
    clause txt=N|?N
        word sp=verb ps=p1
'''
S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in sorted(S.fetch(), key=lambda x: C.rank.data[x[0]-1])[0:10]:
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s loading features ...
   |     0.20s B ps                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.03s B txt                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  0.24s All additional features loaded - for details use loadLog()
  0.24s Setting up search space for 3 objects ...
  0.99s Constraining search space with 2 relations ...
  1.01s Setting up retrieval plan ...
  1.02s Ready to deliver results from 557 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  1.02s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1      clause txt=N|?N
 3 R2          word sp=verb ps=p1
 4     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.04s Done: 273 results
 clause[וְאֶת־יְהֹושׁ֣וּעַ צִוֵּ֔יתִי בָּ...] 
 clause[וָאֶתְחַנַּ֖ן אֶל־יְהוָ֑ה בָּ...] 
 clause[וָאֶשְׁלַ֤ח מַלְאָכִים֙ מִמִּדְבַּ֣ר ...] 
 c

## Example by Reinoud Oosting

[Reinoud Oosting: to go + object marker](https://shebanq.ancient-data.org/hebrew/query?version=4b&id=755)

```
Select all objects
where
 [clause
   [phrase function = Pred OR function = PreC
     [word FOCUS sp = verb AND vs = qal AND lex = "HLK[" ]
         ]
    ..
    [phrase FOCUS
    [word First lex = ">T"]
   ]
 ]
OR
 [clause
    [phrase FOCUS
      [word First lex = ">T" ]
    ]
..
   [phrase function = Pred OR function = PreC
     [word FOCUS sp = verb AND vs = qal AND lex = "HLK["]
   ]
 ]
 ```
 
 4 results.
 
 This is a case where we can simplify greatly because we are not hampered
 by automatic constraints on the order of the phrases.

In [99]:
query = '''
clause
  p1:phrase function=Pred|PreC
    word sp=verb vs=qal lex=HLK[
  p2:phrase
    =: word lex=>T
  p1 # p2
'''

S.study(query)
S.showPlan()
S.count(progress=1000, limit=-1)
for r in sorted(S.fetch(), key=lambda x: C.rank.data[x[0]-1])[0:10]:
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s loading features ...
   |     0.09s B function             from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.20s B vs                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  0.29s All additional features loaded - for details use loadLog()
  0.29s Setting up search space for 5 objects ...
  1.94s Constraining search space with 6 relations ...
  2.14s Setting up retrieval plan ...
  2.15s Ready to deliver results from 184201 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  2.16s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1    p1:phrase function=Pred|PreC
 3 R2      word sp=verb vs=qal lex=HLK[
 4 R3    p2:phrase
 5 R4      =: word lex=>T
 6       p1 # p2
 7     
  0.00s Counting results per 1000 up to  the end of the results ...
  0.02s Done: 4 results
clause[וַנֵּ֡לֶךְ אֵ֣ת כָּל־הַ...] phrase[נֵּ֡לֶךְ ]  phrase[