<img align="right" src="tf-small.png"/>

# Search

Do we need search in TF, like MQL?

Yes, it is convenient to have a more declarative way of getting a set of interesting nodes to work with.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.fabric import Fabric

In [3]:
ETCBC = 'hebrew/etcbc4c'
TF = Fabric( modules=ETCBC )

This is Text-Fabric 1.2.7
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
107 features found and 0 ignored


In [4]:
api = TF.load('')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.00s Feature overview: 102 nodes; 4 edges; 1 configs; 6 computeds
  6.03s All features loaded/computed - for details use loadLog()


In [5]:
phrasesGapped = '''
# test
# verse book=Genesis chapter=2 verse=25
clause
                                 
    p1:phrase
        w1:word
        w3:word
        w1 < w3

    p2:phrase
        w2:word
        w1 < w2 
        w2 < w3
    
    p1 # p2   
'''

In [6]:
yesh = '''
book
  chapter
    verse
      clause
        clause_atom
          phrase
            phrase_atom
              word lex=JC/|>JN/
'''

In [14]:
strat = 'small_choice_first'
#strat = 'spread_1_first'
for (i, query) in enumerate([
    yesh, 
    phrasesGapped,
]):
    print('\n---------------------QUERY--{}-------------------\n'.format(i))
    S.study(query, strategy=strat)
    S.showPlan()
    S.count(progress=1000, limit=10000)
    for r in S.fetch(amount=10):
        print(S.glean(r))


---------------------QUERY--0-------------------

  0.00s Checking search template ...
  0.00s Setting up search space for 8 objects ...
  0.79s Constraining search space with 7 relations ...
  0.84s Setting up retrieval plan ...
  0.87s Ready to deliver results from 5870 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 8 objects and 7 relations
Results are instantiations of the following objects:
node  0-book                              (    39   choices)
node  1-chapter                           (   416   choices)
node  2-verse                             (   799   choices)
node  3-clause                            (   922   choices)
node  4-clause_atom                       (   922   choices)
node  5-phrase                            (   923   choices)
node  6-phrase_atom                       (   923   choices)
node  7-word                              (   926   choices)
Instantiations are computed along the following relations

# Testing basic relations

In [24]:
# << slotBefore
query = '''
verse
  sentence
    c:clause
    p:phrase
    c << p
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.15s Constraining search space with 4 relations ...
  0.75s Setting up retrieval plan ...
  1.26s Ready to deliver results from 420759 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 4 objects and 4 relations
Results are instantiations of the following objects:
node  0-verse                             ( 21802   choices)
node  1-sentence                          ( 62655   choices)
node  2-clause                            ( 83128   choices)
node  3-phrase                            (253174   choices)
Instantiations are computed along the following relations:
node                      0-verse         ( 21802   choices)
edge  0-verse         [[  1-sentence      (     2.9 choices)
edge  1-sentence      [[  2-clause        (     1.6 choices)
edge  1-sentence      [[  3-phrase        (     2.9 choices)
edge  3-phrase        >>  2-clause        ( 457

In [8]:
# && overlap
query = '''
verse
    phrase
      s1:subphrase
      s2:subphrase
      s1 # s2
      s1 && s2
'''
S.study(query)
S.showPlan()
for r in S.fetch(amount=10):
    print(S.glean(r))

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.17s Constraining search space with 5 relations ...


KeyboardInterrupt: 

# Query syntax

## General
We have these kinds of lines:

* white-space lines (everywhere allowed, will be always ignored)
* relation line: **name operator name**.
  Indents and spacing are ignored, but there must be space around the operator.
* atom line: **indent otype features**. The indent is significant.
* feature line: **features**. Indent is not significant. Only allowed after an atom line or after
  a feature line. 
  Feature lines are continuations of the features of an atom, handy in those cases where the
  features occupy a lot of space.

## Features

A white-space separated list of *key*=*values*.

* there may be no space around the `=`.
* *key* must be a feature name that exists in the dataset.
  If it is not yet loaded, it will be loaded.
* *values* must be a `|` separated list of feature values, no quotes.
  No spaces around the `|`.
  If you need a space or `|` or `\\` in a value, escape it by a `\\`.
  Escape tabs and newlines as `\\t` and `\\n`.

## Operators

### Node comparison
* `=`: is equal (meaning the same node, a clause and a verse that occupy the same slots are still unequal)
* `#`: is unequal (meaning a different node, a clause and a verse that occupy the same slots are still unequal)
* `<` `>`: before and after (in the *canonical ordering*)

### Slot comparison
* `==`: occupy the same slots (identical slot sets)
* `&&`: overlap (the intersection of both slot sets is not empty)
* `##`: occupy different slots (but they may overlap, the set of slots of the two are different as sets)
* `||`: occupy disjoint slots (no slot occupied by the one is also occupied by the other)
* `[[` `]]`: embeds and contains (slot set inclusion, in both directions)
* `<<` `>>`: before and after (with respect to the slots occupied: left ends before right starts and vv)

### Edge features
* `-`*name*`>` `<`*name*`-`: connected by the edge feature *name*, in both directions

In [77]:
info('Getting gapped phrases')
results = []
for c in F.otype.s('clause'):
    ps = L.d(c, 'phrase')
    for p in ps:
        words = L.d(p, 'word')
        (bp, ep) = (words[0], words[-1])
        for q in ps:
            if p == q: continue
            bq = L.d(q, 'word')[0]
            if bp < bq and bq < ep:
                results.append((p, q, bp, bq, ep, c))
info('{} results'.format(len(results)))
for r in results[0:10]:
    print(r)

    20s Getting gapped phrases
    25s 373 results
(605793, 605794, 1159, 1160, 1164, 426799)
(606150, 606151, 1720, 1721, 1723, 426921)
(607746, 607747, 4819, 4821, 4828, 427418)
(608322, 608323, 5803, 5805, 5809, 427601)
(608369, 608370, 5868, 5869, 5875, 427616)
(608705, 608706, 6515, 6521, 6530, 427723)
(609286, 609287, 7431, 7432, 7437, 427917)
(609997, 609998, 8502, 8507, 8520, 428159)
(609997, 609999, 8502, 8508, 8520, 428159)
(610379, 610380, 9127, 9129, 9133, 428286)
