<img align="right" src="tf-small.png"/>


# From SHEBANQ to Text-Fabric

Maybe you arrived here because you are interested in extending the possibilities of using the ETCBC database, after having reached the limits of what is possible in [SHEBANQ](http://shebanq.ancient-data.org).

Here is a link (back) to the
[description](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_mql.html) of the transition from SHEBANQ to Text-Fabric.

And here is the corresponding query on SHEBANQ: [Yesh](http://shebanq.ancient-data.org/hebrew/query?id=556).

# Introduction to MQL

Coming from Text-Fabric and its notebooks, you might wonder what MQL is.
MQL stands for **Mini Query Language**, which is a query language optimized for textual resources.
[EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions).
The Text-Fabric resource which is 
[ETCBC Hebrew Dataset](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_home.html)
is the result of converting an EMDROS database into TF.

MQL is good for detecting syntactical patterns.
TF is good for programmatically walking through the text and gathering information as you go.

The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results.
More formally, there is a correspondence between the structure of the query instruction
and the structure of the query results, and this correspondence holds for the sequential order and the embedding order.

Put otherwise, MQL is a convenient language to query the data for tree fragments
(but not for arbitrary *network* patterns).

A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html).

In order to run this notebook, you need to have the ETCBC dataset from
[text-fabric-data](https://github.com/ETCBC/text-fabric-data)

# Mimicking MQL in Text-Fabric

This notebook shows how you can mimick MQL in Text-Fabric.

We translate a simple MQL query in TF procesing, and show how to add context to the results. 

In [1]:
import sys, collections
from tf.fabric import Fabric

In [2]:
ETCBC = 'hebrew/etcbc4c'
TF = Fabric( modules=ETCBC )

This is Text-Fabric 1.2.5
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
105 features found and 0 ignored


In [18]:
api = TF.load('''
    lex 
    typ code function rela det
    oslots
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.00s M otext                from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  0.06s All features loaded/computed - for details use loadLog()


# The MQL query

Here is the basic query that selects two lexemes.

    [word lex="JC/" or lex=">JN/"]

Let us retrieve the word nodes.

In [6]:
lexemes = { 'JC/', '>JN/'}
nlex = collections.Counter()

info('Gathering occurrences of the lexeme set ...')
words = []
for w in F.otype.s('word'):
    lex = F.lex.v(w)
    if lex in lexemes:
        words.append(w)
        nlex[lex] += 1
info('Found {} occurrences ({})'.format(
    len(words), 
    ', '.join('{} of {}'.format(n, l) for (l,n) in sorted(nlex.items())),
))

19m 26s Gathering occurrences of the lexeme set ...
19m 27s Found 926 occurrences (788 of >JN/, 138 of JC/)


# Get context

We want to retrieve these objects from the context, and in particular the feature(s)
that is put between brackets after the object:

    book (name)
      chapter (number)
        verse (number)
          clause (typ)
            clause_atom (code)
              phrase (function)
                phrase_atom (det)
                  word (lex, text)

In [13]:
results = []

for w in words:
    (book, chapter, verse) = T.sectionFromNode(w)
    text = T.text([w])
    print(book, chapter, verse, text, w, L.u(w, 'phrase'))
    results.append((
        book,
        chapter,
        verse,
        F.typ.v(L.u(w, otype='clause')[0]),
        F.code.v(L.u(w, otype='clause_atom')[0]),
        F.function.v(L.u(w, otype='phrase')[0]),
        F.det.v(L.u(w, otype='phrase_atom')[0]),
        F.lex.v(w),
        text,
    ))

#for result in results[0:10]:
#    print(', '.join(*result))

Genesis 2 5 אַ֔יִן  772 ()


IndexError: tuple index out of range

In [15]:
x = L.u(772)
for y in x:
    print(y, F.otype.v(y))

858771 phrase_atom
1437014 lex
514736 clause_atom
426728 clause
1189505 sentence_atom
1125934 sentence
1368569 half_verse
1413717 verse
1367574 chapter
1367534 book


In [16]:
x = L.u(858771)
for y in x:
    print(y, F.otype.v(y))

1437014 lex
514736 clause_atom
426728 clause
1189505 sentence_atom
1125934 sentence
1368569 half_verse
1413717 verse
1367574 chapter
1367534 book


In [21]:
E.oslots.s(858771)

(772,)

In [22]:
E.oslots.s(514736)

(770, 771, 772)

In [24]:
L.d(514736, otype='phrase')

(605576, 605577, 605578)

In [25]:
for p in L.d(514736, otype='phrase'):
    print(p, E.oslots.s(p))

605576 (770,)
605577 (771,)
605578 (772,)


In [28]:
for w in range(1,12):
 print(L.u(w, otype='phrase'))

(605144,)
(605144,)
()
()
(605147,)
(605147,)
(605147,)
(605147,)
(605147,)
(605147,)
(605147,)


In [34]:
x = (L.d(L.u(1, 'clause_atom')[0], otype='phrase'))
for y in x:
    print(y, E.oslots.s(y))

605144 (1, 2)
605145 (3,)
605146 (4,)
605147 (5, 6, 7, 8, 9, 10, 11)


In [32]:
F.otype.maxSlot

426581

In [40]:
api.C.levUp.data[1]

(1436896,
 605144,
 858318,
 1368502,
 514582,
 426582,
 1413682,
 1189403,
 1125833,
 1367573,
 1367534)

In [46]:
api.C.rank.data[2]

16

In [47]:
i = 0
for n in N():
    i += 1
    print(n, F.otype.v(n))
    if i > 30: break

1367534 book
1367573 chapter
1125833 sentence
1189403 sentence_atom
1413682 verse
426582 clause
514582 clause_atom
1368502 half_verse
858318 phrase_atom
605144 phrase
1436895 lex
1 word
1436896 lex
2 word
1436897 lex
858319 phrase_atom
3 word
605145 phrase
1436898 lex
858320 phrase_atom
4 word
605146 phrase
858321 phrase_atom
1368503 half_verse
605147 phrase
1253742 subphrase
1436899 lex
5 word
1436900 lex
6 word
1436901 lex


In [49]:
api.C.levels.data

(('book', 10937.97435897436, 1367534, 1367572),
 ('chapter', 459.1829924650161, 1367573, 1368501),
 ('lex', 46.18676916414032, 1436895, 1446130),
 ('verse', 18.376814715891957, 1413682, 1436894),
 ('half_verse', 9.441810535635236, 1368502, 1413681),
 ('sentence', 6.710413717162184, 1125833, 1189402),
 ('sentence_atom', 6.6302087380904275, 1189403, 1253741),
 ('clause', 4.847511363636364, 426582, 514581),
 ('clause_atom', 4.71037521256156, 514582, 605143),
 ('phrase', 1.6849321020325942, 605144, 858317),
 ('phrase_atom', 1.5946059099489749, 858318, 1125832),
 ('subphrase', 1.4241071428571428, 1253742, 1367533),
 ('word', 1, 1, 426581))

In [51]:
fs = F.otype.s('sentence')[0]
fv = F.otype.s('verse')[0]

In [52]:
print(fs, fv)

1125833 1413682


In [53]:
print(sortNodes([fs, fv]))

[1125833, 1413682]


In [61]:
maxSlot = F.otype.maxSlot
oslots = E.oslots.data
levels = api.C.levels.data
otype = F.otype.data
otypeLevels = dict(((x[0], i) for (i, x) in enumerate(levels)))
otypeRank = lambda n: otypeLevels[slotType if n < maxSlot+1 else otype[n-maxSlot-1]]

def before(na,nb):
    if na < maxSlot + 1:
        a = na
        sa = {a}
    else:
        a = na - maxSlot
        sa = set(oslots[a-1])
    if nb < maxSlot + 1:
        b = nb
        sb = {b}
    else:
        b = nb - maxSlot
        sb = set(oslots[b-1])
    oa = otypeRank(na)
    ob = otypeRank(nb)
    print(na, a, sa, oa)
    print(nb, b, sb, ob)
    if sa == sb: return 0 if oa == ob else -1 if oa < ob else 1
    if sa > sb: return -1
    if sa < sb: return 1
    am = min(sa - sb)
    bm = min(sb - sa)
    return -1 if am < bm else 1 if bm < am else None


In [62]:
before(fs, fv)

1125833 699252 {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} 5
1413682 987101 {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} 3


1