<img align="right" src="tf-small.png"/>


# From SHEBANQ to Text-Fabric

Maybe you arrived here because you are interested in extending the possibilities of using the ETCBC database, after having reached the limits of what is possible in [SHEBANQ](http://shebanq.ancient-data.org).

Here is a link (back) to the
[description](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_mql.html) of the transition from SHEBANQ to Text-Fabric.

And here is the corresponding query on SHEBANQ: [Yesh](http://shebanq.ancient-data.org/hebrew/query?id=556).

# Introduction to MQL

Coming from Text-Fabric and its notebooks, you might wonder what MQL is.
MQL stands for **Mini Query Language**, which is a query language optimized for textual resources.
[EMDROS](http://emdros.org) is a text database system written by Ulrik Sandborg-Petersen based on the PhD. thesis of Crist-Jan Doedens: [Text Databases. One Database Model and Several Retrieval Languages](http://books.google.nl/books?id=9ggOBRz1dO4C&dq=editions%3AISBN9051837291&source=gbs_book_other_versions).
The Text-Fabric resource which is 
[ETCBC Hebrew Dataset](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_home.html)
is the result of converting an EMDROS database into TF.

MQL is good for detecting syntactical patterns.
Text-Fabric is good for programmatically walking through the text 
and gathering information as you go.

The query language of this system, MQL, is a so-called *topographic* query language, meaning that the query instruction is at the same time a template for the query results.
More formally, there is a correspondence between the structure of the query instruction
and the structure of the query results, and this correspondence holds for the sequential order and the embedding order.

Put otherwise, MQL is a convenient language to query the data for tree fragments
(but not for arbitrary *network* patterns).

A specification of MQL can be found at the [Emdros docs page](http://emdros.org/docs.html).

In order to run this notebook, you need to have the ETCBC dataset from
[text-fabric-data](https://github.com/ETCBC/text-fabric-data)

# Mimicking MQL in Text-Fabric

This notebook shows how you can mimick MQL in Text-Fabric.

We translate a simple MQL query in TF procesing, and show how to add context to the results. 

In [1]:
import sys, collections
from tf.fabric import Fabric

In [2]:
ETCBC = 'hebrew/etcbc4c'
TF = Fabric( modules=ETCBC )

This is Text-Fabric 1.2.6
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
105 features found and 0 ignored


In [3]:
api = TF.load('''
    lex 
    typ code function rela det
    oslots
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.62s B oslots               from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s M otext                from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.17s B lex                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.27s B typ                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.04s B code                 from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.10s B function             from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.28s B rela                 from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.17s B det                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  5.94s All features loaded/computed - for details use loadLog()


# The MQL query

Here is the basic query that selects two lexemes.

    [word lex="JC/" or lex=">JN/"]

Let us retrieve the word nodes.

In [4]:
lexemes = {'JC/', '>JN/'}
nlex = collections.Counter()

info('Gathering occurrences of the lexeme set ...')
words = []
for w in F.otype.s('word'):
    lex = F.lex.v(w)
    if lex in lexemes:
        words.append(w)
        nlex[lex] += 1
info('Found {} occurrences ({})'.format(
    len(words), 
    ', '.join('{} of {}'.format(n, l) for (l,n) in sorted(nlex.items())),
))

  8.91s Gathering occurrences of the lexeme set ...
  9.18s Found 926 occurrences (788 of >JN/, 138 of JC/)


# Get context

We want to retrieve these objects from the context, and in particular the feature(s)
that is put between brackets after the object:

    book (name)
      chapter (number)
        verse (number)
          clause (typ)
            clause_atom (code)
              phrase (function)
                phrase_atom (det)
                  word (lex, text)

In [10]:
results = []

for w in words:
    (book, chapter, verse) = T.sectionFromNode(w)
    text = T.text([w])
    results.append((
        book,
        chapter,
        verse,
        F.typ.v(L.u(w, otype='clause')[0]),
        F.code.v(L.u(w, otype='clause_atom')[0]),
        F.function.v(L.u(w, otype='phrase')[0]),
        F.det.v(L.u(w, otype='phrase_atom')[0]),
        F.lex.v(w),
        text,
    ))

for result in results[0:10]:
    print(', '.join(str(r) for r in result))

Genesis, 2, 5, NmCl, 402, Nega, NA, >JN/, אַ֔יִן 
Genesis, 5, 24, NmCl, 407, NCoS, NA, >JN/, אֵינֶ֕נּוּ 
Genesis, 7, 8, AjCl, 10, NCoS, NA, >JN/, אֵינֶ֖נָּה 
Genesis, 11, 30, NmCl, 107, NCop, NA, >JN/, אֵ֥ין 
Genesis, 18, 24, NmCl, 101, Exst, und, JC/, יֵ֛שׁ 
Genesis, 19, 31, NmCl, 402, NCop, NA, >JN/, אֵ֤ין 
Genesis, 20, 7, Ptcp, 663, NCoS, NA, >JN/, אֵֽינְךָ֣ 
Genesis, 20, 11, NmCl, 999, NCop, NA, >JN/, אֵין־
Genesis, 23, 8, NmCl, 999, Exst, und, JC/, יֵ֣שׁ 
Genesis, 24, 23, NmCl, 103, Exst, und, JC/, יֵ֧שׁ 
