# TFQL - a query language inside Text-Fabric?

Let's see whether we can mimick part of MQL inside Text-Fabric.
Maybe it is worthwhile.
If only we can have the basic core of MQL working inside TF, that would be nice.

Chances are that that will be sufficient for most purposes, because inside TF we have so much other ways to walk through the data.

The idea is: with the help of TFQL you quickly create an interesting set of nodes, corresponding to a phenomenon of interest. With that set in hand, you can use the rest of TF to do what you want to do: refine, enlarge, grab context, export to csv, import into R, you name it.

Ok, let's go.
Step by step.

## Load TF

In [81]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [82]:
import sys, collections
from IPython.display import HTML, display_pretty, display_html
from tf.fabric import Fabric

In [83]:
ETCBC = 'hebrew/etcbc4c'
PHONO = 'hebrew/phono'
TF = Fabric( modules=[ETCBC, PHONO] )

This is Text-Fabric 1.2.1
Api reference    : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial         : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources     : https://github.com/ETCBC/text-fabric-data
Data feature docs: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html
Questions? Ask shebanq@ancient-data.org for an invite to Slack
109 features found and 0 ignored


In [84]:
api = TF.load('''
    sp ps gn nu
''')
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.19s B sp                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.17s B ps                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.14s B gn                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.17s B nu                   from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
  5.93s All features loaded/computed - for details use loadLog()


## Review features

Let us just review the features we are going to use.

In [85]:
for ft in 'sp ps gn nu'.strip().split():
    print(ft)
    for valFreq in Fs(ft).freqList():
        print('\t{:<8}: {:>6}x'.format(*valFreq))

sp
	subs    : 125558x
	verb    :  75450x
	prep    :  73298x
	conj    :  62737x
	nmpr    :  35698x
	art     :  30385x
	adjv    :  10075x
	nega    :   6059x
	prps    :   5035x
	advb    :   4603x
	prde    :   2678x
	intj    :   1912x
	inrg    :   1303x
	prin    :   1026x
ps
	NA      : 347860x
	p3      :  40898x
	unknown :  17204x
	p2      :  12281x
	p1      :   8338x
gn
	NA      : 180152x
	m       : 164191x
	unknown :  45524x
	f       :  36714x
nu
	NA      : 180152x
	sg      : 180092x
	pl      :  54956x
	unknown :   8523x
	du      :   2858x


# Atoms

An atomic query just asks for nodes of a certain type with certain conditions on its features satisfied.

In [86]:
atomQuery = ('word', dict(sp='verb', ps='p2', gn='f', nu='pl'))

And now we write a function that interprets and runs an atom query.

In [89]:
def tfqlAtom(aQ):
    (otype, features) = aQ
    featureList = sorted(features.items())
    results = []
    info('querying ...')
    for n in F.otype.s(otype):
        ok = True
        for (f, v) in featureList:
            if Fs(f).v(n) != v:
                ok = False
                break
        if ok: results.append(n)
    info('{} results'.format(len(results)))
    return results

Let's run it.

In [90]:
resultNodes = tfqlAtom(atomQuery)

 1m 11s querying ...
 1m 12s 49 results


Let's view the first 10 results (with a bit of context)

In [91]:
plain = []
html = []

for n in resultNodes[0:10]:
    vn = L.u(n, otype='verse')[0]
    plain.append('{} {} in:\n\t{} {}\n'.format(
        n, T.text([n]),
        '{} {}:{}'.format(*T.sectionFromNode(n)),
        T.text(L.d(vn, otype='word')),
    ))
    html.append('''
<div class="r">
<p>{} <span class="h">{}</span>in:</p>
<p><span class="l">{}</span> <span class="h">{}</span></p>
</div>
'''.format(
        n, T.text([n]),
        '{} {}:{}'.format(*T.sectionFromNode(n)),
        T.text(L.d(vn, otype='word')),
    ))
print(''.join(plain))

2052 שְׁמַ֣עַן  in:
	Genesis 4:23 וַיֹּ֨אמֶר לֶ֜מֶךְ לְנָשָׁ֗יו עָדָ֤ה וְצִלָּה֙ שְׁמַ֣עַן קֹולִ֔י נְשֵׁ֣י לֶ֔מֶךְ הַאְזֵ֖נָּה אִמְרָתִ֑י כִּ֣י אִ֤ישׁ הָרַ֨גְתִּי֙ לְפִצְעִ֔י וְיֶ֖לֶד לְחַבֻּרָתִֽי׃ 
2056 הַאְזֵ֖נָּה  in:
	Genesis 4:23 וַיֹּ֨אמֶר לֶ֜מֶךְ לְנָשָׁ֗יו עָדָ֤ה וְצִלָּה֙ שְׁמַ֣עַן קֹולִ֔י נְשֵׁ֣י לֶ֔מֶךְ הַאְזֵ֖נָּה אִמְרָתִ֑י כִּ֣י אִ֤ישׁ הָרַ֨גְתִּי֙ לְפִצְעִ֔י וְיֶ֖לֶד לְחַבֻּרָתִֽי׃ 
16470 יְדַעְתֶּ֑ן  in:
	Genesis 31:6 וְאַתֵּ֖נָה יְדַעְתֶּ֑ן כִּ֚י בְּכָל־כֹּחִ֔י עָבַ֖דְתִּי אֶת־אֲבִיכֶֽן׃ 
28981 רְאִיתֶ֖ן  in:
	Exodus 1:16 וַיֹּ֗אמֶר בְּיַלֶּדְכֶן֙ אֶת־הָֽעִבְרִיֹּ֔ות וּרְאִיתֶ֖ן עַל־הָאָבְנָ֑יִם אִם־בֵּ֥ן הוּא֙ וַהֲמִתֶּ֣ן אֹתֹ֔ו וְאִם־בַּ֥ת הִ֖יא וָחָֽיָה׃ 
28989 הֲמִתֶּ֣ן  in:
	Exodus 1:16 וַיֹּ֗אמֶר בְּיַלֶּדְכֶן֙ אֶת־הָֽעִבְרִיֹּ֔ות וּרְאִיתֶ֖ן עַל־הָאָבְנָ֑יִם אִם־בֵּ֥ן הוּא֙ וַהֲמִתֶּ֣ן אֹתֹ֔ו וְאִם־בַּ֥ת הִ֖יא וָחָֽיָה׃ 
29029 עֲשִׂיתֶ֖ן  in:
	Exodus 1:18 וַיִּקְרָ֤א מֶֽלֶךְ־מִצְרַ֨יִם֙ לַֽמְיַלְּדֹ֔ת וַיֹּ֣אמֶר לָהֶ֔ן מַדּ֥וּעַ עֲשִׂיתֶ֖ן הַדָּבָ֣ר הַזֶּ֑ה וַת

Now a bit more pretty.

In [92]:
HTML('''
<style type="text/css">
.r {
    border: 2px solid #cccccc;
}
.h {
    font-family: Ezra SIL, SBL Hebrew, Verdana, sans-serif;
    font-size: x-large;
    line-height: 2;
    text-align: right;
    direction: rtl;
}
.e {
    font-family: Menlo, Courier New, Courier, monospace;
    font-size: medium;
    line-height: 1.2;
    text-align: left;
    direction: ltr;
}
.p {
    font-family: Verdana, Arial, sans-serif;
    font-size: large;
    line-height: 1.5;
    text-align: left;
    direction: ltr;
}
.l {
    font-family: Verdana, Arial, sans-serif;
    font-size: small;
    text-align: right;
    float: right;
    color: #aaaaaa;
    direction: ltr;
}
</style>
''')

In [93]:
HTML(''.join(html))

# Moving on

The code below does not work yet.
The query execution is recursive in two dimensions:

* queries can be embedded in a parent query
* queries can be part of a sequence of queries

The combining of the results of component queries into the results of the outer query is potentially very expensive.
Ulrik has invented the concept of `sheaf`, a compact representation of all the results.

So, we have to build a sheaf!

The code below shows a bit of the resursion that is going on, but does not yet attempt to build a sheaf.

In [79]:
def tfqlAtom(aQ, under=None, within=None):
    (otype, features) = aQ
    featureList = sorted(features.items())
    results = []
    for n in F.otype.s(otype):
        parents = L.u(n)
        extent = L.d(n, 'word')
        (minSlot, maxSlot) = (extent[0], extent[-1])
        if under != None and len(parents & under) == 0: continue
        if within != None and (minSlot < within[0] or maxSlot > within[1]): continue
        ok = True
        for (f, v) in featureList:
            if Fs(f).v(n) != v:
                ok = False
                break
        if ok: results.append(n)
    return results

def tfqlSeq(q, under=None, within=None):
    for qi in q:
        resultsi = tfqlInner(qi, under=under)
                
    
def tfqlInner(q, under=None, within=None):
    if type(q[0]) is str:
        if len(q) == 2:         # atomQuery
            return tfqlAtom(q, under=under, within=within)
        elif len(q) > 2:        # embedded list of queries
            outerResults = tfqlAtom(q[0:3], under=under, within=within)
            innerResults = tfqlSeq(q[2:], under=set(outerResults), within=within)
            results
            for o in outerResults:
                extent = L.d(n, 'word')
                (minSlot, maxSlot) = (extent[0], extent[-1])

    else:
        return tfqlSeq(q, under=under, within=within) 

def tfql(q):
    info('querying ...')
    results = tfqlInner(q)
    info('{} results'.format(len(results)))
    return results