# tfQuery

Do we need a query language in TF, like MQL?

Yes, it is convenient to have a more declarative way of getting a set of interesting nodes to work with. But should it be MQL?

Experience shows that MQL may give you a very good first try, 
until you realize that you may not have queried for all cases.
You forgot to query for some elements in a different order.
You have not reckoned with gaps.
And the query does not give you interesting things from the context with the results.

Look in SHEBANQ for examples how unwieldy MQL queries may become.

Here is a good example:

[Oliver Glanz: 101_paradigm_strong-verb](https://shebanq.ancient-data.org/hebrew/query?id=1316)

Also, I wonder: do want a new language? Suppose we make a TFQL, then we need a parser for it,
we need to define a syntax, we need to refine the syntax, update the parser, etc.
It will become a cumbersome straight-jacket.

In our case, we do not have the requirement that non-coders should be able to use TFQL in a stand-alone manner.

On the contrary, TFQL should live in a programming environment, and we can take advantage of that.

Here are initial thought for **tfQuery**, a query *mechanism* inside TF.

* tfQuery defines queries as datastructures in Python, more precisely: as a graph
* it does not matter how you build up a query, tfQuery processses the value of a datastructure
  that you pass to it. The surface syntax will not be seen by tfQuery
* a query is a graph representation where the nodes are things like
  
  `('phrase', dict(det='und'))`
  
  or
  
  `('word', dict(sp='verb', gn='f', ps='3f'))`

* the edges specify relations between the nodes, like: *is contained in*, *follows*,
  *precedes*
  
In MQL you also specify a graph, by means of a template, but this template forces you to *overspecify*: the template often implies more constraints then you really want.

So how do we specify edges? As constraints, like this (and note that we are now writing executable code!)

Let us formulate a query for

* clauses that are object clauses
* containing two phrases (both undetermined)
* one of which contains a verb in the third person feminine
* and the other phrase contains a feminine, plural noun

In MQL

```
[clause rela='Objc'
    [phrase det='und'
        [word sp='verb' AND gn='f' AND ps='p3']
    ]
    [phrase det='und'
        [word sp='subs' AND gn='f' AND nu='pl']
    ]
]
```

In [7]:
c = ('clause', dict(rela='Objc'))
p1 = ('phrase', dict(det='und'))
p2 = ('phrase', dict(det='und'))
w1 = ('word', dict(sp='verb', gn='f', ps='p3'))
w2 = ('word', dict(sp='subs', gn='f', nu='pl'))

In [9]:
nodes = [c, p1, p2, w1, w2]
edges = [
    (c, [p1,p2]),
    (p1, [w1]),
    (p2, [w2]),
    (p1, p2),
]

query = (nodes, edges)

An edge of like `(x, [y,z])` means that `y` and `z` are embedded in `x`, but does not mean
that `y` comes before `z`.

An edge like `(x, y)` means that `x` comes before `y`.

## Increased flexibility

Note that it is very easy to remove the `(p1, p2)` condition, which states that the first
phrase comes before the second one.

If we wanted to do that in MQL, the query would become:

```
[clause rela='Objc'
    [phrase det='und'
        [word sp='verb' AND gn='f' AND ps='p3']
    ]
    [phrase det='und'
        [word sp='subs' AND gn='f' AND nu='pl']
    ]
    OR
    [phrase det='und'
        [word sp='subs' AND gn='f' AND nu='pl']
    ]
    [phrase det='und'
        [word sp='verb' AND gn='f' AND ps='p3']
    ]
]
```

This goes quickly out of hand, see e.g.
[Dirk Roorda: Object clauses of verbless mothers](https://shebanq.ancient-data.org/hebrew/query?id=984) and accompanying
[notebook](https://shebanq.ancient-data.org/shebanq/static/docs/tools/shebanq/VerblessMothers.html)

## Implementation

How could we implement this search efficiently?
First of all: the search should yield as result a list of *instantiations* of the nodes.

First idea:

* build for each node (which corresponds to a local feature condition on an object) the set
  of nodes that satisfy the condition.
  A single walk over all nodes could construct these sets in one go
* Then work through all edges, where every edge is an instruction to weed out non-results
  from the earlier obtained sets
  
This is by far not the whole story, but I have to go shopping now.

The stuff below is not meaningful yet.

In [6]:
def makeNode(otype, features):
    return (otype, tuple((y[0], y[1]) for y in features.items()))
def makeEdge(node, nodes):
    if type(nodes) is list:
        frozenNodes = tuple(list)
    elif type(nodes) is set:
        frozenNodes = frozenset(set)