<img align="right" src="images/ninologo.png" width="150"/>
<img align="right" src="images/tf-small.png" width="125"/>
<img align="right" src="images/dans.png" width="150"/>

# Search

Search is essential to get around in the corpus, and it is convenient as well.
Whereas the whole point of Text-Fabric is to move around in the corpus programmatically,
we show that
[template based search](https://annotation.github.io/text-fabric/Use/Search/#search-templates)
makes everything a lot more convenient ...

Along with showing how search works, we also point to pretty ways to display your search results.
The good news is that `search` and `pretty` work well together. 

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys, os
import collections
from IPython.display import display, Markdown
from tf.app import use

In [3]:
# A = use('uruk', hoist=globals())
A = use('uruk:clone', checkout="clone", hoist=globals())

Using TF-app in /Users/dirk/github/annotation/app-uruk/code:
	repo clone offline under ~/github (local github)
Using data in /Users/dirk/github/Nino-cunei/uruk/tf/uruk/1.0:
	repo clone offline under ~/github (local github)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used


Using data in /Users/dirk/github/Nino-cunei/uruk/sources/cdli/images:
	repo clone offline under ~/github (local github)
Found 2095 ideograph linearts
Found 2724 tablet linearts
Found 5495 tablet photos


# The basics

Here is a very simple query: we look for tablets containing a numeral sign.

In [15]:
query = '''
tablet
  sign type=numeral
'''

results = A.search(query)

  0.28s 38122 results


We can display the results in a table (here are the first 5):

In [16]:
A.table(results, end=5, condenseType="line")

n,p,tablet,sign
1,P006427,tablet P006427,3(N14)
2,P006428,tablet P006428,3(N14)
3,P006428,tablet P006428,1(N14)
4,P006428,tablet P006428,1(N01)
5,P006428,tablet P006428,1(N57)


We can combine all results that are on the same tablet:

In [17]:
A.table(results, condensed=True, condenseType='line', end=5)

n,p,line,sign,Unnamed: 4,Unnamed: 5
1,P006427,1. 3(N14) X SANGA~a [...]...,3(N14),,
2,P006428,2. 3(N14) X,3(N14),,
3,P006428,3. 1(N14) SUHUR,1(N14),,
4,P006428,5. 1(N01) |DUG~bx1(N57)| DUG~b 1(N57),1(N01),1(N57),
5,P448701,1. 1(N46) 2(N19) 4(N41),2(N19),4(N41),1(N46)


And we can show them inside the face they occur in:

In [18]:
A.show(results, condenseType='face', end=2)

The feature *type* is displayed because it occurs in the query.
We can make the display a bit more compact by suppressing those features:

In [19]:
A.show(results, condenseType='face', end=2, showFeatures=False)

## Finding a tablet

Suppose we have the *p-number* of a tablet.
How do we find that tablet?
Remembering from the feature docs that the p-numbers are stored in the feature
`catalogId`, we can write a *search template*.

In [20]:
t = F.otype.s('tablet')[0]
A.plain(t)

In [21]:
query = '''
tablet catalogId=P005381
'''
results = A.search(query)
A.table(results)

  0.01s 1 result


n,p,tablet
1,P005381,tablet P005381


The function `A.table()` gives you a tabular overview of the results,
with a link to the tablet on CDLI.

But we can also get more information by using `A.show()`:

In [22]:
A.show(results)

Several things to note here

* if you want to see the tablet on CDLI, you can click on the tablet header;
* the display matches the layout on the tablet:
  * faces and columns are delineated with red lines
  * lines and cases are delineated with blue lines
  * cases and subcases alternate their direction of division between horizontal and vertical:
    lines are horizontally divided into cases, they are vertically divided into subcases, and they
    in turn are horizontally divided in subsubcases, etc.
  * quads and signs are delineated with grey lines
  * clusters are delineated with brown lines (see further on)
  * lineart is given for top-level signs and quads; those that are part of a bigger quad do not
    get lineart;
    
It is possible to switch off the lineart.

## More info in the results
You can show the line numbers that correspond to the ATF source files as well.
Let us also switch off the lineart.

In [23]:
query = '''
tablet catalogId=P005381
'''
results = A.search(query)
A.table(results, lineNumbers=True)
A.show(results, lineNumbers=True, lineart=False)

  0.01s 1 result


n,p,tablet
1,P005381,"@&P005381 = MSVO 3, 70 tablet P005381"


There is a big quad in `obverse:2 line 1`. We want to call up the lineart for it separately.
First step: make the nodes visible.

In [24]:
query = '''
tablet catalogId=P005381
'''
results = A.search(query)
A.table(results, withNodes=True)
A.show(results, withNodes=True, lineart=False)

  0.01s 1 result


n,p,tablet
1,P005381,tablet P005381


We read off the node number of that quad and fetch the lineart.

In [25]:
A.lineart(143015)

## Search templates
Let's highlight all numerals on the tablet.

We prefer our results to be condensed per tablet for the next few shows.

We make that the temporary default:

In [26]:
A.displaySetup(condensed=True)

In [28]:
query = '''
tablet catalogId=P005381
  sign type=numeral
'''
results = A.search(query)
A.show(results, showFeatures=False)

  0.18s 10 results


We can do the same for multiple tablets. But now we highlight the undivided lines,
just for variation.

In [29]:
query = f'''
tablet catalogId=P003581|P000311
  line terminal
'''
results = A.search(query)

  0.05s 11 results


In [30]:
A.table(results, lineart=False)

n,p,tablet,line,line.1,line.2,line.3,line.4,line.5
1,P000311,tablet P000311,P000311 1. [1(N01)]1(N01) [...]... IR~a,P000311 2. [1(N01)]1(N01) ERIM2,P000311 1. 1(N01) NIMGIR SIG7,P000311 2. 1(N01) U2~b NAGA~a MUSZEN ZATU647 BA,P000311 3. 1(N01) IM~a [...]...,P000311 1. [N]N [...]...
2,P003581,tablet P003581,P003581 1. 5(N01) U2~a [...]...,P003581 2. 1(N01) X [...]...,P003581 2. 1(N14) [...]... SUHUR [...]...,P003581 1. 5(N14) 1(N01) [...]... U2~a,P003581 2. |GI&GI| GI GI GU7,


In [31]:
A.show(results, lineart=False, condenseType="tablet")

In an other chapter of this tutorial, [steps](steps.ipynb) we encounter a grapheme with a double prime.
There is only one, and we showed the tablet on which it occurs, without highlighting the grapheme in question.
Now we can do the highlight:

In [32]:
results = A.search('''
sign prime=2
''')

  0.15s 1 result


In [33]:
A.show(results, showFeatures=False)

## Search for spatial patterns
A few words on the construction of search templates.

The idea is that you mimick the things you are looking for
in your search template.
Embedded things are mimicked by indentation.

Let's search for a line with a case in it that is not further divided,
in which there is a numeral and an ideograph.

Here is our first attempt, and we show the first tablet only.
Note that you can have comments in a search template.
Lines that start with `#` are ignored.

In [34]:
query = '''
line
  case terminal=1
% order is not important
    sign type=ideograph
    sign type=numeral
'''
results = A.search(query)

  0.43s 10673 results


First a glance at the first 3 items in tabular view.

In [35]:
A.table(results, end=3, lineart=False)

n,p,tablet,sign,sign.1,sign.2,line,sign.3,case,sign.4,sign.5,case.1,sign.6,sign.7,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20
1,P448702,tablet P448702,3(N01),KASZ~a,GI,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,3(N01),2b. 3(N01) KASZ~a GI,N,2(N14),2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,NUN~a,,,,,,,
2,P471695,tablet P471695,2b1. (3(N57) PAP~a)a3(N57) PAP~a,1(N01),ISZ~a,1a. 1(N01) ISZ~a,3(N01),APIN~a,3(N57),UR4~a,P471695 1. 3(N01) APIN~a 3(N57) UR4~a (EN~a DU ZATU759)aEN~a DU ZATU759 (BAN~b KASZ~c)aBAN~b KASZ~c (KI@n SAG)aKI@n SAG,P471695 2. 1(N14) 2(N01) [...]... (3(N57) PAP~a)a3(N57) PAP~a (SZU KI X)aSZU KI X (EN~a AN EZINU~d)aEN~a AN EZINU~d (IDIGNA [...])aIDIGNA [...]...,P471695 1. 1(N01) ISZ~a (PAP~a GIR3~c)aPAP~a GIR3~c,3(N57),PAP~a,1a. 3(N01) APIN~a 3(N57) UR4~a,,,,
3,P482083,tablet P482083,1(N14),SZE~a,N,TAR~a,3(N01),SZE~a,KASZ~b,U4,3(N01),1(N42~a),1(N25),TAR~a,1a. [...]... 1(N14) [...]... SZE~a,1b. [N]N TAR~a,2a. 3(N01) SZE~a KASZ~b |U4x3(N01)| U4 3(N01),2b. 1(N42~a) 1(N25) TAR~a,P482083 1. [...]... 1(N14) [...]... SZE~a [N]N TAR~a,P482083 2. 3(N01) SZE~a KASZ~b |U4x3(N01)| U4 3(N01) 1(N42~a) 1(N25) TAR~a


Ah, we were still in condensed mode.

For this query the table is more perspicuous in normal mode, so we tell not to condense.

In [36]:
A.table(results, condensed=False, end=20, lineart=False)

n,p,line,case,sign,sign.1
1,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,N
2,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,2(N14)
3,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,3(N01)
4,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,N
5,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,2(N14)
6,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,3(N01)
7,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2b. 3(N01) KASZ~a GI,KASZ~a,3(N01)
8,P448702,2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2b. 3(N01) KASZ~a GI,GI,3(N01)
9,P471695,1. 3(N01) APIN~a 3(N57) UR4~a (EN~a DU ZATU759)aEN~a DU ZATU759 (BAN~b KASZ~c)aBAN~b KASZ~c (KI@n SAG)aKI@n SAG,1a. 3(N01) APIN~a 3(N57) UR4~a,APIN~a,3(N01)
10,P471695,1. 3(N01) APIN~a 3(N57) UR4~a (EN~a DU ZATU759)aEN~a DU ZATU759 (BAN~b KASZ~c)aBAN~b KASZ~c (KI@n SAG)aKI@n SAG,1a. 3(N01) APIN~a 3(N57) UR4~a,APIN~a,3(N57)


Now the results on the first tablet, condensed by line.

In [37]:
A.show(results, end=1, condenseType="line")

The order between the two signs is not defined by the template,
despite the fact that the line with the ideograph
precedes the line with the numeral.
Results may have the numeral and the ideograph in any order. 

In fact, the highlights above represent multiple results.
If a case has say 2 numerals and 3 ideographs, there are 6 possible
pairs.

By default, results are shown in *condensed* mode.
That means that results are shown per tablet, and on the result tablets
everything that is in some result is being highlighted.

It is also possible to see the uncondensed results.
That gives you an exact picture of each real result constellation.

In order to illustrate the difference, we focus on one tablet and one case.
This case has 3 numerals and 2 ideographs, so we expect 6 results.

In [38]:
query = '''
tablet catalogId=P448702
  line
    case terminal=1 number=2a
      sign type=ideograph
      sign type=numeral
'''
results = A.search(query)

  0.38s 6 results


We show them condensed (by default), so we expect 1 line with all ideographs and numerals in case `2a'` highlighted.

In [39]:
A.show(results, lineart=False, condenseType="line")

Now the same results in uncondensed mode. Expect 6 times the same line with
different highlighted pairs of signs.

Note that we can apply different highlight colors to different parts of the result.
The words in the pair are member 4 and 5.

The members that we do not map, will not be highlighted.
The members that we map to the empty string will be highlighted with the default color.

**NB:** Choose your colors from the
[CSS specification](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).

In [40]:
A.show(results, condensed=False, colorMap={3: '', 4: 'cyan', 5: 'magenta'}, lineart=False, condenseType="line", showFeatures=False)

Color mapping works best for uncondensed results. If you condense results, some nodes may occupy
different positions in different results. It is unpredictable which color will be used 
for such nodes:

In [41]:
A.show(results, condensed=True, colorMap={3: '', 4: 'cyan', 5: 'magenta'}, lineart=False, condenseType="line", showFeatures=False)

You can enforce order.
We modify the template a little to state a
relational condition, namely that the ideograph follows the numeral.

In [42]:
query = '''
tablet catalogId=P448702
  line
    case terminal=1 number=2a
      sign type=ideograph
      > sign type=numeral
'''
results = A.search(query)
A.table(results, condensed=False, lineart=False)

  0.37s 6 results


n,p,tablet,line,case,sign,sign.1
1,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,N
2,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,2(N14)
3,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,3(N01)
4,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,N
5,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,2(N14)
6,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,NUN~a,3(N01)


Still six results.
No wonder, because the case has first three numerals in a row and then 2 ideographs.

Do you want the ideograph and the numeral to be *adjacent* as well?
We only have to add 1 character to the template to make it happen.

In [43]:
query = '''
tablet catalogId=P448702
  line
    case terminal=1 number=2a
      sign type=ideograph
      :> sign type=numeral
'''
results = A.search(query)

  0.36s 1 result


In [44]:
A.table(results, condensed=False, lineart=False)

n,p,tablet,line,case,sign,sign.1
1,P448702,tablet P448702,P448702 2. [N]N 2(N14) 3(N01) KASZ~b NUN~a 3(N01) KASZ~a GI,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a,KASZ~b,3(N01)


In [46]:
A.show(results, condensed=False, colorMap={4: 'cyan', 5: 'magenta'}, lineart=False, condenseType="line", showFeatures=False)

By now it pays off to study the possibilities of
[search templates](https://annotation.github.io/text-fabric/Use/Search/#search-templates).

If you want a reminder of all possible spatial relationships between nodes, you can call it up
here in your notebook:

In [47]:
S.relationsLegend()

                      = left equal to right (as node)
                      # left unequal to right (as node)
                      < left before right (in canonical node ordering)
                      > left after right (in canonical node ordering)
                     == left occupies same slots as right
                     && left has overlapping slots with right
                     ## left and right do not have the same slot set
                     || left and right do not have common slots
                     [[ left embeds right
                     ]] left embedded in right
                     << left completely before right
                     >> left completely after right
                     =: left and right start at the same slot
                     := left and right end at the same slot
                     :: left and right start and end at the same slot
                     <: left immediately before right
                     :> left immediately after right
   

## Comparisons in templates: cases

Cases have a feature depth which indicate their nesting depth within a line.
It is not the depth *of* that case, but the depth *at* which that case occurs.

Comparison queries are handy to select cases of a certain minimum or maximum depth.

We'll work a lot with `condensed=False`, and `lineart` likewise, so let's make that the default:

In [48]:
A.displaySetup(condensed=False, lineart=False)

In [49]:
query = '''
case depth=3
'''
results = A.search(query)
A.table(results, end=20)

  0.01s 254 results


n,p,case
1,P003357,1b1A. EN~a ZATU759 DU
2,P003357,1b1B. 3(N57) SU~a
3,P003537,4b1A. 3(N57) X SZA U4 [...]... X
4,P003537,4b1B. X X
5,P003537,4b2A. 2(N57) GAN~b SZU [...]...
6,P003537,4b2B. X [...]...
7,P003589,3b2A. |GA~a.ZATU753| GA~a ZATU753
8,P003589,3b2B. MUD [...]...
9,P003822,1a2A. [...]... [...]...
10,P003822,1a2B. [...]... PAP~a SU~a


Are there deeper cases?

In [50]:
query = '''
case depth>3
'''
results = A.search(query)
A.table(results, end=20)

  0.01s 119 results


n,p,case
1,P004735,1b1B1. (NAB DI |BU~a+DU6~a|)aNAB DI |BU~a+DU6~a| BU~a DU6~a
2,P004735,1b1B2. (ZI~a#? AN)aZI~a AN
3,P004735,1b1B3. (ANSZE~e 7(N57) DUR2 DU)aANSZE~e 7(N57) DUR2 DU
4,P004735,1b1B4. (LAL3~a#? GAR IG~b)aLAL3~a GAR IG~b
5,P004735,2b2B1. (GI6 KISZIK~a# URI3~a)aGI6 KISZIK~a URI3~a
6,P004735,2b2B2. ([...])a[...]...
7,P218054,1a1A1. [...]... 5(N01) [...]... UDU~a
8,P218054,1a1A2. [...]... 7(N01) MASZ2
9,P325754,1c2b1. 1(N01) [...]...
10,P325754,1c2b2. 1(N14) 7(N01) TUR


Still deeper?

In [51]:
query = '''
case depth>4
'''
results = A.search(query)
A.table(results, end=20)

  0.01s 0 results


As a check: the cases with depth 4 should be exactly the cases with depth > 3:

In [52]:
query = '''
case depth=4
'''
results = A.search(query)
A.table(results, end=20)
tc4 = len(results)

  0.01s 119 results


n,p,case
1,P004735,1b1B1. (NAB DI |BU~a+DU6~a|)aNAB DI |BU~a+DU6~a| BU~a DU6~a
2,P004735,1b1B2. (ZI~a#? AN)aZI~a AN
3,P004735,1b1B3. (ANSZE~e 7(N57) DUR2 DU)aANSZE~e 7(N57) DUR2 DU
4,P004735,1b1B4. (LAL3~a#? GAR IG~b)aLAL3~a GAR IG~b
5,P004735,2b2B1. (GI6 KISZIK~a# URI3~a)aGI6 KISZIK~a URI3~a
6,P004735,2b2B2. ([...])a[...]...
7,P218054,1a1A1. [...]... 5(N01) [...]... UDU~a
8,P218054,1a1A2. [...]... 7(N01) MASZ2
9,P325754,1c2b1. 1(N01) [...]...
10,P325754,1c2b2. 1(N14) 7(N01) TUR


Terminal cases at depth 1 are top-level divisions of lines that are not themselves divided further.

In [53]:
query = '''
case depth=1 terminal
'''
results = A.search(query)
A.table(results, end=20)
tc1 = len(results)

  0.03s 5468 results


n,p,case
1,P448702,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a
2,P448702,2b. 3(N01) KASZ~a GI
3,P471695,1a. 3(N01) APIN~a 3(N57) UR4~a
4,P471695,2a. 1(N14) 2(N01) [...]...
5,P471695,1a. 1(N01) ISZ~a
6,P482083,1a. [...]... 1(N14) [...]... SZE~a
7,P482083,1b. [N]N TAR~a
8,P482083,2a. 3(N01) SZE~a KASZ~b |U4x3(N01)| U4 3(N01)
9,P482083,2b. 1(N42~a) 1(N25) TAR~a
10,P006438,2a. KU6~a BU~a


Now let us select both the terminal cases of level 1 and 4.
They are disjunct, so the amounts should add up.

In [54]:
query = '''
case depth=1|4 terminal
'''
results = A.search(query)
A.table(results, end=20)
tc14 = len(results)
print(f'{tc1} + {tc4} = {tc1 + tc4} = {tc14}')

  0.03s 5587 results


n,p,case
1,P448702,2a. [N]N 2(N14) 3(N01) KASZ~b NUN~a
2,P448702,2b. 3(N01) KASZ~a GI
3,P471695,1a. 3(N01) APIN~a 3(N57) UR4~a
4,P471695,2a. 1(N14) 2(N01) [...]...
5,P471695,1a. 1(N01) ISZ~a
6,P482083,1a. [...]... 1(N14) [...]... SZE~a
7,P482083,1b. [N]N TAR~a
8,P482083,2a. 3(N01) SZE~a KASZ~b |U4x3(N01)| U4 3(N01)
9,P482083,2b. 1(N42~a) 1(N25) TAR~a
10,P006438,2a. KU6~a BU~a


5468 + 119 = 5587 = 5587


## Relational patterns: quads

Quads are compositions of signs by means of *operators*, such as `.` and `x`.
The operators are coded as an *edge* feature with values. The `op`-edges are between the signs/quads that are combined,
and the values of the `op` edges are the names of the operators in question.

Which operators do we have?

In [55]:
for (op, freq) in E.op.freqList():
    print(f'{op} : {freq:>5}x')

x :  2346x
. :  1042x
& :   222x
+ :   200x


Between how many sign pairs do we have an operator?

In [56]:
query = '''
sign
-op> sign
'''
results = A.search(query)

  0.21s 3642 results


Lets specifically ask for the `x` operator:

In [57]:
query = '''
sign
-op=x> sign
'''
results = A.search(query)

  0.21s 2238 results


Less than expected?

We must not forget the combinations between quads and between quads and signs.

We write a function that gives all pairs of sign/quads connected by a specific operator.

This is a fine illustration of how you can use programming to compose search templates,
instead of writing them out yourself.

In [58]:
def getCombi(op):
    types = ('sign', 'quad')
    allResults = []
    for type1 in types:
        for type2 in types:
            query = f'''
{type1}
-op{op}> {type2}
'''
            results = A.search(query, silent=True)
            print(f'{len(results):>5} {type1} {op} {type2}')
            allResults += results
    print(f'{len(allResults):>5} {op}')

Now we can count all combinations with `x`:

In [59]:
getCombi('=x')

 2238 sign =x sign
  105 sign =x quad
    3 quad =x sign
    0 quad =x quad
 2346 =x


In [60]:
getCombi('=.')

  985 sign =. sign
   43 sign =. quad
   14 quad =. sign
    0 quad =. quad
 1042 =.


In [61]:
getCombi('=&')

  220 sign =& sign
    1 sign =& quad
    0 quad =& sign
    1 quad =& quad
  222 =&


In [62]:
getCombi('=+')

  199 sign =+ sign
    0 sign =+ quad
    0 quad =+ sign
    1 quad =+ quad
  200 =+


In exact agreement with the results of `E.op.freqList()` above.
But we are more flexible!

We can ask for more operators at the same time.

In [63]:
getCombi('=x|+')

 2437 sign =x|+ sign
  105 sign =x|+ quad
    3 quad =x|+ sign
    1 quad =x|+ quad
 2546 =x|+


In [64]:
getCombi('~[^a-z]')

 1404 sign ~[^a-z] sign
   44 sign ~[^a-z] quad
   14 quad ~[^a-z] sign
    2 quad ~[^a-z] quad
 1464 ~[^a-z]


Finally, we zoom in on the rare cases where the operator is `x` used between a quad and a sign.
We want to see the show the lines where they occur.

In [65]:
query = '''
line
  quad
  -op=x> sign
'''
results = A.search(query)
A.show(results, withNodes=True, lineart=True, condenseType="line")

  0.13s 3 results


Hint: if you want to see where these lines come from, hover over the line indicator, or click on it.

Alternatively, you can set the condense type to tablet.
And note that we have set the base type to `quad`, so that the pretty display does not unravel the quads.

In [66]:
A.show(results, withNodes=True, lineart=True, condenseType="tablet", baseType="quad")

## Regular expressions in templates
We can use regular expressions in our search templates.

### Digits in graphemes
We search for non-numeral signs whose graphemes contains digits.

In [67]:
A.displaySetup(condensed=True)

In [68]:
query = '''
sign type=ideograph grapheme~[0-9]
'''
results = A.search(query)
A.table(results, withNodes=True, end=5)

  0.27s 14558 results


n,p,tablet,sign,sign.1,Unnamed: 5,Unnamed: 6,Unnamed: 7
1,P448702,tablet P448702,U4 75,U4 76,,,
2,P448703,tablet P448703,U4 97,U4 100,U4 87,U4 90,U4 93
3,P471695,tablet P471695,ZATU759 114,GIR3~c 140,UR4~a 111,,
4,P482082,tablet P482082,ZATU694~c 155,,,,
5,P482083,tablet P482083,U4 169,,,,


We can add a bit more context easily:

In [69]:
query = '''
tablet
  face
    column
      line
        sign type=ideograph grapheme~[0-9]
'''
results = A.search(query)
A.table(results, condensed=False, end=20)

  0.41s 14558 results


n,p,tablet,face,column,line,sign
1,P448702,tablet P448702,face obverse,P448702 column 2,P448702 1. U4 |U4x1(N01)| U4 1(N01) SAG SUKUD@h NA,U4
2,P448702,tablet P448702,face obverse,P448702 column 2,P448702 1. U4 |U4x1(N01)| U4 1(N01) SAG SUKUD@h NA,U4
3,P448703,tablet P448703,face obverse,P448703 column 1,P448703 1. |U4.1(N08)| U4 1(N08) X,U4
4,P448703,tablet P448703,face obverse,P448703 column 1,P448703 2. |U4.1(N08)| U4 1(N08) GI,U4
5,P448703,tablet P448703,face obverse,P448703 column 1,P448703 3. |U4.1(N08)| U4 1(N08) |GI&GI| GI GI,U4
6,P448703,tablet P448703,face obverse,P448703 column 1,P448703 4. |U4.1(N08)| U4 1(N08) X,U4
7,P448703,tablet P448703,face obverse,P448703 column 1,P448703 5. |U4.1(N08)| U4 1(N08) X,U4
8,P471695,tablet P471695,face obverse,P471695 column 1,P471695 1. 3(N01) APIN~a 3(N57) UR4~a (EN~a DU ZATU759)aEN~a DU ZATU759 (BAN~b KASZ~c)aBAN~b KASZ~c (KI@n SAG)aKI@n SAG,UR4~a
9,P471695,tablet P471695,face obverse,P471695 column 1,P471695 1. 3(N01) APIN~a 3(N57) UR4~a (EN~a DU ZATU759)aEN~a DU ZATU759 (BAN~b KASZ~c)aBAN~b KASZ~c (KI@n SAG)aKI@n SAG,ZATU759
10,P471695,tablet P471695,face obverse,P471695 column 2,P471695 1. 1(N01) ISZ~a (PAP~a GIR3~c)aPAP~a GIR3~c,GIR3~c


### Pit numbers

The feature `excavation` gives you the number of the pit where a tablet is found. 
The syntax of pit numbers is a bit involved, here are a few possible values:

```
W 20497
W 20335,3
W 19948,10
W 20493,26
W 17890,b
W 17729,o
W 15920,b5
W 17729,aq
W 19548,a + W 19548,b
W 17729,cn + W 17729,eq
W 14337,a + W 14337,b + W 14337,c + W 14337,d + W 14337,e
Ashm 1928-445b
```

Let's assume we are interested in `SZITA~a1` signs occurring in cases of depth 1.
The following query finds them all:

In [70]:
query = '''
tablet
  case depth=1
    sign grapheme=SZITA variant=a1
'''
results = A.search(query)

  0.18s 78 results


Now we want to organize them by excavation number:

In [71]:
signPerPit = {}

for (tablet, case, sign) in sorted(results):
    pit = F.excavation.v(tablet) or 'no pit information'
    signPerPit.setdefault(pit, []).append(sign)

for pit in sorted(signPerPit):
    print(f'{pit:<30} {len(signPerPit[pit]):>2}')

Ashm 1926,562                   1
Ashm 1926,567                   1
Ashm 1926,569                  13
Ashm 1926,695+737+741           6
Ashm 1926,716+732               1
Ashm 1926,739                   1
W 14731,z                       1
W 14777,c                       4
W 15776,i                       1
W 15785,a2                      1
W 15833,a01 + W 15833,aa04      1
W 15897,b5                      1
W 15897,c26                     1
W 20274,001                     1
W 20274,043                     1
W 20274,095                     2
W 20274,119                     1
W 20327,01                      1
W 20327,03                      1
W 20511,01                      1
W 20511,02                      6
W 21157                         1
W 21194                         1
W 21733,1                       3
W 22100,01                      4
W 22100,03                      5
W 22101,1                       1
W 23950                         1
W 23973,01                      1
W 24033,05    

We can restrict results to those on tablets found in certain pits by constraining the search template.
If we are interested in pit `20274` we can use a regular expression that matches all 4 detailed pit numbers
based on `20274`.
So, we do not say 

```
excavation=20274
```
but 

```
excavation~20274
```

In [72]:
query = '''
tablet excavation~20274
  case depth=1
    sign grapheme=SZITA variant=a1
'''
results = A.search(query)
A.table(results, condensed=False, lineart=False)

  0.19s 5 results


n,p,tablet,case,sign
1,P003617,tablet P003617,2b. SZITA~a1 BU~a,SZITA~a1
2,P003499,tablet P003499,2a. GAL~a SZITA~a1,SZITA~a1
3,P003541,tablet P003541,1b. GESZTU~b SZITA~a1 ZATU686~a,SZITA~a1
4,P003593,tablet P003593,2a. [...]... GADA~a SZITA~a1 X,SZITA~a1
5,P003593,tablet P003593,3b. GESZTU~b SZITA~a1 ZATU686~a,SZITA~a1


Or if we want to restrict ourselves to pit numbers with a `W`, we can say:

In [73]:
query = '''
tablet excavation~W
  case depth=1
    sign grapheme=SZITA variant=a1
'''
results = A.search(query)

  0.19s 42 results


## Quantifiers in templates

So far we have seen only very positive templates.
They express what you want to see in the result.

It is also possible to state conditions about what you do not want to see in the results.

### Tablets without case divisions

Let's find all tablets in which all lines are undivided, i.e. lines without cases.

In [74]:
query = '''
tablet
/without/
  case
/-/
'''

The expression

```
/without/
template
/-/
```

is a [quantifier](https://annotation.github.io/text-fabric/Use/Search/#quantifiers).

It poses a condition on the preceding line in the template, in this case the `tablet`.
And the condition is that the template

```
tablet
  case
```

does not have results.

In [75]:
results = A.search(query)

  0.03s 5384 results


In [76]:
A.show(results, end=2)

Now let's find cases without numerals.

In [77]:
query = '''
case
/without/
  sign type=numeral
/-/
'''
results = A.search(query)

  0.21s 2833 results


We show a few.

In [78]:
A.show(results, end=2)

Now we can use this to get something more sophisticated: the tablets that do not have numerals in their cases. So only undivided lines may contain numerals.

Let's find tablets that do have cases, but just no cases with numerals.

In [79]:
query = '''
tablet
/where/
  case
/have/
  /without/
    sign type=numeral
  /-/
/-/
/with/
  case
/-/
'''

In [80]:
results = A.search(query) 

  0.02s 53 results


In [81]:
A.show(results, end=2)

Can we find such tablet which do have numerals on their undivided lines.

We show here a way to use the results of one query in another one: 
*custom sets*.

We put the set of tablets with cases but without numerals in cases in a set called `cntablet`.

We run the query again, but now in shallow mode, so that the result is a set.

By the way: read more about custom sets and shallow mode in the description of
[`A.search()`](https://annotation.github.io/text-fabric/Api/General/#search-api).

In [82]:
results = A.search(query, shallow=True)
customSets = dict(cntablet=results)

  0.02s 53 results


Now we can perform a very simple query for numerals on this set: we want tablets with numerals.
By restricting ourselves to this set, we now that these numerals must occur on undivided lines.

In [83]:
query = '''
cntablet
  sign type=numeral
'''
results = A.search(query, sets=customSets)

  0.17s 160 results


In [85]:
A.show(results, end=2, showFeatures=False)

We could have found these results by one query as well.
Judge for yourself which method causes the least friction.

In [86]:
query = '''
tablet
/without/
  case
    sign type=numeral
/-/
/with/
  case
/-/
  sign type=numeral
'''
results = A.search(query)
A.show(results, end=2, showFeatures=False)

  0.16s 160 results


## Search and hand-coding

Now we want to find all the ShinPP numerals.

In [87]:
shinPP = dict(
    N41=0.2,
    N04=1,
    N19=6,
    N46=60,
    N36=180,
    N49=1800,
)

shinPPPat = '|'.join(shinPP)

We make use of the fact that we can construct our template.

In [89]:
query = f'''
tablet
  sign grapheme={shinPPPat}
'''
results = A.search(query)
A.table(results, end=20, lineart=True)

  0.17s 1018 results


n,p,tablet,sign,sign.1,sign.2,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10
1,P448701,tablet P448701,2(N19),4(N41),1(N46),,,,,
2,P006005,tablet P006005,1(N04),,,,,,,
3,P002329,tablet P002329,2(N19),,,,,,,
4,P002342,tablet P002342,1(N36),2(N19),,,,,,
5,P002344,tablet P002344,1(N04),,,,,,,
6,P002398,tablet P002398,1(N04),,,,,,,
7,P002622,tablet P002622,5(N19),1(N46),4(N19),,,,,
8,P002626,tablet P002626,1(N41),,,,,,,
9,P003330,tablet P003330,3(N46),2(N49),5(N19),2(N04),1(N41),,,
10,P003357,tablet P003357,1(N04),,,,,,,


Let's see a few tablets in more detail:

In [91]:
A.show(results, end=5, showFeatures=False)

### A tablet calculator

Rather than displaying search results, you can also *process* them in your program.

Search results come as tuples of nodes that correspond directly to the elements
of your search template.

We query for shinPP numerals on the faces of tablets.
The result of the query is a list of tuples `(t, f, s)` consisting of
a tablet node, a face node and a node for a sign of a shinPP numeral.

#### Rationale
This task will require a higher level of programming skills and a deeper knowledge of how
Python works.
We include it in this tutorial to get the message across that Text-Fabric is not
a black box that shields you from your data. Everything you handle in Text-Fabric is 
open to further programming and processing of your own design and choosing.

#### Data collection

In [92]:
query = f'''
tablet
    face
        sign type=numeral grapheme={shinPPPat}
'''
results = A.search(query)

  0.18s 1018 results


We are going to put all these numerals in buckets: for each face on each tablet a separate bucket.

In [93]:
numerals = {}
pNums = {}
for (tablet, face, sign) in results:
    pNums[F.catalogId.v(tablet)] = tablet
    numerals.setdefault(tablet, {}).setdefault(face, []).append(sign)
print(f'{len(pNums)} tablets')
print('\n'.join(list(pNums)[0:10]))
print('...')

235 tablets
P448701
P006005
P002329
P002342
P002344
P002398
P002622
P002626
P003330
P003357
...


#### The calculator
We define a function that given a tablet, adds the shinPP numerals by its faces.
We also show the line art and a pretty transcription.

The function is a bit involved.

In [96]:
# we generate Markdown strings and send them to the notebook formatter

def dm(x): display(Markdown(x))

def calcTablet(pNum): # pNum identifies the tablet in question
    # show a horizontal line in Markdown
    dm('---\n')     
    tablet = pNums.get(pNum, None)  # look up the node for this p-number
    if tablet is None:
        dm(f'**no results for {pNum}**')
        return                      # if not found the tablet has no ShinPP numerals: quit
    
    A.lineart(tablet, withCaption="top", width="200")   # show lineart
    faces = numerals[tablet]                    # get the buckets for the faces
    mySigns = []
    for (face, signs) in faces.items():         # work per face 
        mySigns.extend(signs)
        dm(f'##### {F.type.v(face)}')           # show the name of the face
        distinctSigns = {}                      # collect the distinct numerals
        for s in signs:
            distinctSigns.setdefault(A.atfFromSign(s), []).append(s)
        A.lineart(distinctSigns)      # display the list of signs
        total = 0                               # start adding up
        for (signAtf, signs) in distinctSigns.items():
            value = 0
            for s in signs:
                value += F.repeat.v(s) * shinPP[F.grapheme.v(s)]
            total += value
            amount = len(signs)                 # we report our calculation
            shinPPval = shinPP[F.grapheme.v(signs[0])]
            repeat = F.repeat.v(signs[0])
            print(f'{amount} x {signAtf} = {amount} x {repeat} x {shinPPval} = {value}')
        dm(f'**total** = **{total}**')
    A.prettyTuple([tablet] + mySigns, 1, showFeatures=False) # show pretty transcription

#### Calculate once

In [97]:
calcTablet('P006377')

---


##### obverse

1 x 1(N46) = 1 x 1 x 60 = 60
1 x 5(N19) = 1 x 5 x 6 = 30
4 x 3(N04) = 4 x 3 x 1 = 12
2 x 1(N41) = 2 x 1 x 0.2 = 0.4
8 x 1(N19) = 8 x 1 x 6 = 48
2 x 3(N19) = 2 x 3 x 6 = 36
5 x 1(N04) = 5 x 1 x 1 = 5
3 x 2(N04) = 3 x 2 x 1 = 6
3 x 2(N19) = 3 x 2 x 6 = 36
1 x 2(N41) = 1 x 2 x 0.2 = 0.4
2 x 4(N04) = 2 x 4 x 1 = 8
1 x 3(N41) = 1 x 3 x 0.2 = 0.6000000000000001
1 x 4(N19) = 1 x 4 x 6 = 24


**total** = **266.4**

##### reverse

1 x 1(N36) = 1 x 1 x 180 = 180
1 x 1(N46) = 1 x 1 x 60 = 60
1 x 8(N19) = 1 x 8 x 6 = 48
1 x 5(N04) = 1 x 5 x 1 = 5
1 x 3(N41) = 1 x 3 x 0.2 = 0.6000000000000001


**total** = **293.6**

#### Calculate ad lib
Now the first 5 tablets.

In [98]:
for tablet in sorted(pNums)[0:5]:
    calcTablet(tablet)

---


##### obverse

1 x 1(N04) = 1 x 1 x 1 = 1


**total** = **1**

---


##### obverse

2 x 1(N36) = 2 x 1 x 180 = 360


**total** = **360**

##### reverse

1 x 3(N36) = 1 x 3 x 180 = 540


**total** = **540**

---


##### obverse

1 x 5(N36) = 1 x 5 x 180 = 900
4 x 1(N46) = 4 x 1 x 60 = 240
2 x 1(N36) = 2 x 1 x 180 = 360
1 x 2(N46) = 1 x 2 x 60 = 120
1 x 1(N04) = 1 x 1 x 1 = 1
1 x 1(N19) = 1 x 1 x 6 = 6
2 x 2(N36) = 2 x 2 x 180 = 720
1 x 2(N19) = 1 x 2 x 6 = 12


**total** = **2359**

---


##### obverse

1 x 1(N04) = 1 x 1 x 1 = 1


**total** = **1**

---


##### obverse

1 x 1(N36) = 1 x 1 x 180 = 180


**total** = **180**

## More ...

The capabilities of search are endless.
Often it is the quickest way to focus on a phenomenon, quicker than hand coding all the logic
to retrieve your patterns.

That said, it is not a matter of either-or. You can use coding to craft your templates,
and you can use coding to process your results.

It's an explosive mix. A later chapter in this tutorial shows
even more [cases](cases.ipynb).

Have another look at
[the manual](https://annotation.github.io/text-fabric/Use/Search/).

# Next

[signs](signs.ipynb)

*Back to the basics ...*

All chapters:
[start](start.ipynb)
[imagery](imagery.ipynb)
[steps](steps.ipynb)
[search](search.ipynb)
[signs](signs.ipynb)
[quads](quads.ipynb)
[jumps](jumps.ipynb)
[cases](cases.ipynb)