<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/logo.png" width="128"/>
<img align="right" src="images/etcbc.png" width="128"/>
<img align="right" src="images/dans.png" width="128"/>

---

To get started: consult [start](start.ipynb)

---

# Search Introduction

*Search* in Text-Fabric is a template based way of looking for structural patterns in your dataset.

It is inspired by the idea of
[topographic query](http://books.google.nl/books?id=9ggOBRz1dO4C).

Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for
complicated patterns with the power of programmatically processing the results.

This notebook will show you how to get up and running.

## Alternative for hand-coding

Search is a powerful feature for a wide range of purposes.

Quite a bit of the implementation work has been dedicated to optimize performance.
Yet I do not pretend to have found optimal strategies for all 
possible search templates.
Some search tasks may turn out to be somewhat costly or even very costly.

That being said, I think search might turn out helpful in many cases,
especially by reducing the amount of hand-coding needed to work with special subsets of your data.

## Easy command

Search is as simple as saying (just an example)

```python
results = A.search(template)
A.show(results)
```

See all ins and outs in the
[search template docs](https://annotation.github.io/text-fabric/Use/Search/#search-templates).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [18]:
A = use('dss:hot', hoist=globals())

	connecting to online GitHub repo annotation/app-dss ... connected
	code/app.py...downloaded
	code/config.py...downloaded
	code/static...directory
		code/static/display.css...downloaded
		code/static/logo.png...downloaded
	OK
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-dss/code:
	#fa0361f04d87c2314095bdefc4a66645d6e008e8 (latest commit)
	connecting to online GitHub repo etcbc/dss ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/dss/tf/0.3:
	#62c1b150a7265916e48727b2286f192443d968fc (latest commit)


# Basic search command

We start with the most simple form of issuing a query.

Let's look for the verbs in the `hitpael` with an uncertain character in it.

All work involved in searching takes place under the hood.

In [19]:
query = '''
word vs=hitpael
  sign unc
'''
results = A.search(query)
A.table(results, end=10)

  1.39s 399 results


n,p,word,sign
1,CD 12:23,מתהלכים,מ
2,CD 12:23,מתהלכים,כ
3,CD 12:23,מתהלכים,י
4,CD 12:23,מתהלכים,ם
5,CD 15:11,יתפתה,ת
6,CD 19:4,יתהלכו,י
7,1QS 7:24,יתערב,י
8,1QS 7:24,יתערב,ת
9,1QSa 1:11,התיצב,ת
10,1QSb 4:2,התערב,ה


We have multiple uncertain signs per word, 
and for each sign we see the whole word repeated.

We can condense our results to words:

In [20]:
A.table(results, end=10, condensed=True, condenseType='word')

n,p,word,sign,sign.1,sign.2,sign.3
1,CD 12:23,מתהלכים,כ,י,ם,מ
2,CD 15:11,יתפתה,ת,,,
3,CD 19:4,יתהלכו,י,,,
4,1QS 7:24,יתערב,י,ת,,
5,1QSa 1:11,התיצב,ת,,,
6,1QSb 4:2,התערב,ה,,,
7,1QM 13:12,יתהלכו,ו,,,
8,1QM 15:15,מתעתדים,ת,,,
9,1QM 16:6,התקרב,ר,,,
10,1QM 16:13,מתקרבים,י,,,


We can show them in rich layout as well:

In [21]:
A.table(results, end=10, condensed=True, condenseType='word', fmt='layout-orig-full')

n,p,word,sign,sign.1,sign.2,sign.3
1,CD 12:23,מתהלכים,כ,י,ם,מ
2,CD 15:11,יתפתה,ת,,,
3,CD 19:4,יתהלכו,י,,,
4,1QS 7:24,יתערב,י,ת,,
5,1QSa 1:11,התיצב,ת,,,
6,1QSb 4:2,התערב,ה,,,
7,1QM 13:12,יתהלכו,ו,,,
8,1QM 15:15,מתעתדים,ת,,,
9,1QM 16:6,התקרב,ר,,,
10,1QM 16:13,מתקרבים,י,,,


Note that we can choose start and/or end points in the results list.

In [22]:
A.table(results, start=100, end=110, condensed=True, condenseType='word', fmt='layout-orig-full')

n,p,word,sign,sign.1,Unnamed: 5,Unnamed: 6,Unnamed: 7
100,4Q405 f23i:11,יתכונו,ו,י,,,
101,4Q405 f25:3,מתהלכים,מ,ת,ה,כ,ם
102,4Q415 f12:4,התשׁשׁו #,שׁ,ו,,,
103,4Q417 f1i:12,התהלכ׳ו,ו,,,,
104,4Q417 f1i:23,תתחזק,ח,ז,ת,,
105,4Q417 f2i:8,תתהלך,ת,ה,,,
106,4Q417 f19:4,יתהלך,ך,י,ת,ה,
107,4Q418 f2+2a_c:4,יתר,ר,,,,
108,4Q418 f9+9a_c:9,תתהלך,ת,ל,ך,ת,
109,4Q418 f9+9a_c:10,תתהלך,ת,ת,ה,,


We can show the results more fully with `show()`.

In [23]:
A.show(results, fmt='layout-orig-full', start=1, end=3)

# Condense results

There are two fundamentally different ways of presenting the results: condensed and uncondensed.

In **uncondensed** view, all results are listed individually.
You can keep track of which parts belong to which results.
The display can become unwieldy.

This is the default view, because it is the straightest, most logical, answer to your query.

In **condensed** view all nodes of all results are grouped in containers first (e.g. lines), and then presented 
container by container.
You loose the information of what parts belong to what result.

As an example of the difference, we look for all proper nouns.

In [39]:
query = '''
line scroll=1Q1
  word sp=subs cl=prp
'''

Note that you can have comments in a search template. Comment lines start with a `%`.

In [40]:
results = A.search(query)
A.table(results, end=100)

  0.55s 20 results


n,p,line,word
1,1Q1 f2:3,יהוה אלהים לאשׁה מה זאת עשׁית ותאמר האשׁה הנחשׁ השׁיא׳ני ואכל ׃ ויאמר יהוה,יהוה
2,1Q1 f2:3,יהוה אלהים לאשׁה מה זאת עשׁית ותאמר האשׁה הנחשׁ השׁיא׳ני ואכל ׃ ויאמר יהוה,יהוה
3,1Q1 f3:1,בקרני׳ו וילך אברהם ויקח את האיל ויעל׳הו לעלה תחת בנ׳ו ׃,אברהם
4,1Q1 f3:2,ויקרא אברהם את שׁם המקום ההוא יהוה יראה אשׁר יאמר היום בהר,אברהם
5,1Q1 f3:2,ויקרא אברהם את שׁם המקום ההוא יהוה יראה אשׁר יאמר היום בהר,יהוה
6,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,יהוה
7,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,יהוה
8,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,אברהם
9,1Q1 f4:1,ε ויקם שׁדה עפרון אשׁר,עפרון
10,1Q1 f4:2,במכפלה אשׁר לפני ממרא השׁדה והמערה אשׁר ב׳ו וכול העץ,מכפלה


We zoom in on line 3 of fragment `f3`:

In [41]:
query = '''
line scroll=1Q1 fragment=f3 line=3
  word sp=subs cl=prp
'''

Note that you can have comments in a search template. Comment lines start with a `%`.

In [43]:
results = A.search(query)
A.table(results)

  0.56s 3 results


n,p,line,word
1,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,יהוה
2,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,יהוה
3,1Q1 f3:3,יהוה יראה ׃ ויקרא מלאך יהוה אל אברהם שׁנית מן השׁמים ׃,אברהם


Let's expand the results display:

In [44]:
A.show(results)

As you see, the results are listed per result tuple, even if they occur all in the same line.
This way you can keep track of what exactly belongs to each result.

Now in condensed mode:

In [45]:
A.show(results, condensed=True)

This line has 3 results, and all of them are highlighted in the same line display.

We can modify the container in which we see our results.

By default, it is `line`, but we can make it `fragment` as well:

In [46]:
A.show(results, condensed=True, condenseType='fragment')

We now see the the displays of the whole fragment, with the line with the proper names in it highlighted and the proper names 
themselves highlighted as well.

# Custom highlighting

Let us make a new search where we look for two different things in the same line.

We can apply different highlight colors to different parts of the result.
The signs in the pair are member 0 and 1 of the result tuples.
The members that we do not map, will not be highlighted.
The members that we map to the empty string will be highlighted with the default color.

**NB:** Choose your colors from the
[CSS specification](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).

In [13]:
query = '''
line
  sign missing=1
  sign question=1
  sign damage=1
'''

In [14]:
results = A.search(query)
A.table(results, end=10)

  0.56s 776 results


n,p,line,sign,sign.1,sign.2
1,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,[{d},ra#?-,ra#?-
2,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,x]-,ra#?-,ra#?-
3,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,[szu-,lim?],sza#
4,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,bu-,lim?],sza#
5,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],lim?],sza#
6,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,li#-
7,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,iq#-
8,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,ab#-
9,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,li#?-
10,P510534 reverse:20',li#-il#-qe2#?-[szu-ma?],[szu-,qe2#?-,li#-


In [15]:
A.table(results, end=10, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

n,p,line,sign,sign.1,sign.2
1,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,[{d},ra#?-,ra#?-
2,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,x]-,ra#?-,ra#?-
3,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,[szu-,lim?],sza#
4,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,bu-,lim?],sza#
5,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],lim?],sza#
6,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,li#-
7,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,iq#-
8,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,ab#-
9,P510534 reverse:15',li#-iq#-bi-ma da-ba-ab#-szu li#?-in-na-me-[er],[er],li#?-,li#?-
10,P510534 reverse:20',li#-il#-qe2#?-[szu-ma?],[szu-,qe2#?-,li#-


In [16]:
A.show(results, end=10, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

Color mapping works best for uncondensed results. If you condense results, some nodes may occupy
different positions in different results. It is unpredictable which color will be used 
for such nodes:

In [17]:
A.show(results, condensed=True, end=10, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

You can specify to what container you want to condense. By default, everything is condensed to lines.

Let's change that to faces.
Note that the `end` parameter counts the number of faces now.

In [18]:
A.show(results, end=2, condensed=True, condenseType='face', colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

# Constraining order
You can stipulate an order on the things in your template.
You only have to put a relational operator between them.
Say we want only results where the damage follows the missing.

In [19]:
query = '''
line
  sign question=1
  sign missing=1
  < sign damage=1
'''

In [20]:
results = A.search(query)
A.table(results, end=10)

  0.56s 372 results


n,p,line,sign,sign.1,sign.2
1,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,ra#?-,[{d},ra#?-
2,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,ra#?-,x]-,ra#?-
3,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],[szu-,sza#
4,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],bu-,sza#
5,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],lim?],sza#
6,P510550 obverse:9,_ha-za-nu-um{sar}_ i-ka-am#? [x] x-al#-la-x-ma,am#?,[x],al#-
7,P510565 reverse:4,[a-na?] _e2_ {d}utu szu-ri-ba-am#,na?],[a-,am#
8,P510565 reverse:4,[a-na?] _e2_ {d}utu szu-ri-ba-am#,na?],na?],am#
9,P510567 reverse:2,[...] ia#? _sag-us2_,ia#?,[...],ia#?
10,P510586 obverse:11,[{disz}]i-na-e2-ul-masz-numun# a-na#? [ma-ah-ri-ka?],na#?,[{disz}],numun#


We can also require the things to be adjacent.

In [21]:
query = '''
line
  sign question=1
  sign missing=1
  <: sign damage=1
'''

In [22]:
results = A.search(query)
A.table(results, end=10)
A.show(results, end=10, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

  0.55s 84 results


n,p,line,sign,sign.1,sign.2
1,P510530 obverse:5,{disz}[{d}x]-ra#?-bi il-qe2-a-am-ma,ra#?-,x]-,ra#?-
2,P510530 reverse:11,[szu-bu-lim?] sza# ta-asz-pu-ra-am,lim?],lim?],sza#
3,P510567 reverse:2,[...] ia#? _sag-us2_,ia#?,[...],ia#?
4,P510586 obverse:27,[asz-szum ha]-bil2#?-we?-du-um sza asz-pu-ra#-[ak-kum],bil2#?-,ha]-,bil2#?-
5,P510586 obverse:27,[asz-szum ha]-bil2#?-we?-du-um sza asz-pu-ra#-[ak-kum],we?-,ha]-,bil2#?-
6,P510588 left:1':2,[i?]-na# zimbir{ki#},[i?]-,[i?]-,na#
7,P510597 obverse:12,[a?-na] babila2#[{ki}],[a?-,na],babila2#
8,P510657 envelope - obverse:1,[tup]-pi#? a-hi-i3-li2-ia,pi#?,[tup]-,pi#?
9,P510660 reverse:7,"[t,e4?]-em# _a-sza3_ ma-la a-di i-na-an-na","[t,e4?]-","[t,e4?]-",em#
10,P510662 reverse:1,[1(disz)?] _tug2#?_ a-la-qe2-am,[1(disz)?],[1(disz)?],_tug2#?_


Finally, we make the three things fully adjacent in fixed order:

In [23]:
query = '''
line
  sign question=1
  <: sign missing=1
  <: sign damage=1
'''

In [24]:
results = A.search(query)
A.table(results, end=10)
A.show(results, end=10, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

  0.55s 7 results


n,p,line,sign,sign.1,sign.2
1,P510597 obverse:12,[a?-na] babila2#[{ki}],[a?-,na],babila2#
2,P292756 obverse:21,[sza? (...)] it#-ti-ia u2-ul in-nam-ru-ma,[sza?,(...)],it#-
3,P313393 obverse:3,um-ma ni?-[...]-ma#,ni?-,[...]-,ma#
4,P386470 reverse:15',[u2?-sze]-et#-bu-u2-[szu?],[u2?-,sze]-,et#-
5,P305755 reverse:3,u3 KA#?-[x]-UM#? a-hu-u2-szu qa2-ta-ti-szu-ma,KA#?-,[x]-,UM#?
6,P307155 obverse:11,mi-im#-ma# sza ta#-GA?-[x]-BU#? [x],GA?-,[x]-,BU#?
7,P308015 reverse:2,"{disz}UD?-[x] {ki#}-li-ib-lu-ut,",UD?-,[x],{ki#}-


# Custom feature display

We would like to see the original atf and the flags for signs.
The way to do that, is to perform a `A.prettySetup(features)` first.

We concentrate on one specific result.

In [25]:
A.displaySetup(extraFeatures='atf flags')

In [26]:
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

The features without meaningful values have been left out. We can also change that by passing a set of values
we think are not meaningful. The default set is 

```python
{None, 'NA', 'none', 'unknown'}
```

In [27]:
A.displaySetup(noneValues=set())
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

This makes clear that it is convenient to keep `None` in the `noneValues`:

In [28]:
A.displaySetup(noneValues={None})
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

We can even choose to suppress other values, e.g. the value 1.

That will remove all the features such as `question`, `missing`.

In [29]:
A.displaySetup(noneValues={None, 'NA', 'unknown', 1})
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

In the rest of the notebook we stick to our normal setup, so we reset the extra features.

In [30]:
A.displayReset()
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

# Features from queries

In earlier displays we saw the *types* of signs, because the query mentioned it.

Suppose we want to display the type also here, then we can modify the query by mentioning the feature `type`.

But we do not want to impose extra limitations, so we say `type*`, meaning: no conditions on type whatsoever.

In [31]:
query = '''
line
  sign question=1 type*
  <: sign missing=1
  <: sign damage=1
'''

In [32]:
results = A.search(query)
A.show(results, start=4, end=4, colorMap={0: '', 2: 'cyan', 3: 'magenta', 4: 'lightsalmon'})

  0.56s 7 results


# Show your own tuples

So far we have `show()`n the results of searches.
But you can also construct your own tuples and show them.

Whereas you can use search to get a pretty good approximation of what you want, most of the times
you do not arrive precisely at your destination.

Here is an example where we use search to come close, and then work our way to produce the end result.

## More missing than damaged

We look for lines that have more missing signs than damaged signs.

In our search templates we cannot formulate that a feature has different values on two nodes in the template.
We could spell out all possible combinations of values and make a search template for each of them, 
but that is needlessly complex.

Let's first use search to find all clauses containing missing and damaged signs.

In [33]:
query = '''
line
  sign missing
  sign damage
'''
results = A.search(query)

  0.32s 9900 results


Now the hand coding begins. We are going to extract the tuples we want.

In [34]:
lines = {}
for (l, m, d) in results:
  lines.setdefault(l, (set(), set()))
  lines[l][0].add(m)
  lines[l][1].add(d)
print(f'{len(lines)} lines')

3031 lines


Now we have all lines with both missing and damaged signs, without duplicates.

For each line we have a set with its missing signs and one with its damaged signs.

We filter in order to retain the lines with more missing than damaged signs.
We put all missing signs in one big set and all damaged signs in one big set.

In [35]:
answer = []
missing = set()
damage = set()

for (l, (m, d)) in lines.items():
  if len(m) > len(d):
    answer.append((l, *m, *d))
    missing |= m
    damage |= d
len(answer)

1345

In [36]:
answer[0]

(230894, 955, 956, 954)

We are going to make a dictionary of highligts: one color for the missing signs and one for the damaged.

In [37]:
highlights = {}
colorM = 'lightsalmon'
colorD = 'mediumaquamarine'
for s in missing:
  highlights[s] = colorM
for s in damage:
  highlights[s] = colorD

And now we can show them:

In [38]:
A.table(answer, start=1, end=10, highlights=highlights)

n,p,line,sign,sign.1,sign.2,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
1,P509377 reverse:2,_sze_ u3 _ku3-babbar_ ad-di-na-ak#-[kum-ma],[kum-,ma],ak#-,,,,,,
2,P481192 obverse:10',_a-sza3_ u2-ul e-ri#-[isz ...],[isz,...],ri#-,,,,,,
3,P481192 obverse:13',[x (x)] x-a-tum it-ta#-za-az-za i-mu-ut-ta,[x,(x)],ta#-,,,,,,
4,P481192 obverse:14',[...] x tup-pu# [x x] a-na# ma#-har a-[wi-le-e],[...],[x,x],[wi-,le-,e],na#,ma#-,pu#
5,P481192 reverse:13',[x (x)] x-ma a-na ma-ah-ri#-ka# il-li-kam x x [(x) (x)],(x)],[(x),[x,(x)],ri#-,ka#,,,
6,P389256 obverse:4',uz-ni-ia li#-[x ...],[x,...],li#-,,,,,,
7,P389256 left:2,[...] i-di-in# [...],[...],[...],in#,,,,,,
8,P510527 reverse:1,[a-wi]-lu#-u2 i-na mu-uh2-hi ip-qu2-i3-li2-szu _di-ku5_,[a-,wi]-,lu#-,,,,,,
9,P510528 reverse:1,[x x] x x ra# [...] [tup]-pi2#-im# lu x [x x],[x,x],[...],[tup]-,[x,x],pi2#-,im#,ra#
10,P510530 obverse:3,"[um-ma {d}na-bi]-um#-na-s,i-ir-ma",[um-,ma,{d},na-,bi]-,um#-,,,


As you see, you have total control.

---

All chapters:

* **[start](start.ipynb)** become an expert in creating pretty displays of your text structures
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **search** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[similarLines](similarLines.ipynb)** spot the similarities between lines

---

See the [cookbook](cookbook) for recipes for small, concrete tasks.