<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Search Introduction

*Search* in Text-Fabric is a template based way of looking for structural patterns in your dataset.

Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for
complicated syntactical patterns with the power of programmatically processing the results.

This notebook will show you how to get up and running.

## Alternative for hand-coding

Search is a powerful feature for a wide range of purposes.

Quite a bit of the implementation work has been dedicated to optimize performance.
Yet I do not pretend to have found optimal strategies for all
possible search templates.
Some search tasks may turn out to be somewhat costly or even very costly.

That being said, I think search might turn out helpful in many cases,
especially by reducing the amount of hand-coding needed to work with special subsets of your data.

## Easy command

Search is as simple as saying (just an example)

```python
results = A.search(template)
A.show(results)
```

See all ins and outs in the
[search template docs](https://annotation.github.io/text-fabric/tf/about/searchusage.html).

In [1]:
%load_ext autoreload
%autoreload 2

# Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
explained in the [start tutorial](start.ipynb).

In [2]:
from tf.app import use

In [3]:
# A = use('dhammapada', hoist=globals())
A = use("dhammapada:clone", checkout="clone", hoist=globals())

This is Text-Fabric 9.1.12
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

16 features found and 0 ignored


# Basic search command

We start with the most simple form of issuing a query.
Let's search for the word `Māro` in the Pali text.
We also want to show the clauses in which they occur.

But first: how do you type that `ā`? To be honest: I don't know either.

Text-Fabric has a handy function to give you a palette of all the non-ascii characters in the corpus:

In [4]:
A.specialCharacters()

Now, if you click on a letter, it is stored on your clipboard, ready to paste.
To help you remember where you clicked last, the letter becomes yellow.

In [5]:
query = """
clause
  word pali=Māro
"""
results = A.search(query)

  0.01s 5 results


We have the results. We only need to display them. Here they are in a table:

In [6]:
A.table(results)

n,p,clause,word
1,1 7,subhānupassiṃ viharantaṃ indriyesu asaṃvutaṃ bhojanamhi câmattaññuṃ kusītaṃ hīnavīriyaṃ taṃ ve pasahatī Māro vāto rukkhaṃ va dubbalaṃ.,Māro
2,1 8,asubhānupassiṃ viharantaṃ indriyesu susaṃvutaṃ bhojanamhi ca mattaññuṃ saddhaṃ āraddhavīriyaṃ taṃ [ve] na-ppasahatī Māro vāto selaṃ va pabbataṃ.,Māro
3,4 57,tesaṃ sampannasīlānaṃ appamādavihārinaṃ sammadaññāvimuttānaṃ Māro maggaṃ na vindati.,Māro
4,8 105,n' eva devo na gandhabbo na Māro saha Brahmunā jitaṃ apajitaṃ kayrā tathārūpassa jantuno.,Māro
5,24 337,taṃ vo vadāmi bhaddaṃ vo yāvant' ettha samāgatā taṇhāya mūlaṃ khanatha usīrattho va bīraṇaṃ mā vo naḷaṃ va soto va Māro bhañji punappunaṃ.,Māro


The hyperlinks in the `p` column point to the tipitake site, to the stanza most relevant to the individual results.

Here is the first one in a pretty display:

In [7]:
A.show(results, end=1)

We can also stop unraveling structure at the clause level:

In [8]:
A.show(results, end=2, baseTypes={"clause"})

# Condense results

There are two fundamentally different ways of presenting the results: condensed and uncondensed.

In **uncondensed** view, all results are listed individually.
You can keep track of which parts belong to which results.
The display can become unwieldy.

This is the default view, because it is the straightest, most logical, answer to your query.

In **condensed** view all nodes of all results are grouped in containers first (e.g. stanzas), and then presented
container by container.
You loose the information of what parts belong to what result.

Here is an example of the difference.

In [9]:
query = """
clause
  word pali=maṃ
"""

results = A.search(query)

  0.01s 7 results


In [10]:
A.table(results)

n,p,clause,word
1,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
2,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
3,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
4,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
5,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
6,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ
7,26 414,yo' maṃ palipathaṃ duggaṃ saṃsāraṃ moham accagā tiṇṇo pāragato jhāyī anejo akathaṃkathī anupādāya nibbuto tam -,maṃ


There are multiple occurrences of `maṃ` in the clauses.

Now in condensed mode:

In [12]:
A.table(results, condensed=True)

n,p,stanza,clause,word,word.1,word.2
1,1 3,,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,maṃ,maṃ
2,1 4,,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,maṃ,maṃ
3,26 414,,yo' maṃ palipathaṃ duggaṃ saṃsāraṃ moham accagā tiṇṇo pāragato jhāyī anejo akathaṃkathī anupādāya nibbuto tam -,maṃ,,


Much more compact.

And in a pretty display we get for the first 6 hits:

In [13]:
A.show(results, end=2, condensed=True)

We can make it more compact by condensing into *clauses* instead of *stanzas*:

In [14]:
A.show(results, end=2, condensed=True, condenseType="clause")

# Custom highlighting

We can apply different highlight colors to different parts of the result.
The words in the pair are member 5 and 6 of the result tuples.
The members that we do not map, will not be highlighted.
The members that we map to the empty string will be highlighted with the default color.

**NB:** Choose your colors from the
[CSS specification](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).

In [15]:
query = """
clause
  word pali=maṃ
  word pali=avadhi
"""

results = A.search(query)

  0.02s 6 results


In [17]:
A.table(results, condensed=False, colorMap={1: "", 2: "cyan", 3: "magenta"})

n,p,clause,word,word.1
1,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
2,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
3,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
4,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
5,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
6,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi


Or with more glory:

In [21]:
A.show(results, end=2, condensed=False, condenseType="sentence", colorMap={1: "", 2: "cyan", 3: "magenta"})

Color mapping works best for uncondensed results. If you condense results, some nodes may occupy
different positions in different results. It is unpredictable which color will be used
for such nodes:

In [22]:
A.show(results, end=1, condensed=True, condenseType="sentence", colorMap={1: "", 2: "cyan", 3: "magenta"})

# Constraining order
You can stipulate an order on the words in your template.
You only have to put a relational operator between them.
Say we want only results where `maṃ` follows `avadhi`.

In [23]:
A.specialCharacters()

In [24]:
query = """
clause
  word pali=maṃ
  > word pali=avadhi
"""

results = A.search(query)

  0.02s 4 results


In [26]:
A.table(results, colorMap={1: "", 2: "cyan", 3: "magenta"})

n,p,clause,word,word.1
1,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
2,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
3,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
4,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi


We can also require the words to be adjacent.

In [27]:
query = """
clause
  word pali=maṃ
  :> word pali=avadhi
"""

results = A.search(query)

  0.02s 2 results


In [28]:
A.table(results, colorMap={1: "", 2: "cyan", 3: "magenta"})

n,p,clause,word,word.1
1,1 3,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi
2,1 4,"""akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me"",",maṃ,avadhi


# Custom feature display

We would like to see the frequency.
The way to do that, is to perform a display setup  first.
By the way, we can also include the highlight colors in the display setup.

In [29]:
A.displaySetup(
    extraFeatures="freq_occ", colorMap={2: "lightsalmon", 3: "mediumaquamarine"}
)

In [31]:
A.show(results, condensed=False, condenseType="sentence")

Now we completely reset the display customization.

In [None]:
A.displayReset()

As you see, you have total control.

# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **search** turbo charge your hand-coding with search templates

CC-BY Dirk Roorda