# Basic Python and Text-Fabric

By Christian Højgaard Jensen (chj@dbi.edu)

*Adapted from Martijn Naaijer (https://github.com/MartijnNaaijer/Shebanq_Course_Files/blob/master/Introduction_to_text_fabric.ipynb)*

Welcome to this course on text-fabric. This course will teach you how to extract data from the Eep Talstra Centre for Bible and Computer (ETCBC) database using the Python package text-fabric. You do not need to have any Python knowledge to do this course, because you will learn Python and text-fabric at the same time.   

Text-fabric was developed by Dirk Roorda and Wido van Peursen as part of the SHEBANQ-project. In this project the website [SHEBANQ website](https://shebanq.ancient-data.org/) was developed. On this website you can inspect the text of the Hebrew Bible and you can make queries on this text with the features of the ETCBC database using the Mini Query Language (MQL). Text-fabric serves as a research tool to make datasets that can be analyzed further. The whole ETCBC database is archived at the website of [Data Archiving and Networked Sevices (DANS)](https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:62732), from where it can be downloaded.

In this course you will first learn some basic Python, and then move to text-fabric as soon as possible.

## About Python

We start with some basic things you need to know about the Python language. It includes basic data types, data structures and data flow control. When you have finished this notebook, you should know the following basic Python things:

* different data-types (string, integer, float, boolean, list, tuple, dictionary, set)
* slicing strings, string concatenation, lower() and upper()
* print()
* control flow tools (if-statements (if, elif, else), for-loop, range(), while-loop, write your own functions)
* use import
* export data to text files and csv files

Python has three main versions: Python 1, Python 2 and Python 3. This notebook works with the newest version, Python 3. Many programmers still use Python 2, because there is no compatibility between Python 3 and the libraries they want to use. If you want to know more about different versions of Python, look [here](https://wiki.python.org/moin/Python2orPython3).

You can find a lot of good information about Python on the internet. A valuable source is the Python documentation: https://docs.python.org/3/

This introduction to Python is by no means a complete Python course. If you want to know more about Python it is recommended to follow a complete course. Good online courses can be found here:
https://www.codecademy.com/learn/python

On Coursera you can find many different Python courses. Some are more general, others are focused on data science.
https://www.coursera.org/courses?languages=en&query=python

## Basic Python, data types, print

We start the course with some basic Python:

In [None]:
print("Hello world!")

This statement prints the text "Hello world!". print() is a function, and it prints what is inside the parentheses. We call this the argument. The part between the quotes is called a string and it consists of a number of characters, including a space and an exclamation mark. The function print() can have more arguments.

In [None]:
print("Hello world!", "How are you?")

You can add comments after #. It is strongly recommended to add comments to your code, because it clarifies the code for new readers, including yourself two months after you have written it.

In [None]:
print("Hello world!", "How are you?") #This print() function has two arguments

The string is not the only data type in Python. There are also other data types, such as the integer. In the following cell we assign the value 5 to the variable a. 5 is an integer. Integers are whole numbers.

In [None]:
a = 5 # 5 is an integer.

Python stores the data type of an object in memory. You can find out the data type of an object with the function type().

In [None]:
print(type(a))

In [None]:
print(type("Hello world!"))

The real numbers are a separate data type, called float. In the following cell numeric values are assigned to variable names with the names b, c and d. Assignment takes place with the = sign.

In [None]:
b = 5. # this is a float, note the decimal '.'.
c = 2.3
d = float(5)

Look [here](https://www.quora.com/What-is-the-difference-between-floats-and-integers) for an explanation of the difference between integers and floats.

In [None]:
print(type(b))
print(type(c))
print(type(d))

In [None]:
c / 3

We can use the variable name to do calculations.

In [None]:
print(b * 5) # multiplication
print(b ** 5) # power

It is important to know what the type of an object is. Let's have a look at the difference between a string and an integer.

What happens if we do '5' + '5' ?

In [None]:
'5' + '5' # This is called string concatenation. Strings are 'glued' together.

The same happens if we do the following.

In [None]:
'5' * 2

This differs from the addition of two integers.

In [None]:
5 + 5

This happens if you want to add a string to an integer.

In [None]:
'5' + 5

You can convert the string '5' into an integer with int().

In [None]:
num_str = '5'
int(num_str) + 5

Another important data type is the boolean. A boolean variable can have two values: True and False.

In [None]:
bool_var = True
type(bool_var)

## Conditions, if statements


Notice the difference between an assignment (=) and 'is equal to' (==). 

In [None]:
4 == 5 # this is evaluated to False

!= is 'is not equal to' .

In [None]:
4 != 5

There is also > (greater than), < (smaller than), >= (greater than or equal to), and <= (smaller than or equal to)

In [None]:
4 <= 5

In [None]:
4 >= 5

The if-statement checks if a certain condition is evaluated to True.

In [None]:
a = 0

if a == 0:
    print('Hello,')

In [None]:
a += 1
print(a)

In [None]:
if a != 0: 
    print('Eep')

The if statament ends with a colon. In Python it is required that the line after the colon is indented (with a tab or 4 spaces). Later you will see the colon also after for and while statements and in a function header, which starts with def.

To the if statement you can add zero or more elif's and an optional else.

In [None]:
b = 12

if b == 0:
    print('b is 0') #what is printed here is one string
elif b < 10:
    print('b is greater than 0 and smaller than 10')
elif b < 15:
    print('b is greater than 9 and smaller than 15')
else:
    print('b is greater than 14')

## While loops

With while you loop until a specific condition is not evaluated to the value True anymore.

In [None]:
number = 10

while number > 0:
    print(number)
    number = number - 1

The example above can be condenced exchanging "number = number - 1" for "number -= 1". Similarly you have +=, \*= and /= in Python.

In [None]:
number = 10

while number > 0:
    print(number)
    number -= 1

In [None]:
number2 = 2

while number2 < 25:
    print(number2)
    number2 += 2

## Writing functions

You use a function for operations that have to be done more than once in a program. The advantage of using functions is that you need to write the piece of code in the function only once, which keeps your code concise. Every time you want that piece of code to be executed, you call the function.

The structure of a function is as follows:


def functionName(arg1, arg2, ...):  
"""  
This is the docstring, here you explain what the function does. Use triple quotes.
It can cover more than one line.  
"""  
&nbsp;&nbsp;&nbsp;&nbsp;function body  
&nbsp;&nbsp;&nbsp;&nbsp;...  
&nbsp;&nbsp;&nbsp;&nbsp;...  
&nbsp;&nbsp;&nbsp;&nbsp;return(certain_object)  

    
If a function name consists of more words, as the name functionName, it is good Python style to use camel case, which means that the first letters of the words after the first word are capitalized. There is a whole document about Python style programming. You can find it here: https://www.python.org/dev/peps/pep-0008/ You can refer to it as PEP8. There are many things that you can do in more ways in Python. Often, one of those ways is considered the "Pythonic way". This Pythonic way gives you clean and efficient code. Many of these things are described in PEP8.

In [None]:
def cubicCalculator(num):
    """calculates cube of a number"""
    cub_num = num**3
    return(cub_num)

We call the function.

In [None]:
print(cubicCalculator(4))
print(cubicCalculator(6))
print(cubicCalculator(2))

If you want to work further with the result of a function, you assign its value to a new variable.

In [None]:
new_var = cubicCalculator(10)

In [None]:
new_var

A function often has more than one argument. An argument can have a default value.

In [None]:
def addition(num_a, num_b = 5):
    num_c = num_a + num_b
    return(num_c)

In the next cell the function is called. The value 10 in the function call corresponds with num_a in the function definition, and the value 12 in the function call corresponds with num_b.

In [None]:
print(addition(10, 12))

Now we give the function call only one argument. The second argument gets the default value 5 in this case.

In [None]:
print(addition(10))

## About text-fabric

Now we start using text-fabric. If you want to work with it offline, you need to install it on your computer first. For the installation, look at: https://github.com/ETCBC/text-fabric/wiki/Api.

It is important to know the features of the ETCBC database to be able to work with text-fabric. If you want to know more about the features, check the feature documentation on the Shebanq website:
https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/0_overview.html

## Importing and loading text-fabric

Every time you make a notebook in which you want to use text-fabric, you start with the following code cells, maybe with some slight modifications.

First import some modules

In [1]:
import sys, os

Text-fabric wakes up!

In [9]:
from tf.app import use
B = use('bhsa', hoist=globals())

Using etcbc/bhsa/tf - c r1.4 in C:\Users\Ejer/text-fabric-data
Using etcbc/phono/tf - c r1.1 in C:\Users\Ejer/text-fabric-data
Using etcbc/parallels/tf - c r1.1 in C:\Users\Ejer/text-fabric-data
Cannot determine the name of this notebook
Work around: call me with a self-chosen name: name='xxx'


**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="provenance of BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis">BHSA</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Writing/Hebrew" title="('Hebrew characters and transcriptions',)">Character table</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/hebrew/c/0_home.html" title="BHSA feature documentation">Feature docs</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/Bhsa/" title="bhsa API documentation">bhsa API</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/" title="text-fabric-api">Text-Fabric API 7.0.1</a> <a target="_blank" href="https://dans-labs.github.io/text-fabric/Api/General/#search-templates" title="Search Templates Introduction and Reference">Search Reference</a>

## For loops and looping over the nodes in the ETCBC database.

We are ready to work with text-fabric now. If you select words or clauses or other linguistic units with specific characteristics, in text-fabric you do this by looping over all the objects in the ETCBC database and check for each object whether it has the desired properties. You can do this by making a for loop. With a for loop you loop over a sequence.

We can make a numeric sequence with range() and loop over it with a for loop.

The for statament ends with a colon. In Python it is required that the line following the colon is indented. Later you will see the colon also after for and while statements and in a function header, which starts with def.

In [None]:
for n in range(15):
    print(n)

Look carefully at what range does in the following examples.

In [None]:
for number in range(4, 15): # with two arguments we have range(start, stop)
    print(number)

In [None]:
for number in range(4, 15, 3): # with three arguments we have range(start, stop, step)
    print(number)

Note that the arguments of range() are integers.

#### Data model

In text-fabric the ETCBC-database is constructed as a graph database. A graph consists of nodes, which are linked to each other with edges. The objects in the database are textual objects. These can be words, phrases, clauses, etc. The edges between the nodes indicate that certain objects are included in other objects (for instance, the words in a clause), but the edges can also indicate other linguistic relationships. If you want to know more about the data model, which is recommended, read the [text-fabric documentation on the data model](https://github.com/ETCBC/text-fabric/wiki/Data-model).

We are ready now to do the first search in the ETCBC database. We want to find out what the names of the biblical books are.

In [None]:
for node in N(): # N() is a so-called generator. With this line of code you loop over all the nodes in the database.
    if F.otype.v(node) == 'book':
        print(F.book.v(node))

What has happened here? In the first line the for loop loops over all the objects (nodes) in the database.
In the second line for each node is checked if its object type is 'book'. It is crucial to understand this line, especially this part: F.otype.v(node). F is the class of all node features. otype stands for object type, and v returns the value of the feature for a specific node. In other words, for every F.otype.v(node) returns the object type of every node, and with if F.otype.v(node) == 'book': it is checked if this value is equal to 'book'. If this condition is evaluated to True, in the third line the value of the feature book is returned for this node, which is the name of the book.

## Lists and retrieving booknames and lexemes

In the example above, the script prints the names of the books, but it does not remember them, so it is impossible to use them later in the program. If we want Python to remember the books names, we can store them in a list. 

A list is a sequence of elements. You can recognize it by its square brackets. Here we initialize an empty list, which we call a_list.

In [None]:
a_list = []

In [None]:
a_list

You can add elements to the list with .append() .

In [None]:
a_list.append(40)
print(a_list)

We add a string to a_list.

In [None]:
a_list.append('ETCBC')
print(a_list)

A list can also be populated manually.

In [None]:
unsorted_list = [3, 2, 1, 5]

print(unsorted_list)

You can sort a list with sorted().

In [None]:
print(sorted(unsorted_list))

And you can sort it in reversed order with the argument reverse.

In [None]:
print(sorted(unsorted_list, reverse = True))

We use the same script as above, which produces the names of the biblical books, and store them in a list.

In [None]:
book_list = [] # an empty list is initialized.

for node in N():
    if F.otype.v(node) == 'book':
        book_list.append(F.book.v(node, language='')) # every book name is appended to the list book_list
        
print(book_list) # book_list can be accessed later.

You can check if a certain element is in a list.

In [None]:
'Genesis' in book_list

In [None]:
'Genesis' not in book_list

In [None]:
'Henoch' in book_list

if we want to know what the number of elements in a list is, we use len()

In [None]:
print(len(book_list))

In the following cell we modify the code a little bit. Instead of selecting books, words are selected, and their lexemes are retrieved witht the feature lex. Each lexeme is stored in the list lex_list. The objects in the database are only abstract entities, and we can get information from them by using features that are related to the object type. lex is a feature that is characteristic of the word object.

In [None]:
lex_list = []

for node in N():
    if F.otype.v(node) == 'word':
        lexeme = F.lex.v(node)
        lex_list.append(lexeme)

What is the length of this list?

In [None]:
print(len(lex_list)) # this is a long list!

We want to retrieve the first element in this list. You can do this by using an index with []. The first element in a list is retrieved with index 0, because Python is zero based.

In [None]:
print(lex_list[10000])

Now we would like to find out what the first ten elements of the list are.

In [None]:
print(lex_list[0:9])

You may recognize the first ten words in the book of Genesis.

What is the lexeme of the last word in the Hebrew Bible? The last element in a list is retrieved with index -1.

In [None]:
print(lex_list[-1])

And the last ten elements? There is nothing after the colon, which means that it looks for everything from the tenth last element until the last element

In [None]:
print(lex_list[-10:])

We can choose any range that we want, of course.

In [None]:
print(lex_list[100000:100055]) # do you see in which biblical book this fragment can be found?

## Some details about lists and strings

A Python list can contain elements of different data types.

In [None]:
varied_list = [True, 5, 5.0, 'Hebrew']

print(type(varied_list[0]))
print(type(varied_list[1]))
print(type(varied_list[2]))
print(type(varied_list[3]))

The following does not work.

In [None]:
print(type(varied_list[4]))

The elements of a list can also be lists.

In [None]:
list_of_lists = [[1, 2], [7, 4], [5, 6]] # this is a list of lists

print(list_of_lists[1])

We use a double index to access the individual integers.

In [None]:
print(list_of_lists[0][1])

A list comprehension is a fast and clean way to create a list.

In [None]:
another_list = [number**2 for number in range(12,20)]
print(another_list)

You can find the minimum and maximum values in a list with the functions min() and max().

In [None]:
print(min(another_list))
print(max(another_list))

It can be very useful to retrieve the position of a certain value in a list. You do that with .index().

In [None]:
highest_value = max(another_list)

pos_of_highest = another_list.index(highest_value)

print(pos_of_highest)

In [None]:
print(another_list[pos_of_highest])

Here is an example of a list comprehension with strings. Look at what lower() and upper() do.

In [None]:
books_list = ['Genesis', 'Exodus', 'Leviticus']

In [None]:
book_list_lower = [book.lower() for book in books_list]
print(book_list_lower)

In [None]:
book_list_lower = [book.upper() for book in books_list]
print(book_list_lower)

Often you need to make slices of a string.

In [None]:
book_string = 'Genesis'

If you want to retrieve the first letter of a string, you use the index 0.

In [None]:
print(book_string[0])

And if you need the first three letters, you use the index 0:3.

In [None]:
print(book_string[0:3])

If you want to know the last letter, the index is -1.

In [None]:
print(book_string[-1])

And finally, if you want to know the last three letters, you use -3: . 

In [None]:
print(book_string[3:5])

## Dictionaries and counting object types

Which object types are there in the ETCBC database and how many of each of them? This is a problem that is slightly more difficult. We will walk through all the nodes again, but we have to keep track of all of them at the same time. To be able to do this, we introduce a new data type, the dictionary.

A dictionary is a structure which contains key-value pairs. You can recognize a dictionary by the curly brackets. A dictionary is initialized as follows.

In [None]:
geo_dict = {'Netherlands': 'Amsterdam', 'Germany': 'Berlin', 'Belgium': 'Brussels', 'Italy': 'Rome'}

The geo_dict contains four keys, 'Netherlands', 'Belgium','Germany', and 'Italy', and four values. Between key and value you see a colon, and the key:value pairs are separated by comma's. How many elements does this dictionary contain?

In [None]:
print(len(geo_dict))

We can retrieve the value of a specific key as follows.

In [None]:
print(geo_dict['Germany']) # returns the value of the key 'Netherlands'.

You can add new key:value pairs to a dictionary.

In [None]:
geo_dict['USA'] = 'Baton Rouge'
print(geo_dict)

You can interate over the dictionary:

In [None]:
for country in geo_dict:
    print(country)

... and over the values of the dictionary:

In [None]:
for country in geo_dict:
    print(country, geo_dict[country])

A specific key can only occur once in a dictionary, it's keys are unique. What happens if we add an existing key with a new value?

In [None]:
geo_dict['USA'] = 'Amsterdam'
print(geo_dict)

In [None]:
geo_dict['Netherlands'] = ['Amsterdam','Den Haag']
print(geo_dict['Netherlands'][1])

You see the old value is overwritten.

There is no order in a dictionary. If you want to get the capitals in a certain order, you need an alternative solution, for instance an iteration over an ordered structure like a list.

In [None]:
for country in ['Netherlands', 'Germany', 'Belgium', 'Italy']:
    print(geo_dict[country]) # returns capitals in alphabetical order

You can use a dictionary to count elements.

In [6]:
letter_list = ['a', 'b', 'c', 'a', 'b', 'c', 'b', 'b', 'b']

In [7]:
let_count_dict = {}

for letter in letter_list:
    if letter in let_count_dict:
        let_count_dict[letter] += 1
    else:
        let_count_dict[letter] = 1
        
print(let_count_dict)

{'a': 2, 'b': 5, 'c': 2}


You can make this code slightly shorter by using the defaultdict() from the collections library. A library is a package which contains a number of functions. You get access to these functions with the import statement. Many libraries need to be downloaded an installed before you can use them.

In [8]:
import collections

In [9]:
let_default_dict = collections.defaultdict(int)
for letter in letter_list:
    let_default_dict[letter] += 1
    
print(let_default_dict)

defaultdict(<class 'int'>, {'a': 2, 'b': 5, 'c': 2})


Or you can make the code even shorter with Counter(), also from the collections library.

In [10]:
let_counter_dict = collections.Counter(letter_list)

print(let_counter_dict)

Counter({'b': 5, 'a': 2, 'c': 2})


You can also use Counter() to count the characters in a string.

In [11]:
random_string = 'ajskddhcjcjfnfn  djddllk;xlkkdjn'

let_counter_dict2 = collections.Counter(random_string)

print(let_counter_dict2)

Counter({'d': 6, 'j': 5, 'k': 4, 'n': 3, 'l': 3, 'c': 2, 'f': 2, ' ': 2, 'a': 1, 's': 1, 'h': 1, ';': 1, 'x': 1})


In this example, the dictionary is sorted according to the values, but this is not always the case. If you want to sort a dictionary according to the values, you can use `most_common()`, also from the collections package:

In [12]:
let_counter_dict2.most_common()

[('d', 6),
 ('j', 5),
 ('k', 4),
 ('n', 3),
 ('l', 3),
 ('c', 2),
 ('f', 2),
 (' ', 2),
 ('a', 1),
 ('s', 1),
 ('h', 1),
 (';', 1),
 ('x', 1)]

Using index you can print the most common letter:

In [15]:
let_counter_dict2.most_common()[0]

('d', 6)

Now we return to text-fabric. Which object types are there in the ETCBC database and how many are there of each? In the following cell they are counted.
The total number of objects is called n. The different types of objects are counted in the dictionary called 'object_types'. This is a defaultdict() from the collections module. Using the defaultdict instaed of an ordinary dictionary has the advantage that new keys in the dictionary do not need to be initialized explicitly.

In [16]:
n = 0
object_types = collections.defaultdict(int)

for node in N():
    n += 1
    object_types[F.otype.v(node)] += 1

print(n, object_types)

1446799 defaultdict(<class 'int'>, {'book': 39, 'chapter': 929, 'verse': 23213, 'sentence': 63727, 'sentence_atom': 64525, 'clause': 88121, 'clause_atom': 90688, 'half_verse': 45180, 'phrase': 253207, 'phrase_atom': 267541, 'lex': 9233, 'word': 426584, 'subphrase': 113812})


With "for node in N():" you walk through all the nodes (or objects) in the database. This script will add 1 to the variable n every time it sees a new node. The next line of code is a bit more complex. Of every node, it asks what kind of object it is using the feature "otype". Note that this feature does not need to be initialized in cell 3.

When object_types is initialized it is still an empty dictionary. Once it encounters the first node, it checks the object-type, adds that object-type to the dictionary and adds 1 to its initial value 0. In F.otype.v(), F is the class of object features, which is followed by the name of the feature. This is followed by v, which stands for the value of the feature. The values for the feature otype are word, clause, sentence, and so on.

Compare print() with pprint() from the pprint module. It prints a dictionary in a clear way in alphabetical order.

In [17]:
import pprint as pp

In [18]:
pp.pprint(object_types)

defaultdict(<class 'int'>,
            {'book': 39,
             'chapter': 929,
             'clause': 88121,
             'clause_atom': 90688,
             'half_verse': 45180,
             'lex': 9233,
             'phrase': 253207,
             'phrase_atom': 267541,
             'sentence': 63727,
             'sentence_atom': 64525,
             'subphrase': 113812,
             'verse': 23213,
             'word': 426584})


The result is clear, the database contains 39 books, 929 chapters, and so on.

If we would not have used the defaultdict(), the script would have looked like this:

In [19]:
n = 0
object_types = {}

for node in N():
    n += 1
    if F.otype.v(node) in object_types:
        object_types[F.otype.v(node)] += 1
    else:
        object_types[F.otype.v(node)] = 1
        # the object-type has to be initialized

print(n, object_types)

1446799 {'book': 39, 'chapter': 929, 'verse': 23213, 'sentence': 63727, 'sentence_atom': 64525, 'clause': 88121, 'clause_atom': 90688, 'half_verse': 45180, 'phrase': 253207, 'phrase_atom': 267541, 'lex': 9233, 'word': 426584, 'subphrase': 113812}


There is a more efficient way of walking through the nodes. In general you do not need information from all the objects, but only from one specific object-type, for instance words. If this is the case, you do the following:

In [25]:
word_count = 0

for word in F.otype.s('word'): # now you walk through the word nodes only. You can do this with all the object types.
    word_count += 1
    
print(word_count)

426584


In previous examples you have seen that dictionaries are useful if you want to count things, for instance object types. It becomes a bit more difficult if you want to count the object types in each book of the Hebrew Bible. How many words, clauses, sentences, and so on can be found in each book? We solve this programming by using embedded dictionaries.

In [43]:
obj_per_book_dict = collections.defaultdict(lambda: collections.defaultdict(int)) #we do not discuss the syntax of lambda here

for node in N():
    where = T.sectionFromNode(node)
    book = where[0] # now we know the book
    obj_type = F.otype.v(node) #now we know the object type
    obj_per_book_dict[book][obj_type] += 1

pp.pprint(obj_per_book_dict)

defaultdict(<function <lambda> at 0x0000029D250818C8>,
            {'1_Chronicles': defaultdict(<class 'int'>,
                                         {'book': 1,
                                          'chapter': 29,
                                          'clause': 2514,
                                          'clause_atom': 2577,
                                          'half_verse': 1609,
                                          'lex': 515,
                                          'phrase': 7189,
                                          'phrase_atom': 8336,
                                          'sentence': 1856,
                                          'sentence_atom': 1891,
                                          'subphrase': 5828,
                                          'verse': 943,
                                          'word': 15564}),
             '1_Kings': defaultdict(<class 'int'>,
                                    {'book': 1,
                     

For information about lambda, read the [Python documentation](https://docs.python.org/3/tutorial/controlflow.html).

In the example we used the function `T.sectionFromNode` which is a very handy function to find the specific section of a particular node. The function replies a tuble (which we will get to know later), that can be further accessed using indexing.

In [30]:
print(T.sectionFromNode(1)) #printing entire tuple

print(T.sectionFromNode(1)[0]) #printing chapter

('Genesis', 1, 1)
Genesis


In [40]:
ref = ('Deuteronomium',20,1)

node = T.nodeFromSection(ref)

F.otype.v(node)

NameError: name 'la' is not defined

## TF-Search

This part of the notebook is adapted from Dirk Roorda's [tutorial](https://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/search.ipynb).

We start by loading the Search API:

We start with a simple query:

In [17]:
query = '''
book book=Samuel_I
  clause
    word sp=nmpr
'''
results = B.search(query)
B.table(results, start = 5, end = 10)

  0.78s 1868 results


n | book | clause | word
--- | --- | --- | ---
1|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם </span>|<span class="hb">אֶפְרָ֑יִם </span>
2|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ </span>|<span class="hb">אֶ֠לְקָנָה </span>
3|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ </span>|<span class="hb">יְרֹחָ֧ם </span>
4|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ </span>|<span class="hb">אֱלִיה֛וּא </span>
5|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ </span>|<span class="hb">תֹּ֥חוּ </span>
6|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ </span>|<span class="hb">צ֖וּף </span>
7|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">שֵׁ֤ם אַחַת֙ חַנָּ֔ה </span>|<span class="hb">חַנָּ֔ה </span>
8|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וְשֵׁ֥ם הַשֵּׁנִ֖ית פְּנִנָּ֑ה </span>|<span class="hb">פְּנִנָּ֑ה </span>
9|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">לִפְנִנָּה֙ יְלָדִ֔ים </span>|<span class="hb">פְנִנָּה֙ </span>
10|<span class="trb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel" sec="1_Samuel">1_Samuel</a></span>|<span class="hb">וּלְחַנָּ֖ה אֵ֥ין יְלָדִֽים׃ </span>|<span class="hb">חַנָּ֖ה </span>

In [13]:
results

[(426592, 453942, 141547),
 (426592, 453943, 141550),
 (426592, 453943, 141552),
 (426592, 453943, 141554),
 (426592, 453943, 141556),
 (426592, 453943, 141558),
 (426592, 453945, 141566),
 (426592, 453946, 141571),
 (426592, 453948, 141575),
 (426592, 453949, 141579),
 (426592, 453952, 141599),
 (426592, 453952, 141602),
 (426592, 453953, 141607),
 (426592, 453953, 141608),
 (426592, 453953, 141610),
 (426592, 453953, 141613),
 (426592, 453955, 141620),
 (426592, 453956, 141624),
 (426592, 453957, 141635),
 (426592, 453958, 141642),
 (426592, 453959, 141645),
 (426592, 453962, 141658),
 (426592, 453964, 141672),
 (426592, 453968, 141683),
 (426592, 453969, 141685),
 (426592, 453974, 141706),
 (426592, 453975, 141710),
 (426592, 453977, 141715),
 (426592, 453977, 141725),
 (426592, 453979, 141733),
 (426592, 453983, 141742),
 (426592, 453988, 141766),
 (426592, 453992, 141784),
 (426592, 453993, 141786),
 (426592, 453994, 141791),
 (426592, 453998, 141805),
 (426592, 453999, 141811),
 

The query is fetched by `B.search` which store the results in a list of tuples. Another function `B.table` is designed to render the results in a tabular format.

In the example, the hyperlinks a placed on the book name which allows us to jump to the book in Shebanq. However, it is more convenient if it is the word that is hyperlinked as below.

Also note that `B.table()` contains several arguments including the results of the query and start/end points in the result list.

In [20]:
B.table(results, start=100, end=105, linked=2)

n | book | clause | word
--- | --- | --- | ---
100|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=26&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:26" sec="1_Samuel 2:26">וָטֹ֑וב גַּ֚ם עִם־יְהוָ֔ה וְגַ֖ם עִם־אֲנָשִֽׁים׃ ס </a></span>|<span class="hb">יְהוָ֔ה </span>
101|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=27&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:27" sec="1_Samuel 2:27">וַיָּבֹ֥א אִישׁ־אֱלֹהִ֖ים אֶל־עֵלִ֑י </a></span>|<span class="hb">עֵלִ֑י </span>
102|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=27&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:27" sec="1_Samuel 2:27">כֹּ֚ה אָמַ֣ר יְהוָ֔ה </a></span>|<span class="hb">יְהוָ֔ה </span>
103|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=27&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:27" sec="1_Samuel 2:27">בִּֽהְיֹותָ֥ם בְּמִצְרַ֖יִם לְבֵ֥ית פַּרְעֹֽה׃ </a></span>|<span class="hb">מִצְרַ֖יִם </span>
104|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=28&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:28" sec="1_Samuel 2:28">וּבָחֹ֣ר אֹ֠תֹו מִכָּל־שִׁבְטֵ֨י יִשְׂרָאֵ֥ל לִי֙ לְכֹהֵ֔ן </a></span>|<span class="hb">יִשְׂרָאֵ֥ל </span>
105|<span class="trb">1_Samuel</span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Samuel_I&chapter=2&verse=28&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="1_Samuel 2:28" sec="1_Samuel 2:28">וָֽאֶתְּנָה֙ לְבֵ֣ית אָבִ֔יךָ אֶת־כָּל־אִשֵּׁ֖י בְּנֵ֥י יִשְׂרָאֵֽל׃ </a></span>|<span class="hb">יִשְׂרָאֵֽל׃ </span>

We can show the results more fully with `show()`:

In [21]:
B.show(results, end = 3)



**verse** *1*





**verse** *2*





**verse** *3*



### Condensation

The results in Verse 1 are condensed, that is, more results appear in the same visualization. In some cases, it would be useful to show the results uncondensed:

In [52]:
B.show(results, condensed = False, start = 2, end = 6)



**book** *2*





**book** *3*





**book** *4*





**book** *5*





**book** *6*



By default, everything is condensed to verses, but this setting can be changed with `condenseType`

In [26]:
B.show(results, condensed=False, condenseType='clause', start = 2, end = 6)



**book** *2*





**book** *3*





**book** *4*





**book** *5*





**book** *6*



### Custom highlighting

The hightligthing of the results can be customized to distinguish results.

First, let us try another example. We want to create a query of singular and plural words within the same clause:

In [32]:
query = '''
book book=Genesis
  clause
    phrase function=Subj
    phrase function=Pred
'''

Note, that comments can be added and start with `#`

In [35]:
results = B.search(query)
B.table(results, linked=3, end=10)

  0.99s 1501 results


n | book | clause | phrase | phrase
--- | --- | --- | --- | ---
1|<span class="trb">Genesis</span>|<span class="hb">בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:1" sec="Genesis 1:1">אֱלֹהִ֑ים </a></span>|<span class="hb">בָּרָ֣א </span>
2|<span class="trb">Genesis</span>|<span class="hb">וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=2&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:2" sec="Genesis 1:2">הָאָ֗רֶץ </a></span>|<span class="hb">הָיְתָ֥ה </span>
3|<span class="trb">Genesis</span>|<span class="hb">וַיֹּ֥אמֶר אֱלֹהִ֖ים </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3" sec="Genesis 1:3">אֱלֹהִ֖ים </a></span>|<span class="hb">יֹּ֥אמֶר </span>
4|<span class="trb">Genesis</span>|<span class="hb">יְהִ֣י אֹ֑ור </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3" sec="Genesis 1:3">אֹ֑ור </a></span>|<span class="hb">יְהִ֣י </span>
5|<span class="trb">Genesis</span>|<span class="hb">וַֽיְהִי־אֹֽור׃ </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3" sec="Genesis 1:3">אֹֽור׃ </a></span>|<span class="hb">יְהִי־</span>
6|<span class="trb">Genesis</span>|<span class="hb">וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=4&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:4" sec="Genesis 1:4">אֱלֹהִ֛ים </a></span>|<span class="hb">יַּ֧רְא </span>
7|<span class="trb">Genesis</span>|<span class="hb">וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָאֹ֖ור וּבֵ֥ין הַחֹֽשֶׁךְ׃ </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=4&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:4" sec="Genesis 1:4">אֱלֹהִ֔ים </a></span>|<span class="hb">יַּבְדֵּ֣ל </span>
8|<span class="trb">Genesis</span>|<span class="hb">וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לָאֹור֙ יֹ֔ום </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5" sec="Genesis 1:5">אֱלֹהִ֤ים׀ </a></span>|<span class="hb">יִּקְרָ֨א </span>
9|<span class="trb">Genesis</span>|<span class="hb">וַֽיְהִי־עֶ֥רֶב </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5" sec="Genesis 1:5">עֶ֥רֶב </a></span>|<span class="hb">יְהִי־</span>
10|<span class="trb">Genesis</span>|<span class="hb">וַֽיְהִי־בֹ֖קֶר </span>|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5" sec="Genesis 1:5">בֹ֖קֶר </a></span>|<span class="hb">יְהִי־</span>

In [34]:
B.show(results, end=5)



**verse** *1*





**verse** *2*





**verse** *3*





**verse** *4*





**verse** *5*



We want to distinguish the results by uncondensing the visualization and adding two different colors to the representation. For this we use the feature `colorMap`.

The results in question are member 5 and 6 of the result tuples (`word` and `word`). The members that we do not map, will not be highlighted. The members that we map to the empty string will be highlighted with the default color.

Choose colors from the [CSS specification](https://developer.mozilla.org/en-US/docs/Web/CSS/color_value).

In [36]:
B.show(results, condensed=False, colorMap={3: 'cyan', 4: 'magenta'}, end=3)



**book** *1*





**book** *2*





**book** *3*



In [37]:
results

[(426585, 427553, 651544, 651543),
 (426585, 427554, 651547, 651548),
 (426585, 427557, 651559, 651558),
 (426585, 427558, 651561, 651560),
 (426585, 427559, 651564, 651563),
 (426585, 427560, 651567, 651566),
 (426585, 427562, 651573, 651572),
 (426585, 427563, 651577, 651576),
 (426585, 427565, 651586, 651585),
 (426585, 427566, 651589, 651588),
 (426585, 427568, 651593, 651592),
 (426585, 427569, 651595, 651594),
 (426585, 427571, 651603, 651602),
 (426585, 427576, 651617, 651616),
 (426585, 427577, 651622, 651621),
 (426585, 427578, 651625, 651624),
 (426585, 427580, 651629, 651628),
 (426585, 427581, 651631, 651630),
 (426585, 427582, 651635, 651634),
 (426585, 427584, 651641, 651640),
 (426585, 427586, 651650, 651649),
 (426585, 427588, 651655, 651654),
 (426585, 427589, 651657, 651656),
 (426585, 427594, 651673, 651672),
 (426585, 427598, 651686, 651685),
 (426585, 427600, 651691, 651690),
 (426585, 427601, 651694, 651693),
 (426585, 427603, 651698, 651697),
 (426585, 427604, 65

In [41]:
file = 'pred_subj_Genesis.csv'

with open(file, 'w') as f:
    header = 'book,clause,subj,pred'
    f.write(header)
    for line in results:
        f.write('''\n{},{},{},{}'''.format(line[0], line[1], line[2], line[3]))

### Multiple values

It is easy to require several values to be true of an object:

In [None]:
query = '''
word lex=>B/
'''
results = B.search(query)

In [None]:
query = '''
word lex=>B/ language=Hebrew nu=pl
'''
results = B.search(query)

Or we use # to stipulate that the objects may not be of a certain value:

In [None]:
query = '''
word lex=>B/ language=Hebrew nu#pl
'''
results = B.search(query)

We can use | to allow for alternative values:

In [None]:
query = '''
chapter book=Genesis chapter=1|2|3
  word lex=>B/
'''
results = B.search(query)

We can check existence of a certain feature by skipping the `=value` part:

In [None]:
query = '''
word lex=HW> qere
'''
results = B.search(query)
B.show(results)

In the query above, we search for all occurences of the masculine pronominal phrase (HW>) in qere. If we want all occurences that are not in qere we can add # to the qere feature:

In [None]:
query = '''
word lex=HW> qere#
'''
results = B.search(query)

### Constraining order

So far, the word order of the results have been ignored. If we want to specify the word-order we can use the operator `<` to stipulate the word-order.

In this case, the second word will need to follow the first word.

In [None]:
query = '''
book book=Genesis
  chapter chapter=1
    verse verse=1
      clause
        word nu=sg
        < word nu=pl
'''

In [None]:
results = B.search(query)
B.table(results)

The words can also be constrained to be adjacent:

In [None]:
query = '''
book book=Genesis
  chapter chapter=1
    verse verse=1
      sentence
        word nu=sg
        <: word nu=pl
'''

In [None]:
results = B.search(query)
B.show(results)

## Creating your own tuples

So far, we have used `B.show` to display the results of the result-variable, which in fact is a tuple. However, it is rather simple to specify which values of the tuple to be displayed.

We want to explore this feature by searching for all subjects and predicates that do not correspond in number:

In [59]:
query = '''
clause
    phrase function=Subj
        =: word nu=sg|pl
        :=
    phrase function=Pred|PreO
        word sp=verb
             nu=sg|pl
'''
results = B.search(query)

  3.32s 10638 results


Note that the expression `=:` stipulates that the selected object (e.g. word) must be adjacent to the preceding slot, in this case the beginning of the phrase. Using `=:` and `:=` we ensure that the subject is both the first and the last word the phrase, that is, the subject consists of only one word.

In [60]:
B.table(results, end = 10)

n | clause | phrase | word | phrase | word
--- | --- | --- | --- | --- | ---
1|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=1&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:1">בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ </a></span>|<span class="hb">אֱלֹהִ֑ים </span>|<span class="hb">אֱלֹהִ֑ים </span>|<span class="hb">בָּרָ֣א </span>|<span class="hb">בָּרָ֣א </span>
2|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3">וַיֹּ֥אמֶר אֱלֹהִ֖ים </a></span>|<span class="hb">אֱלֹהִ֖ים </span>|<span class="hb">אֱלֹהִ֖ים </span>|<span class="hb">יֹּ֥אמֶר </span>|<span class="hb">יֹּ֥אמֶר </span>
3|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3">יְהִ֣י אֹ֑ור </a></span>|<span class="hb">אֹ֑ור </span>|<span class="hb">אֹ֑ור </span>|<span class="hb">יְהִ֣י </span>|<span class="hb">יְהִ֣י </span>
4|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=3&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:3">וַֽיְהִי־אֹֽור׃ </a></span>|<span class="hb">אֹֽור׃ </span>|<span class="hb">אֹֽור׃ </span>|<span class="hb">יְהִי־</span>|<span class="hb">יְהִי־</span>
5|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=4&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:4">וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור </a></span>|<span class="hb">אֱלֹהִ֛ים </span>|<span class="hb">אֱלֹהִ֛ים </span>|<span class="hb">יַּ֧רְא </span>|<span class="hb">יַּ֧רְא </span>
6|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=4&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:4">וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָאֹ֖ור וּבֵ֥ין הַחֹֽשֶׁךְ׃ </a></span>|<span class="hb">אֱלֹהִ֔ים </span>|<span class="hb">אֱלֹהִ֔ים </span>|<span class="hb">יַּבְדֵּ֣ל </span>|<span class="hb">יַּבְדֵּ֣ל </span>
7|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5">וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לָאֹור֙ יֹ֔ום </a></span>|<span class="hb">אֱלֹהִ֤ים׀ </span>|<span class="hb">אֱלֹהִ֤ים׀ </span>|<span class="hb">יִּקְרָ֨א </span>|<span class="hb">יִּקְרָ֨א </span>
8|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5">וַֽיְהִי־עֶ֥רֶב </a></span>|<span class="hb">עֶ֥רֶב </span>|<span class="hb">עֶ֥רֶב </span>|<span class="hb">יְהִי־</span>|<span class="hb">יְהִי־</span>
9|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=5&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:5">וַֽיְהִי־בֹ֖קֶר </a></span>|<span class="hb">בֹ֖קֶר </span>|<span class="hb">בֹ֖קֶר </span>|<span class="hb">יְהִי־</span>|<span class="hb">יְהִי־</span>
10|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=6&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 1:6">וַיֹּ֣אמֶר אֱלֹהִ֔ים </a></span>|<span class="hb">אֱלֹהִ֔ים </span>|<span class="hb">אֱלֹהִ֔ים </span>|<span class="hb">יֹּ֣אמֶר </span>|<span class="hb">יֹּ֣אמֶר </span>

The table shows all results with a subject and a predicate in either singular or plural. We want to display only the results with no correspondence between number. Therefore, we need to create our own tuples from the results.

This is how the result-tuple looks like:

In [9]:
results[:10]

[(427553, 651544, 4, 651543, 3),
 (427557, 651559, 34, 651558, 33),
 (427558, 651561, 36, 651560, 35),
 (427559, 651564, 39, 651563, 38),
 (427560, 651567, 42, 651566, 41),
 (427562, 651573, 50, 651572, 49),
 (427563, 651577, 60, 651576, 59),
 (427565, 651586, 73, 651585, 72),
 (427566, 651589, 76, 651588, 75),
 (427568, 651593, 81, 651592, 80)]

In [70]:
wantedResults = []

for r in results:
    subj = r[2]
    pred = r[4]
    if F.nu.v(subj) != F.nu.v(pred):
        result_tuple = (subj, pred)
        wantedResults.append(result_tuple)
        
len(wantedResults)

469

In [61]:
wantedResults = tuple(
    (subj, pred)
    for (clause, phraseS, subj, phraseP, pred) in results
    if F.nu.v(subj) != F.nu.v(pred)
)
print('Number of filtered results', len(wantedResults))

Number of filtered results 469


In [62]:
B.show(wantedResults, end = 3)



**verse** *1*





**verse** *2*





**verse** *3*



The results show up nicely but we want to display gender as well and have different colors for subject and predicate:

In [None]:
B.prettySetup(features = 'gn')
B.show(wantedResults, colorMap={1:'lightsalmon', 2:'mediumaquamarine'}, end = 3)

### Relationships

We have already looked briefly at relationship operators such as := and =:. Let's take a closer look.

#### > and < Canonical order

We have already seen examples of this.

In [None]:
query = '''
clause
  p1:phrase function=Subj
     word lex=JHWH/
  p2:phrase function=Pred
  
p1 < p2
'''
results = B.search(query)

In [None]:
B.show(results, end = 3)

#### == Same slots

By using == we stipulate that the objects must occupy the same slots. In the following example we want to find all clauses that have the same extension as phrases, in other words, the clause must be one phrase.

The operation takes a while so please be patient

In [None]:
query = '''
c:clause
  p:phrase
c == p
'''
results = B.search(query)
B.show(results, end = 1)

The oposite query would use || to enforce that the two objects do not coincide

#### << and >> positioning

We can use this operator to test whether one object follows completely after another (slot-wise):

In [None]:
query = '''
clause
  p1:phrase function=Subj
    word lex=JHWH/
  p2:phrase function=Pred

p1 << p2
'''
results = B.search(query)
B.table(results, end=1)
B.show(results, colorMap={3:'cyan',4:'orange'}, end=3)

Or opposite:

In [None]:
query = '''
clause
  p1:phrase function=Subj
    word lex=JHWH/
  p2:phrase function=Pred

p1 >> p2
'''
results = B.search(query)
B.show(results, colorMap={3:'cyan',4:'orange'}, end=3)

####  := and =: Same start or end slots

We have already seen examples of this. These relational operators stipulates that the object must occupy the first or the last slot of its parent, respectively.

In [None]:
query = '''
phrase function=Subj
  =: word
  :=
'''
results = B.search(query)
B.show(results, end=1)

If we want to enforce the object occupy both the first and last slot of the parent, we can use ::

In [None]:
query = '''
phrase function=Subj
  :: word
'''
results = B.search(query)

Let's try another example where we look for phrases for which the first slot and last slot coincide with the first and last slot of its parent - but only if the phrase itself do not coincide with its parent clause:

In [None]:
query = '''
c:clause
  :: p:phrase
  
c ## p
'''
results = B.search(query)
B.show(results, end = 3)

This is also a nice example of how to find gaps within phrases.

#### <: and :> Adjacent before and after

These operators are used to enforce that to siblings of a parent are adjacent, either before or after.

Let's take an example with two adjacent phrases:

In [None]:
query = '''
clause
  p1:phrase function=Subj
  p2:phrase function=Pred
  
p1 <: p2
'''
results = B.search(query)
B.show(results, end = 3)

#### Nearness

The relations with `:` have as a requirement that the left-hand slot always are equal to the right-hand slot or that they are adjacent.

We can specify the distance between slots by inserting a variable `k` in the relationship operator. `k` denotes the maximum slots between two objects.

In the following example we design the query to find all instances in which a phrase has a distance of 0 slots to the end of the clause:

In [None]:
query = '''
chapter book=Genesis chapter=1
  clause
    :0= phrase
'''

results = B.search(query)
B.show(results, end=3)

The example above gives the same results as when using an operator without specifying the k-distance:

In [None]:
query = '''
chapter book=Genesis chapter=1
  clause
    := phrase
'''

results = B.search(query)

We can add more freedom and look for phrases within 1 or 0 slots distance to the end of the clause:

In [None]:
query = '''
chapter book=Genesis chapter=1
  clause
    :1= phrase
'''

results = B.search(query)

Or k=2:

In [None]:
query = '''
chapter book=Genesis chapter=1
  clause
    =2: phrase
'''

results = B.search(query)

In [None]:
B.table(results, start=1, end=10, linked=2)
B.show(results, condensed=False, start=1, end=4, colorMap={2: 'yellow', 3: 'cyan'})

### Gaps

Probably, in our usual understanding of text, we imagine textual units in a neat sequential order. However, on a functional level, pieces of phrases or clauses (called clause/phrase atoms) may be connected in ways that imply embeddings of words/phrases/clauses that do not functionally belong to the phrase or clause in question. There are numerous cases of embedded clauses and phrases. In practice, this means that a clause may begin in word-node 20 and end in word-node 30 but may not occupy all slots within the span. An embedded clause may occupy slots 23-25. This phenomenon is called a gap.

In TF-Search we can find a gap using the following code:

In [17]:
query = '''
p:phrase
  wPreGap:word lex=L
  wLast:word
  :=

wGap:word
wPreGap <: wGap
wGap < wLast
p || wGap
'''
results = B.search(query)

  1.12s 13 results


In our query, we stipulate that a phrase must consist of a word with the lexeme `L` and a another word at the final slot of the phrase, indicated with `:=`. As a second step, we require that the first word must be right before the word of the gap (wGap), and that wGap must occur before the final word of the phrase. Finally, we need to make sure that the phrase does not occupy the same slots as the wGap.

In [18]:
B.table(results, end = 3)
B.show(results, colorMap={2:'yellow',4:'red'},end = 13)

n | phrase | word | word | word
--- | --- | --- | --- | ---
1|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=17&verse=7&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 17:7">לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ </a></span>|<span class="hb">לְךָ֙ </span>|<span class="hb">אַחֲרֶֽיךָ׃ </span>|<span class="hb">לֵֽ</span>
2|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=28&verse=4&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 28:4">לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ </a></span>|<span class="hb">לְךָ֙ </span>|<span class="hb">אִתָּ֑ךְ </span>|<span class="hb">אֶת־</span>
3|<span class="hb"><a href="https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=31&verse=16&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt" title="Genesis 31:16">לָ֥נוּ וּלְבָנֵ֑ינוּ </a></span>|<span class="hb">לָ֥נוּ </span>|<span class="hb">בָנֵ֑ינוּ </span>|<span class="hb">ה֖וּא </span>



**verse** *1*





**verse** *2*





**verse** *3*





**verse** *4*





**verse** *5*





**verse** *6*





**verse** *7*





**verse** *8*





**verse** *9*





**verse** *10*





**verse** *11*





**verse** *12*





**verse** *13*

