# Introduction to Text-Fabric: the Hebrew Bible and the DSS

by Martijn Naaijer, September 2019

We are going to work with the ETCBC database using [Text-Fabric](https://annotation.github.io/text-fabric/) (or TF). TF is a Python package. This package is used for storing, querying and analyzing annotated textual data. The TF project started a few years ago with the Hebrew Bible, but in the meanwhile there is a whole range of [texts in various languages](https://annotation.github.io/text-fabric/About/Corpora/) covered by TF. It is a community driven project, so if you want, you can contribute to it yourself!

In this course we work in the cloud. In this case, that means, that all the computations are done on a server of Google. If you want, you can also install it on your own machine. In that case you install [Anaconda](https://www.anaconda.com/distribution/#download-section), you open a Jupyter Notebook, and you can start coding.

With TF you can query the data in two ways. The first one is the pure Python approach. For this approach you need to write scripts in the Python programming language. This language is not difficult to learn, but you need a lot of practice to become fluent in it. 
The second approach is called Search. Search is a template based query language. In this tutorial we will use both approaches.

In [None]:
import collections

You can install the text-fabric software with the following shell command. In you work offline you need to do this only once, in the cloud(Google's colab), you have to install it at the start of every session.

In [None]:
!pip install text-fabric

Now we have the TF software, but we still need the data, and activate the system. We do this with the TF incantation. 

In the output of the incantation you find some important links, such as [the feature documentation](https://etcbc.github.io/bhsa/features/0_home/), [the Search documentation](https://annotation.github.io/text-fabric/Use/Search/)

In [None]:
from tf.app import use
A = use('bhsa', hoist=globals())

In the incantation "bhsa" stands for Biblia Hebraica Stuttgartensia Amstelodamensis, which is the electronic ETCBC edition of the MT, based on the fourth edition of the BHS.

## Search BHSA

Let's start with Search! First a simple Search query is formulated in which 

In [None]:
query = '''
word vs=nif
'''

results = A.search(query)
A.table(results, end=10)

We would like to see the whole clause.

In [None]:
query = '''
clause
  word vs=qal # line starts with two spaces!
'''

results = A.search(query)
A.table(results, end=10)

And we want more conditions to be satisfied.

In [None]:
query = '''
clause
  word vs=qal nu=pl
'''

results = A.search(query)
A.table(results, end=10)

And the clause has to occur in the book of Exodus.

In [None]:
query = '''
book book=Exodus
  clause
    word vs=qal nu=pl
'''

results = A.search(query)
A.table(results, end=10)

And of course the qal verb occurs earlier in the clause than the name of Moses.

In [None]:
query = '''
book book=Exodus
  clause
    word vs=qal 
    < word lex=MCH=/
'''

results = A.search(query)
A.table(results, end=10)

You can also search for text in the ETCBC transcription. Sometimes it is handy to search for specific lexemes (feature: lex). You can find all lexemes in etcbc transcription on [Shebanq](https://shebanq.ancient-data.org/hebrew/words).

In [None]:
query = '''
clause
  word lex=W
  <: word lex=QR>[
'''

results = A.search(query)
A.table(results, end=10)

But you can also search for a concrete consonantal representation, with the feature g_cons.

In [None]:
query = '''
clause
  word g_cons=W
  <: word g_cons=JQR>W
'''

results = A.search(query)
A.table(results, end=10)

In [None]:
query = '''
clause
  word g_cons=W
  <: word g_cons=JQR>W
  
'''

results = A.search(query)
A.table(results, end=10)

If you want to have the "raw" results, use S.search 

In [None]:
for result in S.search(query):
    print(result)

## BHSA with Python

In [None]:
for w in F.otype.s("word"):
  if F.lex.v(w) == "HLK[":
    print(T.sectionFromNode(w))

It is nice to know where we can find the word הלך, but if we want to know more about these cases, we need to adapt the script a little bit. For instance, we want to know more about the morphology of the verb:

In [None]:
for w in F.otype.s("word"):
  if F.lex.v(w) == "HLK[":
    print(T.sectionFromNode(w), F.vt.v(w), F.vs.v(w), F.gn.v(w), F.nu.v(w))

In [None]:
for w in F.otype.s("word"):
    
  if F.lex.v(w) == "HLK[":
    
    clause = L.u(w, "clause")
    
    text_whole_clause = T.text(clause)
    print(T.sectionFromNode(w), F.vt.v(w), F.vs.v(w), F.gn.v(w), F.nu.v(w), text_whole_clause)

And of course, it can be important to export the data for further analysis.

In [None]:
hlk_dict = {}

for w in F.otype.s("word"):
  if F.lex.v(w) == "HLK[":
    clause = L.u(w, "clause")
    bo, ch, ve = T.sectionFromNode(w)
    text_whole_clause = T.text(clause)
    hlk_list = [bo, ch, ve, F.vt.v(w), F.vs.v(w), F.gn.v(w), F.nu.v(w), text_whole_clause]
    hlk_dict[w] = hlk_list

In [None]:
import pandas as pd

hlk_df = pd.DataFrame(hlk_dict).T
hlk_df

In [None]:
hlk_df.to_csv("query_results.csv", index=False)

In colab:

In [None]:
#from google.colab import files
#files.download( "hlk_dataset.csv" )

In [None]:
query = '''
book book=Samuel_I
  clause
    word sp=nmpr
'''
results = A.search(query)
A.table(results, end=10)

## The Dead Sea Scrolls

Now, we move to the DSS module. First the data are downloaded and loaded using the incantation, just like we did with the BHSA.

In [None]:
from tf.app import use
A = use('dss', hoist=globals())

Which object types do we have in this module?

In [None]:
F.otype.all

In [None]:
object_count_dict = collections.defaultdict(int)

for node in N():
  object_count_dict[F.otype.v(node)] += 1
  
print(object_count_dict)

What are the names of the scrolls?

In [None]:
for scr in F.otype.s('scroll'):
    scroll_name = T.scrollName(scr)
    print(scroll_name)

In [None]:
for scr in F.otype.s('scroll'):
    scroll_name = T.scrollName(scr)
    
    if scroll_name == '1QS':
        words = L.d(scr, 'word')
        
        for w in words:
            print(w, F.lex.v(w) ,F.lexe.v(w), F.lexo.v(w), F.glex.v(w), F.glexe.v(w), F.glexo.v(w), F.biblical.v(w))


Let's explore one scroll: 1QIsaa, the [Great Isaiah Scroll](https://en.wikipedia.org/wiki/Isaiah_Scroll)!

In [None]:
for scr in F.otype.s('scroll'):
    scroll = T.scrollName(scr)
    
    if scroll == "1Qisaa":
        lines = L.d(scr, 'line')

You can compare the results below with a [picture](http://dss.collections.imj.org.il/isaiah#1:1) of the manuscript.

In [None]:
for l in lines[0:10]:
    A.plain(l) # note that 1:1, 1:2, etc means the line on a column in the manuscript

It is obvious that this is the book of Isaiah, but where in the book of Isaiah (chapter, verse) can we find these lines?

In [None]:
words = L.d(lines[0], 'word')
for word in words:
    print(F.book.v(word), F.chapter.v(word), F.verse.v(word), F.biblical.v(word))

In [None]:
for scr in F.otype.s('scroll'):
    if T.scrollName(scr) == '1QpHab':
        words = L.d(scr, 'word')
        
        print("Pesher Habakkuk contains", len(words), "words.")
        
        
        for word in words:
            print(F.biblical.v(word), )

In [None]:
for l in lines[0:10]:
  A.plain(l, withPassage=False)

In [None]:
for l in lines[0:10]:
  A.plain(l, fmt='text-source-extra')

In [None]:
for l in lines[0:10]:
  A.plain(l, fmt='text-trans-extra')

In [None]:
for l in lines[0:10]:
  A.plain(l, fmt='layout-orig-full')

In [None]:
A.pretty(lines[0], fmt='layout-orig-full', withNodes=False, lineNumbers=False)

In [None]:
examplefragment = ('1Qisaa', '1')
f = T.nodeFromSection(examplefragment)
lines = L.d(f, otype='line')
words = L.d(f, otype='word')
signs = L.d(f, otype='sign')

Let's make some Search queries!

In [None]:
query = '''
line biblical=2

'''
results = A.search(query)
A.table(results, end=10)

In [None]:
query = '''
line 
  word vs=nifal nu=s
    sign unc
'''
results = A.search(query)
A.table(results, end=10)


In [None]:
query = '''
line 
  word book=Gen chapter=1

'''
results = A.search(query)
A.table(results)

In [None]:
query = '''
line
  word gn=m gn2=f

'''
results = A.search(query)
A.table(results)

In [None]:
query = '''
line
  sign cor=3

'''
results = A.search(query)
A.table(results)

In [None]:
query = '''
line
  word gn=m vt=ptca
  < word vs=qal

'''
results = A.search(query)
A.table(results)

## Biblical scrolls in the DSS module

Now we are ready to see where biblical texts occur in the DSS. Suppose we want to know where all the places where Deuteronomy can be found in the DSS, and we want to compare those texts with the MT, how do we do that?

Our strategy is as follows:

1. We use the incantation for both the DSS and the BHSA, but we rename the classes F, L, and T for the DSS. With this step we prevent that they are overwritten.
2. Then we iterate over all the scrolls. In the scrolls we iterate over all lines, and in each line, we iterate over each word.
3. For each word in a line we check if it is marked with the biblical book Deuteronomy.
4. If so, we retrieve which chapter and verse(s) (based on the BHSA) can be found in this line. 
5. Then we retrieve the text of the verse(s) in the MT and the DSS line and print them.

In [None]:
# STEP 1

from tf.app import use
A = use('dss', hoist=globals())

In [None]:
Fdss = F
Ldss = L
Tdss = T

In [None]:
from tf.app import use
A = use('bhsa', hoist=globals())

In [None]:
book_dict = {'Genesis':      'Gen',
             'Exodus':       'Ex',
             'Leviticus':    'Lev',
             'Numbers':      'Num',
             'Deuteronomy':  'Deut',
             'Joshua':       'Josh',
             'Judges':       'Judg',
             '1_Samuel':     '1Sam',
             '2_Samuel':     '2Sam',
             '1Kings':       '1Kgs',
             '2_Kings':      '2Kgs',
             'Isaiah':       'Is',
             'Jeremiah':     'Jer',
             'Ezekiel':      'Ezek',
             'Hosea':        'Hos',
             'Joel':         'Joel',
             'Amos':         'Amos',
             'Obadiah':      'Obad',
             'Jonah':        'Jonah',
             'Micah':        'Mic',
             'Nahum':        'Nah',
             'Habakkuk':     'Hab',
             'Zephaniah':    'Zeph',
             'Haggai':       'Hag',
             'Zechariah':    'Zech',
             'Malachi':      'Mal',
             'Psalms':       'Ps',
             'Job':          'Job',
             'Proverbs':     'Prov',
             'Ruth':         'Ruth',
             'Song_of_songs':'Song',
             'Ecclesiastes': 'Eccl',
             'Lamentations': 'Lam',
             'Daniel':       'Dan',
             'Ezra':         'Ezra',
             '2_Chronicles': '2Chr'
            }

In [None]:
# CHOOSE A BOOK FROM THE LEFT COLUMN IN THE book_dict AND CHOOSE A CHAPTER
MT_BOOK = 'Habakkuk'
CHAPTER = 2

DSS_BOOK = book_dict[MT_BOOK]

# STEP 2
for scr in Fdss.otype.s('scroll'):
    lines = Ldss.d(scr, 'line')
    for line in lines:
        words = Ldss.d(line, 'word')
        biblical_book_per_word = [Fdss.book.v(w) for w in words]

# STEP 3    
        if DSS_BOOK in biblical_book_per_word:
            scr_name = Tdss.scrollName(scr)
# STEP 4
            chapter = set([Fdss.chapter.v(w) for w in words])
            verses = set([int(Fdss.verse.v(w)) for w in words])
            
# STEP 5
            try:
                ch = int(list(chapter)[0])
                if ch == CHAPTER:
                    for verse in verses:
                        section = (MT_BOOK, ch, verse)
                        mt_verse = T.nodeFromSection(section)
                        print(section)
                        mt_verse_words = L.d(mt_verse, 'word')
                        mt_text = ' '.join([F.g_cons_utf8.v(mt_w) for mt_w in mt_verse_words])
                        print('BHSA', MT_BOOK, ch, verse, mt_text)
                        print(scr_name, DSS_BOOK, ch, verse, Tdss.text(words))
                        print('\n')
            except:
                print(scr_name, list(chapter)[0])
            