# Standard queries to explore the Samaritan Pentateuch 

This dataset of the Samaritan Pentateuch (SP) follows the same conventions as the [ETCBC dataset of the Masoretic Text (MT)](https://github.com/ETCBC/bhsa). We also use the same [Text-Fabric Python package](https://annotation.github.io/text-fabric/tf/index.html) to build and query the dataset. We therefore recommend the Text-Fabric tutorial developed for the ETCBC-database of the MT-text: https://nbviewer.org/github/ETCBC/bhsa/blob/master/tutorial/start.ipynb.

This notebook provides an introduction to a few standard queries to explore the lexical, morphological, and syntactic features of the SP-dataset.

In [1]:
from tf.app import use

In [2]:
A = use('dt-ucph/sp', hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,5,79878.4,100
chapter,187,2135.79,100
verse,5841,68.38,100
phrase,66407,6.01,100
phrase_atom,69984,5.71,100
word,114889,3.48,100
sign,399392,1.0,100


## 1. Lexemes

The lexeme is a word-level feature. As far as possible, we use the same lexicon as the ETCBC-database of the MT. The MT lexicon can be accessed here: https://shebanq.ancient-data.org/hebrew/words.

The following query searches for all words with the lexeme >LHJM/ (אלהים):

In [3]:
query1 = '''
word lex=>LHJM/
'''
results1 = A.search(query1)
A.show(results1)

  0.08s 858 results


You can search for two or more words within a syntactical unit, e.g., >LHJM/ (אלהים) and RWX/ (רוח) within the same verse:

In [4]:
query2 = '''
verse
 word lex=>LHJM/
 word lex=RWX/
'''
results2 = A.search(query2)
A.show(results2)

  0.17s 12 results


## 2. Morphology

The SP dataset is annotated with morphological features on the word level. The following query searches for *qatal* (perfective) verbs in third person feminine singular:

In [5]:
query3 = '''
word vt=perf ps=p3 gn=f nu=sg
'''
results3 = A.search(query3)
A.show(results3)

  0.09s 312 results


It is also possible to search for the actual morphemes, e.g,. the pronominal suffix morpheme +M (the + indicates pronominal suffix)

In [6]:
query4 = '''
word g_prs=+M
'''
results4 = A.search(query4)
A.show(results4)

  0.08s 1059 results


## 3. Syntax

Words are embedded in syntactic structures. The SP-dataset contains phrase boundaries and phrase-atom boundaries, so it is possible to search for the co-occurence of words, not only within the same verse, but in the same phrase(-atom), as in the following example which searches for phrase-atoms that contain the lexemes >LHJM/ (אלהים) and RWX/ (רוח) (compare with query 2 above):

In [7]:
query5 = '''
phrase_atom
 word lex=>LHJM/
 word lex=RWX/
'''
results5 = A.search(query5)
A.show(results5)

  0.18s 7 results


It is also possible to search for discontinuous phrases, that is, phrases that consist of more than one phrase atom, sometimes due to intervening text. See the following example that searches for phrases that contain two phrase atoms:

In [8]:
query6 = '''
verse
 phrase
  phrase_atom
  < phrase_atom
'''
results6 = A.search(query6)
A.show(results6)

  0.23s 5544 results
