# Export BH Reference System Features


This notebook exports five new features that enable a student or researcher of Biblical Hebrew (BH) to study its reference system. BH, like other languages, makes use of all kinds of features within the language to establish text-coherence and to ensure that the reader or listener understands to whom or what is being referred to. These BH features are, including their Person, Gender and Number (PGN) properties (if existent) among others: 

* Personal pronouns, `prps`; 
* Demonstrative pronouns, `prde`; 
* The `verb` and its suffix `prs` (if the verb has one); 
* The suffix `prs` of a word; 
* `Nouns` that can function as pronouns. I leave these out of consideration, for now. 

Within Text Fabric (see: [BHSA](https://github.com/ETCBC/bhsa)), the dataset this notebook makes use of, named features are already available. 

The features however, are sometimes only available in a transliterated or full BH form (`prps`, `prde`, `prs`). In addition, a combination of multiple other features is needed to determine the PGN property of pronouns, verbs, and suffixes. For example to retrieve the PGN of a verb five seperate features are needed: `pdp`, `nu`, `ps`, `gn` and `prs`. 

Therefore to make life, research and programming more simple this notebook exports five new features: `pgn_prde`, `pgn_prps`, `pgn_prs`, `pgn_verb`, and `pgn_verb_prs` (i.e. verb+suffix).

In part 2 of this notebook I will demonstrate some nice things you can do with the new features.

## First import some modules and utils

In [3]:
import sys, os, re, pickle, csv
import collections, difflib
from collections import *

from IPython.display import HTML, display_pretty, display_html
%matplotlib inline
from random import random

from pprint import pprint
from tf.fabric import Fabric
from tf.transcription import Transcription
from participant_helpers import * 

## Pull in the data

In [4]:
DATABASE = ['~/github/etcbc/bhsa', '~/github/bh-reference-system']
BHSA = 'tf/c'
TF = Fabric(locations=DATABASE, modules=BHSA, silent=False )

This is Text-Fabric 4.0.3
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

114 features found and 0 ignored


In [5]:
api = TF.load('''
    otype
    lex book chapter verse
    nu ps gn prs ls lex
    function sp typ pdp language
''', silent=True)

api.makeAvailableIn(globals())

## 1. Export personal pronouns as readable TF Feature 

In [22]:
def export_prps():
    
    readable_prps = {}
    
    meta = {'': {'created_by': 'Christiaan Erwich',
                 'coreData': 'BHSA',
                 'coreVersion': 'c'
                },
            'pgn_prps' : {'source': 'see the notebooks at https://github.com/cmerwich/bh-reference-system',
                      'valueType': 'str',
                      'edgeValues': False}
           }

    for word in F.otype.s('word'):
        pdp = F.pdp.v(word) # phrase dependent part-of-speech
        lex = F.lex.v(word) # lexeme

        # if the word is a personal pronoun
        if pdp == 'prps':
            if lex in prps_dict:
                readable_prps[word] = prps_dict[lex][0]

    new_nodes = {'pgn_prps': readable_prps}

    saveTF = Fabric('tf/c')
    saveTF.save(nodeFeatures=new_nodes, edgeFeatures={}, metaData=meta)

export_prps()

This is Text-Fabric 3.2.5
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

0 features found and 0 ignored


  0.00s Warp feature "otype" not found in

  0.00s Warp feature "oslots" not found in



  0.00s Warp feature "otext" not found. Working without Text-API

  0.00s Exporting 1 node and 0 edge and 0 config features to tf/c:
   |     0.02s T pgn_prps             to tf/c
  0.02s Exported 1 node features and 0 edge features and 0 config features to tf/c


#### Check if the new feature loads properly

In [23]:
TF.load('pgn_prps', add=True)
loadLog()

  0.00s loading features ...
   |     0.05s T pgn_prps             from /Users/Christiaan/github/bh-reference-system/tf/c
  0.06s All additional features loaded - for details use loadLog()
   |     0.00s M otext                from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.05s B otype                from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.84s B oslots               from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.00s M otext                from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.01s B book                 from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.01s B chapter              from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.02s B verse                from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.19s B g_cons               from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.28s B g_cons_utf8          from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.20s B g_lex                from /Users/Chris

## 2. Export demonstrative pronouns as readable TF Feature 

In [None]:
def export_prde():
    
    readable_prde = {}
    
    meta = {'': {'created_by': 'Christiaan Erwich',
                 'coreData': 'BHSA',
                 'coreVersion': 'c'
                },
            'pgn_prde' : {'source': 'see the notebooks at https://github.com/cmerwich/bh-reference-system',
                      'valueType': 'str',
                      'edgeValues': False}
           }

    for word in F.otype.s('word'):
        pdp = F.pdp.v(word) # phrase dependent part-of-speech
        lex = F.lex.v(word) # lexeme

        # if the word is a demonstrativum
        if pdp == 'prde':
            if lex in prde_dict:
                readable_prde[word] = prde_dict[lex][0]

    new_nodes = {'pgn_prde': readable_prde}

    saveTF = Fabric('tf/c')
    saveTF.save(nodeFeatures=new_nodes, edgeFeatures={}, metaData=meta)

export_prde()

#### Check if the new feature loads properly

In [24]:
TF.load('pgn_prde', add=True)
loadLog()

  0.00s loading features ...
   |     0.01s B pgn_prde             from /Users/Christiaan/github/bh-reference-system/tf/c
  0.03s All additional features loaded - for details use loadLog()
   |     0.00s M otext                from /Users/Christiaan/github/etcbc/bhsa/tf/c
   |     0.01s B pgn_prde             from /Users/Christiaan/github/bh-reference-system/tf/c


## 3. Export PGN of the verb

In [13]:
def export_verb_pgn():
    
    readable_verb_pgn = {}
    
    meta = {'': {'created_by': 'Christiaan Erwich',
                 'coreData': 'BHSA',
                 'coreVersion': 'c'
                },
            'pgn_verb' : {'source': 'see the notebooks at https://github.com/cmerwich/bh-reference-system',
                      'valueType': 'str',
                      'edgeValues': False}
           }

    for word in F.otype.s('word'):
        prs = F.prs.v(word) # pronominal suffix -consonantal-transliterated
        pdp = F.pdp.v(word) # phrase dependent part-of-speech

        # if the word is a verb
        if pdp == 'verb':
            readable_verb_pgn[word] = pgn_dict[converse_pgn(F, word)][0]
       
    #pprint(readable_verb_pgn)
    new_nodes = {'pgn_verb': readable_verb_pgn}

    saveTF = Fabric('tf/c')
    saveTF.save(nodeFeatures=new_nodes, edgeFeatures={}, metaData=meta)

export_verb_pgn()

This is Text-Fabric 3.2.5
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

0 features found and 0 ignored


  0.00s Warp feature "otype" not found in

  0.00s Warp feature "oslots" not found in



  0.01s Warp feature "otext" not found. Working without Text-API

  0.00s Exporting 1 node and 0 edge and 0 config features to tf/c:
   |     0.20s T pgn_verb             to tf/c
  0.20s Exported 1 node features and 0 edge features and 0 config features to tf/c


In [14]:
# Check if the entries in readable_verb_pgn are correct 

print(T.text(range(569,570)), T.sectionFromNode(569))

כִבְשֻׁ֑הָ  ('Genesis', 1, 28)


#### Check if the new feature loads properly

In [20]:
TF.load('pgn_verb', add=True)

  0.00s loading features ...
   |     0.37s T pgn_verb             from /Users/Christiaan/github/bh-reference-system/tf/c
  0.39s All additional features loaded - for details use loadLog()


#### Nota bene

The `pgn_verb` gives a PGN property for the verb and its suffix in one string, e.g.: 'P2Mpl P3Fsg_o'. To retrieve either the PGN of the verb or the suffix use: `str.split()`. The _o or _s indicates if the suffix is an object suffix or a subject suffix. 

If the verb has no suffix `pgn_verb` gives the PGN of the verb, e.g.: 'P3Mpl'.

## 4. Export PGN of the verb and its suffix

In [11]:
def export_verbprs_pgn():
    
    readable_verbprs_pgn = {}
    
    meta = {'': {'created_by': 'Christiaan Erwich',
                 'coreData': 'BHSA',
                 'coreVersion': 'c'
                },
            'pgn_verb_prs' : {'source': 'see the notebooks at https://github.com/cmerwich/bh-reference-system',
                      'valueType': 'str',
                      'edgeValues': False}
           }

    for word in F.otype.s('word'):
        prs = F.prs.v(word) # pronominal suffix -consonantal-transliterated
        pdp = F.pdp.v(word) # phrase dependent part-of-speech
       
        phrase = L.u(word, 'phrase')[0]
        
        # if the word is a verb and has a pronominal suffix, also add the prs function: object or suffix
        if pdp == 'verb' and prs not in {'n/a', 'absent'}:
            if F.function.v(phrase) == 'PreO':
                readable_verbprs_pgn[word] = pgn_dict[converse_pgn(F, word)][0] + ' ' + suffix_dict[prs][0] + '_o'
            elif F.function.v(phrase) == 'PreS':
                readable_verbprs_pgn[word] = pgn_dict[converse_pgn(F, word)][0] + ' ' + suffix_dict[prs][0] + '_s'
            elif F.function.v(phrase) == 'PtcO':
                readable_verbprs_pgn[word] = pgn_dict[converse_pgn(F, word)][0] + ' ' + suffix_dict[prs][0] + '_o'

    #pprint(readable_verb_pgn)
    new_nodes = {'pgn_verb_prs': readable_verbprs_pgn}

    saveTF = Fabric('tf/c')
    saveTF.save(nodeFeatures=new_nodes, edgeFeatures={}, metaData=meta)

export_verbprs_pgn()

This is Text-Fabric 3.2.5
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

0 features found and 0 ignored


  0.00s Warp feature "otype" not found in

  0.00s Warp feature "oslots" not found in



  0.00s Warp feature "otext" not found. Working without Text-API

  0.00s Exporting 1 node and 0 edge and 0 config features to tf/c:
   |     0.03s T pgn_verb_prs         to tf/c
  0.03s Exported 1 node features and 0 edge features and 0 config features to tf/c


In [21]:
TF.load('pgn_verb_prs', add=True)

  0.00s loading features ...
   |     0.05s T pgn_verb_prs         from /Users/Christiaan/github/bh-reference-system/tf/c
  0.06s All additional features loaded - for details use loadLog()


#### Nota bene

The `pgn_verb_prs` gives a PGN property for the verb and its suffix in one string, e.g.: 'P2Mpl P3Fsg_o'. To retrieve either the PGN of the verb or the suffix use: `str.split()`. The _o or _s indicates if the suffix is an object suffix or a subject suffix. 

## 5. Export PGN of the suffix

In [75]:
def export_prs():
    
    readable_prs = {}
    
    meta = {'': {'created_by': 'Christiaan Erwich',
                 'coreData': 'BHSA',
                 'coreVersion': 'c'
                },
            'pgn_prs' : {'source': 'see the notebooks at https://github.com/cmerwich/bh-reference-system',
                      'valueType': 'str',
                      'edgeValues': False}
           }

    for word in F.otype.s('word'):
        prs = F.prs.v(word) # pronominal suffix -consonantal-transliterated
        pdp = F.pdp.v(word) # phrase dependent part-of-speech

        # if the word has a suffix
        if prs not in {'n/a', 'absent'}:
            readable_prs[word] = suffix_dict[prs][0]
    
    new_nodes = {'pgn_prs': readable_prs}
    
    saveTF = Fabric('tf/c')
    saveTF.save(nodeFeatures=new_nodes, edgeFeatures={}, metaData=meta)

export_prs()

This is Text-Fabric 3.2.2
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

0 features found and 0 ignored


  0.00s Warp feature "otype" not found in

  0.01s Warp feature "oslots" not found in



  0.01s Warp feature "otext" not found. Working without Text-API

  0.00s Exporting 1 node and 0 edge and 0 config features to tf/c:
   |     0.12s T pgn_prs              to tf/c
  0.12s Exported 1 node features and 0 edge features and 0 config features to tf/c


#### Check if the new feature loads properly

In [17]:
TF.load('pgn_prs', add=True)

  0.00s loading features ...


   |     0.00s Feature "pgn_verb_prs" not available in
   |   /Users/Christiaan/github/etcbc/bhsa/tf/c
   |   	/Users/Christiaan/github/bh-reference-system/tf/c
  0.01s Not all features could be loaded/computed


   |     0.00s M otext                from /Users/Christiaan/github/etcbc/bhsa/tf/c
