# Simple Data Export

In this notebook we export a series of .txt files containing a variety of different data sets. For our purposes here, we export narrative texts only from two general classes of texts: texts traditionally labeled as "Early Biblical Hebrew" and texts considered "Late Biblical Hebrew".

We will export here the following type(s) of data:
1. [phrase constituent](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/pdp) functions (words, also known as "part of speech") per clause in late/early Biblical Hebrew sources.


The data is accessed using [Text-Fabric](https://github.com/ETCBC/text-fabric), a python package made specially for accessing copora like the ETCBC Hebrew database. 

## Load Text-Fabric and ETCBC Syntactic Data

In [1]:
import collections
from tf.fabric import Fabric # for Text-Fabric

In [2]:
# instantiate Text-Fabric (TF) objects

TF = Fabric(modules='hebrew/etcbc4c') # load ETCBC Hebrew database

This is Text-Fabric 2.3.7
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
108 features found and 0 ignored


In [3]:
# load features for linguistic objects (i.e. clauses, phrases, words) from the database

# features loaded in a string, space separated
api = TF.load('''
              book chapter verse
              typ pdp function
              domain
              ''')

# TF classes are globalized for easier use
api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.10s B book                 from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.10s B chapter              from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.12s B verse                from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.34s B typ                  from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.26s B pdp                  from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.22s B function             from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.13s B domain               from /mnt/shared/text-fabric-data/hebrew/etcbc4c
   |     0.08s Feature overview: 102 for nodes; 5 for edges; 1 configs; 7 computed
    15s All features loaded/computed - for details use loadLog()


## Gather, Arrange, and Export Data

ETCBC data is stored in graph structure with linguistic objects existing as nodes that have corresponding features. TF uses a node integer to access a dictionary and pull the requested feature with a function: `F.feature.v(node_number)`. There are various other functions used to iterate through the nodes which you can explore more thoroughly in the tutorial [here](https://github.com/codykingham/tfNotebooks/blob/master/timeSpans/Text_Fabric_Tutorial.ipynb) or [here](https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb). There are also edge relationships between some nodes (such as clause relations which represent the discourse structure of the text).


### Functions for Data Export

In [8]:
early_hebrew = {'Genesis', 'Exodus', 'Leviticus', 
                'Deuteronomy', 'Joshua', 'Judges',
                '1_Samuel', '2_Samuel', '1_Kings',
                '2_Kings'}

late_hebrew = {'Esther', 'Ezra', 'Nehemiah',
               '1_Chronicles', '2_Chronicles'}

def get_data(feature, obtype):
    
    '''
    Returns dictionary with linguistic date as key and list as value.
    List contains space-separated strings of word/phrase level functions.
    Requires the feature and ETCBC object type.
    '''
    
    function_data = collections.defaultdict(list)

    for book in F.otype.s('book'):

        if F.book.v(book) in early_hebrew:
            book_tag = 'EBH'
        elif F.book.v(book) in late_hebrew:
            book_tag = 'LBH'
        else: # skip irrelevant books
            continue

        # get all clauses in the book. The Clauses must domain of NARRATIVE
        book_clauses = [clause for clause in L.d(book, otype='clause')
                           if F.domain.v(clause) in {'N'}
                       ]

        # add phrase data per clause
        for clause in book_clauses:

            # format data for all phrases in the clause
            phrase_functions = [feature.v(obj) for obj in L.d(clause, otype=obtype)]
            phrase_funct_str = ' '.join(phrase_functions)

            function_data[book_tag].append(phrase_funct_str) # save data
            
    return(function_data)
     
    
def export_dated_files(data_dict, file_name):
    
    '''
    Exports simple data txt files per dated text.
    '''
    
    for linguistic_date, linguistic_data in data_dict.items():

        filename = file_name.format(linguistic_date)

        with open(filename, 'w') as outfile:

            for phrase in linguistic_data:
                outfile.write(phrase+'\n')

### Clause Constituents (phrases and their functions)

In [10]:
# apply function
phrase_function_data = get_data(F.function, 'phrase')
phrase_function_data['EBH'][0] # sample of data

'Conj Pred Subj'

In [11]:
# export file
export_dated_files(phrase_function_data, 'phrase_functions/phrase_functions_{}.txt')