<div style="text-align:center; font-size: 120%">
<h1>Text-Fabric Tutorial</h1>

<table>
<tr>
<td>
<img src="images/tf.png" 
style="width:250px; height:150px;"
>
</td>
<td>
<img src="images/vuEtcbc.png"
style="width:315px; height:150;"
>
</td>
</tr>
</table>
<p style='clear:both'>In this project, we use the [Text-Fabric](https://github.com/ETCBC/text-fabric) Python package combined with the Biblical Hebrew data from the [Eep Talstra Centre for Bible and Computer](http://www.wi.th.vu.nl). This notebook provides the basic set-up and introduction to the api for using Text-Fabric.</p>

<p> The api information contained below is important for understanding the Time_Spans.ipynb</p>
</div>

In [12]:
import os

## Installation and Set-Up

First you need to [install the text-fabric package](https://github.com/ETCBC/text-fabric/wiki#install). Uncomment and run the bash script below to install it. If you need `sudo`, you have to run the command directly in terminal. 

In [13]:
#%%bash
#pip install text-fabric

Next we need to download the data. Specify the directory where you would like the data:

In [14]:
data_dir = '/Users/Cody/Desktop' # specify your directory here

Now we initiate the download. Uncomment and run the bash script below:

In [15]:
#%%bash -s "$data_dir"
#cd $1
#git clone https://github.com/ETCBC/text-fabric-data

We're ready to access and process the Hebrew data in Text-Fabric!

First we need to get the processing object, `Fabric`, from the tf.fabric module:

In [16]:
from tf.fabric import Fabric

We instantiate the Fabric object, and pass it the `locations` and `modules` keyword argument. `locations` tells the processor where the data is. `modules` tells it which language and database module to load. The values are strings containing the directory paths.

In [17]:
tf_data = os.path.join(data_dir, 'text-fabric-data') # get the tf data dir

text_fabric = Fabric(locations=tf_data,         # instantiate processor
                     modules='Hebrew/etcbc4c')  # path within TF data directory

This is Text-Fabric 2.3.0
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
108 features found and 0 ignored


*You should see ^ "108 features found and 0 ignored"* and no <span style='color: red;'>red</span> error messages.*

<hr>

## Text-Fabric Data Format

Text-Fabric uses [its own data format](https://github.com/ETCBC/text-fabric/wiki/Data-model), which are a set of plain-text files containing rows of data (loaded as strings) for each linguistic object in the database. Each consecutive row of data corresponds with a node number, which itself corresponds with a linguistic object (i.e. a clause, phrase, word):

*ex:*<br>
&nbsp;&nbsp;&nbsp;&nbsp;*prep*<br>
&nbsp;&nbsp;&nbsp;&nbsp;*subs*

In the TF data file, "prep" corresponds to node 1 since it is the first row; it also corresponds to the first word in the database. "subs" corresponds to node 2, etc.

Word-level nodes also function as the "slot" or the most atomic linguistic object in the ETCBC database. Words are numbered consecutively up until the last word in the database. After the last word, the node count resumes for linguistic objects that contain the words:

*ex:*<br>
&nbsp;&nbsp;&nbsp;&nbsp;*426582&nbsp;&nbsp; 1-11*<br>
&nbsp;&nbsp;&nbsp;&nbsp;*12-18*

Here we have the first two clauses in a TF data file. The row specifies that the node count starts at 426582 (the first node number after the last word in the database) and contains slots (nodes/words) 1-11; this is the first clause in the database. The node count resumes in the next row with node 426583 and contains slots/words 12-18. Text-Fabric only needs the first node number to set the count as it accesses the files and reads the data rows.

The Text-Fabric format was built specifically for simple and efficient text processing with Python. It is the successor to [LAF-Fabric](https://github.com/ETCBC/laf-fabric), which used the ISO-standard XML LAF format. Use of LAF-Fabric during research led to the desire for a more "stripped down" format that removed the beaurocracy and efficiently loaded data for processing with Python or R.

<hr>

## Loading TF Data

We have already told TF where the data files are located. Now we need to tell it which data to load into memory. Data on linguistic objects are called "features" in TF. Features are loaded by calling the `load` method on the text-fabric object. We assign it to a variable so we can acccess those features.

The features are loaded as the argument, which is a string; the features are separated by spaces. When the features are loaded for the first time, the program takes a bit longer as it compiles and compresses the data. 

All available features for the ETCBC Hebrew Bible are listed in the [TF feature documentation](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html).

In [18]:
tf = text_fabric.load('''
                      book chapter verse 
                      function pdp vt
                      lex lex_utf8 g_word_utf8
                      mother tab
                      ''')

  0.00s loading features ...
   |     0.01s B book                 from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.00s B chapter              from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.01s B verse                from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.20s B g_word_utf8          from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.17s B lex_utf8             from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.07s B function             from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.22s B pdp                  from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.13s B vt                   from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.13s B lex                  from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.21s B mother               from /Users/Cody/Desktop/text-fabric-data/Hebrew/etcbc4c
   |     0.03s B tab 

We also call a method `makeAvailableIn` that globalizes the object variables. The object variable names are limited to single letters, so there is little danger of writing over them. However without doing this, accessing data would require something like: `tf.F.otype.s('word')`, which can become cumbersome. With this call we can write something like
`F.otype.s('word')` 

In [19]:
tf.makeAvailableIn(globals())

<hr>

## TF API, Basics

For the full api, see the [Text-Fabric Documentation](https://github.com/ETCBC/text-fabric/wiki)

We iterate through TF data with generator objects, call features with feature objects, and move up / down between container and contained with a layer object:

### `F.otype.s('object')`  
**a generator that iterates through all specified objects in the database:**

In [20]:
word_generator = F.otype.s('word')

all_words = list(word_generator) # expand to list
word_count = len(all_words)

print('{} words in the ETCBC Hebrew database...'.format(word_count))

426581 words in the ETCBC Hebrew database...


The generator returns the node numbers:

In [21]:
print('First five word nodes in database: ')
print(all_words[:5])
print()
print('Last five word nodes in database: ')
print(all_words[-5:])

First five word nodes in database: 
[1, 2, 3, 4, 5]

Last five word nodes in database: 
[426577, 426578, 426579, 426580, 426581]


### `F.feature.v(node)`
**call the features for a given node:**

In [22]:
second_word = all_words[1]

F.g_word_utf8.v(second_word)

'רֵאשִׁ֖ית'

Linguistic data is also available:

In [23]:
F.pdp.v(second_word)   # phrase-dependent part of speech

'subs'

### `L.u(node, otype='object')`
**find an embedding linguistic object:**

In this case, we return the clause node that `second_word` is contained within...

In [24]:
clause_node_tuple = L.u(second_word, otype='clause')

clause_node_tuple

(426582,)

The object outputs a tuple to handle possibilities of multiple assignment. However, in the ETCBC dataset, words are never assigned more than once. So this object is often indexed to pull the node number:

In [25]:
clause_node = clause_node_tuple[0]

### `L.d(node, otype='object')`
Now we go in the reverse direction, a "layer down." Using the clause node, we look up every phrase node contained within it...

In [26]:
phrase_nodes = L.d(clause_node, otype='phrase')

phrase_nodes

(605144, 605145, 605146, 605147)

We can also call features on phrases: 

In [27]:
for phrase in phrase_nodes:
    print(F.function.v(phrase))

Time
Pred
Subj
Objc


And of course, we can use the `L.d` object to move back down to the word levels.

In [28]:
clause_words = L.d(clause_node, otype='word')

clause_words

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

### `T.text(word_nodes)`
**return a plain text representation for multi-word linguistic objects**

Now we take clause words, a list of word nodes, and feed it to the `T.text` object:

In [32]:
T.text(clause_words)

'בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ '

"In the beginning, God created the sky and the land..."

The output formats the `UTF-8` plain text for the entire clause (since some words in Hebrew are prefixed directly to other words).

## Conclusions

These are all of the TF objects and methods utilized in [Time_Spans.ipynb]() for the time spans project. However, there are many more methods available in the [Text-Fabric Documentation](https://github.com/ETCBC/text-fabric/wiki).