<img align="right" src="tf-small.png"/>

# Tutorial
## SBLGNT and Text-Fabric
This tutorial introduces basic queries on the SBL Greek New Testament dataset using [Text-Fabric](https://github.com/ETCBC/text-fabric)<br>
It assumes at least a basic familiarity with the [data model](https://github.com/ETCBC/text-fabric/wiki/Data-model)<br>
For documentation on Text-Fabric, see [Text-Fabric Wiki](https://github.com/ETCBC/text-fabric/wiki)

## Table of Contents

* [Loading Text-Fabric](#Loading-Text-Fabric)    
    * &nbsp;[instantiate text-fabric](#instantiate-text-fabric)
    * &nbsp;[load sblgnt features](#load-sblgnt-features)
* [Intro to Nodes, Objects, and Features](#Intro-to-Nodes,-Objects,-and-Features)
    * &nbsp;[what is a node?](#what-is-a-node?)
    * &nbsp;[what is an object?](#what-is-an-object?)
    * &nbsp;[what is a feature?](#what-is-a-feature?)
* [Access Object Nodes](#Access-Object-Nodes)<br>
    * &nbsp;[access nodes](#access-nodes)
    * &nbsp;[count all object types](#count-all-object-types)
    * &nbsp;[count features and values](#count-features-and-values)
* [Example Query: WordOrder](#Example-Query:-WordOrder)

<hr>

In [15]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [16]:
import collections

## Loading Text-Fabric

Import the Fabric module from text-fabric:

In [17]:
from tf.fabric import Fabric

### instantiate text-fabric
Load the module with its path in the `text-fabric-data` directory.

In [18]:
TF = Fabric(modules='greek/sblgnt')

This is Text-Fabric 2.1.2
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
35 features found and 0 ignored


### load sblgnt features

Select which features to load from the data. The available features are in the [sblgnt features documentation](https://etcbc.github.io/text-fabric-data/features/greek/sblgnt/0_home.html). Features unique to text-fabric are lower-case while features native to sblgnt are upper. 

Features are loaded with the load method on the Fabric object. The method takes a string argument with all of the features. Features in the load string may be space or new-line separated.

In [19]:
api = TF.load('''
                Cat Gender Tense
                Unicode UnicodeLemma Mood
                book chapter verse
                otype function psp
                freq_occ freq_lex
              ''')

api.makeAvailableIn(globals()) # optional line, but without it you must always append api.

  0.00s loading features ...
   |     0.01s B otype                from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.00s B book                 from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.00s B chapter              from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.00s B verse                from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.09s B Unicode              from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.07s B UnicodeLemma         from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.10s B Cat                  from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.03s B Gender               from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.01s B Tense                from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.01s B Mood                 from /Users/Cody/github/text-fabric-data/greek/sblgnt
   |     0.07s B function             from /Users/C

<hr>

## Intro to Nodes, Objects, and Features

TF uses nodes, objects, and features as pointers to the data.

### what is a node?

A node is an arbitrary integer that TF uses to look up the data. Every datapoint in TF has its own unique node. We supply node numbers to TF python objects and get the value in return.

In [20]:
example_node = 137795

# What kind of data does example_node represent? 
# We can find out by supplying the node number to the otype feature object:

F.otype.v(example_node)

'book'

Which book does example_node represent? We can find out by supplying it to another feature object:

In [21]:
F.book.v(example_node)  # the book feature returns the book's name

'matthew'

Let's try something else with this node. We'll supply example_node to a different kind of feature object...

In [22]:
print(F.Gender.v(example_node))

None


What happened here? Book nodes can't have gender features. But word nodes can:

In [23]:
word_node = 1231
print('word_node gender:', F.Gender.v(word_node))
print('word_node unicode:', F.Unicode.v(word_node))

word_node gender: Feminine
word_node unicode: ἔρημον


This is because any given node represents different linguistic **objects**.

### what is an object?

Up to this point we've used the term 'object' in the usual Python sense. The sense we refer to from now on has *no* relation to programming objects. Rather, in the datamodel of TF, words, phrases, and clauses are defined as (linguistic) objects; likewise, sections like books, chapters, and verses are objects. For more information about how objects are defined, see the [data model documentation](https://github.com/ETCBC/text-fabric/wiki/Data-model). Every object has a `type`. As in the example above, some nodes are book object types, others are word object types, [and more](https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/otype).

### what is a feature?

Features are strings that provide information on an object type. `book`, `gender`, `tense`, and `function` are all examples of features that can be looked up for a corresponding object type. See the feature documentation for a reference to all of the features.

<hr>

## Access Object Nodes

### access nodes
We've seen what we can do with nodes. But how do we get the nodes we want? 

#### iterate through all nodes with [node generator](https://github.com/ETCBC/text-fabric/wiki/Api#walking-through-nodes) &nbsp;&nbsp;&nbsp;&nbsp;   `N():` 

In [24]:
node_count = 0 

for node in N():
    node_count += 1
    
print('total nodes: ', node_count)

total nodes:  428430


#### interate through certain object type nodes with [feature otype](https://github.com/ETCBC/text-fabric/wiki/Api#node-features) &nbsp;&nbsp;&nbsp;&nbsp;`F.otype.s()` 

In [25]:
book_count = 0

for book_node in F.otype.s('book'):
    book_count += 1
    
print('total books nodes: ', book_count)

total books nodes:  27


#### access embedd[ed/ing] objects with "level up", "level down" [locality](https://github.com/ETCBC/text-fabric/wiki/Api#locality) &nbsp;&nbsp;&nbsp;&nbsp; `L.u()` / `L.d()`

TF preserves embedding relationships between object types. For example, phrases are embedded in clauses. See the [datamodel discussion on levels](https://github.com/ETCBC/text-fabric/wiki/Api#locality-and-levels) to understand how this is encoded. The TF term for these relationships is 'levels.'

In [30]:
from random import Random
randomizer = Random()

highest_word_node = F.otype.maxSlot
random_word = randomizer.randint(1, highest_word_node)

random_word

134334

In [31]:
# the book lookup returns a tuple containing the embedding book node:
L.u(random_word,'book')

(137821,)

In [32]:
# let's see all the above information for random_word
level_up = (
            F.Unicode.v(random_word),
            F.book.v(
                        L.u(random_word, otype='book')[0]
                     ),
    
            str(F.chapter.v(
                        L.u(random_word, otype='chapter')[0]
                     )),

            str(F.verse.v(
                        L.u(random_word, otype = 'verse')[0]
                     )),
            'phraseFunction: ' + F.function.v(
                                L.u(random_word, otype='phrase')[0]
                             ))

', '.join(level_up)

'ἐξέχεεν, revelation, 16, 3, phraseFunction: vp'

#### access section objects with [Text](https://github.com/ETCBC/text-fabric/wiki/Api#text) &nbsp;&nbsp;&nbsp;&nbsp; `T.nodeFromSection()` / `T.sectionFromNode()`

In [33]:
john316 = ('John',3,16)  # req. a tuple; verse/chapter optional
john316_node = T.nodeFromSection(john316)

john316_node

422594

The Text api can conversely return section information from a given node (**`T.sectionFromNode`**). The T. api also provides a formatting function for formatting UTF8 text from a provided list of nodes.

In the example below we do 3 things: 
1. Gather all of the word nodes in John 3:16 with a `L.d()` call (this returns a list). 
2. We feed the word nodes to **`T.text()`**, which requires an iterable of word nodes as an argument.
3. And we print with the `T.text()` now formatted, and reverse the previous cell's step by re-gathering the section data from the `john316_node` (with `T.sectionfromNode()).

In [34]:
john316_words = L.d(john316_node, otype='word')

print(T.text(john316_words), T.sectionFromNode(john316_node))

γὰρ Οὕτως ἠγάπησεν ὁ θεὸς τὸν κόσμον ὥστε τὸν υἱὸν τὸν μονογενῆ ἔδωκεν, ἵνα πᾶς ὁ πιστεύων εἰς αὐτὸν μὴ ἀπόληται ἀλλὰ ἔχῃ ζωὴν αἰώνιον.  ('John', 3, 16)


### count all object types

In [35]:
all_objects = F.otype.all # just a tuple of object types (not nodes!) in sblgnt
print(all_objects)
print(len(all_objects), 'object types in sblgnt')

('book', 'chapter', 'verse', 'sentence', 'clause', 'clause_atom', 'phrase', 'conjunction', 'wordx', 'word')
10 object types in sblgnt


In [36]:
# how many instances of each object type?

object_counts = collections.Counter() # we use a counter to number the instances

for obj_type in all_objects:
    for otype_node in F.otype.s(obj_type): # F.otype.s() to iterate through the given otype nodes
        object_counts[obj_type] += 1

for otype, count in sorted(object_counts.items(), key = lambda k: k[1]):
    print('{:<15}{:>15}'.format(otype, count))

book                        27
conjunction                172
chapter                    260
wordx                      879
verse                     7939
sentence                  8014
clause                   54800
clause_atom              75967
word                    137794
phrase                  142578


### count features and values

A special method can return the count of a given feature.

In [37]:
F.Gender.freqList()

(('Masculine', 41418), ('Feminine', 18750), ('Neuter', 13813))

Use &nbsp; [`Fall()`](https://github.com/ETCBC/text-fabric/wiki/Api#node-features) to see all loaded features

In [38]:
Fall()

['Cat',
 'Gender',
 'Mood',
 'Tense',
 'Unicode',
 'UnicodeLemma',
 'book',
 'chapter',
 'freq_lex',
 'freq_occ',
 'function',
 'otype',
 'psp',
 'verse']

In [39]:
select_features = {'category' : F.Cat, 
                   'gender' : F.Gender, 
                   'tense' : F.Tense, 
                   'mood' : F.Mood,
                   'partOfSpeech' : F.psp,}

for feature,TFObject in select_features.items():
    
    counts = '\n'.join(list('{:10}{:>15}'.format(value, count) for value, count in TFObject.freqList()))
    print('{:>15}\n{:>15}'.format(feature,'-'*25))
    print(counts, '\n\n')

       category
-------------------------
np                  86102
CL                  54800
vp                  28339
noun                28277
verb                28112
V                   25142
det                 19806
ADV                 19523
S                   19180
conj                18422
pron                16132
pp                  11434
prep                11039
O                   10931
adjp                 9651
adj                  8906
advp                 6535
adv                  6314
P                    3685
IO                   2666
VC                   2590
ptcl                 1043
nump                  517
num                   477
intj                  317
O2                    264 


         gender
-------------------------
Masculine           41418
Feminine            18750
Neuter              13813 


          tense
-------------------------
Aorist              11596
Present             11552
Imperfect            1679
Future               1624
Perfect   

<hr>

## Example Query: WordOrder

Word order is notoriously tricky in Greek. Can we find any tendencies throughout the different NT books?

For this search, we will look for clauses in which both a subject and a finite verb are present and measure which one comes first. The results will be presented on a book-by-book basis. We'll need to gather several pieces of information for each instance:

* clause_atom, and feature category (clause-level-function)
* Cat == 'V' and 'S'

*Caution: this query encounters some quirky issues with the way clauses are structured in the dataset.*

In [40]:
wordOrderCounts = collections.defaultdict(collections.Counter)

for book in F.otype.s('book'):
    book_clauses = L.d(book, otype = 'clause')
    for clause in book_clauses:
        ordered_elements = ''
        for ca in L.d(clause, otype = 'clause_atom'):
            if F.Cat.v(ca) in {'V','S'}:
                ordered_elements += F.Cat.v(ca)
        if ordered_elements in {'VS','SV'}:  # the clause structuring produces quite a lot of superfluous results
            wordOrderCounts[F.book.v(book)][ordered_elements] += 1

In [41]:
print('{:>15}{:>9}{:>5}'.format('Book','SV','VS'))
print('-'*30)
for book in (F.book.v(b) for b in F.otype.s('book')):
    count_data = list(number for (tag, number) in sorted(wordOrderCounts[book].items()))        
    print('{:>15}{:>9}{:5}'.format( book, count_data[0], count_data[1]))

           Book       SV   VS
------------------------------
        matthew      672  395
           mark      423  243
           luke      657  440
           john      819  561
           acts      508  427
         romans      294  111
   1corinthians      387  134
   2corinthians      164   74
      galatians       69   47
      ephesians       36   22
    philippians       45    7
     colossians       29   13
 1thessalonians       39   15
 2thessalonians       15   15
       1timothy       41    9
       2timothy       30   22
          titus       21    7
       philemon        7    2
        hebrews      131   86
          james       68   33
         1peter       48   22
         2peter       35    8
          1john      110   42
          2john        8    1
          3john        9    5
           jude        4    3
     revelation      318  178
