<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/laf-fabric-small.png"/></a>
<a href="https://shebanq.ancient-data.org" target="_blank"><img align="left"src="images/shebanq_logo_small.png"/></a>
<a href="http://dx.doi.org/10.17026/dans-z6y-skyh" target="_blank"><img align="left"src="images/DANS-logo_small.png"/></a>
<a href="https://www.dbg.de/index.php?L=1" target="_blank"><img align="right" src="images/DBG-small.png"/></a>
<a href="http://www.godgeleerdheid.vu.nl/etcbc" target="_blank"><img align="right" src="images/VU-ETCBC-small.png"/></a>

# Workshop 2016-03-14

# 1. Datamodel: 

## 1.1 ETCBC data in the Emdros model

See the [otype](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/otype.html) feature.

And then the [overview](https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/0_overview.html) of features.

[SHEBANQ](https://shebanq.ancient-data.org) is your friend, especially the **Help** there.

## 1.2 Let's go LAF

<a href="http://laf-fabric.readthedocs.org/en/latest/" target="_blank"><img align="left" src="images/LAF.pdf"/></a>

## 1.3 Let's forget LAF

**LAF-Fabric**
* has read all the LAF-XML
* has built a datastructure (graph)
* has saved the data structure on disk
* will load the relevant parts for you quickly

# 2. API

We'll start the API, but first we have to import the necessary modules.
``sys, collections, re`` are not necessary for LAF-Fabric, but may come in handy later.

In [1]:
import sys, collections, re

from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.5.21
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



`LafFabric` is a class offered by the `laf.fabric` module, and have created just one object of that class, and stored it in the variable `fabric`.

Note the links to the documentation.

LAF-Fabric can work with several data sources and versions.

In [2]:
source='etcbc'
version='4b'

## 2.1 Loading data

The `load` method is a function that listens to your data requirements, and manages to keep in memory exactly what you need.

In [3]:
API=fabric.load(source+version, 'lexicon', 'workshop', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        otype
        lex
        sp gloss
        chapter verse
    ''',''),
    "prepare": prepare,
    "primary": False,
})
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s USING main  DATA COMPILED AT: 2015-11-02T15-08-56
  0.00s USING annox DATA COMPILED AT: 2016-01-27T19-01-17
  2.49s LOGFILE=/Users/dirk/laf-fabric-output/etcbc4b/workshop/__log__workshop.txt
  2.49s INFO: LOADING PREPARED data: please wait ... 
  2.49s prep prep: G.node_sort
  2.60s prep prep: G.node_sort_inv
  3.12s prep prep: L.node_up
  6.63s prep prep: L.node_down
    12s prep prep: V.verses
    12s prep prep: V.books_la
    12s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    14s INFO: LOADED PREPARED data
    14s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK workshop AT 2016-03-14T13-12-54


## 2.2 ETCBC additions

The `laf` modules know nothing about Hebrew data, nor about ETCBC data features.

The `etcbc` modules bring in specific knowledge about how the ETCBC data has been modeled in LAF. It knows
* it knows that *sentences* contain *clauses* contain *phrases*
* it can order all nodes in a logical order
* it can find parts and wholes
* it can print text content

# 3. Tasks

## 3.1 Exploration

### 3.1.1 Show the first 20 nodes

In [4]:
i = 0
for n in NN():
    print('{} {}'.format(n, F.otype.v(n)))
    i += 1
    if i >= 20: break

1367497 book
1367536 chapter
1413645 verse
1125793 sentence
1189379 sentence_atom
426568 clause
514579 clause_atom
1368465 half_verse
605133 phrase
858294 phrase_atom
0 word
1 word
605134 phrase
858295 phrase_atom
2 word
605135 phrase
858296 phrase_atom
3 word
1368466 half_verse
605136 phrase


### 3.1.2 Count all nodes

In [5]:
msg('Counting')

i = 0
for n in NN(): i += 1
print(i)

msg('Done. {} nodes'.format(i))

 9m 11s Counting
 9m 12s Done. 1436858 nodes


1436858


### 3.1.3 Count the nodes per object type

In [6]:
msg('Counting per object type')

counts = collections.Counter()
for n in NN(): counts[F.otype.v(n)] += 1

msg('Done. {} distinct object types'.format(len(counts)))

12m 54s Counting per object type
12m 56s Done. 12 distinct object types


But how many nodes per object type?

In [7]:
for tp in counts: print('{} has {} nodes'.format(tp, counts[tp]))

chapter has 929 nodes
subphrase has 113764 nodes
sentence has 63586 nodes
sentence_atom has 64354 nodes
word has 426568 nodes
half_verse has 45180 nodes
clause_atom has 90554 nodes
verse has 23213 nodes
phrase_atom has 267499 nodes
clause has 88011 nodes
book has 39 nodes
phrase has 253161 nodes


Now more pretty, sorted by most numerous:

In [9]:
for (tp, n) in sorted(counts.items(), key=lambda x: (-x[1], x[0])):
    print('{:<20}: {:>7} nodes'.format(tp, n))

word                :  426568 nodes
phrase_atom         :  267499 nodes
phrase              :  253161 nodes
subphrase           :  113764 nodes
clause_atom         :   90554 nodes
clause              :   88011 nodes
sentence_atom       :   64354 nodes
sentence            :   63586 nodes
half_verse          :   45180 nodes
verse               :   23213 nodes
chapter             :     929 nodes
book                :      39 nodes


## 3.2 Find a passage

In [10]:
# for convenience, we use swahili bible book names
my_node = T.node_of(T.book_node('Mwanzo', lang='sw'), 1, 1)

print(my_node)

1413645


Now get the word nodes of that passage.

In [11]:
my_words = L.d('word', my_node)
print(my_words)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


## 3.3 Print text

In [12]:
print(T.words(my_words))

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃



We also can get it in other formats

In [13]:
print(T.words(my_words, fmt='hc'))

בראשית ברא אלהימ את השמימ ואת הארצ׃



Now get it in all available formats.

In [14]:
T.formats()

OrderedDict([('hp',
              ('hebrew primary',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('ha',
              ('hebrew accent',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('hv',
              ('hebrew vowel',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('hc',
              ('hebrew cons',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('ea',
              ('trans accent',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('ev',
              ('trans vowel',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('ec',
              ('trans cons',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('pf',
              ('phono full',
               <function etcbc.text.Text.__init__.<locals>.<lambda>>)),
             ('ps',

In [15]:
for f in T.formats(): print('{}'.format(T.words(my_words, fmt=f)))

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ׃

בראשית ברא אלהימ את השמימ ואת הארצ׃

B.:R;>CI73JT B.@R@74> >:ELOHI92Jm >;71T HAC.@MA73JIm W:>;71T H@>@95REy00

B.:R;>CIJT B.@R@> >:ELOHIJm >;T HAC.@MAJIm W:>;T H@>@REy00

BR>#JT BR> >LHJM >T H#MJM W>T H>RY00

bᵊrēšˌîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˌayim wᵊʔˌēṯ hāʔˈāreṣ .

brēšîṯ bårå ʔlōhîm ʔēṯ haššåmayim wʔēṯ håʔåreṣ .



With a bit more info

In [17]:
for (f, (desc, method)) in T.formats().items(): print('{}={} {}'.format(f, desc, T.words(my_words, fmt=f)))

hp=hebrew primary בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

ha=hebrew accent בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

hv=hebrew vowel בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ׃

hc=hebrew cons בראשית ברא אלהימ את השמימ ואת הארצ׃

ea=trans accent B.:R;>CI73JT B.@R@74> >:ELOHI92Jm >;71T HAC.@MA73JIm W:>;71T H@>@95REy00

ev=trans vowel B.:R;>CIJT B.@R@> >:ELOHIJm >;T HAC.@MAJIm W:>;T H@>@REy00

ec=trans cons BR>#JT BR> >LHJM >T H#MJM W>T H>RY00

pf=phono full bᵊrēšˌîṯ bārˈā ʔᵉlōhˈîm ʔˌēṯ haššāmˌayim wᵊʔˌēṯ hāʔˈāreṣ .

ps=phono simple brēšîṯ bårå ʔlōhîm ʔēṯ haššåmayim wʔēṯ håʔåreṣ .



# 4. Advanced tasks

## 4.1 Adding annotations.

See notebook valence/flow_corr.

## 4.2 R

See notebooks shebanq/r