<img align="right" src="tf-small.png"/>

# SHEBANQ from ETCBC

This notebook assembles the data from the ETCBC that is needed
to feed the website [SHEBANQ](https://shebanq.ancient-data.org).

All data is delivered through github repositories.
Before the pipeline starts, these repos must be pulled.

This notebook will call a series of other notebooks, some of them
residing in other github repos.
Before these notebooks can be run, they must be converted to Python
programs. Then the will be called as such, with parameters injected as local variables.
One of these parameters will be `SCRIPT=True`, with the understanding
that a notebook can adapt its actions to the fact that it is part of the pipeline.
These notebooks can also be run interactively, and then you can add extra actions which are not relevant to the pipeline conversion, such as testing, experimenting, visualizing.
Take care that you wrap non-essential things in contexts where
`SCRIPT=False`.

This notebook itself can also be run in script mode.

## Pipeline

### Core data

The core data is delivered by the ETCBC as `bhsa.mql.bz2` in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `source`.

This data will be converted by `tfFromMQL` in the `programs` directory.

The result of this action will be an updated TF resource in its 
`tf/core` directory.

### Statistics

The notebook `addStats` in the same *bhsa* repo will add statistical
features to the core dataset: `freq_occ freq_lex rank_occ rank_lex`.

In [1]:
import os,sys,collections
from pipeline import runPipeline
from tf.fabric import Fabric

# Config

In [2]:
CORE_NAME = 'bhsa'
CORE_MODULE = 'core'

if 'SCRIPT' not in locals(): 
    SCRIPT = False
    DEFAULT_CORE_NAME = CORE_NAME
    DEFAULT_VERSION = 'c'

In [3]:
pipeline = dict(
    defaults = dict(
        CORE_NAME=CORE_NAME,
        VERSION=DEFAULT_VERSION,
        CORE_MODULE=CORE_MODULE,
    ),
    versions={
        '4': dict(),
        '4b': dict(),
        'c': dict(),
        'd': dict(),
        '2017': dict(),
    },
    repoOrder = '''
        bhsa
        phono
        parallels
        valence
    ''',
    repoConfig = dict(
        bhsa=(
            dict(
                task='tfFromMQL',
            ),
            dict(
                task='lexicon',
                omit={'4', '4b'},
            ),
            dict(
                task='paragraphs',
                 omit={'4', '4b', 'c'},
            ),
            dict(
                task='ketivQere',
                omit={'4', '4b'},
            ),
            dict(
                task='addStats',
                omit={'4', '4b'},
            ),
        ),
        phono=(
            dict(
                task='phono',
                omit={'4', '4b'},
            ),
        ),
        parallels=(
            dict(
                task='crossref',
                omit={'4', '4b', 'c'},
            ),
        ),
        valence=(
            dict(
                task='flowchart',
                omit={'4', '4b', 'c'},
            ),
        ),
    ),
)

# Run the pipeline

In [4]:
good = runPipeline(pipeline, version='c', force=True)


##############################################################################################
#                                                                                            #
#       0.00s Make version [c]                                                               #
#                                                                                            #
##############################################################################################


**********************************************************************************************
*                                                                                            *
*       0.00s Make repo [bhsa]                                                               *
*                                                                                            *
**********************************************************************************************


---------------------------------------------

|       1.36s 			feature mother_object_type (str) =def= clause : node
|       1.36s 			feature dist_unit (str) =def= clause_atoms : node
|       1.36s 		otype half_verse
|       1.36s 			feature label (str) =def=  : node
|       1.36s 		otype verse
|       1.36s 			feature verse (int) =def= 0 : node
|       1.36s 			feature chapter (int) =def= 0 : node
|       1.36s 			feature label (str) =def=  : node
|       1.36s 			feature book (str) =def= Genesis : node
|       1.36s 		otype phrase_atom
|       1.36s 			feature number (int) =def= 0 : node
|       1.36s 			feature dist (int) =def= 0 : node
|       1.36s 			feature distributional_parent (str) =def= 0 : edge
|       1.36s 			feature mother (str) =def= 0 : edge
|       1.36s 			feature functional_parent (str) =def= 0 : edge
|       1.36s 			feature det (str) =def= NA : node
|       1.36s 			feature typ (str) =def= VP : node
|       1.36s 			feature rela (str) =def= NA : node
|       1.37s 			feature dist_unit (str) =def= clause_atoms 

   |     0.00s T book@zh              to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.08s T chapter              to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.25s T code                 to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     1.08s T det                  to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     1.23s T dist                 to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     1.09s T dist_unit            to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.22s T domain               to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.49s T function             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.79s T g_cons               to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.86s T g_cons_utf8          to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.79s T g_lex                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.83s T g_lex_utf8           to /Users/dirk/github/etcbc/bhsa/_temp

|      4m 18s dist                      ... no changes
|      4m 18s dist_unit                 ... no changes
|      4m 19s distributional_parent     ... no changes
|      4m 20s domain                    ... no changes
|      4m 20s function                  ... no changes
|      4m 20s functional_parent         ... no changes
|      4m 21s g_cons                    ... no changes
|      4m 21s g_cons_utf8               ... no changes
|      4m 22s g_lex                     ... no changes
|      4m 22s g_lex_utf8                ... no changes
|      4m 23s g_nme                     ... no changes
|      4m 23s g_nme_utf8                ... no changes
|      4m 23s g_pfm                     ... no changes
|      4m 24s g_pfm_utf8                ... no changes
|      4m 24s g_prs                     ... no changes
|      4m 24s g_prs_utf8                ... no changes
|      4m 25s g_uvf                     ... no changes
|      4m 25s g_uvf_utf8                ... no changes
|      4m 

95 features found and 0 ignored
  0.00s loading features ...
   |     1.27s T otype                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |       11s T oslots               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.09s T book                 from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.05s T chapter              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.05s T verse                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.57s T g_cons               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.76s T g_cons_utf8          from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.82s T g_lex                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.75s T g_lex_utf8           from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.66s T g_word               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.66s T g_word_utf8          from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.42s T lex      

   |     1.62s T sp                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.43s T st                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.18s T tab                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.29s T txt                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     2.62s T typ                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.51s T uvf                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.33s T vbe                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.45s T vbs                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.45s T vs                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.46s T vt                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s Feature overview: 90 for nodes; 4 for edges; 1 configs; 7 computed
 1m 13s All features loaded/computed - for details use loadLog()
.......................

..............................................................................................
.      7m 43s Various tweaks in features                                                     .
..............................................................................................
..............................................................................................
.      7m 44s Update the otype, oslots and otext features                                    .
..............................................................................................
|      7m 46s Features that have new or modified data
|      7m 46s 	gloss
|      7m 46s 	language
|      7m 46s 	lex
|      7m 46s 	lex0
|      7m 46s 	lex_utf8
|      7m 46s 	ls
|      7m 46s 	nametype
|      7m 46s 	otype
|      7m 46s 	root
|      7m 46s 	sp
|      7m 46s 	voc_lex
|      7m 46s 	voc_lex_utf8
|      7m 46s 	oslots
|      7m 46s Check voc_lex_utf8: בְּ רֵאשִׁית ברא אֱלֹהִים אֵת הַ שָׁמַיִם וְ אֶרֶץ
|      7m

99 features found and 0 ignored
  0.00s loading features ...
   |     1.37s T otype                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |       11s T oslots               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.51s T lex0                 from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     1.72s T lex_utf8             from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |      |     1.48s C __levels__           from otype, oslots
   |      |       19s C __order__            from otype, oslots, __levels__
   |      |     0.98s C __rank__             from otype, __order__
   |      |       34s C __levUp__            from otype, oslots, __rank__
   |      |       12s C __levDown__          from otype, __levUp__, __rank__
   |      |     4.09s C __boundary__         from otype, oslots, __rank__
   |     0.00s M otext                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |      |     0.12s C __sections__         from otype, oslots, otext, __levUp__, __levels_

   |     0.14s B g_word               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.09s B trailer_utf8         from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.02s B label                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s Feature overview: 94 for nodes; 4 for edges; 1 configs; 7 computed
  4.69s All features loaded/computed - for details use loadLog()
|      9m 38s Mapping between verse labels and verse nodes
|      9m 38s 23213 verses
..............................................................................................
.      9m 38s Parsing Ketiv-Qere data                                                        .
..............................................................................................
|      9m 38s 	Read 1892 ketiv-qere annotations
|      9m 39s 	Parsed 1892 ketiv-qere annotations
|      9m 39s 	All verses entries found in index
|      9m 39s 	All ketivs found in the text
|      9m 39s 	All ketivs found in the dat

101 features found and 0 ignored
  0.00s loading features ...
   |     0.15s B g_cons               from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.13s B language             from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.13s B lex                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s Feature overview: 96 for nodes; 4 for edges; 1 configs; 7 computed
  4.81s All features loaded/computed - for details use loadLog()
|      9m 48s Counting occurrences
|      9m 50s Making statistical features
..............................................................................................
.      9m 52s Write statistical features as TF                                               .
..............................................................................................
   |     0.69s T freq_lex             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.66s T freq_occ             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.6

   |     0.26s B sp                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.28s B vs                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.29s B vt                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.17s B gn                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.22s B nu                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.26s B ps                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.17s B st                   from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.25s B uvf                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.24s B prs                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.14s B g_prs                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.21s B pfm                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.25s B vbs                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |