<img align="right" src="tf-small.png"/>

# SHEBANQ from ETCBC

This notebook assembles the data from the ETCBC that is needed
to feed the website [SHEBANQ](https://shebanq.ancient-data.org).

All data is delivered through github repositories.
Before the pipeline starts, these repos must be pulled.

This notebook will call a series of other notebooks, some of them
residing in other github repos.
Before these notebooks can be run, they must be converted to Python
programs. Then the will be called as such, with parameters injected as local variables.
One of these parameters will be `SCRIPT=True`, with the understanding
that a notebook can adapt its actions to the fact that it is part of the pipeline.
These notebooks can also be run interactively, and then you can add extra actions which are not relevant to the pipeline conversion, such as testing, experimenting, visualizing.
Take care that you wrap non-essential things in contexts where
`SCRIPT=False`.

This notebook itself can also be run in script mode.

## Pipeline

### Core data

The core data is delivered by the ETCBC as `bhsa.mql.bz2` in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `source`.

This data will be converted by `tfFromMQL` in the `programs` directory.

The result of this action will be an updated TF resource in its 
`tf/core` directory.

### Statistics

The notebook `addStats` in the same *bhsa* repo will add statistical
features to the core dataset: `freq_occ freq_lex rank_occ rank_lex`.

In [1]:
import os,sys,collections
from pipeline import runPipeline
from tf.fabric import Fabric

# Config

In [2]:
CORE_NAME = 'bhsa'
CORE_MODULE = 'core'

if 'SCRIPT' not in locals(): 
    SCRIPT = False
    DEFAULT_CORE_NAME = CORE_NAME
    DEFAULT_VERSION = 'c'

In [3]:
pipeline = dict(
    defaults = dict(
        CORE_NAME=CORE_NAME,
        VERSION=DEFAULT_VERSION,
        CORE_MODULE=CORE_MODULE,
    ),
    versions={
        '4': dict(),
        '4b': dict(),
        'c': dict(),
        'd': dict(),
        '2017': dict(),
    },
    repoOrder = '''
        bhsa
        phono
        parallels
        valence
    ''',
    repoConfig = dict(
        bhsa=(
            dict(
                task='tfFromMQL',
            ),
            dict(
                task='lexicon',
                omit={'4', '4b'},
            ),
            dict(
                task='paragraphs',
                 omit={'4', '4b', 'c'},
            ),
            dict(
                task='ketivQere',
                omit={'4', '4b'},
            ),
            dict(
                task='addStats',
                omit={'4', '4b'},
            ),
        ),
        phono=(
            dict(
                task='phono',
                omit={'4', '4b'},
            ),
        ),
        parallels=(
            dict(
                task='crossref',
                omit={'4', '4b', 'c'},
            ),
        ),
        valence=(
            dict(
                task='flowchart',
                omit={'4', '4b', 'c'},
            ),
        ),
    ),
)

# Run the pipeline

In [4]:
good = runPipeline(pipeline, version='c', force=True)


##############################################################################################
#                                                                                            #
#       0.00s Make version [c]                                                               #
#                                                                                            #
##############################################################################################


**********************************************************************************************
*                                                                                            *
*       0.00s Make repo [bhsa]                                                               *
*                                                                                            *
**********************************************************************************************


---------------------------------------------

|         13s 	line   2000000
|         17s 		objects in word
|         19s 	line   3000000
|         25s 	line   4000000
|         31s 	line   5000000
|         32s 		objects in word
|         37s 	line   6000000
|         43s 	line   7000000
|         48s 		objects in word
|         49s 	line   8000000
|         55s 	line   9000000
|      1m 01s 	line  10000000
|      1m 03s 		objects in word
|      1m 07s 	line  11000000
|      1m 13s 	line  12000000
|      1m 18s 	line  13000000
|      1m 18s 		objects in word
|      1m 24s 	line  14000000
|      1m 30s 	line  15000000
|      1m 34s 		objects in word
|      1m 36s 	line  16000000
|      1m 42s 	line  17000000
|      1m 48s 	line  18000000
|      1m 50s 		objects in word
|      1m 54s 	line  19000000
|      2m 00s 	line  20000000
|      2m 05s 		objects in word
|      2m 06s 	line  21000000
|      2m 12s 	line  22000000
|      2m 13s 		objects in clause_atom
|      2m 16s 		objects in clause_atom
|      2m 17s 	line  23000000
|     

   |     0.68s T g_nme                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.71s T g_nme_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.74s T g_pfm                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.77s T g_pfm_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.81s T g_prs                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.70s T g_prs_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.73s T g_uvf                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.67s T g_uvf_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.65s T g_vbe                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.63s T g_vbe_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.63s T g_vbs                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.66s T g_vbs_utf8           to /Users/dirk/github/etcbc/bhsa/_temp

|      4m 33s g_vbs_utf8                ... no changes
|      4m 34s g_word                    ... no changes
|      4m 34s g_word_utf8               ... no changes
|      4m 34s gn                        ... no changes
|      4m 35s is_root                   ... no changes
|      4m 35s kind                      ... no changes
|      4m 35s label                     ... no changes
|      4m 35s language                  ... differencesafter the metadata
|      4m 35s 	line      2 OLD -->hbo<--
|      4m 35s 	line      2 NEW -->Hebrew<--
|      4m 35s 	line      3 OLD -->hbo<--
|      4m 35s 	line      3 NEW -->Hebrew<--
|      4m 35s 	line      4 OLD -->hbo<--
|      4m 35s 	line      4 NEW -->Hebrew<--
|      4m 35s 	line      5 OLD -->hbo<--
|      4m 35s 	line      5 NEW -->Hebrew<--

|      4m 35s lex                       ... differencesafter the metadata
|      4m 36s 	line 426583 OLD -->1436895	B<--
|      4m 36s 	line 426583 NEW --><empty><--
|      4m 36s 	line 426584 OLD -->

   |     0.00s M otext                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |      |     0.14s C __sections__         from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse
   |     0.00s T book@am              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@ar              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@bn              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@da              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.01s T book@de              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@el              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@en              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.01s T book@es              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@fa              from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s T book@fr              from /Users/dirk

|      7m 27s START lexicon (CORE_MODULE=core, CORE_NAME=bhsa, VERSION=c)
|      7m 27s 	Destination /Users/dirk/github/etcbc/bhsa/tf/c/core/.tf/lex0.tfx does not exist
|      7m 27s New text formats
|      7m 27s fmt:lex-trans-plain            = "{lex0} "
..............................................................................................
.      7m 27s Load the existing TF dataset                                                   .
..............................................................................................
This is Text-Fabric 2.3.12
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
95 

   |     0.89s T ls                   to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.01s T nametype             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.67s T otype                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.00s T root                 to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.68s T sp                   to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.02s T voc_lex              to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.03s T voc_lex_utf8         to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     4.37s T oslots               to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.00s M otext                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
..............................................................................................
.      8m 04s Check differences with previous version                                        .
......................................................

   |     0.04s T gloss                from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s Feature overview: 94 for nodes; 4 for edges; 1 configs; 7 computed
 1m 33s All features loaded/computed - for details use loadLog()
|      9m 40s new format lex-trans-plain (using lex0): TC</ TC</ TC</ TC</ TC</ TC</ TC</ TC</ TC</ TC</ TC</ 
|      9m 40s lex_utf8 feature              : ב ראשׁית ברא אלהים את ה שׁמים ו את ה ארץ
|      9m 40s language feature              : hbo hbo hbo hbo hbo hbo hbo hbo hbo hbo hbo
..............................................................................................
.      9m 40s Lexeme info for the first verse                                                .
..............................................................................................
|      9m 40s 	hbo - B - 15542x
|      9m 40s 		gloss           = in
|      9m 40s 		ls              = None
|      9m 40s 		nametype        = None
|      9m 40s 		root            = None
|      9m 4

   |     0.00s M otext                to /Users/dirk/github/etcbc/bhsa/_temp/c/core
..............................................................................................
.      9m 46s Check differences with previous version                                        .
..............................................................................................
|      9m 46s 	2 features to add
|      9m 46s 		qere_trailer
|      9m 46s 		qere_trailer_utf8
|      9m 46s 	no features to delete
|      9m 46s 	3 features in common
|      9m 46s otext                     ... differences
|      9m 46s 	line      5 OLD -->@dateWritten=2017-09-28T15:08:39Z<--
|      9m 46s 	line      5 NEW -->@dateWritten=2017-09-28T15:12:21Z<--
|      9m 46s 	line     12 OLD -->@fmt:text-orig-full={g_word_utf8}{traile ...<--
|      9m 46s 	line     12 NEW -->@fmt:text-orig-full={qere_utf8/g_word_ut ...<--
|      9m 46s 	line     13 OLD -->@fmt:text-orig-plain={g_cons_utf8}{trail ...<--
|      9m 46s 	lin

   |     0.71s T freq_lex             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.68s T freq_occ             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.70s T rank_lex             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
   |     0.71s T rank_occ             to /Users/dirk/github/etcbc/bhsa/_temp/c/core
..............................................................................................
.     10m 04s Check differences with previous version                                        .
..............................................................................................
|     10m 04s 	4 features to add
|     10m 04s 		freq_lex
|     10m 04s 		freq_occ
|     10m 04s 		rank_lex
|     10m 04s 		rank_occ
|     10m 04s 	no features to delete
|     10m 04s 	0 features in common
|     10m 04s Done
..............................................................................................
.     10m 04s Deliver features to /Users/dirk/github/etcbc/

   |     0.18s B pfm                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.15s B vbs                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.17s B vbe                  from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.52s B language             from /Users/dirk/github/etcbc/bhsa/tf/c/core
   |     0.00s Feature overview: 100 for nodes; 4 for edges; 1 configs; 7 computed
  9.79s All features loaded/computed - for details use loadLog()
|     10m 28s 	Looking for non-verb qamets
|     10m 31s 	4060 lexemes and 13458 unique occurrences
|     10m 31s 	Filtering lexemes with varied occurrences
|     10m 31s 	161 interesting lexemes with 1705 unique occurrences
|     10m 31s 	Guessing between gadol and qatan
	JM/: Override for syllable 1: ā becomes o
	BJT/: Override for syllable 1: o becomes ā
	JWMM: Override for syllable 2:  becomes ā
	JHWNTN/: Override for syllable 2:  becomes ā
	JRB<M/: No override needed for syllable 1 which is ā
|     10m 31s 	10

UnboundLocalError: local variable 'good' referenced before assignment