![pipeline](pictures/pictures.002.png)

# Text-Fabric from ETCBC

This notebook assembles the data from the ETCBC that is needed
to compile its datasets in text-fabric-format on Github.
Ulltimately the data for the website [SHEBANQ](https://shebanq.ancient-data.org) will be
derived from these TF-sources.

## Pipeline
This is **pipe 1** of the pipeline from ETCBC data to the website SHEBANQ.

A run of this pipe produces a data *version*.
It should be run whenever there are new or updated data sources present that affect the output data.
Since all input data is delivered in a Github repo, we have excellent machinery to 
work with versioning.

The pipe works by executing a series of programs, contained in Github repositories.
For each repository in the pipe, a series of notebooks will be executed.
See [script mode](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) for 
details on how we call notebooks.

All this is specified in the configuration below.

### Core data

The core data is delivered by the ETCBC as `bhsa.mql.bz2` in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `source`.

This data will be converted by `tfFromMQL` in the `programs` directory.

The result of this action will be an updated TF resource in its 
`tf/core` directory.

### Additional data

Researchers have contributed to the dataset, 
but not all that data is in the core.
They are typically in the repository where the research has been 
executed, and where the data is documented.

Before the pipe starts, these repos must be pulled.

In [1]:
import os,sys,collections
from pipeline import runPipeline
from tf.fabric import Fabric

# Config

In [2]:
CORE_NAME = 'bhsa'
CORE_MODULE = 'core'

if 'SCRIPT' not in locals(): 
    SCRIPT = False
    DEFAULT_CORE_NAME = CORE_NAME
    DEFAULT_VERSION = 'c'

In [3]:
pipeline = dict(
    defaults = dict(
        CORE_NAME=CORE_NAME,
        VERSION=DEFAULT_VERSION,
        CORE_MODULE=CORE_MODULE,
    ),
    versions={
        '4': dict(),
        '4b': dict(),
        'c': dict(),
        '2016': dict(),
        '2017': dict(),
    },
    repoOrder = '''
        bhsa
        phono
        valence
        parallels
    ''',
    repoConfig = dict(
        bhsa=(
            dict(
                task='tfFromMQL',
            ),
            dict(
                task='lexicon',
                omit={},
            ),
            dict(
                task='paragraphs',
                 omit={'4', '4b'},
            ),
            dict(
                task='ketivQere',
                omit={'4', '4b'},
            ),
            dict(
                task='addStats',
                omit={'4', '4b'},
            ),
        ),
        phono=(
            dict(
                task='phono',
                omit={'4', '4b'},
            ),
        ),
        valence=(
            dict(
                task='enrich',
                omit={},
            ),
            dict(
                task='flowchart',
                omit={},
            ),
        ),
        parallels=(
            dict(
                task='parallels',
                omit={},
                params=dict(
                    FORCE_MATRIX=False,
                ),
            ),
        ),
    ),
)

# Run the pipeline

In [4]:
good = runPipeline(pipeline, version='2016', force=False)


##############################################################################################
#                                                                                            #
#       0.00s Make version [2016]                                                            #
#                                                                                            #
##############################################################################################


**********************************************************************************************
*                                                                                            *
*       0.00s Make repo [bhsa]                                                               *
*                                                                                            *
**********************************************************************************************


---------------------------------------------

|       1.37s 			feature book (str) =def= Genesis : node
|       1.37s 		otype phrase_atom
|       1.37s 			feature number (int) =def= 0 : node
|       1.37s 			feature dist (int) =def= 0 : node
|       1.37s 			feature distributional_parent (str) =def= 0 : edge
|       1.37s 			feature mother (str) =def= 0 : edge
|       1.37s 			feature functional_parent (str) =def= 0 : edge
|       1.37s 			feature det (str) =def= NA : node
|       1.37s 			feature typ (str) =def= VP : node
|       1.37s 			feature rela (str) =def= NA : node
|       1.37s 			feature dist_unit (str) =def= clause_atoms : node
|       1.37s 		objects in word
|       8.02s 	line   1000000
|         14s 	line   2000000
|         17s 		objects in word
|         19s 	line   3000000
|         24s 	line   4000000
|         30s 	line   5000000
|         31s 		objects in word
|         36s 	line   6000000
|         42s 	line   7000000
|         47s 		objects in word
|         49s 	line   8000000
|         55s 	line   9000000
|

   |     1.30s T dist                 to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     1.11s T dist_unit            to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.14s T domain               to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.46s T function             to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.76s T g_cons               to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.83s T g_cons_utf8          to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.77s T g_lex                to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.90s T g_lex_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.72s T g_nme                to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.74s T g_nme_utf8           to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.69s T g_pfm                to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.69s T g_pfm_utf8           to /U

|      4m 20s dist_unit                 ... no changes
|      4m 21s distributional_parent     ... no changes
|      4m 21s domain                    ... no changes
|      4m 21s function                  ... no changes
|      4m 22s functional_parent         ... no changes
|      4m 23s g_cons                    ... no changes
|      4m 23s g_cons_utf8               ... no changes
|      4m 23s g_lex                     ... no changes
|      4m 24s g_lex_utf8                ... no changes
|      4m 24s g_nme                     ... no changes
|      4m 25s g_nme_utf8                ... no changes
|      4m 25s g_pfm                     ... no changes
|      4m 25s g_pfm_utf8                ... no changes
|      4m 26s g_prs                     ... no changes
|      4m 26s g_prs_utf8                ... no changes
|      4m 26s g_uvf                     ... no changes
|      4m 27s g_uvf_utf8                ... no changes
|      4m 27s g_vbe                     ... no changes
|      4m 

95 features found and 0 ignored
  0.00s loading features ...
   |     1.34s T otype                from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |       10s T oslots               from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.09s T book                 from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.05s T chapter              from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.05s T verse                from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.45s T g_cons               from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.61s T g_cons_utf8          from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.51s T g_lex                from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.61s T g_lex_utf8           from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.52s T g_word               from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.63s T g_word_utf8          from /Users/dirk/github/etcbc/bhsa/tf/201

   |     0.65s T qere                 from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.66s T qere_utf8            from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     2.51s T rela                 from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.46s T sp                   from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.42s T st                   from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.18s T tab                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     0.28s T txt                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     2.64s T typ                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.49s T uvf                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.38s T vbe                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.55s T vbs                  from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.48s T vs                   from /Users/dirk

..............................................................................................
.      7m 44s Various tweaks in features                                                     .
..............................................................................................
..............................................................................................
.      7m 45s Update the otype, oslots and otext features                                    .
..............................................................................................
|      7m 48s Features that have new or modified data
|      7m 48s 	gloss
|      7m 48s 	language
|      7m 48s 	lex
|      7m 48s 	lex0
|      7m 48s 	lex_utf8
|      7m 48s 	ls
|      7m 48s 	nametype
|      7m 48s 	otype
|      7m 48s 	root
|      7m 48s 	sp
|      7m 48s 	voc_lex
|      7m 48s 	voc_lex_utf8
|      7m 48s 	oslots
|      7m 48s Check voc_lex_utf8: בְּ רֵאשִׁית ברא אֱלֹהִים אֵת הַ שָׁמַיִם וְ אֶרֶץ
|      7m

99 features found and 0 ignored
  0.00s loading features ...
   |     0.88s T otype                from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     9.90s T oslots               from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.49s T lex0                 from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |     1.62s T lex_utf8             from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |      |     1.27s C __levels__           from otype, oslots
   |      |       19s C __order__            from otype, oslots, __levels__
   |      |     0.92s C __rank__             from otype, __order__
   |      |       33s C __levUp__            from otype, oslots, __rank__
   |      |       12s C __levDown__          from otype, __levUp__, __rank__
   |      |     4.22s C __boundary__         from otype, oslots, __rank__
   |     0.00s M otext                from /Users/dirk/github/etcbc/bhsa/tf/2016/core
   |      |     0.11s C __sections__         from otype, oslots, otext, __lev

|      9m 38s 	Read 90562 paragraph annotations
|      9m 38s 	OK: All label/line entries found in index
|      9m 38s Prepare TF paragraph features
..............................................................................................
.      9m 38s write new/changed features to TF ...                                           .
..............................................................................................
   |     0.14s T instruction          to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.16s T pargr                to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
..............................................................................................
.      9m 39s Check differences with previous version                                        .
..............................................................................................
|      9m 39s 	2 features to add
|      9m 39s 		instruction
|      9m 39s 		pargr
|      9m 39s 	no featur

   |     0.00s T qere_trailer_utf8    to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.01s T qere_utf8            to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.00s M otext                to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
..............................................................................................
.      9m 49s Check differences with previous version                                        .
..............................................................................................
|      9m 49s 	2 features to add
|      9m 49s 		qere_trailer
|      9m 49s 		qere_trailer_utf8
|      9m 49s 	no features to delete
|      9m 49s 	3 features in common
|      9m 49s otext                     ... differences
|      9m 49s 	line      5 OLD -->@dateWritten=2017-09-30T15:07:26Z<--
|      9m 49s 	line      5 NEW -->@dateWritten=2017-09-30T15:11:05Z<--
|      9m 49s 	line     12 OLD -->@fmt:text-orig-full={g_word_utf8}{traile ...<--
|    

   |     0.74s T freq_lex             to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.71s T freq_occ             to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.73s T rank_lex             to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
   |     0.72s T rank_occ             to /Users/dirk/github/etcbc/bhsa/_temp/2016/core
..............................................................................................
.     10m 07s Check differences with previous version                                        .
..............................................................................................
|     10m 07s 	4 features to add
|     10m 07s 		freq_lex
|     10m 07s 		freq_occ
|     10m 07s 		rank_lex
|     10m 07s 		rank_occ
|     10m 07s 	no features to delete
|     10m 07s 	0 features in common
|     10m 07s Done
..............................................................................................
.     10m 07s Deliver features to /Users/dirk/g

|     10m 44s START parallels (CORE_MODULE=core, CORE_NAME=bhsa, FORCE_MATRIX=False, VERSION=2016)
|     10m 45s 	Destination /Users/dirk/github/etcbc/parallels/tf/2016/parallels/.tf/crossref.tfx exists
|     10m 45s SUCCESS parallels

----------------------------------------------------------------------------------------------
-     10m 45s SUCCES [parallels/parallels]                                                   -
----------------------------------------------------------------------------------------------


**********************************************************************************************
*                                                                                            *
*     10m 45s SUCCES [parallels]                                                             *
*                                                                                            *
**********************************************************************************************


