![pipeline](pictures/pictures.003.png)

# SHEBANQ from Text-Fabric

This notebook assembles data from relevant Github repositories of the ETCBC.
It selects the data that is needed for the website
[SHEBANQ](https://shebanq.ancient-data.org).


## Pipeline
This is **pipe 2** of the pipeline from ETCBC data to the website SHEBANQ.

A run of this pipe produces shebanq data according to a chosen *version*.
It should be run whenever there are new or updated data sources present that affect the output data.
Since all input data is delivered in Github repositories, we have excellent machinery to 
work with versioning.

Which directories the pipe should access for which version is specified in the configuration below.

### Core data
The core data is what resides in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `tf`.

This data will be converted by notebook `coreData` in its `programs` directory.

The result of this action will be an updated TF resource in its 
`tf` directory, under the chosen *version*.

### Additional data

The pipe will try to load any text-fabric data features found in the `tf` subdirectories
of the designated additional repos.
It will descend one level deeper, according to the chosen *version*.

### Resulting data
The resulting data will be delivered in the `shebanq` subdirectory of the core repo `bhsa`, 
and then under the chosen *version* subdirectory.

The resulting data consists of three parts:

* One big mql file, containing the core data plus **all** additions: `bhsa-xx.mql`.
  It will be bzipped.
* **not yet implemented** 
  A subdirectory `mysql` with database tables, containing everything SHEBANQ needs to construct its pages.
* **not yet implemented**
  A subdirectory `annotations`, containing bulk-uploadable annotation sets, that SHEBANQ can show in notes view,
  between the clause atoms of the text.

In [1]:
import os,sys,collections
from pipeline import webPipeline
from tf.fabric import Fabric

# Config

In [2]:
if 'SCRIPT' not in locals(): 
    SCRIPT = False

In [3]:
pipeline = dict(
    repoOrder = '''
        bhsa
        phono
        valence
        parallels
    ''',
)

In [4]:
good = webPipeline(pipeline, version='c', force=False)


##############################################################################################
#                                                                                            #
#       0.00s Aggregate MLQ for version c                                                    #
#                                                                                            #
##############################################################################################

|       0.00s 	Work to do because the tf in bhsa is recently compiled
|       0.00s 		/Users/dirk/github/etcbc/bhsa/tf/c/.tf

##############################################################################################
#                                                                                            #
#       0.00s Using TF to make an MQL export                                                 #
#                                                                                            #
#################

   |     0.00s feature "book@am" => "book_am"


   |     0.00s M book@ar              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@ar" => "book_ar"


   |     0.00s M book@bn              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@bn" => "book_bn"


   |     0.00s M book@da              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@da" => "book_da"


   |     0.00s M book@de              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@de" => "book_de"


   |     0.00s M book@el              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@el" => "book_el"


   |     0.00s M book@en              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@en" => "book_en"


   |     0.00s M book@es              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@es" => "book_es"


   |     0.00s M book@fa              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@fa" => "book_fa"


   |     0.00s M book@fr              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@fr" => "book_fr"


   |     0.00s M book@he              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@he" => "book_he"


   |     0.00s M book@hi              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@hi" => "book_hi"


   |     0.00s M book@id              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@id" => "book_id"


   |     0.00s M book@ja              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@ja" => "book_ja"


   |     0.00s M book@ko              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@ko" => "book_ko"


   |     0.00s M book@la              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@la" => "book_la"


   |     0.00s M book@nl              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@nl" => "book_nl"


   |     0.00s M book@pa              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@pa" => "book_pa"


   |     0.00s M book@pt              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@pt" => "book_pt"


   |     0.00s M book@ru              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@ru" => "book_ru"


   |     0.00s M book@sw              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@sw" => "book_sw"


   |     0.00s M book@syc             from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@syc" => "book_syc"


   |     0.00s M book@tr              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@tr" => "book_tr"


   |     0.00s M book@ur              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@ur" => "book_ur"


   |     0.00s M book@yo              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@yo" => "book_yo"


   |     0.00s M book@zh              from /Users/dirk/github/etcbc/bhsa/tf/c


   |     0.00s feature "book@zh" => "book_zh"


   |     0.00s M cfunction            from /Users/dirk/github/etcbc/valence/tf/c
   |     0.00s M chapter              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M code                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M crossref             from /Users/dirk/github/etcbc/parallels/tf/c
   |     0.00s M det                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M dist                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M dist_unit            from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M distributional_parent from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M domain               from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M f_correction         from /Users/dirk/github/etcbc/valence/tf/c
   |     0.00s M freq_lex             from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M freq_occ             from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s M function             from /Users/di

   |     0.00s B book@fr              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@he              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@hi              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@id              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@ja              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@ko              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@la              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@nl              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@pa              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@pt              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@ru              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@sw              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s B book@syc             from /Users/dirk/github/et

   |     0.15s B vs                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.16s B vt                   from /Users/dirk/github/etcbc/bhsa/tf/c
    14s Mapping 117 features onto 13 object types
    21s Writing 117 features as data in 13 object types
   |     0.00s word data ...
   |      |     4.72s batch of size               40.2MB with   50000 of   50000 words
   |      |     9.02s batch of size               40.2MB with   50000 of  100000 words
   |      |       14s batch of size               40.4MB with   50000 of  150000 words
   |      |       18s batch of size               40.4MB with   50000 of  200000 words
   |      |       23s batch of size               40.6MB with   50000 of  250000 words
   |      |       28s batch of size               40.6MB with   50000 of  300000 words
   |      |       32s batch of size               40.8MB with   50000 of  350000 words
   |      |       36s batch of size               40.5MB with   50000 of  400000 words
   |      |  