![pipeline](pictures/pictures.003.png)

# SHEBANQ from Text-Fabric

This notebook assembles data from relevant GitHub repositories of the ETCBC.
It selects the data that is needed for the website
[SHEBANQ](https://shebanq.ancient-data.org).


## Pipeline
This is **pipe 2** of the pipeline from ETCBC data to the website SHEBANQ.

A run of this pipe produces SHEBANQ data according to a chosen *version*.
It should be run whenever there are new or updated data sources present that affect the output data.
Since all input data is delivered in GitHub repositories, we have excellent machinery to 
work with versioning.

Which directories the pipe should access for which version is specified in the configuration below.

### Core data
The core data is what resides in 
the GitHub repo [BHSA](https://github.com/ETCBC/bhsa) in directory `tf`.

This data will be converted by notebook `coreData` in its `programs` directory.

The result of this action will be an updated Text-Fabric resource in its 
`tf` directory, under the chosen *version*.

### Additional data

The pipe will try to load any text-fabric data features found in the `tf` subdirectories
of the designated additional repos.
It will descend one level deeper, according to the chosen *version*.

### Resulting data
The resulting data will be delivered in the `shebanq` subdirectory of the core repo `bhsa`, 
and then under the chosen *version* subdirectory.

The resulting data consists of three parts:

* One big MQL file, containing the core data plus **all** additions: `bhsa-xx.mql`.
  It will be bzipped.
* **not yet implemented** 
  A subdirectory `mysql` with database tables, containing everything SHEBANQ needs to construct its pages.
* **not yet implemented**
  A subdirectory `annotations`, containing bulk-uploadable annotation sets, that SHEBANQ can show in notes view,
  between the clause atoms of the text.

In [1]:
import os,sys,collections
from pipeline import webPipeline, importLocal, copyServer
from tf.fabric import Fabric

# Config

In [2]:
if 'SCRIPT' not in locals(): 
    SCRIPT = False
    VERSIONS = ['2016']

In [4]:
pipeline = dict(
    repoOrder = '''
        bhsa
        phono
        valence
        parallels
    ''',
)
user = 'dirkr'
server = 'clarin11.dans.knaw.nl'
remoteDir = '/home/dirkr/shebanq-install'

In [5]:
good = webPipeline(pipeline, versions=VERSIONS, force=True, kinds={'mql'})


##############################################################################################
#                                                                                            #
#       0.00s Aggregate MQL for version 2016                                                 #
#                                                                                            #
##############################################################################################

|       0.00s 	Work to do because /Users/dirk/github/etcbc/bhsa/shebanq/2016/shebanq_etcbc2016.mql.bz2 does not exist

##############################################################################################
#                                                                                            #
#       0.00s Using TF to make an MQL export                                                 #
#                                                                                            #
########################

   |     0.01s feature "book@am" => "book_am"


   |     0.00s M book@ar              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@ar" => "book_ar"


   |     0.00s M book@bn              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@bn" => "book_bn"


   |     0.00s M book@da              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@da" => "book_da"


   |     0.00s M book@de              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@de" => "book_de"


   |     0.00s M book@el              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@el" => "book_el"


   |     0.00s M book@en              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@en" => "book_en"


   |     0.00s M book@es              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@es" => "book_es"


   |     0.00s M book@fa              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@fa" => "book_fa"


   |     0.00s M book@fr              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@fr" => "book_fr"


   |     0.00s M book@he              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@he" => "book_he"


   |     0.00s M book@hi              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.01s feature "book@hi" => "book_hi"


   |     0.00s M book@id              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@id" => "book_id"


   |     0.00s M book@ja              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@ja" => "book_ja"


   |     0.00s M book@ko              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@ko" => "book_ko"


   |     0.00s M book@la              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@la" => "book_la"


   |     0.00s M book@nl              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@nl" => "book_nl"


   |     0.00s M book@pa              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.01s feature "book@pa" => "book_pa"


   |     0.00s M book@pt              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.01s feature "book@pt" => "book_pt"


   |     0.00s M book@ru              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@ru" => "book_ru"


   |     0.00s M book@sw              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@sw" => "book_sw"


   |     0.00s M book@syc             from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@syc" => "book_syc"


   |     0.00s M book@tr              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@tr" => "book_tr"


   |     0.00s M book@ur              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@ur" => "book_ur"


   |     0.00s M book@yo              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@yo" => "book_yo"


   |     0.00s M book@zh              from /Users/dirk/github/etcbc/bhsa/tf/2016


   |     0.00s feature "book@zh" => "book_zh"


   |     0.00s M cfunction            from /Users/dirk/github/etcbc/valence/tf/2016
   |     0.00s M chapter              from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M code                 from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M crossref             from /Users/dirk/github/etcbc/parallels/tf/2016
   |     0.00s M crossrefLCS          from /Users/dirk/github/etcbc/parallels/tf/2016
   |     0.00s M crossrefSET          from /Users/dirk/github/etcbc/parallels/tf/2016
   |     0.00s M det                  from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M dist                 from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M dist_unit            from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M distributional_parent from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M domain               from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M f_correction         from /Users/dirk/github/etcbc/valence/tf/2016
   |  

   |     0.00s feature "omap@4b-2016" => "omap_4b_2016"


   |     0.00s M original             from /Users/dirk/github/etcbc/valence/tf/2016
   |     0.00s M otext@phono          from /Users/dirk/github/etcbc/phono/tf/2016
   |     0.00s M pargr                from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M pdp                  from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M pfm                  from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M phono                from /Users/dirk/github/etcbc/phono/tf/2016
   |     0.00s M phono_trailer        from /Users/dirk/github/etcbc/phono/tf/2016
   |     0.00s M predication          from /Users/dirk/github/etcbc/valence/tf/2016
   |     0.00s M prs                  from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M prs_gn               from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M prs_nu               from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M prs_ps               from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.00s M ps

   |     0.07s B g_vbs_utf8           from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.14s B g_word               from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.20s B g_word_utf8          from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.01s B gloss                from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.10s B gn                   from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.28s B grammatical          from /Users/dirk/github/etcbc/valence/tf/2016
   |     0.07s B instruction          from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.03s B is_root              from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.03s B kind                 from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.01s B label                from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.11s B language             from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.12s B lex                  from /Users/dirk/github/etcbc/bhsa/tf/2016
   |     0.13s B lex0    

	uvf            :    6 values, 1 not a name, e.g. «>»
	vbe            :   19 values, 6 not a name, e.g. «»
	vbs            :   11 values, 3 not a name, e.g. «>»
   |     2.28s Writing an all-in-one enum with  236 values
    16s Mapping 120 features onto 13 object types
    22s Writing 120 features as data in 13 object types
   |     0.00s word data ...
   |      |     6.12s batch of size               40.0MB with   50000 of   50000 words
   |      |       11s batch of size               40.0MB with   50000 of  100000 words
   |      |       16s batch of size               40.2MB with   50000 of  150000 words
   |      |       21s batch of size               40.2MB with   50000 of  200000 words
   |      |       25s batch of size               40.4MB with   50000 of  250000 words
   |      |       30s batch of size               40.4MB with   50000 of  300000 words
   |      |       35s batch of size               40.6MB with   50000 of  350000 words
   |      |       39s batch of size 

In [6]:
#good = True
if good:
    good = importLocal(pipeline, versions=['2017'], kinds={'mql'})


##############################################################################################
#                                                                                            #
#      3m 45s Import MQL db for version 2017 locally                                         #
#                                                                                            #
##############################################################################################

|      6m 47s Dropping indices on word_objects...!

|      6m 47s Creating indices on word_objects...!

|      6m 47s Dropping indices on subphrase_objects...!

|      6m 47s Creating indices on subphrase_objects...!

|      6m 47s Dropping indices on phrase_atom_objects...!

|      6m 47s Creating indices on phrase_atom_objects...!

|      6m 47s Dropping indices on phrase_objects...!

|      6m 47s Creating indices on phrase_objects...!

|      6m 47s Dropping indices on clause_atom_objects...!

|      6m 47s Creati

In [6]:
#good = True
if good:
    good = copyServer(pipeline, user, server, remoteDir, versions=['2017', 'c'], kinds={'mql'})


##############################################################################################
#                                                                                            #
#     11m 03s Sending MQL database for version 2017 to server                                #
#                                                                                            #
##############################################################################################

|     11m 03s 	shebanq_etcbc2017.mql.bz2
|     11m 03s 	scp /Users/dirk/github/etcbc/bhsa/shebanq/2017/shebanq_etcbc2017.mql.bz2 dirkr@clarin11.dans.knaw.nl:/home/dirkr/shebanq-install/shebanq_etcbc2017.mql.bz2
|     11m 26s 	done

##############################################################################################
#                                                                                            #
#     11m 26s Sending MQL database for version c to server                                   #
#     