![pipeline](pictures/pictures.002.png)

# Text-Fabric from ETCBC

This notebook assembles the data from the ETCBC that is needed
to compile its datasets in text-fabric-format on Github.
Ulltimately the data for the website [SHEBANQ](https://shebanq.ancient-data.org) will be
derived from these TF-sources.

## Pipeline
This is **pipe 1** of the pipeline from ETCBC data to the website SHEBANQ.

A run of this pipe produces a data *version*.
It should be run whenever there are new or updated data sources present that affect the output data.
Since all input data is delivered in a Github repo, we have excellent machinery to 
work with versioning.

The pipe works by executing a series of programs, contained in Github repositories.
For each repository in the pipe, a series of notebooks will be executed.
See [script mode](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) for 
details on how we call notebooks.

All this is specified in the configuration below.

### Core data

The core data is delivered by the ETCBC as `bhsa.mql.bz2` in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `source`.

This data will be converted by `tfFromMQL` in the `programs` directory.

The result of this action will be an updated TF resource in its 
`tf/core` directory.

### Additional data

Researchers have contributed to the dataset, 
but not all that data is in the core.
They are typically in the repository where the research has been 
executed, and where the data is documented.

Before the pipe starts, these repos must be pulled.

In [1]:
import os,sys,collections
from pipeline import runPipeline
from tf.fabric import Fabric

# Config

In [2]:
CORE_NAME = 'bhsa'

if 'SCRIPT' not in locals(): 
    SCRIPT = False
    DEFAULT_CORE_NAME = CORE_NAME
    DEFAULT_VERSION = 'c'

In [3]:
pipeline = dict(
    defaults = dict(
        CORE_NAME=CORE_NAME,
        VERSION=DEFAULT_VERSION,
    ),
    versions={
        '4': dict(),
        '4b': dict(),
        'c': dict(),
        '2016': dict(),
        '2017': dict(),
    },
    repoOrder = '''
        bhsa
        phono
        valence
        parallels
    ''',
    repoConfig = dict(
        bhsa=(
            dict(
                task='coreData',
            ),
            dict(
                task='bookNames',
                omit={},
            ),
            dict(
                task='lexicon',
                omit={},
            ),
            dict(
                task='paragraphs',
                 omit={'4', '4b'},
            ),
            dict(
                task='ketivQere',
                omit={'4', '4b'},
            ),
            dict(
                task='stats',
                omit={'4', '4b'},
            ),
        ),
        phono=(
            dict(
                task='phono',
                omit={'4', '4b'},
            ),
        ),
        valence=(
            dict(
                task='enrich',
                omit={},
            ),
            dict(
                task='flowchart',
                omit={},
            ),
        ),
        parallels=(
            dict(
                task='parallels',
                omit={},
                params=dict(
                    FORCE_MATRIX=False,
                ),
            ),
        ),
    ),
)

# Run the pipeline

In [4]:
good = runPipeline(pipeline, version='c', force=False)


##############################################################################################
#                                                                                            #
#       0.00s Make version [c]                                                               #
#                                                                                            #
##############################################################################################


**********************************************************************************************
*                                                                                            *
*       0.00s Make repo [bhsa]                                                               *
*                                                                                            *
**********************************************************************************************


---------------------------------------------

|       9.00s START enrich (CORE_NAME=bhsa, VERSION=c)
|       9.01s 	Destination /Users/dirk/github/etcbc/valence/tf/c/.tf/valence.tfx exists
True False
|       9.01s SUCCESS enrich

----------------------------------------------------------------------------------------------
-       9.01s SUCCES [valence/enrich]                                                        -
----------------------------------------------------------------------------------------------


----------------------------------------------------------------------------------------------
-       9.02s Run notebook [valence/flowchart]                                               -
----------------------------------------------------------------------------------------------

|         10s START flowchart (CORE_NAME=bhsa, VERSION=c)
|         10s 	Destination /Users/dirk/github/etcbc/valence/tf/c/.tf/sense.tfx exists
|         10s SUCCESS flowchart

------------------------------------------------------------------