<img align="right" src="tf-small.png"/>

# Text-Fabric from ETCBC

This notebook assembles the data from the ETCBC that is needed
to compile its datasets in text-fabric-format on Github.
Ulltimately the data for the website [SHEBANQ](https://shebanq.ancient-data.org) will be
derived from these TF-sources.

## Pipeline

A run of the pipeline produces a data *version*.
It should be run whenever there are new or updated data sources present that affect the output data.
Since all input data is delivered in a Github repo, we have excellent machinery to 
work with versioning.

The pipe line works by executing a series of programs, contained in Github repositories.
For each repository in the pipeline, a series of notebooks will be executed.
See [script mode](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) for 
details on how we call notebooks.

All this is specified in the configuration below.

### Core data

The core data is delivered by the ETCBC as `bhsa.mql.bz2` in 
the Github repo [bhsa](https://github.com/ETCBC/bhsa) in directory `source`.

This data will be converted by `tfFromMQL` in the `programs` directory.

The result of this action will be an updated TF resource in its 
`tf/core` directory.

### Additional data

Researchers have contributed to the dataset, 
but not all that data is in the core.
They are typically in the repository where the research has been 
executed, and where the data is documented.

Before the pipeline starts, these repos must be pulled.

For each of those repositories,
this notebook will call a series of other notebooks.

Before these notebooks can be run, they must be converted to Python
programs. Then the will be called as such, with parameters injected as local variables.
One of these parameters will be `SCRIPT=True`, with the understanding
that a notebook can adapt its actions to the fact that it is part of the pipeline.
These notebooks can also be run interactively, and then you can add extra actions which are not relevant to the pipeline conversion, such as testing, experimenting, visualizing.
Take care that you wrap non-essential things in contexts where
`SCRIPT=False`.

This notebook itself can also be run in script mode.

In [1]:
import os,sys,collections
from pipeline import runPipeline
from tf.fabric import Fabric

# Config

In [2]:
CORE_NAME = 'bhsa'
CORE_MODULE = 'core'

if 'SCRIPT' not in locals(): 
    SCRIPT = False
    DEFAULT_CORE_NAME = CORE_NAME
    DEFAULT_VERSION = 'c'

In [3]:
pipeline = dict(
    defaults = dict(
        CORE_NAME=CORE_NAME,
        VERSION=DEFAULT_VERSION,
        CORE_MODULE=CORE_MODULE,
    ),
    versions={
        '4': dict(),
        '4b': dict(),
        'c': dict(),
        'd': dict(),
        '2017': dict(),
    },
    repoOrder = '''
        bhsa
        phono
        valence
        parallels
    ''',
    repoConfig = dict(
        bhsa=(
            dict(
                task='tfFromMQL',
            ),
            dict(
                task='lexicon',
                omit={},
            ),
            dict(
                task='paragraphs',
                 omit={'4', '4b'},
            ),
            dict(
                task='ketivQere',
                omit={'4', '4b'},
            ),
            dict(
                task='addStats',
                omit={'4', '4b'},
            ),
        ),
        phono=(
            dict(
                task='phono',
                omit={'4', '4b'},
            ),
        ),
        valence=(
            dict(
                task='enrich',
                omit={},
            ),
            dict(
                task='flowchart',
                omit={},
            ),
        ),
        parallels=(
            dict(
                task='parallels',
                omit={},
                params=dict(
                    FORCE_MATRIX=False,
                ),
            ),
        ),
    ),
)

# Run the pipeline

In [4]:
good = runPipeline(pipeline, version='4', force=False)


##############################################################################################
#                                                                                            #
#       0.00s Make version [4]                                                               #
#                                                                                            #
##############################################################################################


**********************************************************************************************
*                                                                                            *
*       0.00s Make repo [bhsa]                                                               *
*                                                                                            *
**********************************************************************************************


---------------------------------------------

|         12s START parallels (CORE_MODULE=core, CORE_NAME=bhsa, FORCE_MATRIX=False, VERSION=4)
|         12s 	Destination /Users/dirk/github/etcbc/parallels/tf/4/parallels/.tf/crossref.tfx does not exist
..............................................................................................
.         12s Load the existing TF dataset                                                   .
..............................................................................................
This is Text-Fabric 2.3.15
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
103 features found and 0 ignored
  0.00s loading featur