<img align="right" src="images/dans-small.png"/>
<img align="right" src="images/tf-small.png"/>
<img align="right" src="images/etcbc.png"/>


# Booknames (multilingual)

This notebook adds multilingual book names to a 
[BHSA](https://github.com/ETCBC/bhsa) dataset in
[text-Fabric](https://github.com/Dans-labs/text-fabric)
format.

## Discussion

We add the features
`book@`*iso*
where *iso* is a
[two letter ISO-639](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
language code of a modern language.
We use a source file `blang.py` that contains the names of the books of the bible
in modern languages (around 20, most big languages are covered).
This data has been gleaned mostly from Wikipedia.

We assume that the dataset has the `book` feature present, holding *Latin* book names.

This program works for all datasets and versions that have this feature with the
intended meaning.

In [12]:
import os,sys,re,collections
import utils
from tf.fabric import Fabric
from blang import bookLangs, bookNames

# Pipeline
See [operation](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) 
for how to run this script in the pipeline.

In [13]:
if 'SCRIPT' not in locals():
    SCRIPT = False
    FORCE = True
    CORE_NAME = 'bhsa'
    VERSION= 'c'

def stop(good=False):
    if SCRIPT: sys.exit(0 if good else 1)

# Setting up the context: source file and target directories

The conversion is executed in an environment of directories, so that sources, temp files and
results are in convenient places and do not have to be shifted around.

In [14]:
repoBase = os.path.expanduser('~/github/etcbc')
thisRepo = '{}/{}'.format(repoBase, CORE_NAME)

thisTemp = '{}/_temp/{}'.format(thisRepo, VERSION)
thisTempTf = '{}/tf'.format(thisTemp)

thisTf = '{}/tf/{}'.format(thisRepo, VERSION)

# Collect

We collect the book names.

In [15]:
utils.caption(4, 'Book names')

metaData={
    '': dict(
            dataset='BHSA',
            version=VERSION,
            datasetName='Biblia Hebraica Stuttgartensia Amstelodamensis',
            author='Eep Talstra Centre for Bible and Computer',
            provenance='book names from wikipedia and other sources',
            encoders='Dirk Roorda (TF)',
            website='https://shebanq.ancient-data.org',
            email='shebanq@ancient-data.org',
        ),
}

for (langCode, (langEnglish, langName)) in bookLangs.items():
    metaData['book@{}'.format(langCode)] = {
        'valueType': 'str',
        'language': langName,
        'languageCode': langCode,
        'languageEnglish': langEnglish,
    }

newFeatures = sorted(m for m in metaData if m != '')
newFeaturesStr = ' '.join(newFeatures)

utils.caption(0, '{} languages ...'.format(len(newFeatures)))

..............................................................................................
.     10m 29s Book names                                                                     .
..............................................................................................
|     10m 29s 26 languages ...


# Test

Check whether this conversion is needed in the first place.
Only when run as a script.

In [16]:
if SCRIPT:
    (good, work) = utils.mustRun(None, '{}/.tf/{}.tfx'.format(thisTf, newFeatures[0]), force=FORCE)
    if not good: stop(good=False)
    if not work: stop(good=True)

# Load existing data

In [17]:
utils.caption(4, 'Loading relevant features')

TF = Fabric(locations=thisTf, modules=[''])
api = TF.load('book')
api.makeAvailableIn(globals())

nodeFeatures = {}
nodeFeatures['book@la'] = {}

bookNodes = []
for b in F.otype.s('book'):
    bookNodes.append(b)
    nodeFeatures['book@la'][b] = F.book.v(b)

for (langCode, langBookNames) in bookNames.items():
    nodeFeatures['book@{}'.format(langCode)] = dict(zip(bookNodes, langBookNames))
utils.caption(0, '{} book name features created'.format(len(nodeFeatures)))

..............................................................................................
.     10m 29s Loading relevant features                                                      .
..............................................................................................
This is Text-Fabric 3.0.2
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

69 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s Feature overview: 64 for nodes; 4 for edges; 1 configs; 7 computed
  4.14s All features loaded/computed - for details use loadLog()
|     10m 33s 26 book name features created


# Write new features

In [18]:
utils.caption(4, 'Write book name features as TF')
TF = Fabric(locations=thisTempTf, silent=True)
TF.save(nodeFeatures=nodeFeatures, edgeFeatures={}, metaData=metaData)

..............................................................................................
.     10m 33s Write book name features as TF                                                 .
..............................................................................................
   |     0.00s T book@am              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@ar              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@bn              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@da              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@de              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@el              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@en              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@es              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@fa              to /Users/dirk/github

# Diffs

Check differences with previous versions.

In [19]:
utils.checkDiffs(thisTempTf, thisTf, only=set(newFeatures))

..............................................................................................
.     10m 33s Check differences with previous version                                        .
..............................................................................................
|     10m 33s 	26 features to add
|     10m 33s 		book@am
|     10m 33s 		book@ar
|     10m 33s 		book@bn
|     10m 33s 		book@da
|     10m 33s 		book@de
|     10m 33s 		book@el
|     10m 33s 		book@en
|     10m 33s 		book@es
|     10m 33s 		book@fa
|     10m 33s 		book@fr
|     10m 33s 		book@he
|     10m 33s 		book@hi
|     10m 33s 		book@id
|     10m 33s 		book@ja
|     10m 33s 		book@ko
|     10m 33s 		book@la
|     10m 33s 		book@nl
|     10m 33s 		book@pa
|     10m 33s 		book@pt
|     10m 33s 		book@ru
|     10m 33s 		book@sw
|     10m 33s 		book@syc
|     10m 33s 		book@tr
|     10m 33s 		book@ur
|     10m 33s 		book@yo
|     10m 33s 		book@zh
|     10m 33s 	no features to delete
|     10m 33s 	0 fe

# Deliver 

Copy the new Text-Fabric features from the temporary location where they have been created to their final destination.

In [20]:
utils.deliverFeatures(thisTempTf, thisTf, newFeatures)

..............................................................................................
.     10m 33s Deliver features to /Users/dirk/github/etcbc/bhsa/tf/c                         .
..............................................................................................
|     10m 33s 	book@am
|     10m 33s 	book@ar
|     10m 33s 	book@bn
|     10m 33s 	book@da
|     10m 33s 	book@de
|     10m 33s 	book@el
|     10m 33s 	book@en
|     10m 33s 	book@es
|     10m 33s 	book@fa
|     10m 33s 	book@fr
|     10m 33s 	book@he
|     10m 33s 	book@hi
|     10m 33s 	book@id
|     10m 33s 	book@ja
|     10m 33s 	book@ko
|     10m 33s 	book@la
|     10m 33s 	book@nl
|     10m 33s 	book@pa
|     10m 33s 	book@pt
|     10m 33s 	book@ru
|     10m 33s 	book@sw
|     10m 33s 	book@syc
|     10m 33s 	book@tr
|     10m 33s 	book@ur
|     10m 33s 	book@yo
|     10m 33s 	book@zh


# Compile TF

In [21]:
utils.caption(4, 'Load and compile the new TF features')

TF = Fabric(locations=thisTf, modules=[''])
api = TF.load('')
api.makeAvailableIn(globals())

..............................................................................................
.     10m 33s Load and compile the new TF features                                           .
..............................................................................................
This is Text-Fabric 3.0.2
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

95 features found and 0 ignored
  0.00s loading features ...
   |     0.00s T book@am              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@ar              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@bn              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@da              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@de              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T boo

# Examples

In [22]:
utils.caption(4, 'Genesis in all languages')
genesisNode = F.otype.s('book')[0]

for (lang, langInfo) in sorted(T.languages.items()):
    language = langInfo['language']
    langEng = langInfo['languageEnglish']
    book = T.sectionFromNode(genesisNode, lang=lang)[0]
    utils.caption(0, '{:<2} = {:<20} Genesis is {:<20} in {:<20}'.format(lang, langEng, book, language))

utils.caption(0, 'Done')

..............................................................................................
.     10m 39s Genesis in all languages                                                       .
..............................................................................................
|     10m 39s am = amharic              Genesis is ኦሪት_ዘፍጥረት            in ኣማርኛ                
|     10m 39s ar = arabic               Genesis is تكوين                in العَرَبِية          
|     10m 39s bn = bengali              Genesis is আদিপুস্তক            in বাংলা               
|     10m 39s da = danish               Genesis is 1.Mosebog            in Dansk               
|     10m 39s de = german               Genesis is Genesis              in Deutsch             
|     10m 39s el = greek                Genesis is Γένεση               in Ελληνικά            
|     10m 39s en = english              Genesis is Genesis              in English             
|     10m 39s es = spanish              Gen