<img align="right" src="tf-small.png"/>

# Booknames (multilingual)

This notebook adds multilingual book names to a 
[BHSA](https://github.com/ETCBC/bhsa) dataset in
[text-Fabric](https://github.com/ETCBC/text-fabric)
format.

## Discussion

We add the features
`book@`*iso*
where *iso* is a
[two letter ISO-639](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
language code of a modern language.
We use a source file `blang.py` that contains the names of the books of the bible
in modern languages (around 20, most big languages are covered).
This data has been gleaned mostly from Wikipedia.

We assume that the dataset has the `book` feature present, holding *latin* book names.

This program works for all datasets and versions that have this feature with the
intended meaning.

In [1]:
import os,sys,re,collections
import utils
from tf.fabric import Fabric
from blang import bookLangs, bookNames

# Pipeline
See [operation](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) 
for how to run this script in the pipeline.

In [2]:
if 'SCRIPT' not in locals():
    SCRIPT = False
    FORCE = True
    CORE_NAME = 'bhsa'
    VERSION= 'c'

def stop(good=False):
    if SCRIPT: sys.exit(0 if good else 1)

# Setting up the context: source file and target directories

The conversion is executed in an environment of directories, so that sources, temp files and
results are in convenient places and do not have to be shifted around.

In [3]:
repoBase = os.path.expanduser('~/github/etcbc')
thisRepo = '{}/{}'.format(repoBase, CORE_NAME)

thisTemp = '{}/_temp/{}'.format(thisRepo, VERSION)
thisTempTf = '{}/tf'.format(thisTemp)

thisTf = '{}/tf/{}'.format(thisRepo, VERSION)

# Collect

We collect the book names.

In [5]:
utils.caption(4, 'Book names')

metaData = {}

for (langCode, (langEnglish, langName)) in bookLangs.items():
    metaData['book@{}'.format(langCode)] = {
        'valueType': 'str',
        'language': langName,
        'languageCode': langCode,
        'languageEnglish': langEnglish,
    }

newFeatures = sorted(metaData)
newFeaturesStr = ' '.join(newFeatures)

utils.caption(0, '{} languages ...'.format(len(newFeatures)))

..............................................................................................
.      2m 08s Book names                                                                     .
..............................................................................................
|      2m 08s 26 languages ...


# Test

Check whether this conversion is needed in the first place.
Only when run as a script.

In [6]:
if SCRIPT:
    (good, work) = utils.mustRun(None, '{}/.tf/{}.tfx'.format(thisTf, newFeatures[0]), force=FORCE)
    if not good: stop(good=False)
    if not work: stop(good=True)

# Load existing data

In [9]:
utils.caption(4, 'Loading relevant features')

TF = Fabric(locations=thisTf, modules=[''])
api = TF.load('book')
api.makeAvailableIn(globals())

nodeFeatures = {}
nodeFeatures['book@la'] = {}

bookNodes = []
for b in F.otype.s('book'):
    bookNodes.append(b)
    nodeFeatures['book@la'][b] = F.book.v(b)

for (langCode, langBookNames) in bookNames.items():
    nodeFeatures['book@{}'.format(langCode)] = dict(zip(bookNodes, langBookNames))
utils.caption(0, '{} book name features created'.format(len(nodeFeatures)))

..............................................................................................
.      4m 11s Loading relevant features                                                      .
..............................................................................................
This is Text-Fabric 2.3.15
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data

69 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s Feature overview: 64 for nodes; 4 for edges; 1 configs; 7 computed
  5.08s All features loaded/computed - for details use loadLog()
|      4m 16s 26 book name features created


# Write new features

In [10]:
utils.caption(4, 'Write book name features as TF')
TF = Fabric(locations=thisTempTf, silent=True)
TF.save(nodeFeatures=nodeFeatures, edgeFeatures={}, metaData=metaData)

..............................................................................................
.      4m 34s Write book name features as TF                                                 .
..............................................................................................
   |     0.00s T book@am              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@ar              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@bn              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@da              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@de              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@el              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@en              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@es              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.00s T book@fa              to /Users/dirk/github

# Diffs

Check differences with previous versions.

In [11]:
utils.checkDiffs(thisTempTf, thisTf, only=set(newFeatures))

..............................................................................................
.      4m 40s Check differences with previous version                                        .
..............................................................................................
|      4m 40s 	26 features to add
|      4m 40s 		book@am
|      4m 40s 		book@ar
|      4m 40s 		book@bn
|      4m 40s 		book@da
|      4m 40s 		book@de
|      4m 40s 		book@el
|      4m 40s 		book@en
|      4m 40s 		book@es
|      4m 40s 		book@fa
|      4m 40s 		book@fr
|      4m 40s 		book@he
|      4m 40s 		book@hi
|      4m 40s 		book@id
|      4m 40s 		book@ja
|      4m 40s 		book@ko
|      4m 40s 		book@la
|      4m 40s 		book@nl
|      4m 40s 		book@pa
|      4m 40s 		book@pt
|      4m 40s 		book@ru
|      4m 40s 		book@sw
|      4m 40s 		book@syc
|      4m 40s 		book@tr
|      4m 40s 		book@ur
|      4m 40s 		book@yo
|      4m 40s 		book@zh
|      4m 40s 	no features to delete
|      4m 40s 	0 fe

# Deliver 

Copy the new TF features from the temporary location where they have been created to their final destination.

In [12]:
utils.deliverFeatures(thisTempTf, thisTf, newFeatures)

..............................................................................................
.      4m 46s Deliver features to /Users/dirk/github/etcbc/bhsa/tf/c                         .
..............................................................................................
|      4m 46s 	book@am
|      4m 46s 	book@ar
|      4m 46s 	book@bn
|      4m 46s 	book@da
|      4m 46s 	book@de
|      4m 46s 	book@el
|      4m 46s 	book@en
|      4m 46s 	book@es
|      4m 46s 	book@fa
|      4m 46s 	book@fr
|      4m 46s 	book@he
|      4m 46s 	book@hi
|      4m 46s 	book@id
|      4m 46s 	book@ja
|      4m 46s 	book@ko
|      4m 46s 	book@la
|      4m 46s 	book@nl
|      4m 46s 	book@pa
|      4m 46s 	book@pt
|      4m 46s 	book@ru
|      4m 46s 	book@sw
|      4m 46s 	book@syc
|      4m 46s 	book@tr
|      4m 46s 	book@ur
|      4m 46s 	book@yo
|      4m 46s 	book@zh


# Compile TF

In [14]:
utils.caption(4, 'Load and compile the new TF features')

TF = Fabric(locations=thisTf, modules=[''])
api = TF.load('')
api.makeAvailableIn(globals())

..............................................................................................
.      5m 08s Load and compile the new TF features                                           .
..............................................................................................
This is Text-Fabric 2.3.15
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data

95 features found and 0 ignored
  0.00s loading features ...
   |     0.00s T book@am              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@ar              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@bn              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@da              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@de              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s T book@el       

# Examples

In [22]:
utils.caption(4, 'Genesis in all languages')
genesisNode = F.otype.s('book')[0]

for (lang, langInfo) in sorted(T.languages.items()):
    language = langInfo['language']
    langEng = langInfo['languageEnglish']
    book = T.sectionFromNode(genesisNode, lang=lang)[0]
    utils.caption(0, '{:<2} = {:<20} Genesis is {:<20} in {:<20}'.format(lang, langEng, book, language))

utils.caption(0, 'Done')

..............................................................................................
.     18m 44s Genesis in all languages                                                       .
..............................................................................................
|     18m 44s am = amharic              Genesis is ኦሪት_ዘፍጥረት            in ኣማርኛ                
|     18m 44s ar = arabic               Genesis is تكوين                in العَرَبِية          
|     18m 44s bn = bengali              Genesis is আদিপুস্তক            in বাংলা               
|     18m 44s da = danish               Genesis is 1.Mosebog            in Dansk               
|     18m 44s de = german               Genesis is Genesis              in Deutsch             
|     18m 44s el = greek                Genesis is Γένεση               in Ελληνικά            
|     18m 44s en = english              Genesis is Genesis              in English             
|     18m 44s es = spanish              Gen