<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Discussion" data-toc-modified-id="Discussion-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Discussion</a></span></li></ul></div>

<img align="right" src="images/dans-small.png"/>
<img align="right" src="images/tf-small.png"/>
<img align="right" src="images/etcbc.png"/>


# Book names (multilingual)

This notebook adds multilingual book names to a
[BHSA](https://github.com/ETCBC/bhsa) dataset in
[text-Fabric](https://github.com/Dans-labs/text-fabric)
format.

## Discussion

We add the features
`book@`*ISO*
where *ISO* is a
[two letter ISO-639](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
language code of a modern language.
We use a source file `blang.py` that contains the names of the books of the bible
in modern languages (around 20, most big languages are covered).
This data has been gleaned mostly from Wikipedia.

We assume that the dataset has the `book` feature present, holding *Latin* book names.

This program works for all datasets and versions that have this feature with the
intended meaning.

In [3]:
import os
import sys
import utils
import yaml
from tf.fabric import Fabric
from blang import bookLangs, bookNames

# Pipeline
See [operation](https://github.com/ETCBC/pipeline/blob/master/README.md#operation)
for how to run this script in the pipeline.

In [4]:
if "SCRIPT" not in locals():
    SCRIPT = False
    FORCE = True
    CORE_NAME = "bhsa"
    VERSION = "2021"


def stop(good=False):
    if SCRIPT:
        sys.exit(0 if good else 1)

# Setting up the context: source file and target directories

The conversion is executed in an environment of directories, so that sources, temp files and
results are in convenient places and do not have to be shifted around.

In [5]:
repoBase = os.path.expanduser("~/github/etcbc")
thisRepo = "{}/{}".format(repoBase, CORE_NAME)

thisTemp = "{}/_temp/{}".format(thisRepo, VERSION)
thisTempTf = "{}/tf".format(thisTemp)

thisTf = "{}/tf/{}".format(thisRepo, VERSION)

# Collect

We collect the book names.

In [8]:
utils.caption(4, "Book names")

genericMetaPath = f"{thisRepo}/yaml/generic.yaml"
with open(genericMetaPath) as fh:
    genericMeta = yaml.load(fh, Loader=yaml.FullLoader)
    genericMeta["version"] = VERSION

metaData = {"": genericMeta}

for (langCode, (langEnglish, langName)) in bookLangs.items():
    metaData["book@{}".format(langCode)] = dict(
        valueType="str",
        language=langName,
        languageCode=langCode,
        languageEnglish=langEnglish,
        provenance="book names from wikipedia and other sources",
        encoders="Dirk Roorda (TF)",
        description=f"✅ book name in {langEnglish} ({langName})",
    )

newFeatures = sorted(m for m in metaData if m != "")
newFeaturesStr = " ".join(newFeatures)

utils.caption(0, "{} languages ...".format(len(newFeatures)))

..............................................................................................
.       0.00s Book names                                                                     .
..............................................................................................
|       0.00s 26 languages ...


# Test

Check whether this conversion is needed in the first place.
Only when run as a script.

In [10]:
if SCRIPT:
    (good, work) = utils.mustRun(
        None, "{}/.tf/{}.tfx".format(thisTf, newFeatures[0]), force=FORCE
    )
    if not good:
        stop(good=False)
    if not work:
        stop(good=True)

# Load existing data

In [11]:
utils.caption(4, "Loading relevant features")

TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("book")
api.makeAvailableIn(globals())

nodeFeatures = {}
nodeFeatures["book@la"] = {}

bookNodes = []
for b in F.otype.s("book"):
    bookNodes.append(b)
    nodeFeatures["book@la"][b] = F.book.v(b)

for (langCode, langBookNames) in bookNames.items():
    nodeFeatures["book@{}".format(langCode)] = dict(zip(bookNodes, langBookNames))
utils.caption(0, "{} book name features created".format(len(nodeFeatures)))

..............................................................................................
.      1m 24s Loading relevant features                                                      .
..............................................................................................
This is Text-Fabric 9.1.6
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

75 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
    11s All features loaded/computed - for details use TF.isLoaded()
|      1m 35s 26 book name features created


# Write new features

In [12]:
utils.caption(4, "Write book name features as TF")
TF = Fabric(locations=thisTempTf, silent=True)
TF.save(nodeFeatures=nodeFeatures, edgeFeatures={}, metaData=metaData)

..............................................................................................
.      1m 39s Write book name features as TF                                                 .
..............................................................................................


True

# Diffs

Check differences with previous versions.

In [13]:
utils.checkDiffs(thisTempTf, thisTf, only=set(newFeatures))

..............................................................................................
.      2m 24s Check differences with previous version                                        .
..............................................................................................
|      2m 24s 	26 features to add
|      2m 24s 		book@am
|      2m 24s 		book@ar
|      2m 24s 		book@bn
|      2m 24s 		book@da
|      2m 24s 		book@de
|      2m 24s 		book@el
|      2m 24s 		book@en
|      2m 24s 		book@es
|      2m 24s 		book@fa
|      2m 24s 		book@fr
|      2m 24s 		book@he
|      2m 24s 		book@hi
|      2m 24s 		book@id
|      2m 24s 		book@ja
|      2m 24s 		book@ko
|      2m 24s 		book@la
|      2m 24s 		book@nl
|      2m 24s 		book@pa
|      2m 24s 		book@pt
|      2m 24s 		book@ru
|      2m 24s 		book@sw
|      2m 24s 		book@syc
|      2m 24s 		book@tr
|      2m 24s 		book@ur
|      2m 24s 		book@yo
|      2m 24s 		book@zh
|      2m 24s 	no features to delete
|      2m 24s 	0 fe

# Deliver

Copy the new Text-Fabric features from the temporary location where they have been created to their final destination.

In [14]:
utils.deliverFeatures(thisTempTf, thisTf, newFeatures)

..............................................................................................
.      2m 26s Deliver features to /Users/werk/github/etcbc/bhsa/tf/2021                      .
..............................................................................................
|      2m 26s 	book@am
|      2m 26s 	book@ar
|      2m 26s 	book@bn
|      2m 26s 	book@da
|      2m 26s 	book@de
|      2m 26s 	book@el
|      2m 26s 	book@en
|      2m 26s 	book@es
|      2m 26s 	book@fa
|      2m 26s 	book@fr
|      2m 26s 	book@he
|      2m 26s 	book@hi
|      2m 26s 	book@id
|      2m 26s 	book@ja
|      2m 26s 	book@ko
|      2m 26s 	book@la
|      2m 26s 	book@nl
|      2m 26s 	book@pa
|      2m 26s 	book@pt
|      2m 26s 	book@ru
|      2m 26s 	book@sw
|      2m 26s 	book@syc
|      2m 26s 	book@tr
|      2m 26s 	book@ur
|      2m 26s 	book@yo
|      2m 26s 	book@zh


# Compile TF

In [15]:
utils.caption(4, "Load and compile the new TF features")

TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("")
api.makeAvailableIn(globals())

..............................................................................................
.      2m 30s Load and compile the new TF features                                           .
..............................................................................................
This is Text-Fabric 9.1.6
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

101 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
   |     0.00s T book@el              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@la              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@de              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@pt              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@ar              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@fr              from ~/github/etcbc/bhsa/tf/2021
   |     0.00s T book@fa     

[('Computed',
  'computed-data',
  ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),
 ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),
 ('Fabric', 'loading', ('TF',)),
 ('Locality', 'locality', ('L Locality',)),
 ('Nodes', 'navigating-nodes', ('N Nodes',)),
 ('Features',
  'node-features',
  ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),
 ('Search', 'search', ('S Search',)),
 ('Text', 'text', ('T Text',))]

# Examples

In [16]:
utils.caption(4, "Genesis in all languages")
genesisNode = F.otype.s("book")[0]

for (lang, langInfo) in sorted(T.languages.items()):
    language = langInfo["language"]
    langEng = langInfo["languageEnglish"]
    book = T.sectionFromNode(genesisNode, lang=lang)[0]
    utils.caption(
        0,
        "{:<2} = {:<20} Genesis is {:<20} in {:<20}".format(
            lang, langEng, book, language
        ),
    )

utils.caption(0, "Done")

..............................................................................................
.      3m 07s Genesis in all languages                                                       .
..............................................................................................
|      3m 07s    = default              Genesis is Genesis              in default             
|      3m 07s am = amharic              Genesis is ኦሪት_ዘፍጥረት            in ኣማርኛ                
|      3m 07s ar = arabic               Genesis is تكوين                in العَرَبِية          
|      3m 07s bn = bengali              Genesis is আদিপুস্তক            in বাংলা               
|      3m 07s da = danish               Genesis is 1.Mosebog            in Dansk               
|      3m 07s de = german               Genesis is Genesis              in Deutsch             
|      3m 07s el = greek                Genesis is Γένεση               in Ελληνικά            
|      3m 07s en = english              Gen