<img align="right" src="images/dans-small.png"/>
<img align="right" src="images/tf-small.png"/>
<img align="right" src="images/etcbc.png"/>


# Voyant

[voyant](http://voyant-tools.org/docs/#!/guide/start)
is an online suite with corpus tools.

In order to experiment with it, we prepare the Hebrew Bible in various ways into a corpus,
to be uploaded to Voyant.

In [1]:
import os

from tf.fabric import Fabric


# Load data
We load the
[BHSA](https://github.com/etcbc/bhsa) data.
See the [feature documentation](https://etcbc.github.io/bhsa/features/hebrew/2017/0_home.html) for more info.

In [27]:
VERSION = "2017"
BHSA = f"BHSA/tf/{VERSION}"
PHONO = f"phono/tf/{VERSION}"
CORPUS_BASE = "_temp/corpora"

In [2]:
TF = Fabric(locations="~/github/etcbc", modules=[BHSA, PHONO])
api = TF.load(
    """
"""
)
api.makeAvailableIn(globals())

This is Text-Fabric 3.1.1
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

118 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Feature overview: 111 for nodes; 5 for edges; 2 configs; 7 computed
  4.85s All features loaded/computed - for details use loadLog()


We produce corpora by text format and by granularity.
The text format must be a string from `T.formats` (see next cell),
the granularity must be one of `book chapter verse`.

In [5]:
T.formats

{'lex-orig-full',
 'lex-orig-plain',
 'lex-trans-full',
 'lex-trans-plain',
 'text-orig-full',
 'text-orig-full-ketiv',
 'text-orig-plain',
 'text-phono-full',
 'text-trans-full',
 'text-trans-full-ketiv',
 'text-trans-plain'}

In [31]:
levels = dict(book=1, chapter=2, verse=3)


def makeCorpus(fmt, granularity):
    corpusDir = f"{CORPUS_BASE}/{fmt}/by_{granularity}"
    os.makedirs(corpusDir, exist_ok=True)
    for doc in F.otype.s(granularity):
        section = T.sectionFromNode(doc)[0 : levels[granularity]]
        fileName = f'{"-".join(str(s) for s in section)}.txt'
        with open(f"{corpusDir}/{fileName}", "w") as fh:
            for s in L.d(doc, otype="sentence"):
                fh.write(T.text(L.d(s, otype="word"), fmt=fmt))
                fh.write("\n")

In [28]:
makeCorpus("text-orig-full", "chapter")

In [29]:
makeCorpus("text-phono-full", "chapter")

In [32]:
makeCorpus("text-phono-full", "book")

In [33]:
makeCorpus("text-orig-full", "book")

In [56]:
jb = T.nodeFromSection(("Jeremiah",))

In [58]:
print(len(L.d(jb, otype="word")))

29736


In [38]:
[T.sectionFromNode(b) for b in F.otype.s("book")]

[('Genesis', 1, 1),
 ('Exodus', 1, 1),
 ('Leviticus', 1, 1),
 ('Numbers', 1, 1),
 ('Deuteronomy', 1, 1),
 ('Joshua', 1, 1),
 ('Judges', 1, 1),
 ('1_Samuel', 1, 1),
 ('2_Samuel', 1, 1),
 ('1_Kings', 1, 1),
 ('2_Kings', 1, 1),
 ('Isaiah', 1, 1),
 ('Jeremiah', 1, 1),
 ('Ezekiel', 1, 1),
 ('Hosea', 1, 1),
 ('Joel', 1, 1),
 ('Amos', 1, 1),
 ('Obadiah', 1, 1),
 ('Jonah', 1, 1),
 ('Micah', 1, 1),
 ('Nahum', 1, 1),
 ('Habakkuk', 1, 1),
 ('Zephaniah', 1, 1),
 ('Haggai', 1, 1),
 ('Zechariah', 1, 1),
 ('Malachi', 1, 1),
 ('Psalms', 1, 1),
 ('Job', 1, 1),
 ('Proverbs', 1, 1),
 ('Ruth', 1, 1),
 ('Song_of_songs', 1, 1),
 ('Ecclesiastes', 1, 1),
 ('Lamentations', 1, 1),
 ('Esther', 1, 1),
 ('Daniel', 1, 1),
 ('Ezra', 1, 1),
 ('Nehemiah', 1, 1),
 ('1_Chronicles', 1, 1),
 ('2_Chronicles', 1, 1)]