In [1]:
%load_ext autoreload
%autoreload 2

# Lowfat to TF

We use the machinery of Text-Fabric combined with some custom code to convert
the lowfat XML of the Greek New Testament into TF.

# Set up

We gather all prerequisites.

In [2]:
from tf.convert.xml import XML
from lowfat import convertTaskCustom
from tf.app import use

The custom code is in `lowfat.py`, here in this directory.

It consists of two functions that replace default functions in
[xmlCustom](https://annotation.github.io/text-fabric/tf/convert/xmlCustom.html),
which is part of TF.

So you only have to focus on the bits that actually touch the lowfat XML.

We pass the function `convertCustomTask()`, defined in `lowfat.py`, to the XML converter.

We also specify the way we want to see some attributes in the report files:

* keyword attributes: we want to see an inventory of all words that occur in such attributes
* trim attributes: we do not want to see the values of these attributes

In [3]:
keywordAtts = set(
    """
    case
    class
    number
    gender
    mood
    person
    role
    tense
    type
    voice
""".strip().split()
)

trimAtts = set(
    """
    domain
    frame
    gloss
    id
    lemma
    ln
    morph
    normalized
    ref
    referent
    rule
    strong
    subjref
    unicode
""".strip().split()
)

renameAtts = dict(Rule="crule")

We do not want both the `Rule` and `rule` features in our dataset, because this can clash on file systems
that are case insensitive.

In [21]:
X = XML(
    convertTaskCustom=convertTaskCustom,
    keywordAtts=keywordAtts,
    trimAtts=trimAtts,
    renameAtts=renameAtts,
    verbose=1,
    xml=0,
    tf="0.3.1",
)

Working in repository ETCBC/nestle1904 in backend github
XML data version is 2022-11-01 (most recent)
TF data version is 0.3.1 (explicit exising)


Now we can run tasks.

# Check

First we check the input:

In [22]:
X.task(check=True)

XML to TF checking: ~/github/ETCBC/nestle1904/xml/2022-11-01 => ~/github/ETCBC/nestle1904/report/2022-11-01
Start folder gnt:
  27 27-revelation.xml                                 
End   folder gnt

151 info line(s) written to ~/github/ETCBC/nestle1904/report/2022-11-01/elements.txt
0 error(s) in 0 file(s) written to ~/github/ETCBC/nestle1904/report/2022-11-01/errors.txt
7 tags of which 0 with multiple namespaces written to ~/github/ETCBC/nestle1904/report/2022-11-01/namespaces.txt


True

# Convert

Here we generate the actual TF data.

In [28]:
X.task(convert=True)

XML to TF converting: ~/github/ETCBC/nestle1904/xml/2022-11-01 => ~/github/ETCBC/nestle1904/tf/0.3.1
  0.00s Not all of the warp features otype and oslots are present in
~/github/ETCBC/nestle1904/tf/0.3.1
  0.00s Only the Feature and Edge APIs will be enabled
  0.00s Warp feature "otext" not found. Working without Text-API

  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s No structure nodes will be set up
   |   SECTION   TYPES:    book, chapter, verse
   |   SECTION   FEATURES: book, chapter, verse
   |   STRUCTURE TYPES:    
   |   STRUCTURE FEATURES: 
   |   TEXT      FEATURES:
   |      |   text-orig-full       after, text
   |     0.00s OK
   |     0.00s Following director... 
  27 27-revelation.xml                                 
source reading done
   |     4.07s "edge" actions: 0
   |     4.07s "feature" actions: 260889
   |     4.07s "node" actions: 131121
   |     4.07s "resume" actions: 0
   |     4.07s "slot" a

True

# Load

The best check to see that the TF is valid is to load it.

In [29]:
X.task(load=True)

   |     0.12s T otype                from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     1.42s T oslots               from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     0.25s T verse                from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     0.28s T book                 from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     0.34s T text                 from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     0.24s T chapter              from ~/github/ETCBC/nestle1904/tf/0.3.1
   |     0.27s T after                from ~/github/ETCBC/nestle1904/tf/0.3.1
   |      |     0.03s C __levels__           from otype, oslots, otext
   |      |     1.15s C __order__            from otype, oslots, __levels__
   |      |     0.04s C __rank__             from otype, __order__
   |      |     2.26s C __levUp__            from otype, oslots, __rank__
   |      |     1.26s C __levDown__          from otype, __levUp__, __rank__
   |      |     0.04s C __characters__       from otext
   |      |     0.61s C __boundar

True

# App creation

We create the config file that turns the dataset into a TF app.

In [33]:
X.task(app=True)

App updating ...
	~/github/ETCBC/nestle1904/app/static/logo.png (already exists, not overwritten)
	~/github/ETCBC/nestle1904/app/static/display.css (no custom info, older orginal exists)
	~/github/ETCBC/nestle1904/app/config.yaml (generated with custom info)
	~/github/ETCBC/nestle1904/app/app.py (deleted)
Done


True

# Test

We test a bit of the resulting dataset right here.

In [34]:
A = use("ETCBC/nestle1904:clone", checkout="clone", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
error,1,34.0,0
verse,7944,17.34,100
sentence,8011,17.2,100
wg,114878,7.6,633
w,137779,1.0,100


In [35]:
s2 = F.otype.s("sentence")[1]
A.pretty(s2, withNodes=True, standardFeatures=True)

# Browse

We are ready to browse the data.
If you run this notebook, then the next cell will open a browser window with the TF-browser
on the Greek New Testament.

In [36]:
X.task(browse=True)

This is Text-Fabric 11.4.11
Starting new kernel listening on 17116
Loading data for ETCBC/nestle1904. Please wait ...
Setting up TF kernel for ETCBC/nestle1904  
**Locating corpus resources ...**
Using app in ~/github/ETCBC/nestle1904/app:
	repo clone offline under ~/github (local github)
Using data in ~/github/ETCBC/nestle1904/tf/0.3.1:
	repo clone offline under ~/github (local github)
TF setup done.
Starting new webserver listening on 27116


 * Running on http://localhost:27116
[33mPress CTRL+C to quit[0m


Opening ETCBC/nestle1904 in browser
Press <Ctrl+C> to stop the TF browser


127.0.0.1 - - [10/May/2023 16:03:54] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/highlight.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/fonts.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/index.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/display.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/base.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/fontawesome.css HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/tf3.0.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/jquery.js HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/fonts/fa-regular-400.woff2 HTTP/1.1[0m" 304 -
127.0.0.1 - - [10/May/2023 16:03:54] "[36mGET /server/static/fonts/fa-solid-900.woff

Kernel listening at port 17116

TF web server has stopped
TF kernel has stopped


keyboard interrupt!


True

# Terminate

You can stop the browser by pressing `i` twice.

# Create zip

It is time to commit and push the repo to GitHub now:

```
git add --all .
git commit "new data version"
git push origin master
```

Then go over to GitHub and create a new release there.

After that, fetch the new tags from GitHub by

```
git pull --tags
```

Then we are ready to create a zip file for publishing the dataset in a release on Github,
so that users can get it easily.

In [37]:
A.zipAll()

Data to be zipped:
	OK       app                      (v0.3.1 41dd47)     : ~/github/ETCBC/nestle1904/app
	OK       main data                (v0.3.1 41dd47)     : ~/github/ETCBC/nestle1904/tf/0.3.1
Writing zip file ...
Result: ~/Downloads/github/ETCBC/nestle1904/complete.zip


# Fetch

We now test wether users can use this dataset in the normal way.

Run this after you have attached the complete.zip file that we create earlier, to the latest release on GitHub.

In [38]:
A = use("ETCBC/nestle1904:latest")

**Locating corpus resources ...**

   |     0.12s T otype                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     1.47s T oslots               from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     0.26s T verse                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     0.27s T book                 from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     0.34s T text                 from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     0.24s T chapter              from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |     0.27s T after                from ~/text-fabric-data/github/ETCBC/nestle1904/tf/0.3.1
   |      |     0.03s C __levels__           from otype, oslots, otext
   |      |     1.16s C __order__            from otype, oslots, __levels__
   |      |     0.05s C __rank__             from otype, __order__
   |      |     2.29s C __levUp__            from otype, oslots, __rank__
   |      |     1.26s C __levDown__          fr

Name,# of nodes,# slots/node,% coverage
book,27,5102.93,100
chapter,260,529.92,100
error,1,34.0,0
verse,7944,17.34,100
sentence,8011,17.2,100
wg,114878,7.6,633
w,137779,1.0,100


Indeed, downloading and installing went without hassle.

Now save this notebook, commit and push the repo again to publish this very notebook.

```
git add --all .
git commit "maker notebook updated"
git push origin master
```