<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Export

Text-Fabric is not a world to stay in for ever.
When you go to other worlds, you can travel with the corpus data in your backpack.

Here we show two destinations (and one of them is also an origin):
Pandas and Emdros.

Before we go there, we load the corpus.

In [1]:
%load_ext autoreload
%autoreload 2

# Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
explained in the [start tutorial](start.ipynb).

In [2]:
from tf.app import use

In [3]:
A = use("etcbc/bhsa", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots/node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


# Pandas

The first journey is to 
[Pandas](https://pandas.pydata.org).

We convert the data to a dataframe, via a tab-separated text file.

The nodes are exported as rows, they correspond to the text objects such as word, phrase, clause, sentence, verse, chapter, book and a few others.

The BHSA features become the columns, so each row tells what values the features have for the corresponding node.

The edges corresponding to the BHSA features *mother*, *functional_parent*, *distributional_parent* are
exported as extra columns. For each row, such a column indicates the target of a corresponding outgoing edge.

We also write the data that says which objects are contained in which.
To each row we add the following columns:

* for each node type, except `word` there is a column with that node type as name;
  the value in that column is the node of this type that contains the row node (if any).

Extra data such as lexicon (including frequency and rank features), phonetic transcription, and ketiv-qere are also included.

While exporting the data to Pandas format, the program
composes the big table and saves it as a tab delimited file.
This is stored in a temporary directory (not visible on GitHub).

This temporary file can also be read by R, but we proceed with Pandas.
Pandas offers functions in the same spirit as R, but is more Pythonic and also faster.

In [4]:
A.exportPandas()

  0.00s Create tsv file ...
   |     3.85s   5%   72342 nodes written
   |     7.65s  10%  144684 nodes written
   |       12s  15%  217026 nodes written
   |       15s  20%  289368 nodes written
   |       19s  25%  361710 nodes written
   |       23s  30%  434052 nodes written
   |       27s  35%  506394 nodes written
   |       31s  40%  578736 nodes written
   |       35s  45%  651078 nodes written
   |       39s  50%  723420 nodes written
   |       43s  55%  795762 nodes written
   |       47s  60%  868104 nodes written
   |       50s  65%  940446 nodes written
   |       54s  70% 1012788 nodes written
   |       58s  75% 1085130 nodes written
   |    1m 02s  80% 1157472 nodes written
   |    1m 06s  85% 1229814 nodes written
   |    1m 10s  90% 1302156 nodes written
   |    1m 14s  95% 1374498 nodes written
   |    1m 18s  95% 1446831 nodes written and done
 1m 18s TSV file is ~/text-fabric-data/github/etcbc/bhsa/_temp/data-2021.tsv
 1m 18s Columns 72:
 1m 18s 	nd
 1m 18s 	otype

## How to use the Pandas file

See
[pandas](pandas.ipynb)
for a tutorial on how to work with the BHSA as a dataframe.

We collect a few pieces of data that will come in handy.

Here is the fhe first verse node:

In [5]:
F.otype.s("verse")[0]

1414389

# MQL

The next journey is to MQL, a text-database format not unlike SQL, supported by the Emdros software.

[EMDROS](http://emdros.org), written by Ulrik Petersen,
is a text database system with the powerful *topographic* query language MQL.
The ideas are based on a model devised by Christ-Jan Doedens in
[Text Databases: One Database Model and Several Retrieval Languages](https://books.google.nl/books?id=9ggOBRz1dO4C).

Text-Fabric's model of slots, nodes and edges is a fairly straightforward translation of the models of Christ-Jan Doedens and Ulrik Petersen.

[SHEBANQ](https://shebanq.ancient-data.org) uses EMDROS to offer users to execute and save MQL queries against the Hebrew Text Database of the ETCBC.

So it is kind of logical and convenient to be able to work with a Text-Fabric resource through MQL.

If you have obtained an MQL dataset somehow, you can turn it into a text-fabric data set by `importMQL()`,
which we will not show here.

And if you want to export a Text-Fabric data set to MQL, that is also possible.

After the `Fabric(modules=...)` call, you can call `exportMQL()` in order to save all features of the
indicated modules into a big MQL dump, which can be imported by an EMDROS database.

In [4]:
A.exportMQL("mybhsa", exportDir="~/Downloads/mql")

  0.00s Checking features of dataset mybhsa


   |     1.75s feature "book@am" => "book_am"
   |     1.75s feature "book@ar" => "book_ar"
   |     1.75s feature "book@bn" => "book_bn"
   |     1.75s feature "book@da" => "book_da"
   |     1.75s feature "book@de" => "book_de"
   |     1.75s feature "book@el" => "book_el"
   |     1.75s feature "book@en" => "book_en"
   |     1.75s feature "book@es" => "book_es"
   |     1.75s feature "book@fa" => "book_fa"
   |     1.75s feature "book@fr" => "book_fr"
   |     1.75s feature "book@he" => "book_he"
   |     1.75s feature "book@hi" => "book_hi"
   |     1.75s feature "book@id" => "book_id"
   |     1.75s feature "book@ja" => "book_ja"
   |     1.75s feature "book@ko" => "book_ko"
   |     1.75s feature "book@la" => "book_la"
   |     1.75s feature "book@nl" => "book_nl"
   |     1.75s feature "book@pa" => "book_pa"
   |     1.75s feature "book@pt" => "book_pt"
   |     1.75s feature "book@ru" => "book_ru"
   |     1.75s feature "book@sw" => "book_sw"
   |     1.76s feature "book@syc" 

  0.02s 118 features to export to MQL ...
  0.02s Loading 118 features
  1.88s Writing enumerations
	book_am        :   39 values, 39 not a name, e.g. «መኃልየ_መኃልይ_ዘሰሎሞን»
	book_ar        :   39 values, 39 not a name, e.g. «1_اخبار»
	book_bn        :   39 values, 39 not a name, e.g. «আদিপুস্তক»
	book_da        :   39 values, 13 not a name, e.g. «1.Kongebog»
	book_de        :   39 values, 7 not a name, e.g. «1_Chronik»
	book_el        :   39 values, 39 not a name, e.g. «Άσμα_Ασμάτων»
	book_en        :   39 values, 6 not a name, e.g. «1_Chronicles»
	book_es        :   39 values, 22 not a name, e.g. «1_Crónicas»
	book_fa        :   39 values, 39 not a name, e.g. «استر»
	book_fr        :   39 values, 19 not a name, e.g. «1_Chroniques»
	book_he        :   39 values, 39 not a name, e.g. «איוב»
	book_hi        :   39 values, 39 not a name, e.g. «1_इतिहास»
	book_id        :   39 values, 7 not a name, e.g. «1_Raja-raja»
	book_ja        :   39 values, 39 not a name, e.g. «アモス書»
	book_ko        :   

Now you have a file `~/Downloads/mql/mybhsa.mql` of 530 MB.
You can import it into an Emdros database by saying:

    cd ~/Downloads/mql
    rm mybhsa.mql
    mql -b 3 < mybhsa.mql

The result is an SQLite3 database `mybhsa` in the same directory (168 MB).
You can run a query against it by creating a text file test.mql with this contents:

    select all objects where
    [lex gloss ~ 'make'
        [word FOCUS]
    ]

And then say

    mql -b 3 -d mybhsa test.mql

You will see raw query results: all word occurrences that belong to lexemes with `make` in their gloss.

It is not very pretty, and probably you should use a more visual Emdros tool to run those queries.
You see a lot of node numbers, but the good thing is, you can look those node numbers up in Text-Fabric.

# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **export** export your dataset as an Emdros database
* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus
* **[volumes](volumes.ipynb)** work with selected books only
* **[trees](trees.ipynb)** work with the BHSA data as syntax trees

CC-BY Dirk Roorda