<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/uu-small.png" width="128"/>
<img align="right" src="images/dans.png" width="128"/>

# Tutorial

This notebook gets you started with using
[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Quran.

Chances are that a bit of reading about the underlying
[data model](https://annotation.github.io/text-fabric/Model/Data-Model/)
helps you to follow the exercises below, and vice versa.

## Installing Text-Fabric

### Python

You need to have Python on your system. Most systems have it out of the box,
but alas, that is python2 and we need at least python **3.6**.

Install it from [python.org](https://www.python.org) or from
[Anaconda](https://www.anaconda.com/download).

### TF itself

```
pip3 install text-fabric
```

### Jupyter notebook

You need [Jupyter](http://jupyter.org).

If it is not already installed:

```
pip3 install jupyter
```

## Tip
If you start computing with this tutorial, first copy its parent directory to somewhere else,
outside your `syrnt` directory.
If you pull changes from the `syrnt` repository later, your work will not be overwritten.
Where you put your tutorial directory is up till you.
It will work from any directory.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, collections

In [3]:
from tf.app import use

## Quran data

Text-Fabric will fetch a standard set of features for you from the newest github release binaries.

The data will be stored in the `text-fabric-data` in your home directory.

# Load Features
The data of the corpus is organized in features.
They are *columns* of data.
Think of the text as a gigantic spreadsheet, where row 1 corresponds to the
first word, row 2 to the second word, and so on, for all 100,000+ words.

The letters of each word is a column `form` in that spreadsheet.

The corpus contains ca. 30 columns, not only for the words, but also for 
textual objects, such as *suras*, *ayas*, and *word groups*.

Instead of putting that information in one big table, the data is organized in separate columns.
We call those columns **features**.

In [4]:
A = use('quran', hoist=globals(), check=True)

	downloading latest annotation/app-quran
	from https://api.github.com/repos/annotation/app-quran/zipball ...
	unzipping ...
	saving annotation/app-quran commit f528c149daf596e80c79710ed3fc391f33519af4
	saved annotation/app-quran commit f528c149daf596e80c79710ed3fc391f33519af4
Using annotation/app-quran commit f528c149daf596e80c79710ed3fc391f33519af4 (=latest)
  in /Users/dirk/text-fabric-data/__apps__/quran
No new data release available online.
Using q-ran/quran/tf - 0.2 rv0.3 (=latest) in /Users/dirk/text-fabric-data.
   |      |     0.06s C __levels__           from otype, oslots, otext
   |      |     1.45s C __order__            from otype, oslots, __levels__
   |      |     0.08s C __rank__             from otype, __order__
   |      |     1.82s C __levUp__            from otype, oslots, __rank__
   |      |     0.46s C __levDown__          from otype, __levUp__, __rank__
   |      |     0.34s C __boundary__         from otype, oslots, __rank__
   |      |     0.02s C __sections__

**Documentation:** <a target="_blank" href="https://github.com/q-ran/quran/blob/master/docs" title="provenance of Quran">QURAN</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Writing/Arabic" title="('Arabic characters and transcriptions',)">Character table</a> <a target="_blank" href="https://github.com/q-ran/quran/blob/master/docs/features-0.2.md#features.md" title="QURAN feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/annotation/app-quran" title="quran API documentation">quran API</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Api/Fabric/" title="text-fabric-api">Text-Fabric API 7.3.10</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Use/Search/" title="Search Templates Introduction and Reference">Search Reference</a>

## API

At this point it is helpful to throw a quick glance at the text-fabric API documentation
(see the links under **API Members** above).

The most essential thing for now is that we can use `F` to access the data in the features
we've loaded.
But there is more, such as `N`, which helps us to walk over the text, as we see in a minute.

# Counting

In order to get acquainted with the data, we start with the simple task of counting.

## Count all nodes
We use the 
[`N()` generator](https://github.com/annotation/text-fabric/wiki/Api#walking-through-nodes)
to walk through the nodes.

We compared corpus to a gigantic spreadsheet, where the rows correspond to the words.
In Text-Fabric, we call the rows `slots`, because they are the textual positions that can be filled with words.

We also mentioned that there are also more textual objects. 
They are the verses, chapters and books.
They also correspond to rows in the big spreadsheet.

In Text-Fabric we call all these rows *nodes*, and the `N()` generator
carries us through those nodes in the textual order.

Just one extra thing: the `info` statements generate timed messages.
If you use them instead of `print` you'll get a sense of the amount of time that 
the various processing steps typically need.

In [5]:
indent(reset=True)
info('Counting nodes ...')

i = 0
for n in N(): i += 1

info('{} nodes'.format(i))

  0.00s Counting nodes ...
  0.04s 218282 nodes


## What are those nodes?
Every node has a type, like word, or aya, or sura.
We know that we have approximately 100,000 words and a few other nodes.
But what exactly are they?

Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.
`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.

Here we go!

In [6]:
F.otype.slotType

'word'

In [7]:
F.otype.maxSlot

128219

In [8]:
F.otype.maxNode

218282

In [9]:
F.otype.all

('manzil',
 'sajda',
 'juz',
 'sura',
 'hizb',
 'ruku',
 'page',
 'aya',
 'lex',
 'group',
 'word')

In [10]:
C.levels.data

(('manzil', 18317.0, 217101, 217107),
 ('sajda', 6043.066666666667, 218268, 218282),
 ('juz', 4273.966666666666, 216831, 216860),
 ('sura', 1124.7280701754387, 211885, 211998),
 ('hizb', 534.2458333333333, 216861, 217100),
 ('ruku', 230.60971223021582, 217108, 217663),
 ('page', 212.28311258278146, 217664, 218267),
 ('aya', 20.56109685695959, 205649, 211884),
 ('lex', 15.440397350993377, 211999, 216830),
 ('group', 1.6559557788425525, 128220, 205648),
 ('word', 1, 1, 128219))

This is interesting: above you see all the textual objects, with the average size of their objects,
the node where they start, and the node where they end.

## Count individual object types
This is an intuitive way to count the number of nodes in each type.
Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed 
and indented progress messages.

In [11]:
indent(reset=True)
info('counting objects ...')

for otype in F.otype.all:
    i = 0
    indent(level=1, reset=True)

    for n in F.otype.s(otype): i+=1

    info('{:>7} {}s'.format(i, otype))

indent(level=0)
info('Done')

  0.00s counting objects ...
   |     0.00s       7 manzils
   |     0.00s      15 sajdas
   |     0.00s      30 juzs
   |     0.00s     114 suras
   |     0.00s     240 hizbs
   |     0.00s     556 rukus
   |     0.00s     604 pages
   |     0.00s    6236 ayas
   |     0.00s    4832 lexs
   |     0.01s   77429 groups
   |     0.02s  128219 words
  0.04s Done


# Viewing textual objects

We use the A API (the extra power) to peek into the corpus.

Let's inspect some words.

In [12]:
wordShow = (1000, 10000, 100000)
for word in wordShow:
  A.pretty(word)

# Feature statistics

`F`
gives access to all features.
Every feature has a method
`freqList()`
to generate a frequency list of its values, higher frequencies first.
Here are the parts of speech:

In [13]:
F.pos.freqList()

(('pronoun', 29319),
 ('noun', 29049),
 ('verb', 19356),
 ('particle', 13511),
 ('preposition', 13006),
 ('conjunction', 10134),
 ('determiner', 8377),
 ('adjective', 1961),
 ('adverb', 1835),
 ('prefix', 1641),
 ('initials', 30))

# Lexeme matters

## Top 10 frequent verbs

If we count the frequency of words, we usually mean the frequency of their
corresponding roots or lexemes.

Let's start with roots.

In [14]:
verbs = collections.Counter()
indent(reset=True)
info('Collecting data')

for w in F.otype.s('word'):
    if F.pos.v(w) != 'verb': continue
    verbs[F.root.v(w)] +=1

info('Done')
print(''.join(
    '{}: {}\n'.format(verb, cnt)
    for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
))       

  0.00s Collecting data
  0.07s Done
qwl: 1620
kwn: 1358
Amn: 558
Aty: 535
Elm: 425
jEl: 340
rAy: 315
kfr: 304
jyA: 278
Eml: 276



Now the same with lexemes.
There are several methods for working with lexemes.

### Method 1: counting words

In [15]:
verbs = collections.Counter()
indent(reset=True)
info('Collecting data')

for w in F.otype.s('word'):
    if F.pos.v(w) != 'verb': continue
    verbs[F.lemma.v(w)] +=1

info('Done')
print(''.join(
    '{}: {}\n'.format(verb, cnt)
    for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
))       

  0.00s Collecting data
  0.07s Done
qaAla: 1618
kaAna: 1358
'aAmana: 537
Ealima: 382
jaEala: 340
kafara: 289
jaA^'a: 278
Eamila: 276
A^taY: 271
ra'aA: 271



## Lexeme distribution

Let's do a bit more fancy lexeme stuff.

### Hapaxes

A hapax can be found by inspecting lexemes and see to how many word nodes they are linked.
If that is number is one, we have a hapax.

We print 10 hapaxes with their gloss.

In [16]:
indent(reset=True)

hapax = []
lexIndex = collections.defaultdict(list)

for n in F.otype.s('word'):
    lexIndex[F.lemma.v(n)].append(n)
    
hapax = dict((lex, occs) for (lex, occs) in lexIndex.items() if len(occs) == 1)
    
info('{} hapaxes found'.format(len(hapax)))

for h in sorted(hapax)[0:10]:
    print(f'\t{h}')

  0.07s 1994 hapaxes found
	$aAkilat
	$aAni}
	$aAriko
	$aAwiro
	$aTo_#
	$a`Ti}
	$a`mixa`t
	$a`xiSap
	$afatayon
	$agafa


If we want more info on the hapaxes, we get that by means of its *node*.
The lexIndex dictionary stores the occurrences of a lexeme as a list of nodes.

Let's get the part of speech and the Arabic form of those 10 hapaxes.

In [17]:
for h in sorted(hapax)[0:10]:
    node = hapax[h][0]
    print(f'\t{F.pos.v(node):<12} {F.unicode.v(node)}')

	noun         شَاكِلَتِ
	noun         شَانِئَ
	verb         شَارِكْ
	verb         شَاوِرْ
	noun         شَطْـَٔ
	noun         شَٰطِئِ
	adjective    شَٰمِخَٰتٍ
	noun         شَٰخِصَةٌ
	noun         شَفَتَيْنِ
	verb         شَغَفَ


### Small occurrence base

The occurrence base of a lexeme are the suras in which it occurs.
Let's look for lexemes that occur in a single sura.

Oh yes, we have already found the hapaxes, we will skip them here.

In [18]:
indent(reset=True)
info('Finding single sura lexemes')

lexSuraIndex = {}

for (lex, occs) in lexIndex.items():
    lexSuraIndex[lex] = set(L.u(n, otype='sura')[0] for n in occs)
    
singleSura = [(lex, occs) for (lex, occs) in lexIndex.items() if len(lexSuraIndex.get(lex, [])) == 1]
singleSuraWithoutHapax = [(lex, occs) for (lex, occs) in singleSura if len(occs) != 1]

info('{} single sura lexemes found'.format(len(singleSura)))

for data in (singleSura, singleSuraWithoutHapax):
  print('=====================================')
  for (lex, occs) in sorted(data[0:10]):
      print('{:<15} ({}x) first {:>5} last {:>5}'.format(
          lex,
          len(occs),
          '{}:{}'.format(*T.sectionFromNode(occs[0])),
          '{}:{}'.format(*T.sectionFromNode(occs[-1])),
      ))

  0.00s Finding single sura lexemes
  0.89s 2228 single sura lexemes found
>aZolama        (1x) first  2:20 last  2:20
Ha*ar           (2x) first  2:19 last 2:243
Say~ib          (1x) first  2:19 last  2:19
baEuwDap        (1x) first  2:26 last  2:26
magoDuwb        (1x) first   1:7 last   1:7
nuqad~isu       (1x) first  2:30 last  2:30
rabiHat         (1x) first  2:16 last  2:16
vamarap         (1x) first  2:25 last  2:25
yasofiku        (2x) first  2:30 last  2:84
{sotawoqada     (1x) first  2:17 last  2:17
$aTor           (5x) first 2:144 last 2:150
Ha*ar           (2x) first  2:19 last 2:243
Hayov2          (2x) first 2:144 last 2:150
Hur~            (2x) first 2:178 last 2:178
Sibogap         (2x) first 2:138 last 2:138
baqarap         (4x) first  2:67 last  2:71
huwd2           (3x) first 2:111 last 2:140
taTaw~aEa       (2x) first 2:158 last 2:184
yasofiku        (2x) first  2:30 last  2:84
yataEal~amu     (2x) first 2:102 last 2:102


### Confined to suras

As a final exercise with lexemes, lets make a list of all suras, and show their total number of lexemes and
the number of lexemes that occur exclusively in that sura.

In [19]:
indent(reset=True)
info('Making sura-lexeme index')

allSura = collections.defaultdict(set)
allLex = set()

for s in F.otype.s('sura'):
    for w in L.d(s, 'word'):
        l = F.lemma.v(w)
        allSura[s].add(l)
        allLex.add(l)

info('Found {} lexemes'.format(len(allLex)))

  0.00s Making sura-lexeme index
  0.10s Found 4833 lexemes


In [20]:
indent(reset=True)
info('Finding single sura lexemes')

lexSuraIndex = {}

for (lex, occs) in lexIndex.items():
    lexSuraIndex[lex] = set(L.u(n, otype='sura')[0] for n in occs)

singleSuraLex = collections.defaultdict(set)
for (lex, suras) in lexSuraIndex.items():
    if len(suras) == 1:
        singleSuraLex[list(suras)[0]].add(lex)

singleSura = {sura: len(lexs) for (sura, lexs) in singleSuraLex.items()}

info('found {} single sura lexemes'.format(sum(singleSura.values())))

  0.00s Finding single sura lexemes
  0.88s found 2228 single sura lexemes


In [21]:
print('{:<30} {:>4} {:>4} {:>4} {:>5}\n{}'.format(
    'sura name', 'sura', '#all', '#own', '%own',
    '-'*51,
))
suraList = []

for s in F.otype.s('sura'):
    suraName = Fs('name@en').v(s)
    sura = T.suraName(s)
    a = len(allSura[s])
    o = singleSura.get(s, 0)
    p = 100 * o / a
    suraList.append((suraName, sura, a, o, p))

for x in sorted(suraList, key=lambda e: (-e[4], -e[2], e[1])):
    print('{:<30} {:>4} {:>4} {:>4} {:>4.1f}%'.format(*x))

sura name                      sura #all #own  %own
---------------------------------------------------
Abundance                       108    9    4 44.4%
Quraysh                         106   16    5 31.2%
The Dawn                        113   17    5 29.4%
The Chargers                    100   32    9 28.1%
Sincerity                       112    9    2 22.2%
The Traducer                    104   28    6 21.4%
The Palm Fibre                  111   21    4 19.0%
The Overwhelming                 88   69   13 18.8%
The Beneficent                   55  142   26 18.3%
The Overthrowing                 81   77   14 18.2%
The Morning Star                 86   44    8 18.2%
The Elephant                    105   22    4 18.2%
The Sun                          91   45    8 17.8%
Defrauding                       83   96   17 17.7%
The Inevitable                   56  206   36 17.5%
The City                         90   63   11 17.5%
The Calamity                    101   24    4 16.7%
Those who dr

## For all section types

What we did for suras, we can also do for the other section types.

We generalize the task into a function, that accepts the kind of section as parameter.
Then we call that function for all our section types.

In [22]:
def lexBase(section):
  # make indices
  lexemesPerSection = {}
  sectionsPerLexeme = {}
  for s in F.otype.s(section):
    for w in L.d(s, otype='word'):
      lex = F.lemma.v(w)
      lexemesPerSection.setdefault(s, set()).add(lex)
      sectionsPerLexeme.setdefault(lex, set()).add(s)
    
  print('{:<10} {:>4} {:>4} {:>5}\n{}'.format(
      section, '#all', '#own', '%own',
      '-' * 26,
  ))
  sectionList = []

  for s in F.otype.s(section):
      n = F.number.v(s)
      myLexes = lexemesPerSection[s]
      a = len(myLexes)
      o = len([lex for lex in myLexes if len(sectionsPerLexeme[lex]) == 1])
      p = 100 * o / a
      sectionList.append((n, a, o, p))

  for x in sorted(sectionList, key=lambda e: (-e[3], -e[1], e[0])):
      print('{:<10} {:>4} {:>4} {:>4.1f}%'.format(*x))
  print('=' * 26)

First we check with the suras:

In [23]:
lexBase('sura')

sura       #all #own  %own
--------------------------
108           9    4 44.4%
106          16    5 31.2%
113          17    5 29.4%
100          32    9 28.1%
112           9    2 22.2%
104          28    6 21.4%
111          21    4 19.0%
88           69   13 18.8%
55          142   26 18.3%
81           77   14 18.2%
86           44    8 18.2%
105          22    4 18.2%
91           45    8 17.8%
83           96   17 17.7%
56          206   36 17.5%
90           63   11 17.5%
101          24    4 16.7%
79          127   21 16.5%
80          103   17 16.5%
75          104   17 16.3%
93           31    5 16.1%
89           94   15 16.0%
77          108   17 15.7%
19          360   52 14.4%
69          157   22 14.0%
9           638   89 13.9%
18          552   71 12.9%
53          188   24 12.8%
2          1137  145 12.8%
12          512   65 12.7%
114          16    2 12.5%
68          171   21 12.3%
74          155   19 12.3%
54          188   22 11.7%
73          129   14 10.9%
7

This is the same list as before, but without the sura names.
Now the other sections.

In [24]:
for section in ('manzil', 'sajda', 'juz', 'ruku','hizb', 'page'):
  lexBase(section)

manzil     #all #own  %own
--------------------------
7          2120  685 32.3%
4          1907  415 21.8%
1          1694  302 17.8%
2          1773  316 17.8%
5          1580  235 14.9%
3          1493  222 14.9%
6          1516  215 14.2%
sajda      #all #own  %own
--------------------------
13         1722  510 29.6%
15          257   56 21.8%
14          410   88 21.5%
1          1534  329 21.4%
4           694  117 16.9%
5           876  137 15.6%
10         1173  178 15.2%
12         1285  193 15.0%
7           824  102 12.4%
3           765   94 12.3%
6           388   45 11.6%
9           972  104 10.7%
11          800   74  9.2%
2           747   69  9.2%
8           567   52  9.2%
juz        #all #own  %own
--------------------------
30          840  233 27.7%
29          916  168 18.3%
27          823  141 17.1%
16          776  123 15.9%
15          790  101 12.8%
10          629   70 11.1%
23          782   86 11.0%
26          773   85 11.0%
17          749   79 10.5%
1

# Layer API
We travel upwards and downwards, forwards and backwards through the nodes.
The Layer-API (`L`) provides functions: `u()` for going up, and `d()` for going down,
`n()` for going to next nodes and `p()` for going to previous nodes.

These directions are indirect notions: nodes are just numbers, but by means of the
`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.
And one if next or previous to an other, if its slots follow of precede the slots of the other one.

`L.u(node)` **Up** is going to nodes that embed `node`.

`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.

`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.

`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.

All these functions yield nodes of all possible otypes.
By passing an optional parameter, you can restrict the results to nodes of that type.

The result are ordered according to the order of things in the text.

The functions return always a tuple, even if there is just one node in the result.

## Going up
We go from the first word to the book it contains.
Note the `[0]` at the end. You expect one book, yet `L` returns a tuple. 
To get the only element of that tuple, you need to do that `[0]`.

If you are like me, you keep forgetting it, and that will lead to weird error messages later on.

In [25]:
firstSura = L.u(1, otype='sura')[0]
print(firstSura)

211885


And let's see all the containing objects of word 3:

In [26]:
w = 3
for otype in F.otype.all:
    if otype == F.otype.slotType: continue
    up = L.u(w, otype=otype)
    upNode = 'x' if len(up) == 0 else up[0]
    print('word {} is contained in {} {}'.format(w, otype, upNode))

word 3 is contained in manzil 217101
word 3 is contained in sajda x
word 3 is contained in juz 216831
word 3 is contained in sura 211885
word 3 is contained in hizb 216861
word 3 is contained in ruku 217108
word 3 is contained in page 217664
word 3 is contained in aya 205649
word 3 is contained in lex 212000
word 3 is contained in group 128221


## Going next
Let's go to the next nodes of the first book.

In [27]:
afterFirstSura = L.n(firstSura)
for n in afterFirstSura:
    print('{:>7}: {:<13} first slot={:<6}, last slot={:<6}'.format(
        n, F.otype.v(n),
        E.oslots.s(n)[0],
        E.oslots.s(n)[-1],
    ))
secondSura = L.n(firstSura, otype='sura')[0]

     49: word          first slot=49    , last slot=49    
 128249: group         first slot=49    , last slot=49    
 205656: aya           first slot=49    , last slot=49    
 217665: page          first slot=49    , last slot=112   
 217109: ruku          first slot=49    , last slot=149   
 211886: sura          first slot=49    , last slot=10291 


## Going previous

And let's see what is right before the second book.

In [28]:
for n in L.p(secondSura):
    print('{:>7}: {:<13} first slot={:<6}, last slot={:<6}'.format(
        n, F.otype.v(n),
        E.oslots.s(n)[0],
        E.oslots.s(n)[-1],
    ))

 211885: sura          first slot=1     , last slot=48    
 217108: ruku          first slot=1     , last slot=48    
 217664: page          first slot=1     , last slot=48    
 205655: aya           first slot=34    , last slot=48    
 128248: group         first slot=47    , last slot=48    
     48: word          first slot=48    , last slot=48    


## Going down

We go to the chapters of the second book, and just count them.

In [29]:
ayas = L.d(secondSura, otype='aya')
print(len(ayas))

286


## The first aya
We pick the first aya and the first word, and explore what is above and below them.

In [30]:
for n in [1, L.u(1, otype='aya')[0]]:
    indent(level=0)
    info('Node {}'.format(n), tm=False)
    indent(level=1)
    info('UP', tm=False)
    indent(level=2)
    info('\n'.join(['{:<15} {}'.format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)
    indent(level=1)
    info('DOWN', tm=False)
    indent(level=2)
    info('\n'.join(['{:<15} {}'.format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)
indent(level=0)
info('Done', tm=False)

Node 1
   |   UP
   |      |   128220          group
   |      |   205649          aya
   |      |   217664          page
   |      |   217108          ruku
   |      |   211885          sura
   |      |   216861          hizb
   |      |   216831          juz
   |      |   217101          manzil
   |   DOWN
   |      |   
Node 205649
   |   UP
   |      |   217664          page
   |      |   217108          ruku
   |      |   211885          sura
   |      |   216861          hizb
   |      |   216831          juz
   |      |   217101          manzil
   |   DOWN
   |      |   128220          group
   |      |   1               word
   |      |   2               word
   |      |   128221          group
   |      |   3               word
   |      |   128222          group
   |      |   4               word
   |      |   5               word
   |      |   128223          group
   |      |   6               word
   |      |   7               word
Done


# Text API

So far, we have mainly seen nodes and their numbers, and the names of node types.
You would almost forget that we are dealing with text.
So let's try to see some text.

In the same way as `F` gives access to feature data,
`T` gives access to the text.
That is also feature data, but you can tell Text-Fabric which features are specifically
carrying the text, and in return Text-Fabric offers you
a Text API: `T`.

## Formats
Arabic text can be represented in a number of ways:

* in transliteration, or in Arabic characters,
* showing the actual text or only the lexemes, or roots.

If you wonder where the information about text formats is stored: 
not in the program text-fabric, but in the data set.
It has a feature `otext`, which specifies the formats and which features
must be used to produce them. `otext` is the third special feature in a TF data set,
next to `otype` and `oslots`. 
It is an optional feature. 
If it is absent, there will be no `T` API.

Here is a list of all available formats in this data set.

In [31]:
sorted(T.formats)

['lex-trans-full', 'root-trans-full', 'text-orig-full', 'text-trans-full']

## Using the formats

We can pretty display in other formats:

In [32]:
for word in wordShow:
  A.pretty(word, fmt='text-trans-full')

Now let's use those formats to print out the first aya of the Quran.

In [33]:
a1 = F.otype.s('aya')[0]

for fmt in sorted(T.formats):
    print('{}:\n\t{}'.format(fmt, T.text(a1, fmt=fmt, descend=True)))

lex-trans-full:
	{som {ll~ah r~aHoma`n r~aHiym
root-trans-full:
	smw Alh rHm rHm
text-orig-full:
	بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
text-trans-full:
	bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi


If we do not specify a format, the **default** format is used (`text-orig-full`).

In [34]:
print(T.text(a1))

بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ


## Whole text in all formats in less than a second
Part of the pleasure of working with computers is that they can crunch massive amounts of data.
The text of the Quran Bible is a piece of cake.

It takes less than a second to have that cake and eat it. 
In nearly a handful formats.

In [35]:
indent(reset=True)
info('writing plain text of whole Quran in all formats')

text = collections.defaultdict(list)

for a in F.otype.s('aya'):
    words = L.d(a, 'word')
    for fmt in sorted(T.formats):
        text[fmt].append(T.text(words, fmt=fmt))

info('done {} formats'.format(len(text)))

for fmt in sorted(text):
    print('{}\n{}\n'.format(fmt, '\n'.join(text[fmt][0:5])))

  0.00s writing plain text of whole Quran in all formats
  0.67s done 4 formats
lex-trans-full
{som {ll~ah r~aHoma`n r~aHiym
Hamod {ll~ah rab~ Ea`lamiyn
r~aHoma`n r~aHiym
ma`lik yawom diyn
<iy~aA Eabada <iy~aA {sotaEiynu

root-trans-full
smw Alh rHm rHm
Hmd Alh rbb Elm
rHm rHm
mlk ywm dyn
 Ebd  Ewn

text-orig-full
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ
ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
مَٰلِكِ يَوْمِ ٱلدِّينِ
إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ

text-trans-full
bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi
{loHamodu lil~ahi rab~i {loEa`lamiyna
{lr~aHoma`ni {lr~aHiymi
ma`liki yawomi {ld~iyni
<iy~aAka naEobudu wa<iy~aAka nasotaEiynu



### The full plain text
We write a few formats to file, in your `Downloads` folder.

In [36]:
orig = 'text-orig-full'
trans = 'text-trans-full'
for fmt in (orig, trans):
    with open(os.path.expanduser(f'~/Downloads/Quran-{fmt}.txt'), 'w') as f:
        f.write('\n'.join(text[fmt]))

In [37]:
!head -n 20 ~/Downloads/Quran-{orig}.txt

بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ
ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
مَٰلِكِ يَوْمِ ٱلدِّينِ
إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ
ٱهْدِنَا ٱلصِّرَٰطَ ٱلْمُسْتَقِيمَ
صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُوبِ عَلَيْهِمْ وَلَا ٱلضَّآلِّينَ
الٓمٓ
ذَٰلِكَ ٱلْكِتَٰبُ لَا رَيْبَ فِيهِ هُدًى لِّلْمُتَّقِينَ
ٱلَّذِينَ يُؤْمِنُونَ بِٱلْغَيْبِ وَيُقِيمُونَ ٱلصَّلَوٰةَ وَمِمَّا رَزَقْنَٰهُمْ يُنفِقُونَ
وَٱلَّذِينَ يُؤْمِنُونَ بِمَآ أُنزِلَ إِلَيْكَ وَمَآ أُنزِلَ مِن قَبْلِكَ وَبِٱلْءَاخِرَةِ هُمْ يُوقِنُونَ
أُو۟لَٰٓئِكَ عَلَىٰ هُدًى مِّن رَّبِّهِمْ وَأُو۟لَٰٓئِكَ هُمُ ٱلْمُفْلِحُونَ
إِنَّ ٱلَّذِينَ كَفَرُوا۟ سَوَآءٌ عَلَيْهِمْ ءَأَنذَرْتَهُمْ أَمْ لَمْ تُنذِرْهُمْ لَا يُؤْمِنُونَ
خَتَمَ ٱللَّهُ عَلَىٰ قُلُوبِهِمْ وَعَلَىٰ سَمْعِهِمْ وَعَلَىٰٓ أَبْصَٰرِهِمْ غِشَٰوَةٌ وَلَهُمْ عَذَابٌ عَظِيمٌ
وَمِنَ ٱلنَّاسِ مَن يَقُولُ ءَامَنَّا بِٱللَّهِ وَبِٱلْيَوْمِ ٱلْءَاخِرِ وَمَا هُم بِمُؤْمِنِينَ
يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْدَعُو

In [38]:
!head -n 20 ~/Downloads/Quran-{trans}.txt

bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi
{loHamodu lil~ahi rab~i {loEa`lamiyna
{lr~aHoma`ni {lr~aHiymi
ma`liki yawomi {ld~iyni
<iy~aAka naEobudu wa<iy~aAka nasotaEiynu
{hodinaA {lS~ira`Ta {lomusotaqiyma
Sira`Ta {l~a*iyna >anoEamota Ealayohimo gayori {lomagoDuwbi Ealayohimo walaA {lD~aA^l~iyna
Al^m^
*a`lika {lokita`bu laA rayoba fiyhi hudFY l~ilomut~aqiyna
{l~a*iyna yu&ominuwna bi{logayobi wayuqiymuwna {lS~alaw`pa wamim~aA razaqona`humo yunfiquwna
wa{l~a*iyna yu&ominuwna bimaA^ >unzila <ilayoka wamaA^ >unzila min qabolika wabi{lo'aAxirapi humo yuwqinuwna
>uw@la`^}ika EalaY` hudFY m~in r~ab~ihimo wa>uw@la`^}ika humu {lomufoliHuwna
<in~a {l~a*iyna kafaruwA@ sawaA^'N Ealayohimo 'a>an*arotahumo >amo lamo tun*irohumo laA yu&ominuwna
xatama {ll~ahu EalaY` quluwbihimo waEalaY` samoEihimo waEalaY`^ >aboSa`rihimo gi$a`wapN walahumo Ea*aAbN EaZiymN
wamina {ln~aAsi man yaquwlu 'aAman~aA bi{ll~ahi wabi{loyawomi {lo'aAxiri wamaA hum bimu&ominiyna
yuxa`diEuwna {ll~aha wa{l~a*iyna 'aAmanuwA@ wamaA yaxod

## Sections

A section is a sura, and an aya.
Knowledge of sections is not baked into Text-Fabric. 
The config feature `otext.tf` may specify two or three section levels, and tell
what the corresponding node types and features are.

From that knowledge it can construct mappings from nodes to sections, e.g. from aya
nodes to tuples of the form:

    (sura number, aya number)
   
Here are examples of getting the section that corresponds to a node and vice versa.

**NB:** `sectionFromNode` always delivers a verse specification, either from the
first slot belonging to that node, or, if `lastSlot`, from the last slot
belonging to that node.

The other sectional units in the quran, `manzil`, `sajda`, `juz`, `ruku`, `hizb`, `page`
are not associated with special Text-Fabric functions in this data set, although we could have
chosen to use two or three of them instead of sura and aya.

In [39]:
for x in (
    ('sura, aya of first word',   T.sectionFromNode(1)                           ),
    ('node of 1:1',               T.nodeFromSection((1, 1))                      ),
    ('node of 2:1',               T.nodeFromSection((2, 1))                      ),
    ('node of sura 1',            T.nodeFromSection((1,))                        ),
    ('section of sura node',      T.sectionFromNode(211890)                      ),
    ('section of aya node',       T.sectionFromNode(210000)                      ),
    ('section of juz node',       T.sectionFromNode(216850)                      ),
    ('idem, now last word',       T.sectionFromNode(216850, lastSlot=True)       ),
): print('{:<30} {}'.format(*x))

sura, aya of first word        (1, 1)
node of 1:1                    205649
node of 2:1                    205656
node of sura 1                 211885
section of sura node           (6,)
section of aya node            (43, 27)
section of juz node            (27, 56)
idem, now last word            (29, 45)


# Translations

This data source contains English (by Arberry) and Dutch (by Leemhuis) translations of the Quran.
They are stored in the features `translation@en` and `translation@nl` for aya nodes.

Let's get the translations of sura 107, together with the arabic original.

The translation features are not loaded by default, we load them first.

In [40]:
TF.load('translation@en translation@nl', add=True)

sura = 107

suraNode = T.suraNode(sura)
print(F.name.v(suraNode))

for ayaNode in L.d(suraNode, otype='aya'):
  print(f'{F.number.v(ayaNode)}')
  print(T.text(ayaNode))
  print(Fs('translation@en').v(ayaNode))
  print(Fs('translation@nl').v(ayaNode))

  0.00s loading features ...
  0.00s All additional features loaded - for details use loadLog()
الماعون
1
أَرَءَيْتَ ٱلَّذِى يُكَذِّبُ بِٱلدِّينِ
Hast thou seen him who cries lies to the Doom?
Heb jij hem gezien die de godsdienst loochent?
2
فَذَٰلِكَ ٱلَّذِى يَدُعُّ ٱلْيَتِيمَ
That is he who repulses the orphan
Dat is hij die de wees wegduwt
3
وَلَا يَحُضُّ عَلَىٰ طَعَامِ ٱلْمِسْكِينِ
and urges not the feeding of the needy.
en die er niet op aandringt de behoeftige voedsel te geven.
4
فَوَيْلٌ لِّلْمُصَلِّينَ
So woe to those that pray
En wee hen die de salaat bidden
5
ٱلَّذِينَ هُمْ عَن صَلَاتِهِمْ سَاهُونَ
and are heedless of their prayers,
die hun salaat veronachtzamen,
6
ٱلَّذِينَ هُمْ يُرَآءُونَ
to those who make display
die vertoon willen maken
7
وَيَمْنَعُونَ ٱلْمَاعُونَ
and refuse charity.
en die de hulpverlening weigeren.


# Next steps

* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[similarAyas](similarAyas.ipynb)** spot the similarities between lines
* **[rings](rings.ipynb)** ring structures in sura 2