<img align="right" src="images/tf.png"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/logo.png"/>

---

To get started: consult [start](start.ipynb)

---

# Export to Excel

In a notebook, you can perform searches and view them in a tabular display and zoom in on items with
pretty displays.

But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just
work with them in other programs, such as Excel.

You want to do that not only with query results, but with all kinds of lists of tuples of nodes.

There is a function for that, `A.export()`, and here we show what it can do.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
from tf.app import use

In [3]:
A = use("missieven", hoist=globals())
# A = use("missieven:latest", checkout="latest", hoist=globals())
# A = use('missieven:clone', checkout="clone", hoist=globals())

# Inspect the contents of a file
We write a function that can peek into file on your system, and show the first few lines.
We'll use it to inspect the exported files that we are going to produce.

In [4]:
EXPORT_FILE = os.path.expanduser('~/Downloads/results.tsv')
UPTO = 10

def checkout():
    with open(EXPORT_FILE, encoding='utf_16') as fh:
        for (i, line) in enumerate(fh):
            if i >= UPTO:
                break
            print(line.rstrip('\n'))

# Encoding

Our exported `.tsv` files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: `utf_16_le`.
You can just open them in Excel, there is no need for conversion before or after opening these files.

Should you want to process these files by means of a (Python) program, 
take care to read them with encoding `utf_16`.

# Example query

We first run a query in order to export the results.

In [5]:
query = '''
volume n=1
  line
    word transo~([0-9][A-Za-z])|([a-zA-Z][0-9])
'''
results = A.search(query)

  5.36s 683 results


# Bare export

You can export the table of results to Excel.

The following command writes a tab-separated file `results.tsv` to your downloads directory.

You can specify arguments `toDir=directory` and `toFile=file name` to write to a different file.
If the directory does not exist, it will be created.

We stick to the default, however.

In [6]:
A.export(results)

Check out the contents:

In [7]:
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	TEXT2	NODE3	TYPE3	TEXT3	transo3
1	1	5	11	5572953	volume	1	5054836	line	’t geene sij gedaen hebben ende alsoo de soldaeten secuideerden1 	674	word	secuideerden1 	secuideerden1
2	1	6	6	5572953	volume	1	5054873	line	hebben daeromme den 14en deser goet gevonden 14 soldaeten onder ’t commandement	1067	word	14en 	14en
3	1	8	4	5572953	volume	1	5054931	line	Mijn Heeren, tsedert het vertreck van ons schip den Leeuw met Pijlen1 	1637	word	Pijlen1 	Pijlen1
4	1	9	5	5572953	volume	1	5054963	line	accorderen ende dat door haer onredelijcke proceduren. Sij hebben op ten 24en meye	2030	word	24en 	24en
5	1	10	5	5572953	volume	1	5054989	line	bij hem hebbende dertich duysent realen van 8en, veerthien soldaten	2305	word	8en, 	8en
6	1	21	6	5572953	volume	1	5055279	line	Wij hebben aldaer1 toegebracht omtrent vier maenden in ’t af f breken	5285	word	aldaer1 	aldaer1
7	1	22	5	5572953	volume	1	5055313	line	Opten 13en january 1613 arriveerden weder met ons schip Banda voor	5

You see the following columns:

* **R** the sequence number of the result tuple in the result list
* **S1 S2 S3** the section as P-number, face, line number, in separate columns
* **NODEi TYPEi** the node and its type, for each node **i** in the result tuple
* **TEXTi** the full text of node **i**, if the node type admits a concise text representation
* **reading2-4** the value of feature **reading**, since our query mentions the feature `reading` on nodes 2-4

# Poorer exports

If you do not need the full text of the lines, you can leave them out by specifying a smaller *condense type*.

The export function provides text for all nodes whose type is not too big.
What is too big is determined by the condense type.

In this corpus, the default condens type is line. Node types bigger than lines will not get text.

Now, if we change the condenseType to something smaller than line, e.g. `word`, the line text will be suppressed.

In [8]:
A.export(results, condenseType='word')

  7.46s ERROR in table(): illegal node type in "condenseType=word"
  7.46s Legal values are: cell, folio, head, letter, line, page, para, remark, row, subhead, table, volume


''

Oops, `word` is the slot type, we cannot use it as a condense type.
Let's take `cell`.

In [9]:
A.export(results, condenseType='cell')
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	NODE3	TYPE3	TEXT3	transo3
1	1	5	11	5572953	volume	1	5054836	line	674	word	secuideerden1 	secuideerden1
2	1	6	6	5572953	volume	1	5054873	line	1067	word	14en 	14en
3	1	8	4	5572953	volume	1	5054931	line	1637	word	Pijlen1 	Pijlen1
4	1	9	5	5572953	volume	1	5054963	line	2030	word	24en 	24en
5	1	10	5	5572953	volume	1	5054989	line	2305	word	8en, 	8en
6	1	21	6	5572953	volume	1	5055279	line	5285	word	aldaer1 	aldaer1
7	1	22	5	5572953	volume	1	5055313	line	5659	word	13en 	13en
8	1	22	19	5572953	volume	1	5055327	line	5813	word	7en 	7en
9	1	22	33	5572953	volume	1	5055341	line	5986	word	8en 	8en


## Additional features

If we want to export additional features, we just have to mention them.
In order to do so and not change the result set, put a `*` behind the feature.

The `*` means: *always true, no matter what's in the feature, even if there is nothing in there*.

Lets ask for the flags on the first `ma`.

In [10]:
query = '''
volume n=1
  line
    word transo~([0-9][A-Za-z])|([a-zA-Z][0-9]) fnote
'''
results = A.search(query)

  2.91s 49 results


The same number of results.

We do the export again and peek at the results.

In [11]:
A.export(results, condenseType='cell')
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	NODE3	TYPE3	TEXT3	fnote3	transo3
1	1	5	11	5572953	volume	1	5054836	line	674	word	secuideerden1 	2. Secuideerden, waarschijnlijk : seduceerden, in verleiding brachten.	secuideerden1
2	1	8	4	5572953	volume	1	5054931	line	1637	word	Pijlen1 	2. Kort na 31 maart 1612.	Pijlen1
3	1	21	6	5572953	volume	1	5055279	line	5285	word	aldaer1 	2. NL het eiland Batjan.	aldaer1
4	1	23	5	5572953	volume	1	5055353	line	6128	word	Gapi1 	2. Ketjil Gapi; mij onbekende Temataanse grote; ketjil is de titel, die personen van
vorstelijken bloede voeren, Gapi is een oude naam voor de Banggai-archipel. Itsie is blijkbaar
de naam van den djogugu, zoals de Nederlanders die verstonden.	Gapi1
5	1	25	10	5572953	volume	1	5055432	line	7044	word	Engelandt1 	2. Vermoedelijk is hier bedoeld de gezant der Staten-Generaal in Engeland, sir Noel
de Caron.	Engelandt1
6	1	35	21	5572953	volume	1	5055708	line	9957	word	Gilolespunct1 	2. De vorst van Djailolo was vanouds een der vier Molukse koni

As you see, you have an extra column **fnote3** with footnote text in it, if a word has a footnote.

This gives you a lot of control over the generation of spreadsheets.

# Not from queries

You can also export lists of node tuples that are not obtained by a query.

We are interested in words in the original letters that contain a mixture of letters and digits.

Let's collect them.

We collect those words as a list of 1-tuples that contain the corresponding node for that word.

In [12]:
import re

In [13]:
mixedRe = re.compile(
    r"""
    (?:
        [a-z][0-9]
    )
    |
    (?:
        [0-9][a-z]
    )""",
    re.I | re.X,
)
tuples = tuple((w,) for w in F.otype.s("word") if mixedRe.search(F.transo.v(w) or ""))
print(tuples[0:10])
len(tuples)

((674,), (1067,), (1637,), (2030,), (2305,), (5285,), (5659,), (5813,), (5986,), (6128,))


7883

Let's do a bare export:

In [14]:
A.export(tuples)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	n1
1	1	5	11	674	word	secuideerden1 	
2	1	6	6	1067	word	14en 	
3	1	8	4	1637	word	Pijlen1 	
4	1	9	5	2030	word	24en 	
5	1	10	5	2305	word	8en, 	
6	1	21	6	5285	word	aldaer1 	
7	1	22	5	5659	word	13en 	
8	1	22	19	5813	word	7en 	
9	1	22	33	5986	word	8en 	


---

# Contents

* **[start](start.ipynb)** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **exportExcel** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours

CC-BY Dirk Roorda