<img align="right" src="images/tf.png" width="200"/>
<img align="right" src="images/huc.png" width="200"/>
<img align="right" src="images/logo.png" width="200"/>

---

To get started: consult [start](start.ipynb)

---

# Export to Excel

In a notebook, you can perform searches and view them in a tabular display and zoom in on items with
pretty displays.

But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just
work with them in other programs, such as Excel.

You want to do that not only with query results, but with all kinds of lists of tuples of nodes.

There is a function for that, `A.export()`, and here we show what it can do.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import re
from tf.app import use

In [3]:
A = use("clariah/wp6-missieven", hoist=globals())

This is Text-Fabric 9.4.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

44 features found and 0 ignored


# Inspect the contents of a file
We write a function that can peek into file on your system, and show the first few lines.
We'll use it to inspect the exported files that we are going to produce.

In [4]:
EXPORT_FILE = os.path.expanduser("~/Downloads/results.tsv")
UPTO = 10


def checkout():
    with open(EXPORT_FILE, encoding="utf_16") as fh:
        for (i, line) in enumerate(fh):
            if i >= UPTO:
                break
            print(line.rstrip("\n"))

# Encoding

Our exported `.tsv` files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: `utf_16_le`.
You can just open them in Excel, there is no need for conversion before or after opening these files.

Should you want to process these files by means of a (Python) program,
take care to read them with encoding `utf_16`.

# Example query

We first run a query in order to export the results.

In [5]:
query = """
volume n=1
  line
    word transr~([0-9][A-Za-z])|([a-zA-Z][0-9])
"""
results = A.search(query)

  2.64s 84 results


# Bare export

You can export the table of results to Excel.

The following command writes a tab-separated file `results.tsv` to your downloads directory.

You can specify arguments `toDir=directory` and `toFile=file name` to write to a different file.
If the directory does not exist, it will be created.

We stick to the default, however.

In [6]:
A.export(results)

Check out the contents:

In [7]:
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	TEXT2	NODE3	TYPE3	TEXT3	transr3
1	1	3	7	6638970	volume	1	6018779	line	961, 1folio. 	65	word	1folio. 	1folio
2	1	3	3	6638970	volume	1	6018785	line	961, copie, 5folio's. 	93	word	5folio'	5folio
3	1	3	3	6638970	volume	1	6018791	line	961, 1folio. 	117	word	1folio. 	1folio
4	1	4	2	6638970	volume	1	6018797	line	961, 3folio’s. 	148	word	3folio’s. 	3folio’s
5	1	7	5	6638970	volume	1	6018946	line	961, 2folio’s, 2 exemplaren, 1 origineel, 1 copie. 	1807	word	2folio’s, 	2folio’s
6	1	8	14	6638970	volume	1	6018989	line	961, 7folio’s. 	2283	word	7folio’s. 	7folio’s
7	1	16	3	6638970	volume	1	6019343	line	961, 2folio's. 	6367	word	2folio'	2folio
8	1	16	3	6638970	volume	1	6019349	line	961, 3folio's. 	6390	word	3folio'	3folio
9	1	17	3	6638970	volume	1	6019357	line	961, 1folio. 	6433	word	1folio. 	1folio


You see the following columns:

* **R** the sequence number of the result tuple in the result list
* **S1 S2 S3** the section as P-number, face, line number, in separate columns
* **NODEi TYPEi** the node and its type, for each node **i** in the result tuple
* **TEXTi** the full text of node **i**, if the node type admits a concise text representation
* **transr3** the value of feature **transr**, since our query mentions the feature `transr` on node 3

# Poorer exports

If you do not need the full text of the lines, you can leave them out by specifying a smaller *condense type*.

The export function provides text for all nodes whose type is not too big.
What is too big is determined by the condense type.

In this corpus, the default condens type is line. Node types bigger than lines will not get text.

Now, if we change the condenseType to something smaller than line, e.g. `word`, the line text will be suppressed.

In [8]:
A.export(results, condenseType="word")

  2.66s ERROR in table(): illegal node type in "condenseType=word"
  2.66s Legal values are: cell, folio, head, letter, line, note, page, para, remark, row, subhead, table, volume


''

Oops, `word` is the slot type, we cannot use it as a condense type.
Let's take `cell`.

In [9]:
A.export(results, condenseType="cell")
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	NODE3	TYPE3	TEXT3	transr3
1	1	3	7	6638970	volume	1	6018779	line	65	word	1folio. 	1folio
2	1	3	3	6638970	volume	1	6018785	line	93	word	5folio'	5folio
3	1	3	3	6638970	volume	1	6018791	line	117	word	1folio. 	1folio
4	1	4	2	6638970	volume	1	6018797	line	148	word	3folio’s. 	3folio’s
5	1	7	5	6638970	volume	1	6018946	line	1807	word	2folio’s, 	2folio’s
6	1	8	14	6638970	volume	1	6018989	line	2283	word	7folio’s. 	7folio’s
7	1	16	3	6638970	volume	1	6019343	line	6367	word	2folio'	2folio
8	1	16	3	6638970	volume	1	6019349	line	6390	word	3folio'	3folio
9	1	17	3	6638970	volume	1	6019357	line	6433	word	1folio. 	1folio


## Additional features

If we want to export additional features, we just have to mention them.
In order to do so and not change the result set, put a `*` behind the feature.

The `*` means: *always true, no matter what's in the feature, even if there is nothing in there*.

Let's now ask for editorial words, and also retrieve whether they are folio references or not.

In [10]:
query = """
volume n=1
  line
    word transr~([0-9][A-Za-z])|([a-zA-Z][0-9]) isfolio*
"""
results = A.search(query)

  4.63s 84 results


The same number of results.

We do the export again and peek at the results.

In [11]:
A.export(results, condenseType="cell")
checkout()

R	S1	S2	S3	NODE1	TYPE1	n1	NODE2	TYPE2	NODE3	TYPE3	TEXT3	isfolio3	transr3
1	1	3	7	6638970	volume	1	6018779	line	65	word	1folio. 	1	1folio
2	1	3	3	6638970	volume	1	6018785	line	93	word	5folio'	1	5folio
3	1	3	3	6638970	volume	1	6018791	line	117	word	1folio. 	1	1folio
4	1	4	2	6638970	volume	1	6018797	line	148	word	3folio’s. 	1	3folio’s
5	1	7	5	6638970	volume	1	6018946	line	1807	word	2folio’s, 	1	2folio’s
6	1	8	14	6638970	volume	1	6018989	line	2283	word	7folio’s. 	1	7folio’s
7	1	16	3	6638970	volume	1	6019343	line	6367	word	2folio'	1	2folio
8	1	16	3	6638970	volume	1	6019349	line	6390	word	3folio'	1	3folio
9	1	17	3	6638970	volume	1	6019357	line	6433	word	1folio. 	1	1folio


As you see, you have an extra column **isfolio3** .

This gives you a lot of control over the generation of spreadsheets.

# Not from queries

You can also export lists of node tuples that are not obtained by a query.

We are interested in words in the original letters that contain a mixture of letters and digits.

Let's collect them.

We collect those words as a list of 1-tuples that contain the corresponding node for that word.

In [12]:
mixedRe = re.compile(
    r"""
    (?:
        [a-z][0-9]
    )
    |
    (?:
        [0-9][a-z]
    )""",
    re.I | re.X,
)
tuples = tuple((w,) for w in F.otype.s("word") if mixedRe.search(F.transo.v(w) or ""))
print(tuples[0:10])
len(tuples)

((828,), (1270,), (2295,), (2819,), (3381,), (6908,), (7393,), (7591,), (7764,), (7954,))


7887

Let's do a bare export:

In [13]:
A.export(tuples)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	n1
1	1	5	12	828	word	secuideerden1 	
2	1	6	7	1270	word	14en 	
3	1	8	15	2295	word	Pijlen1 	
4	1	9	8	2819	word	24en 	
5	1	10	5	3381	word	8en, 	
6	1	21	8	6908	word	aldaer1 	
7	1	22	5	7393	word	13en 	
8	1	22	23	7591	word	7en 	
9	1	22	37	7764	word	8en 	


---

# Contents

* **[start](start.ipynb)** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **exportExcel** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[entities](entities.ipynb)** use results of third-party NER (named entity recognition)
* **[porting](porting.ipynb)** port features made against an older version to a newer version
* **[volumes](volumes.ipynb)** work with selected volumes only

CC-BY Dirk Roorda