# Convert from TEI to TF to WATM

We convert Suriano TEI to TF and then to WATM.

This notebook provides three levels of refinement in the execution. They all have the same outcome,
but they differ in the level of detail they provide on the conversion.

These are the levels:

* **Express**: one single command on the command line for the complete conversion;
* **Step by step**: one command for each main step of the conversion;
* **Debugging**: all the commands directly in Python, the intermediate data remains in memory and can be inspected.

# Production or development

Mosts steps are unaffected by the production/development setting.

In the first steps of the pipeline (*ingest* and *scan processing*) we prepare both the dev and the prod data.

The intermediate steps (*from DOCX to TEI*, *from TEI to TF*, *mark named entities*) are identical for prod and dev.

Only for the latter steps (*convert TF to WATM*, generate IIIF manifests*, *deploy to k8s*) there is a distinction between prod and dev.

For these steps we have commented out the line that does the dev version.

# Requirements

* zsh as command line shell (as in macOS);
* access to suitable k8s clusters, streamlined by the 
  [k-suite](https://code.huc.knaw.nl/tt/smart-k8s/-/blob/main/docs/k-suite.md);
* [Pandoc](https://pandoc.org)
* [Imagemagick](https://imagemagick.org)
* [Python](https://www.python.org) (3.12 or higher) with additional pip-installable modules:
  * text-fabric
  * doc2python
  * openpyxl

# Declare the version

Always set the version before running any cell in this notebook!

In [1]:
VERSION = "0.7.1"

# Express: One shot

Here is the express, mindless way to convert the corpus.
If something goes wrong, you can follow the step-by-step section or the debugging section.

In [None]:
%%time

!python make.py all $VERSION
# !python make.py all $VERSION --no-prod

# Still on the command line: Step by step

Here we do the main steps of the conversion.

Every step is a separate run of a python program.
After completion of a step, all information to run a next step, is saved to disk in the form
of result files and report files.

If the results of earlier steps are present, you can just do the following step.

## Step1: Scan ingest

In [9]:
%%time

!python make.py ingest -

Ingest scans ...
CPU times: user 9.15 ms, sys: 7.67 ms, total: 16.8 ms
Wall time: 1.29 s


## Step 2: Scan processing

In [10]:
%%time

!python make.py scans

Process scans ...
CPU times: user 6.13 ms, sys: 7.51 ms, total: 13.6 ms
Wall time: 615 ms


## Step 3: From DOCX to TEI

In [11]:
%%time

!python make.py docx2tei $VERSION

DOCX ==> TEI files ...
DOCX => simple TEI per filza ...
DOCX => headers ...
Collecting transcribers ...
Collecting page scans ...
Collecting excel metadata ...
simple TEI per filza => enriched TEI per letter ...
CPU times: user 12.8 ms, sys: 9.18 ms, total: 22 ms
Wall time: 1.66 s


## Step 4: From TEI to TF

In [15]:
%%time

!python make.py tei2tf - $VERSION

TEI => TF ...
	Validating TEI ...
	Converting TEI ...
	Loading TF ...
CPU times: user 322 ms, sys: 119 ms, total: 441 ms
Wall time: 1min 9s


## Step 5: Mark named entities

In [16]:
%%time

!python make.py ner $VERSION

Annotate named entities ...
	Loading TF  ...
5 rows with a duplicate name:
  r305: William, Count of Nassau-Siegen
  r359: William of Orange
  r361: Maurice of Nassau
  r506: Henry II, Duke of Lorraine
  r645: Guillaume III de Melun
1 row without a name:
	e.g.: 365
149 rows without triggers:
	e.g.: 6, 8, 16, 21, 24, 26, 29, 30, 32, 34
Clash: Gugliemo di Nassau: r9 vs r50
Clash: Nicolò Perez: r75 vs r571
Clash: conte di Frusten: r148 vs r149
Clash: conte di Wanderlip: r150 vs r151
Clash: colonello Sciombergh: r253 vs r352
Clash: colonnello Sciombergh: r253 vs r352
Clash: signor di Rocalaura: r386 vs r459
Clash: colonel Rocalaura: r386 vs r459
Clash: monsignor di Rocalaura: r386 vs r459
Clash: colonello Rocalaura: r386 vs r459
	491 entities targeted with 6063 occurrences. See ~/gitlab.huc.knaw.nl/suriano/letters/_temp/ner/0.4.3/.people.0.6/hits.tsv
	Loading TF with entities ...
CPU times: user 306 ms, sys: 107 ms, total: 413 ms
Wall time: 1min 3s


## Step 6: Convert TF to WATM

In [3]:
%%time

!python make.py watm "$VERSION"e
#!python make.py watm "$VERSION"e --no-prod

TF => WATM ...
	Loading TF ...
	Making WATM for version 0.4.3e
	Writing WATM ...
	Testing WATM ...
	OK - whether all tests passed
CPU times: user 45.9 ms, sys: 20.7 ms, total: 66.7 ms
Wall time: 9.3 s


## Step 7: Generate IIIF manifests

### Development

In [4]:
%%time

!python make.py iiif "$VERSION"e
#!python make.py iiif "$VERSION"e --no-prod

Generate IIIF manifests ...
CPU times: user 13.7 ms, sys: 13.1 ms, total: 26.8 ms
Wall time: 2.24 s


## Step 8: Deploy to k8s and TeamText VM

In [None]:
%%time

!python make.py deploy
#!python make.py deploy --no-prod

# In debugging mode.

Now we dig a bit deeper, en do all the steps while keeping the program in memory.
Now it becomes doable to inspect all intermediate results.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from tf.app import use
from tf.convert.tei import TEI
from tf.convert.addnlp import NLPipeline
from tf.convert.watm import WATM
from tf.convert.iiif import IIIF
from tf.advanced.helpers import dm
from tf.core.files import fileCopy, fileExists

from processscans import Scans
from processdocs import TeiFromDocx, normalizeChars
from processhelpers import nerMeta, NER_NAME, NERIN_FILE, NEROUT_FILE, SOURCEBASE, PAGESEQ_JSON
from processhelpers import NER_NAME_M, NERIN_FILE_M, NEROUT_FILE_M

## Step 0: Initialization

**N.B.** Check the VERSION variable here!

In [4]:
VERSION

'0.7.1'

## Step 1: Scan ingest

In [5]:
SC = Scans(silent=False, force=False)

In [10]:
SC.ingest(dry=False)

	Already ingested covers. Remove ~/gitlab.huc.knaw.nl/suriano/letters/scans/covers or pass --force to ingest again
	Already ingested pages. Remove ~/gitlab.huc.knaw.nl/suriano/letters/scans/pages or pass --force to ingest again


## Step 2: Scan processing

In [11]:
SC = Scans(silent=False, force=False)
SC.process()

Already present: sizes file originals (covers)
Already present: sizes file originals (pages)
Already present: thumbnails (covers)
Already present: sizes file thumbnails (covers)
Already present: thumbnails (pages)
Already present: sizes file thumbnails (pages)


## Step 3: From DOCX to TEI

You might need to do

```
pip install docx2python
```

In [6]:
TFD = TeiFromDocx(silent=False)

In [13]:
TFD.task("pandoc")

DOCX => simple TEI per filza ...
	02.docx ... uptodate
	03.docx ... uptodate
	04.docx ... uptodate
	05.docx ... uptodate
	06.docx ... uptodate
	07.docx ... uptodate
	08.docx ... uptodate
	09.docx ... uptodate
	10.docx ... uptodate
	11.docx ... uptodate
	12.docx ... uptodate


In [14]:
TFD.task("headers")

DOCX => headers ...
	02.docx
	03.docx
	04.docx
	05.docx
	06.docx
	07.docx
	08.docx
	09.docx
	10.docx
	11.docx
	12.docx
	OK: All headers are OK
Angelo              : 1181 pages in 11 filzas
Cristina            : 1034 pages in 10 filzas
Federica            : 684 pages in  7 filzas
Filippo             : 1084 pages in  9 filzas
Flavia              : 1282 pages in 11 filzas
Giorgia             : 934 pages in  9 filzas
Renzo               :  56 pages in  1 filza 
Ruben               : 1162 pages in 10 filzas
Vera                : 966 pages in 10 filzas
Vera, Federica      : 210 pages in  1 filza 


In [15]:
TFD.task("tei")

Collecting transcribers ...
Collecting page scans ...
  0 x error
8766 x good
Collecting excel metadata ...
	found metadata for 725 letters
simple TEI per filza => enriched TEI per letter ...
	02.xml
	03.xml
	04.xml
	05.xml
	06.xml
	07.xml
	08.xml
	09.xml
	10.xml
	11.xml
	12.xml
Translated italian editorial phrases (219 x 14158)
Metadata in summary file corresponds to transcribed letters
Pages with    transcription and    scan:       8766
Pages with    transcription and missing scan:     0
Pages with    transcription and no scan:          0
Pages with no transcription and    scan:          0
See ~/gitlab.huc.knaw.nl/suriano/letters/datasource/transcriptions/report/scantrans.tsv


## Step 4: From TEI to TF

### Check the validity of the TEI.

In [7]:
Tei = TEI(verbose=-1, sourceBase=SOURCEBASE, tei="", tf=VERSION)

In [17]:
Tei.task(check=True, verbose=1, validate=True)

TEI to TF checking: ~/gitlab.huc.knaw.nl/suriano/letters/datasource/tei => ~/gitlab.huc.knaw.nl/suriano/letters/datasource/report
Processing instructions are ignored
XML validation will be performed
Analysing ~/github/annotation/text-fabric/tf/tools/tei/tei_all.xsd
	round   1: 232 changes
INFO: Needs dcr.xsd (exists)
Analysing ~/github/annotation/text-fabric/tf/tools/tei/tei_all.xsd
	round   1: 232 changes
Analysing ~/gitlab.huc.knaw.nl/suriano/letters/datasource/schema/suriano.xsd
	round   1:  68 changes
137 identical override(s)
  1 changing override(s)
	metamark mixed ==> pure
Section model I
	Start folder 02:
		   1 suriano                                001.xml                                           
		   2 suriano                                002.xml                                           
		   3 suriano                                003.xml                                           
		   4 suriano                                004.xml                                   

True

### Convert the data

In [18]:
Tei.good = True
Tei.task(convert=True, verbose=0)

Line model II with ln nodes for lines between lb elements
Page model II with page nodes for pages started by pb elements  keeping the pb elements
Section model I
Processing instructions are ignored
Analysing ~/github/annotation/text-fabric/tf/tools/tei/tei_all.xsd
	round   1: 232 changes
Analysing ~/github/annotation/text-fabric/tf/tools/tei/tei_all.xsd
	round   1: 232 changes
Analysing ~/gitlab.huc.knaw.nl/suriano/letters/datasource/schema/suriano.xsd
	round   1:  68 changes
137 identical override(s)
  1 changing override(s)
	metamark mixed ==> pure
  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s OK
   |     0.00s Following director... 
	Start folder 02:
		   1 suriano                                001.xml                                           
		   2 suriano                                002.xml                                           
		   3 suriano                                003.xml                         

True

### Configure a TF app

The TF app has configuration settings, a bit of custom code, and documentation.

Most of it will be generated now, but there are ways to keep custom additions intact.

In [19]:
Tei.task(app=True)

True

### Use the new dataset

The final proof that the conversion has worked is to load the data.
On first-time loading several checks and pre-computations are performed.
Next time the loading will be much quicker.

In [8]:
A = use(f"{Tei.org}/{Tei.repo}:clone", backend=Tei.backend, checkout="clone", silent="verbose", hoist=globals())

**Locating corpus resources ...**

This is Text-Fabric 12.5.4
37 features found and 0 ignored
  0.28s Dataset without structure sections in otext:no structure functions in the T-API
  1.01s All features loaded / computed - for details use TF.isLoaded()
  0.06s All additional features loaded - for details use TF.isLoaded()


Name,# of nodes,# slots / node,% coverage
folder,11,157692.09,100
file,725,2392.57,100
body,725,2202.1,92
text,725,2202.1,92
div,4148,736.35,176
table,243,217.58,3
teiHeader,725,190.47,8
page,8764,157.79,80
correspDesc,725,118.49,5
sourceDesc,725,56.9,2


## Step 5: Mark named entities

First stage: we use the human-crafted triggers as is.

In the separate notebook [nerCorrect](nerCorrect.ipynb) you can then collect
additional spelling variants of the triggers.

Some of these triggers are not fit to be used, you have to make a list of variants that are not valid triggers
and store them in the file `novariant.txt`, one per line.

Then you can merge the new valid variants automatically with the human-crafted triggers, and
the result is a new spreadsheet, whose name starts with the name of the original spreadsheet and ends with `-merged`.

All this happens in the nerCorrect notebook.

Second stage: use the merged spreadsheet.

If we detect he presence of the merged spreadsheet, we do the second stage. If not, we do the first stage.

In [77]:
if fileExists(NERIN_FILE_M):
    nerStage = 2
    nerName = NER_NAME_M
    nerInFile = NERIN_FILE_M
    nerOutFile = NEROUT_FILE_M
    fileCopy(NERIN_FILE, NEROUT_FILE)
else:
    nerStage = 1
    nerName = NER_NAME
    nerInFile = NERIN_FILE
    nerOutFile = NEROUT_FILE
    
fileCopy(nerInFile, nerOutFile)
print(f"Stage {nerStage}: working with sheet {nerName}")

Stage 2: working with sheet persons-merged


In [78]:
NE = A.makeNer(normalizeChars=normalizeChars, caseSensitive=False, silent=False)

In [84]:
NE.setTask(f".{nerName}", force=True)

Annotation set 🧾 persons-merged has 12487 annotations
SHEET data: computing from scratch ...
--------------
Reading sheets
--------------
Sheet with 843 rows and 18 columns


1 row with a duplicate name:
  r520: () Henry II, Duke of Lorraine also in r21
2 rows with scope mistakes:
  r325: not a valid location: ADD RANGE
  r597: not a valid location: ADD RANGE
9 rows without triggers:
	e.g.: 127, 214, 232, 233, 234, 424, 452, 463, 487



-------------------
Checking scopes ...
-------------------

--
()
--


Ambi: 'colonello broch': 
  Sir William Brog: 45
  William Brog: 819
Ambi: 'colonello gre': 
  Paulus Grenu: 215
  Sir Andrew Gray: 654
Ambi: 'conte di candal': 
  Henri de Nogaret de la Valette: 523
  Henry de Nogaret de La Valette: 779
Ambi: 'conte di solms': 
  Count Frederick of Solms-Rödelheim: 357
  Johann Albrecht I, Count of Solms-Braunfels: 484
Ambi: 'conti di solms': 
  Count Frederick of Solms-Rödelheim: 357
  Johann Albrecht I, Count of Solms-Braunfels: 484
Ambi: 'gugliemo di nassau': 
  William, Count of Nassau-Siegen: 8
  William Louis, Count of Nassau-Dillenburg: 47
Ambi: 'monsignor di betuna': 
  Syrius de Béthune: 139
  Philippe de Béthune: 271
Ambi: 're di polonia': 
  Ladislaus IV of Poland: 211
  Sigismund III Vasa: 497
Ambi: 'signor marioni': 
  Pier Antonio Mariano: 481
  Pier Antonio Marioni: 600
Ambiguous triggers: 9 x



--
02
--

--
03
--

--
04
--

--
05
--

--
06
--

--
07
--

-------------
08.001-08.089
-------------

---------
08.090-08
---------

---------
09-09.021
---------

---------
09.022-09
---------

---------
10-10.017
---------

-------------
10.018-10.020
-------------

---------
10.021-10
---------

------------
11-11.026.22
------------

---------
11.026.23
---------

------------
11.026.24-11
------------

------------
12-12.037.27
------------

---------
12.037.28
---------

----------------
12.037.29-12.041
----------------

-------------
12.042-12.044
-------------

---------
12.045-12
---------
  0.00s Looking up occurrences of many candidates ...
0
duchessa => {('duchessa', 'di', 'sassonia')}
1
duchessa => {('vedova', 'duchessa', 'di', 'sassonia')}
2
3
4
5
6
7
8
9
bStrings=('nel', 'discorso', 'venendosi', 'a', 'proposito', 'della', 'paga', 'che', 'si', 'dava', 'dalla', 'lega', 'del', 'signor', 'conte', ';', 'disse', 'liberamente', 'sua', 'eccellenza', 'non', 'vego', ',', 'che',

In [80]:
NE.reportHits(showNoHits=True)

No slot is covered by more than one trigger


Triggers without hits: 153x:


Looking up 153 triggers in 2 passes over the corpus .

duchessa di bransvich (): 	12@032:44 x 1
durante di prigné (): 	03@018:11 x 1
signor conte gioan (02-05): 	06@034:160 x 1
ambasciator de francia (03): 	12@030:25 x 1
ambasciatore di franza (03): 	02@012:15 x 1, 08@059:18 x 1
ambasciatori francia (03): 	04@036:9 x 1
conte di olach (03): 	12@026:145 x 1
signori francesco (03): 	10@088:8 x 1
ambasciator inglese straordinario (04): 	08@067:14 x 1
don jan (04): 	09@023:43 x 1
maximiliano (04): 	10@081:93 x 1
signor di brederad (04): 	11@016:41 x 1
figliolo maggiore (08): 	04@016:7 x 1, 11@100:23 x 1
la sua sola (08): 	03@038:21 x 1, 04@052:12 x 1
la sua spesa (08): 	09@034:41 x 1
elettore di brandenburgh (08-12): 	03@030:61 x 2
marchese di brandemburg (08-12): 	02@005:8 x 1, 02@012:16 x 1, 06@031:35 x 1
marchese elettore di brandemburg (08-12): 	03@029:47 x 1
eccellentissimo signor general (08.001-08.089): 	03@018:7 x 1, 04@060:7 x 1, 04@071:7 x 1, 04@075:7 x 1, 05@003:12 x 1, 05@009:13 x 1, 05@042:14 x 1, 05@048:20 x 1, 11@066:13 x 1, 12@0

.

jan janson vander linden (): 	07@018:50 x 1
eccellentissimo signor general o (08.001-08.089): 	05@048:20 x 1
a suo figliolo (11.026.23): 	04@028:8 x 1
al figliolo (11.026.23): 	05@049:36 x 1, 11@039:9 x 1, 12@010:18 x 1



Entities targeted:            780
Triggers searched for:       2088
Triggers without hits:        153
 - completely covered:        119
 - missing hits:               34
Triggers with hits:          1935
Total hits:                 12487

All hits in report file:      ~/gitlab.huc.knaw.nl/suriano/letters/_temp/ner/0.7.1/.persons-merged/hits.tsv
Triggers by slot in file:     ~/gitlab.huc.knaw.nl/suriano/letters/_temp/ner/0.7.1/.persons-merged/triggerBySlot.tsv


In [81]:
NE.triggerInterference()

Looking up 1507 interferences in 8 passes over the corpus ........
409 conflicting trigger pairs with 2414 conflicts
----------
same row (285 pairs)
----------
3            ()          : «hertocson»
3            ()          : «dirich hertocson»
              occurrences:  08@010:49 x 1, 08@010:70 x 2
22           ()          : «stotemburgh»
22           ()          : «guglielmo di oldenbernevelt signor di stotemburgh»
              occurrences:  11@099:34 x 1
22           ()          : «stotenburgh»
22           ()          : «stotenburgh l ’ altro figliolo minore di bernvelt»
              occurrences:  12@007:11 x 1
42           ()          : «doublet»
42           ()          : «commissario doublet»
              occurrences:  04@072:13 x 1
49           ()          : «trombol»
49           ()          : «agente d ’ inghilterra trombol»
              occurrences:  10@037:12 x 1
53           ()          : «quast»
53           ()          : «capitan quast»
              occurrences:  0

In [21]:
if nerStage == 1:
    NE.makeSheetOfSingleTokens()
    NE.setTask(f".{NER_NAME}-single", caseSensitive=False, force=True)
    NE.reportHits(showNoHits=True)
    NE.setTask(f".{NER_NAME}", caseSensitive=False)

In [80]:
nerMeta(*NE.getMeta(), silent=False)

In [51]:
NE.bakeEntities()

Entity consolidation for 9786 entity occurrences into version 0.7.1e
  9786 entity occurrences
   736 distinct entities
  0.00s Creating a dataset with entity nodes ...
  0.00s preparing and checking ...
  0.00s Feature overview: 34 for nodes; 2 for edges; 1 configs; 9 computed
   |     2.78s done
   |   Delete types: t                   : keep:   shift  nodes       1-1734392 to         1-1734392
   |   Delete types: author              : keep:   shift  nodes 1734393-1735117 to   1734393-1735117
   |   Delete types: bibl                : keep:   shift  nodes 1735118-1735842 to   1735118-1735842
   |   Delete types: biblScope           : keep:   shift  nodes 1735843-1736567 to   1735843-1736567
   |   Delete types: body                : keep:   shift  nodes 1736568-1737292 to   1736568-1737292
   |   Delete types: cell                : keep:   shift  nodes 1737293-1751627 to   1737293-1751627
   |   Delete types: chunk               : keep:   shift  nodes 1751628-1798599 to   1751628-17

True

We load the new data:

In [52]:
A = use(f"{Tei.org}/{Tei.repo}:clone", backend=Tei.backend, checkout="clone", silent="verbose", hoist=globals())

**Locating corpus resources ...**

This is Text-Fabric 12.5.4
40 features found and 0 ignored
   |     0.57s T otype                from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |     6.34s T oslots               from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
  6.91s Dataset without structure sections in otext:no structure functions in the T-API
   |     4.20s T str                  from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |     3.50s T after                from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |     0.09s T chunk                from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |     0.00s T file                 from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |     0.00s T folder               from ~/gitlab.huc.knaw.nl/suriano/letters/tf/0.7.1e
   |      |     0.13s C __levels__           from otype, oslots, otext
   |      |       17s C __order__            from otype, oslots, __levels__
   |      |     0.44s C __rank__             from otype, __order__
   |     

Name,# of nodes,# slots / node,% coverage
folder,11,157672.0,100
file,725,2392.26,100
body,725,2201.79,92
text,725,2201.79,92
div,4148,736.25,176
table,243,217.58,3
teiHeader,725,190.47,8
page,8766,157.74,80
correspDesc,725,118.49,5
sourceDesc,725,56.9,2


## Step 6: Convert TF to WATM

N.B. For docs click the WATM link in the output cell.

In [28]:
WA = WATM(A, "tei", skipMeta=False, prod=True)
# WA = WATM(A, "tei", skipMeta=False, prod=False)
WA.makeText()
WA.makeAnno()
WA.writeAll()
WA.testAll()

textRepoLevel is section level 'folder'


[WATM exporter documentation](https://annotation.github.io/text-fabric/tf/convert/watm.html)

	Writing WATM ...
Writing production data to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod
Text file    0:    44334 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-0.tsv
Text file    1:   117699 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-1.tsv
Text file    2:   147380 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-2.tsv
Text file    3:   109234 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-3.tsv
Text file    4:   154253 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-4.tsv
Text file   10:   131948 segments to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/text-10.tsv
Text files all:  1734533 segments to 11 files
Anno file    1:   400000 annotations written to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-026/prod/anno-1.tsv
Anno file    2:   400000 annotations written to ~/gitlab.huc.knaw.nl/suriano/letters/watm/0.7.0e-

## Step 7: Generate IIIF manifests

In [29]:
II = IIIF(Tei.teiVersion, A, PAGESEQ_JSON, prod=True, silent=False)
# II = IIIF(Tei.teiVersion, A, prod=False, silent=False)
II.manifests()

Maximum dimensions: W = 8504 H = 5976
Average dimensions: W = 4138 H = 4869
Average deviation:  W = 1038 H =  660
Maximum dimensions: W = 5600 H = 5786
Average dimensions: W = 3253 H = 4359
Average deviation:  W =  477 H =  607
Collections:
   02 with  262 pages
   03 with  660 pages
   04 with  806 pages
   05 with  688 pages
   06 with  684 pages
   07 with  628 pages
   08 with  988 pages
   09 with  946 pages
   10 with 1062 pages
   11 with 1116 pages
   12 with  928 pages
IIIF manifests generated in ~/gitlab.huc.knaw.nl/suriano/letters/static/prod/manifests


## Step 8: Deploy to k8s and TeamText VM

NB: you need to have access to the k8s cluster and to the team text VM.

That means:

* The LDAP of the relevant k8s clusters know you
* You have an ssh key-based login on the Team Text VPN
* You work inside the firewall

In [30]:
!./provision.sh watm

k-suite enabled
Context "k8s-10-26-2-0" modified.

Quick access to iiif-suriano : type khelp for an overview of commands.

WATM export version: 0.7.0e-026
anno-1.tsv                                    100%   17MB   6.5MB/s   00:02    
anno-3.tsv                                    100% 9659KB   7.6MB/s   00:01    
anno-2.tsv                                    100%   14MB   5.4MB/s   00:02    
anno2node.tsv                                 100% 5620KB   6.8MB/s   00:00    
text-6.tsv                                    100% 1227KB   2.8MB/s   00:00    
text-7.tsv                                    100% 1193KB   6.2MB/s   00:00    
text-5.tsv                                    100% 1014KB   3.7MB/s   00:00    
text-4.tsv                                    100%  910KB   2.0MB/s   00:00    
text-0.tsv                                    100%  267KB   1.1MB/s   00:00    
text-1.tsv                                    100%  704KB   6.5MB/s   00:00    
text-3.tsv                                   

In [31]:
!./provision.sh files

k-suite enabled
Context "k8s-10-26-2-0" modified.

Quick access to iiif-suriano : type khelp for an overview of commands.

copying to pod: prod/covers.html
copying to pod: prod/logo
copying to pod: prod/manifests
copying to pod: both/metadata


In [None]:
!./provision.sh prod images

## Step 9: Test the images

* [covers](https://data.suriano.huygens.knaw.nl/files/covers.html)

* [02.json](https://data.suriano.huygens.knaw.nl/files/manifests/02.json)

* [page 02_171r](https://data.suriano.huygens.knaw.nl/iiif/3/pages%2F02_071r.jpg/full/max/0/default.jpg)