## Parsing REPBULIC hOCR Files

The scans of the printed RSG volumes have the following characteristics

- all scans:
  - have two pages per scan
  - have up to 4 columns per scan, 2 per page 
  - full scan is around 4800 pixels wide, left page is up to pixel 2400, right page is from pixel 2400 (roughly)
- scans of index pages
  - have no page numbers
- scans of resolution pages
  - have page numbers (left-side page is even, right-side page is odd)
  
### Columns

The scans are normalized such that the columns are straight. The text width should be around 1000 pixels. Some columns are not cut out properly, resulting in columns that are either to small (some of the column text is missing), or too wide (the hOCR output contains partial texts from two columns)

### Index pages

- start of entry: 
  - start left alignment
- end of entry:
  - end of line possibly before end of text column. 
  - One or more page numbers


### Resolution pages

- header:
  - next top of page (less than 350 pixels from the top)
  - page has header with:
    - even numbered pages: date page_number year
    - odd numbered pages: year page_number date
  - columns have half of page header, e.g.:
    - even numbered pages: 
      - first column: date left aligned and part of page_number right aligned
      - second column: part of page_number left aligned and year right aligned
    - odd numbered pages: 
      - first column: year left aligned and part of page_number right aligned
      - second column: part of page_number left aligned and date right aligned
      
### Viewer

- page viewer: https://images.huygens.knaw.nl/assets/argos/index.html
- list of page URLs: https://images.huygens.knaw.nl/api/argos



In [1]:
%reload_ext autoreload
%autoreload 2

In [13]:
import json
import os
import re
from collections import defaultdict
from parse_hocr_files import make_hocr_page
from parse_republic_hocr_files import get_files, get_page_types, count_page_ref_lines, get_index_entry_lines
from elasticsearch import Elasticsearch

# ../hocr/NL-HaNA_1.01.02_3780_0016.jpg-0-251-98--0.40.hocr



### Reading column scans for a single volume

1. get scan file info
    - scan number, page number, page side, column number, slant, page
2. iterate over pages
    - create hocr_page
    - determine page type: index, resolution, other
    

In [11]:
data_dir = "../../../Data/Projects/REPUBLIC/hocr/1725/"

scan_files = get_files(data_dir)
scan_files.sort(key = lambda x: x["scan_num_column_num"])
print("Number of scan files:", len(scan_files))
#print(json.dumps(scan_files[0:11], indent=2))

scan_columns = defaultdict(list)
for scan_file in scan_files:
    scan_columns[scan_file["scan_num"]]


hocr_page = make_hocr_page(scan_files[10]["filepath"], scan_files[10]["scan_num"], remove_line_numbers=False, remove_tiny_words=True, tiny_word_width=15)
#print(scan_files[10]["filepath"])
#hocr_page.lines

Number of scan files: 1887


### Determining Page Type

We want to parse index pages differently from resolution pages and filter out non-text pages and pages where the columns are not properly identified.


In [8]:


index_scans, resolution_scans = get_page_types(scan_files, max_scan_num=70)
print("index_scans:", index_scans)
print("resolution_scans:", resolution_scans)



scan: 4 	scan_id: scan-4-even-0
	COLUMN IMPROPERLY CUT
scan: 4 	scan_id: scan-4-odd-1
	COLUMN IMPROPERLY CUT
scan: 5 	scan_id: scan-5-even-0
\OTHER PAGE
scan: 5 	scan_id: scan-5-odd-1
\OTHER PAGE
scan: 6 	scan_id: scan-6-even-0
	COLUMN IMPROPERLY CUT
scan: 6 	scan_id: scan-6-even-1
	INDEX PAGE
scan: 6 	scan_id: scan-6-odd-2
	INDEX PAGE
scan: 6 	scan_id: scan-6-odd-3
	INDEX PAGE
scan: 7 	scan_id: scan-7-even-0
	INDEX PAGE
scan: 7 	scan_id: scan-7-even-1
	INDEX PAGE
scan: 7 	scan_id: scan-7-odd-2
	INDEX PAGE
scan: 7 	scan_id: scan-7-odd-3
	INDEX PAGE
scan: 8 	scan_id: scan-8-even-0
	INDEX PAGE
scan: 8 	scan_id: scan-8-even-1
	INDEX PAGE
scan: 8 	scan_id: scan-8-odd-2
	INDEX PAGE
scan: 8 	scan_id: scan-8-odd-3
	INDEX PAGE
scan: 9 	scan_id: scan-9-even-0
	INDEX PAGE
scan: 9 	scan_id: scan-9-even-1
	COLUMN IMPROPERLY CUT
scan: 9 	scan_id: scan-9-odd-2
	INDEX PAGE
scan: 9 	scan_id: scan-9-odd-3
	INDEX PAGE
scan: 10 	scan_id: scan-10-even-0
	INDEX PAGE
scan: 10 	scan_id: scan-10-odd-2
	INDEX 

	RESOLUTION PAGE
scan: 53 	scan_id: scan-53-odd-3
	RESOLUTION PAGE
scan: 54 	scan_id: scan-54-even-0
\OTHER PAGE
scan: 54 	scan_id: scan-54-even-1
	RESOLUTION PAGE
scan: 54 	scan_id: scan-54-odd-2
\OTHER PAGE
scan: 54 	scan_id: scan-54-odd-3
	RESOLUTION PAGE
scan: 55 	scan_id: scan-55-even-0
\OTHER PAGE
scan: 55 	scan_id: scan-55-even-1
\OTHER PAGE
scan: 55 	scan_id: scan-55-odd-2
\OTHER PAGE
scan: 55 	scan_id: scan-55-odd-3
	RESOLUTION PAGE
scan: 56 	scan_id: scan-56-even-0
	RESOLUTION PAGE
scan: 56 	scan_id: scan-56-even-1
	RESOLUTION PAGE
scan: 56 	scan_id: scan-56-odd-2
	RESOLUTION PAGE
scan: 56 	scan_id: scan-56-odd-3
	RESOLUTION PAGE
scan: 57 	scan_id: scan-57-even-0
	RESOLUTION PAGE
scan: 57 	scan_id: scan-57-odd-2
\OTHER PAGE
scan: 57 	scan_id: scan-57-odd-3
\OTHER PAGE
scan: 58 	scan_id: scan-58-even-0
	RESOLUTION PAGE
scan: 58 	scan_id: scan-58-even-1
\OTHER PAGE
scan: 58 	scan_id: scan-58-odd-2
	RESOLUTION PAGE
scan: 59 	scan_id: scan-59-even-0
	RESOLUTION PAGE
scan: 59 	sca

In [None]:
print(sorted(index_scans))

print(sorted(resolution_scans))

### Parsing and Preprocessing Index Pages

- filter tiny and huge text elements (i.e. deviating from average character/word width and height
- extract page lines that are part of the main text body containing index entries
- insert and clean up repetition symbols in index entries
    - determine length of repetition symbol
    - identify and replace mis-recognized repetition symbols


In [16]:

hocr_index_pages = []

for scan_file in scan_files:
    if scan_file["scan_num_column_num"] not in index_scans or scan_file["scan_num_column_num"] in resolution_scans:
        continue
    column_id = "{}-{}".format(scan_file["scan_num"], scan_file["scan_column"])
    hocr_index_page = make_hocr_page(scan_file["filepath"], column_id, remove_line_numbers=False, remove_tiny_words=True, tiny_word_width=6)
    hocr_index_page.scan_info = scan_file
    hocr_index_page.scan_info["num_page_ref_lines"] = count_page_ref_lines(hocr_page)
    hocr_index_pages += [hocr_index_page]



first_line = {}
index_header_found = {}
index_entry_lines = defaultdict(list)

for hocr_index_page in hocr_index_pages:
    lines = get_index_entry_lines(hocr_index_page)
    index_entry_lines[hocr_index_page.scan_info["scan_num_column_num"]] = lines
    first_line[hocr_index_page.scan_info["scan_num_column_num"]] = lines[0]





6.1
	IS INDEX HEADER: 0 241 	## D           E           X.
6.2
	IS INDEX HEADER: 0 223 	##                        I           N           l
6.3
	IS INDEX HEADER: 0 204 	##           E           Xx.
7.0
	IS IN TOP MARGIN
	IS INDEX HEADER: 1 240 	##                      I           N.         l
7.1
	IS INDEX HEADER: 0 246 	##            E           Xx.
7.2
	IS INDEX HEADER: 0 230 	##                        Ï           N
7.3
	IS IN TOP MARGIN
	IS INDEX HEADER: 1 231 	## )           Ë           x7
8.0
	IS INDEX HEADER: 0 278 	##                         I           N
8.1
	IS INDEX HEADER: 0 271 	## D           È           X.
8.2
	IS IN TOP MARGIN
	IS INDEX HEADER: 1 254 	##               id           N           |
8.3
	IS INDEX HEADER: 0 260 	##            E           p.
9.0
	IS INDEX HEADER: 0 251 	##                      I           N           I
9.2
	IS INDEX HEADER: 0 225 	##    *                     Í           N           I
9.3
	IS INDEX HEADER: 0 206 	## D           E           Xx.
1

In [29]:
from parse_republic_hocr_files import index_lemmata
avg_left = 0
lemma_index = defaultdict(list)
curr_lemma = None

    
for scan_id in index_entry_lines:
    print(scan_id, len(index_entry_lines[scan_id]))
    index_lemmata(scan_id, index_entry_lines[scan_id], lemma_index, curr_lemma)


6.1 61
0 116 -20 start 	      ——  Scheepen tot  de  Equipagie   gereedt.
	PAGE_REFS: [185]
1 172 40 continue_stop 	         185.
	PAGE_REFS: [213]
2 115 -22 start_stop 	      —— (Gy/en aange[telt tot Capiteyn.  213.
3 115 -19 start 	      —— Commijie wegens Hollandt  voor den’
	PAGE_REFS: [260]
4 162 25 continue_stop 	         Heere Gallas.   260.
5 113 -29 start 	      —  confideratien  op  het  kleyn  Zegel.
	PAGE_REFS: [275]
6 166 31 continue_stop 	         275.
7 113 -27 start 	      —— Carrier  aangefteldt  tot  Contrerolleur
8 162 22 continue 	         van  de  Convoyen  en  Licenten  te  Venlo,
	PAGE_REFS: [320]
9 165 26 continue_stop 	         320.
10 112 -27 start 	      —— Commiie wegens Hollandt voor den
	PAGE_REFS: [367]
11 159 21 continue_stop 	         Heere van Leyden.   367.
12 109 -28 start 	      ——— gepermitteert  het  Schip  Overy(el te
	PAGE_REFS: [383]
13 158 23 continue_stop 	         verkoopen.  383.
14 110 -24 start 	      —_— bericht op de Memorie van den Mar-

16 108 6 continue 	      doeren hare pretenfien ‘ende grieven tot het
17 111 14 continue 	       beffen van  de Convoyen  en Licentemop de
18 111 18 continue 	       Middelgronden en aan den Abt, te exami-
	PAGE_REFS: [178]
19 110 16 continue_stop 	       ueeren.   178.
20 62 -33 start 	    ——— Commifie wegens  Utrecht woor den
	PAGE_REFS: [236]
21 112 18 continue_stop 	       Heer de Leeuw.  236.
22 63 -26 start 	    ——  Burmania  aangefteldt 101 Gapiteyn.
	PAGE_REFS: [258]
23 115 24 continue_stop 	       258.
24 65 -21 start 	    —— Commmijfie wegens Vrieslandt voor Pe-
	PAGE_REFS: [76]
25 105 19 continue_stop 	      gelin van Claarbergen.   76.
26 66 -21 start 	    —— item  wegens Vrieslandt voor Bìncke
	PAGE_REFS: [280]
27 121 28 continue_stop 	       Symons.   280.
28 66 -27 start 	    — item wegens Frieslandt voor Rengers.
29 65 -29 start 	    —— te berichten op de klaghtenvan Mey-
30 118 22 continue 	       aertshagen, nopenie bet aanhouden van vier
31 117 20 continue 	       ho

30 123 18 continue 	       Neufted en [oo nodi:h is na Keulen te gaan.
	PAGE_REFS: [209]
31 122 12 continue_stop 	       209.
32 77 -29 start 	     —— bericht  van het gepaffverde in confes
33 122 15 continue 	       rentie over de differenten tu(fchen de Chhr-
34 117 9 continue 	       furften  van  Mentz  cn  van  de  Paltz.
	PAGE_REFS: [422]
35 127 18 continue_stop 	       422.
36 79 -20 start 	     _—— Wegers ;  refolatie  wegens het werck
	PAGE_REFS: [591]
37 128 27 continue_stop 	       van Polen genomen en welgevallen.   591.
38 83 -8 start 	     ——  gepermitteert  aam  arme  Pa(agiers
39 130 42 continue 	       bonderdt  ‘vyftigh  guldens  uyt  te  keeren.
	PAGE_REFS: [633]
40 31 -46 start_stop 	        —— 633.
41 96 24 continue 	      er    gepermitteert den vouw te declareeren
	PAGE_REFS: [859]
42 31 -30 start_stop 	        —— over den Hertogh van Aofta.   859.
43 87 32 continue 	     ——  hondert  vyftigh  guldens  toegeleght.
	PAGE_REFS: [60]
44 31 -20 start_stop 	 —— 60.
45

17 27 -28 start 	  — verkoght aan Minure , mits betalen-
18 72 17 continue 	     de vier duy[ent drie hondert vyf en [eventigh
19 71 10 continue 	     guldens Hollandts.  66.              i
20 31 -26 start 	  —— tg berichten op het ver[oek van Thie-
	PAGE_REFS: [93]
21 76 19 continue_stop 	     en.  93.
22 47 -16 start 	   —— berichs  dien  aangaande  en refolutie.
	PAGE_REFS: [133]
23 85 21 continue_stop 	     133.
24 30 -31 start 	   —— Ouitantie waar by  blijckt ,  dat de
25 81 15 continue 	     vier duyfent drie hondert vijf en [eventigh
26 75 8 continue 	     guldens  aan den Ontfanger  Hogendorp fijn
	PAGE_REFS: [184]
27 82 12 continue_stop 	     betaalt.   184.
28 44 -26 start 	   —_       gelaft aan de Capittelen aldaar te of-
29 81 7 continue 	     donneeren, dat wanneer yemandt gencegen is
30 82 9 continue 	     Prebende  te defigneeren  daar  van kenniffe
31 71 0 start 	     /al moeten geven aan den Prooft ,  en  aan
	PAGE_REFS: [639]
32 84 12 continue_stop 	     de Rolle va

	PAGE_REFS: [309]
55 80 22 continue_stop 	    den Grifier vander Haar.   309.
	PAGE_REFS: [337]
56 33 -31 start_stop 	  —— bericht te examineren.   337.
57 33 -30 start 	  —— Refident Peflers te vernemen na den
58 81 14 continue 	    rangh tulfeben den Advocaat Fifcaal en Pro-
59 83 18 continue 	     cureur  Generaal met  den Grifjier van den
	PAGE_REFS: [469]
60 85 22 continue_stop 	     Raadt van Brabandt tot Bruffel.   469.
14.1 62
22 Gap between line start and first word: 37 129 BIT
32 Gap between line start and first word: 71 167 ne rapport en duyfent guldens toegeleght.
44 Gap between line start and first word: 0 107 examineeren. 516.
0 84 -19 start 	     —_——— item om negentien kiflen met Meubi-
	PAGE_REFS: [701]
1 132 32 continue_stop 	        len in te voeren.  701.
	PAGE_REFS: [775]
2 85 -20 start_stop 	     — om conferentie.  775.
3 84 -18 start 	     —— Pafport om  drie Paarden na Enge-
	PAGE_REFS: [879]
4 132 27 continue_stop 	        landt te mogen uytvoeren:  879.
5 85 -

17 76 20 continue_stop 	     mogen omjlaan.   12.
18 29 -32 start 	  ———  item  aan  Regenten  van  Caftelre,
19 85 28 continue 	     ——
20 34 -23 start 	   —— Prince van Hornes  nader beright op
21 80 15 continue 	     bet contra beright van la Motte ,’s Landts
	PAGE_REFS: [15]
22 79 21 continue_stop 	     Advocaten te adviferen.   15.
23 34 -30 start 	   —— Waltheni klaghte over feytelijckbeden,
24 83 25 continue 	     ver[oeckende Sauvegarde, 9c. de Raadt vat
	PAGE_REFS: [17]
25 86 23 continue_stop 	     Brabandt te advijeren.   17.
26 12 -46 start 	  a ——  Regenten van Moergeftel executoriaal
27 83 24 continue 	     verleent.  21,                      .
28 35 -29 start 	   —— Regenten  van  Vught  en  Crömvoirt
	PAGE_REFS: [24]
29 79 20 continue_stop 	     gepermittcert fe mogen omflaan.   24.
30 38 -21 start 	   —— Requefte van  het  meerder getal def
31 85 23 continue 	     Magiftraat als antidotaal ter Griffie te fê-
	PAGE_REFS: [29]
32 77 14 continue_stop 	     poneren.   29.
3

54 7 -27 start_stop 	       —— ven en Zeelft.  778.
55 69 35 continue 	    —— Regenten  van Valkenswaart om re-
56 7 -23 start 	       —— mijie‚ de. Raadt van Staate te advijeeren.
	PAGE_REFS: [778]
57 7 -26 start_stop 	       —— 778.
58 67 39 continue 	    — item  op  het  ver[oeck  van Regenten
	PAGE_REFS: [781]
59 7 -25 start_stop 	      —— van Oirfchot om remijie.‚ 781.
60 35 11 continue 	   —— 7     - diem
18.4 61
2 Gap between line start and first word: 1 89 vifeeren. 879.
25 Gap between line start and first word: 11 95  Neter[el remijjie verleent. 914.
34 Gap between line start and first word: 0 17 ig
44 Gap between line start and first word: 41 97 fie, de Raadt van Staate te advifteren.
47 Gap between line start and first word: 0 109 van Mierlo om remiie. 924.
	PAGE_REFS: [0]
0 20 -46 start_stop 	  ——  Regenten van Aarlerixtel om te 0:
1 83 14 continue 	     gen omflaan ‚  de Raadt van Staate te ads
	PAGE_REFS: [879]
2 89 24 continue_stop 	     vifeeren.  879.
3 50 -18 start 	 

42 103 -31 start 	      ——— Pafport om eenigh Geweer uyt en in
	PAGE_REFS: [435]
43 150 22 continue_stop 	         te mogen voeren.  435.
44 101 -25 start 	      —— foegefonden  haar  Hoogh Mog. Refo-
45 147 37 continue 	        Intie  tegens  Heydens  ende  Landtloopers.
	PAGE_REFS: [4903]
46 151 47 continue_stop 	         4903.
	PAGE_REFS: [650]
47 98 6 continue_stop 	      — drie maanden verlof.   650.
48 95 9 continue 	      de  Hondt ,  Hambeaumaackfter  declaratie.
	PAGE_REFS: [163]
49 0 -74 start_stop 	 —— 163.
50 93 25 continue 	      Aeio,  Pa/[port om na Italien te reyfèn en
	PAGE_REFS: [82]
51 0 -51 start_stop 	        —— twee ducatons tot reysgeldt.   82.
52 90 40 continue 	     van Hoorn aangefteldt  tot Commis  Generaal
	PAGE_REFS: [658]
53 0 -39 start_stop 	        —— in Zeelandt.   658.
54 87 48 continue 	     Hoornbeeck gelaf twee hondert vyfrigh gul-
	PAGE_REFS: [181]
55 0 -29 start_stop 	        —— dens uyt te keeren aan Silder.   181.
56 85 53 continue 	     — item 

36 123 21 continue 	       de grieven van den Staat wegens de  Com-
	PAGE_REFS: [924]
37 121 14 continue_stop 	       mercie.  924.
38 85 -20 start 	     —— Memorie  ordre  om  te handelen we
39 116 11 continue 	       gens  de  Compagnie  van  Commercie  in de
40 120 10 continue 	       Oo/tenrijck{che Nederlanden, en baar Hoogh
41 121 12 continue 	       Mog. dehorterende van  de acceffie  aan het
	PAGE_REFS: [928]
42 116 13 continue_stop 	       Trattaat van Hanover.  928.
43 70 -30 start 	     ——  pouvoir  om  te  handelen  wegens de
44 118 18 continue 	       Vaart  uyt  de Ooftenrijck[ihe  Nederlanden
	PAGE_REFS: [941]
45 115 22 continue_stop 	       op de Indiën, te examineeren.   941.
	PAGE_REFS: [945]
46 66 -25 start_stop 	    — rapport dien aangaande.  945.
47 66 -19 start 	    —— Pa/port  om eenige Goederen na En-
	PAGE_REFS: [955]
48 109 19 continue_stop 	      gelandt te mogen uytvoeren.  955.
49 64 -24 start 	    —— nader Memorie wegens de Ooftenrijk-
	PAGE_REFS: [959]
50

33 115 22 continue 	       we te maken  Steenwegh  tuijchen  Doornick
34 111 12 continue 	       ende. Cortrijck van den Tol vry mogen we-
	PAGE_REFS: [257]
35 106 4 continue_stop 	      fie   257;
36 69 -33 start 	    —— Raadt  van Staate  ver[oght voor fic-
37 119 21 continue 	       ninge te. doen ,  ten  eynde alle tijdt op bet
38 118 19 continue 	       LKa/[teel wan Doornick voor eenige maanden
39 116 22 continue 	       voorraat . van  Granen  en  Levensmiddelen
	PAGE_REFS: [267]
40 74 -22 start_stop 	     …  voor het Garnifoen  ese   Zijn,  267.
41 58 -44 start 	    ie        Prins Willem  ‘van Heffen [es maan=
	PAGE_REFS: [275]
42 121 24 continue_stop 	       den verlof.  275.
43 72 -26 start 	     — te overleggen of niet veranderinge smoct
44 121 23 continue 	       werden gemaakt op den  tijdt  van de Gou-
	PAGE_REFS: [275]
45 121 22 continue_stop 	       verneurs en Commandeurs.   275.
46 75 -31 start 	     ——— Pasport.  op  de Monteeringe voor het
47 125 24 continue 	     

21 100 25 continue_stop 	      gaande devoir te doen.   596.
22 56 -23 start 	    ——  Ritzins redenen waarom op de Pagt-
23 102 20 continue 	      verhuyringe  niet  na  Aurick  hadt  kunnen
	PAGE_REFS: [597]
24 98 21 continue_stop 	      gaan.   597.
25 53 -29 start 	    ——— Brieven van recredentie  voor Gede-
26 95 17 continue 	      Duteerden van de ordinaris Adminiftratoren.
	PAGE_REFS: [197]
27 82 -2 start_stop 	     197.
28 56 -28 start 	    —— Brieven van recredentie  voor Hefe-
	PAGE_REFS: [597]
29 102 18 continue_stop 	      nek.   597.
30 65 -19 start 	     —— Ferf ,  notificeerende  het  overlijden
31 104 18 continue 	      van fijn Broeder ,  met  refcriptie van con-
32 103 20 continue 	      doleantie  in  civile  termen  beantwoordt.
	PAGE_REFS: [619]
33 102 25 continue_stop 	      619.
34 54 -18 start 	    —— Regenten van Embden om baar Hoogh
	PAGE_REFS: [629]
35 108 33 continue_stop 	      Mog. interce(fie en refolatie.  629.
36 55 -15 start 	    —___   klaghten aan den

18 0 -42 start 	      —— Eagelandt te mogen uytvoeren.  95%.
19 55 8 continue 	    ———— voor Finch om twintich Kiftente mo-
	PAGE_REFS: [965]
20 99 63 continue_stop 	      gen invoeren.   965.
21 55 19 continue 	    Paftre, Weduwe Baftie, twee hondert guldens
	PAGE_REFS: [462]
22 0 -38 start_stop 	      —— toegeleght.  462.
23 58 20 continue 	    Patenten vgor een Compagnie van't Reciment
24 0 -32 start 	      —— van Thouars  om  van Lillo  te  gaan naar
	PAGE_REFS: [3907]
25 0 -27 start_stop 	      —— Heufden.   3907.
26 77 56 continue 	     —— voor  de  Compagnie  van  Haarfma
27 0 -32 start 	      —— DI   te  gaan  van  Borculo  naar  Breda.
	PAGE_REFS: [18]
28 0 -34 start_stop 	        —— 18.
29 59 14 continue 	    ——  voor de Compagnig  van Wetter om
30 0 -51 start 	      —— uyt  de  ee     ua  Groningen te
	PAGE_REFS: [318]
31 97 42 continue_stop 	      gaan.   318.
32 78 18 continue 	     —— voor  een  Compagnie van het Regi-
	PAGE_REFS: [376, 465, 916]
33 100 29 continue_stop 	

17 40 -31 start 	  —— wegens achterftallen  tot de Fortificas
18 87 15 continue 	    tien, Provincien ver[oght hare quotes te four-
	PAGE_REFS: [129]
19 89 17 continue_stop 	     neeren.   129.
20 42 -32 start 	  ——— raakende de trattementen van de Com
21 88 15 continue 	     mifen en Clerquen van het Comptoir  Gene-
	PAGE_REFS: [130]
22 92 13 continue_stop 	     neraal, te  examineren:   130.
23 44 -36 start 	  ——  ver[oght Ordonnantie te  depefcheeren
24 91 16 continue 	     tot betalinge van penfioen voor  twee jaaren
25 91 10 continue 	     voor Vourftanders van de Gereformeerden tot
26 94 13 continue 	     Geemen  tot onderhoudt van Predikant ende
	PAGE_REFS: [133]
27 95 13 continue_stop 	     Schoolmeefter.   133.
28 45 -37 start 	  —— te advifeeren op het verfoeck van Re=
29 89 7 continue 	     genten van Oofterwyck om interpretatie van
30 95 12 continue 	     de Refolutie wegens  aftchafjinge van kleyne
	PAGE_REFS: [133]
31 96 12 continue_stop 	     Specien en het Hooftgeldt.  

33 126 20 continue_stop 	       van Eer(el.   680.
34 80 -33 start 	     ——  item  op  de  Regnefte van  Regenten
	PAGE_REFS: [688]
35 129 21 continue_stop 	       van Biervliet om te mogen omjlaan.  688.
36 44 -63 start 	   ——   verzoght  Ordonnantie  van  hondert
37 140 32 continue 	        guldens  te  depecheeren  voor  Witkam.
	PAGE_REFS: [689]
38 141 33 continue_stop 	        689.
39 103 -6 start 	      —— te advifteren  op  bet  verfoerk  van
	PAGE_REFS: [654]
40 130 22 continue_stop 	       Vailjant om pardon wegens defèrtie.  654.
41 83 -34 start 	     —— iten op bet  verfoeck  van  Regenten
	PAGE_REFS: [695]
42 129 18 continue_stop 	       van Dieèn om remijie.  695.
43 83 -27 start 	     ——  Regenten van groot  en kleyn Baaf-
44 119 11 continue 	       ande gepermitteert  te  mogen  omflaan met
	PAGE_REFS: [695]
45 130 23 continue_stop 	       executoriaal.  695.
46 84 -17 start 	     ———  Commijfie  voor  den Heer Geertftma
	PAGE_REFS: [706]
47 131 41 continue_stop 	       

18 22 -11 start 	  —— donnantie op het Jubjeët  der Navigatie  der
19 22 -11 start 	  —— vreemde  Natien  op  Sweeden en Finlandt.
	PAGE_REFS: [241]
20 26 -7 start_stop 	  —— 241.
21 120 87 continue 	       Bie     Memorie wegens de Rigafche [chuldt-
	PAGE_REFS: [319]
22 22 -11 start_stop 	  —— vorderingh.   319.
23 22 -13 start 	        —— — geapprobeert [ijn devoiren wegens een
24 22 -13 start 	  —— Schip met Steenen tn plaats van Kouper ge-
25 22 -13 start 	  —— laaden , dat men by gelegentheyt tut [hade
26 22 -2 start 	  —— A Aftradeurs  meende  te laaten  finken.
27 44 20 continue 	   —— c8.
28 22 -2 start 	       —— —— devoir te doen, ten cynde Haack aan
	PAGE_REFS: [413]
29 22 -2 start_stop 	 —— fijn pretenfien geraakt.   413.
30 22 -2 start 	        —— —— wegens [eckere  Booswighten in Dee-
31 22 -2 start 	  —— nemarcken gearrefteert,  den Secretaris Op-
32 22 0 start 	  —— dorp devoir te doen, ten cynde aan den Ko-
33 22 -6 start 	  —— wingh van Sweeden mogen werden overgele-


37 113 29 continue_stop 	       teert;  61.
HAS LEMMA: Taats van Amerongen wier maanaen verlof.
LEMMA: Taats van Amerongen wier maanaen verlof
38 64 -20 start 	    Taats van  Amerongen  wier maanaen  verlof.
	PAGE_REFS: [98, 713]
39 111 22 continue_stop 	       98. 713.
HAS LEMMA: Tamminga wegens Stadt en Lande gecommit-
40 62 -26 start 	    Tamminga wegens Stadt en Lande gecommit-
	PAGE_REFS: [339]
41 109 27 continue_stop 	      teert in des Generaliteyts Reekenkamer..  339.
42 60 -21 start 	    —— wegens Stadt en Lande  ter Genera-
	PAGE_REFS: [411]
43 107 21 continue_stop 	      lteyt gecommitteert.   411.
	PAGE_REFS: [693]
HAS LEMMA: Tanneur om afiftentie , afceweefen. 693.
LEMMA: Tanneur om afiftentie 
44 59 -20 start_stop 	    Tanneur om afiftentie, afceweefen.   693.
HAS LEMMA: Tarouca , zapport , nopende de Memorie we-
LEMMA: Tarouca 
45 58 -26 start 	    Tarouca , zapport , nopende de Memorie we-
	PAGE_REFS: [314, 686]
46 102 24 continue_stop 	      gens  bet  Schip  Sonneftei

IndexError: list index out of range

In [None]:
for lemma in lemma_index:
    print("Lemma:", lemma)
    print("\tPages:", ", ".join([str(page_ref) for page_ref in lemma_index[lemma]]))


<span>
<img src="../../../Data/Projects/REPUBLIC/images/1725/NL-HaNA_1.01.02_3780_0007.jpg-0-245-75--0.00.png" alt="Drawing" style="width: 400px; display: inline;"/>
<img src="../../../Data/Projects/REPUBLIC/images/1725/NL-HaNA_1.01.02_3780_0007.jpg-1-1226-75-0.60.png" alt="Drawing" style="width: 400px; display: inline;"/>
</span>

