## Section III 

Prodigy 

Exercise I: Basic NER with pre-trained spaCy models [source](https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da)

In [127]:
import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()
spec = {"tei":"http://www.tei-c.org/ns/1.0"}


In [133]:
doc = nlp('Hosted by Utrecht University, the 2019 iteration of the Digital Humanities (DH) conference, the annual international conference of the Alliance of Digital Humanities Organizations, will take place in the medieval city of Utrecht, one of the oldest cities in the Netherlands. The city’s rapid modernization and growth has inspired the conference’s guiding theme, complexity.')
#for ent in doc.ents:
#    print(ent.text, ent.label_)
displacy.render(doc, style="ent")


## Problem
This works very well for many 20th and 21st century texts.  But what about 17th century English?

In [134]:
doc = nlp('ITEM because that the kings most deare Uncle, the king of Denmarke, Norway & Sweveland, as the same our soveraigne Lord the king of his intimation hath understood, considering the manifold & great losses, perils, hurts and damage which have late happened aswell to him and his, as to other foraines and strangers, and also friends and speciall subjects of our said soveraigne Lord the king of his Realme of England, by ye going in, entring & passage of such forain & strange persons into his realme of Norwey & other dominions, streits, territories, jurisdictions & places subdued and subject to him, specially into his Iles of Fynmarke, and elswhere, aswell in their persons as their things and goods')
#for ent in doc.ents:
#    print(ent.text, ent.label_)
displacy.render(doc, style="ent")


## TEI to spaCy patterns 

```json
PATTERNS.JSONL
{"label": "JOB_TITLE", "pattern": [{"lower": "engineering"}, {"lower": "manager"}]}
{"label": "JOB_TITLE", "pattern": [{"orth": "CEO"}]}```

exercise 1) Improve results for a specific task 

Training on new category from TEI training data 

example 2) Add a new category from a list of examples 

#create corpus of texts for our project 

MY_DATA.JSONL
{"text": "Pinterest Hires Its First Head of Diversity"}
{"text": "Airbnb and Others Set Terms for Employees to Cash Out"}

## Here were going to download a TEI file from Persius 
We're going to extract a list of all the place names from the text to create a patterns file.
We'll also extract the raw text to create a set of training documents. 

In [137]:
from urllib.request import urlopen
from lxml import etree

def tei_loader(url):
    tei = urlopen(url).read()
    return etree.XML(tei)
    


Here we are going to download the table of contents and create a list of the 937 parts of the document. We will then get each page, remove the place names and add them to a places list.

In [166]:
table_of_contents_url = "http://www.perseus.tufts.edu/hopper/xmltoc?doc=Perseus%3Atext%3A1999.03.0070%3Anarrative%3D1"
table_of_contents_xml = tei_loader(table_of_contents_url)


chunks = table_of_contents_xml.xpath("//chunk[@ref]")
refs = [chunk.get('ref') for chunk in chunks] 
# an example ref 'Perseus%3Atext%3A1999.03.0070%3Anarrative%3D6'


places = []

for ref in refs:
    
    url = 'http://www.perseus.tufts.edu/hopper/xmlchunk?doc=' + ref
    try:
        tei = tei_loader(url)

        #get all <name type='place'> tags
        for place in tei.findall(".//name[@type='place']", namespaces=spec):
            places.append(place.text.replace('\n',''))
    except Exception as e:
        #print(e)
        pass


14187


In [165]:
print('number of documents: ',len(refs))
print('number of places found: ',len(set(places)))
places[4]

number of documents:  937
number of places found:  2279


'Towne of Northberne'

## Here we create a patterns.jsonl file.  examples of patterns files can be found [here](https://github.com/explosion/prodigy-recipes/tree/master/example-patterns)

```json
{"label": "GPE", "pattern": [{"lower": "república"}, {"lower": "de"}, {"lower": "angola"}]}
```

In [195]:
with open('patterns.jsonl','w') as f:
    for place in set(places):
        pattern = '['
        for token in place.split()[:-1]:
            pattern += '{"lower": "' + token.lower() + '"},'
        pattern += '{"lower": "' + place.split()[-1].lower() + '"}'
        pattern += ']'
        row = '{"label": "PLACE", "pattern": ' + pattern + '}\n'
        f.write(row)

In [196]:
with open('patterns.jsonl','r') as f:
    print(f.read())

{"label": "PLACE", "pattern": [{"lower": "island"},{"lower": "of"},{"lower": "s."},{"lower": "christopher"}]}
{"label": "PLACE", "pattern": [{"lower": "river"},{"lower": "de"},{"lower": "sestos"}]}
{"label": "PLACE", "pattern": [{"lower": "st."},{"lower": "ives"}]}
{"label": "PLACE", "pattern": [{"lower": "isle"},{"lower": "of"},{"lower": "silly"}]}
{"label": "PLACE", "pattern": [{"lower": "saint"},{"lower": "thomas"},{"lower": "iland"}]}
{"label": "PLACE", "pattern": [{"lower": "zetland"}]}
{"label": "PLACE", "pattern": [{"lower": "westerne"},{"lower": "islands"}]}
{"label": "PLACE", "pattern": [{"lower": "solovetsky"}]}
{"label": "PLACE", "pattern": [{"lower": "balsara"}]}
{"label": "PLACE", "pattern": [{"lower": "lubeck"}]}
{"label": "PLACE", "pattern": [{"lower": "viterbium"}]}
{"label": "PLACE", "pattern": [{"lower": "corsica"}]}
{"label": "PLACE", "pattern": [{"lower": "exceter"}]}
{"label": "PLACE", "pattern": [{"lower": "mount"},{"lower": "of"},{"lower": "the"},{"lower": "islan

In [171]:
#Now to extract the full text
txts = []
for ref in refs:
    
    url = 'http://www.perseus.tufts.edu/hopper/xmlchunk?doc=' + ref
    try:
        tei = tei_loader(url)

        new_txt = []
        for body in tei.iter('body'):
            new_txt.append(''.join(body.itertext()).strip().replace('\n',''))
            txts.append(''.join(new_txt))
        
    except Exception as e:
        #print(e)
        pass

full_text = [txt.replace('        ',' ').replace('   ',' ').replace('  ',' ') for txt in txts]

with open('principal_navigations.txt','w') as f:
    f.write(str(full_text))

In [173]:
with open('principal_navigations.txt','r') as f:
    print(f.read()[:500])

['A branch of a Statute made in the eight yeere of Henry the sixt, for the trade to Norwey, Sweveland, Den marke, and Fynmarke. ITEM because that the kings most deare Uncle, the kingof Denmarke, Norway & Sweveland, as the same oursoveraigne Lord the king of his intimation hath understood, considering the manifold & great losses, perils,hurts and damage which have late happened aswell tohim and his, as to other foraines and strangers, and alsofriends and speciall subjects of our said soveraigne L


## With patterns and text files creates, we can now work with Prodigy!

In [107]:
!prodigy dataset historic_places "A dataset for British historic palces" --author Andy


  ✨  Successfully added 'historic_places' to database SQLite.



## Need a training text.  First, we'll train on a set from the original corpus. We'll then try it on a comparable document from the same period.

In [197]:
!prodigy ner.teach historic_places en_core_web_sm principal_navigations.txt --label PLACE --patterns patterns.jsonl

Using 1 labels: PLACE

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

Segmentation fault (core dumped)


In [37]:
pattern = [{'LOWER': 'tool'}, {'LOWER': 'for'}]

sentence0: <?
	 token0: <, annotations: []
	 token1: ?, annotations: []
sentence1: xml version="1.0
	 token2: xml, annotations: []
	 token3: version="1.0, annotations: []
sentence2: " encoding="utf-8"?
	 token4: ", annotations: []
	 token5: encoding="utf-8, annotations: []
	 token6: ", annotations: []
	 token7: ?, annotations: []
sentence3: > <contents
	 token8: >, annotations: []
	 token9: <, annotations: []
	 token10: contents, annotations: []
sentence4: ref="Perseus:text:1999.03.0070" lang="en"> <chunk pos="1" start="613245" end="617030" type="narrative" n="1" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D1
	 token11: ref="Perseus, annotations: []
	 token12: :, annotations: []
	 token13: text:1999.03.0070, annotations: []
	 token14: ", annotations: []
	 token15: lang="en, annotations: []
	 token16: ", annotations: []
	 token17: >, annotations: []
	 token18: <, annotations: []
	 token19: chunk, annotations: []
	 token20: pos="1, annotations: []
	 token21: ", annotations: []
	 toke

	 token2838: <, annotations: []
	 token2839: head, annotations: []
sentence357: lang="en">  
	 token2840: lang="en, annotations: []
	 token2841: ", annotations: []
	 token2842: >, annotations: ['head-end']
	 token2843:  , annotations: ['head-end']
sentence358: <name
	 token2844: <, annotations: []
	 token2845: name, annotations: []
sentence359: reg="Vologda [39.916,59.166] (inhabited place), Vologodskaya Oblast', Rossiya, Russia, Asia" type="place" key="tgn,7010323">Vologda .    
	 token2846: reg="Vologda, annotations: []
	 token2847: [, annotations: []
	 token2848: 39.916,59.166, annotations: []
	 token2849: ], annotations: []
	 token2850: (, annotations: []
	 token2851: inhabited, annotations: []
	 token2852: place, annotations: []
	 token2853: ), annotations: []
	 token2854: ,, annotations: []
	 token2855: Vologodskaya, annotations: []
	 token2856: Oblast, annotations: []
	 token2857: ', annotations: []
	 token2858: ,, annotations: []
	 token2859: Rossiya, annotations: []
	 token286

	 token5370: ,, annotations: []
	 token5371: Russia, annotations: []
	 token5372: ,, annotations: []
	 token5373: Asia, annotations: []
	 token5374: ", annotations: []
	 token5375: type="place, annotations: []
	 token5376: ", annotations: []
	 token5377: key="tgn,1055512">Colmogro, annotations: []
	 token5378: :, annotations: []
	 token5379: and, annotations: []
	 token5380: a, annotations: []
	 token5381: large, annotations: []
	 token5382: description, annotations: []
	 token5383: of, annotations: []
	 token5384: the, annotations: []
	 token5385: maners, annotations: []
	 token5386: of, annotations: []
	 token5387: the, annotations: []
	 token5388: Countrey, annotations: []
	 token5389: ., annotations: []
	 token5390:    , annotations: []
sentence660: <chunk pos="102" start="1332798
	 token5391: <, annotations: []
	 token5392: chunk, annotations: []
	 token5393: pos="102, annotations: []
	 token5394: ", annotations: []
	 token5395: start="1332798, annotations: []
sentence661: " end="

	 token8211: esquire, annotations: []
	 token8212: ,, annotations: []
	 token8213: her, annotations: []
	 token8214: Majesties, annotations: []
	 token8215: Ambassadour, annotations: []
	 token8216: to, annotations: []
	 token8217: the, annotations: []
	 token8218: sayd, annotations: []
	 token8219: Emperour, annotations: []
	 token8220: ,, annotations: []
	 token8221: and, annotations: []
	 token8222: by, annotations: []
	 token8223: Andrew, annotations: []
	 token8224: Savin, annotations: []
	 token8225: his, annotations: []
	 token8226: Ambassadour, annotations: []
	 token8227: in, annotations: []
	 token8228: the, annotations: []
	 token8229: yere, annotations: ['name-end']
	 token8230: of, annotations: []
	 token8231: our, annotations: []
	 token8232: <, annotations: ['name-start', 'name-end']
	 token8233: name, annotations: []
sentence928: type="pers">Lord God <date value="1569
	 token8234: type="pers">Lord, annotations: []
	 token8235: God, annotations: []
	 token8236: <, annota

	 token11091: ", annotations: []
	 token11092: n="193, annotations: []
	 token11093: ", annotations: []
	 token11094: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D193, annotations: []
sentence1248: "> <head
	 token11095: ", annotations: []
	 token11096: >, annotations: []
	 token11097: <, annotations: []
	 token11098: head, annotations: []
sentence1249: lang="en
	 token11099: lang="en, annotations: []
sentence1250: ">
	 token11100: ", annotations: []
	 token11101: >, annotations: []
sentence1251: The booke of Rates.   <chunk pos="194" start="2092973"
	 token11102: The, annotations: []
	 token11103: booke, annotations: []
	 token11104: of, annotations: []
	 token11105: Rates, annotations: []
	 token11106: ., annotations: []
	 token11107:   , annotations: []
	 token11108: <, annotations: []
	 token11109: chunk, annotations: ['head-start', 'head-end']
	 token11110: pos="194, annotations: ['head-start', 'head-end']
	 token11111: ", annotations: []
	 token11112: start="2092973, annotati

	 token13752: name, annotations: []
	 token13753: type="pers">Lord, annotations: []
	 token13754: Boris, annotations: []
	 token13755: Pheodorowich, annotations: []
	 token13756: ,, annotations: []
	 token13757: Master, annotations: []
	 token13758: of, annotations: []
	 token13759: the, annotations: []
	 token13760: horses, annotations: []
	 token13761: to, annotations: []
	 token13762: the, annotations: []
	 token13763: great, annotations: []
	 token13764: and, annotations: []
	 token13765: mighty, annotations: []
	 token13766: <, annotations: []
	 token13767: name, annotations: ['name-end']
	 token13768: type="pers">Emperour, annotations: []
	 token13769: of, annotations: []
	 token13770: Russia, annotations: []
	 token13771: ,, annotations: []
	 token13772: his, annotations: []
	 token13773: Highnesse, annotations: []
	 token13774: Lieutenant, annotations: []
	 token13775: of, annotations: []
	 token13776: Cazan, annotations: []
	 token13777: and, annotations: []
	 token13778: Astr

	 token16437:    , annotations: []
sentence1831: <chunk pos="297
	 token16438: <, annotations: []
	 token16439: chunk, annotations: []
	 token16440: pos="297, annotations: []
sentence1832: " start="2693649" end="2747486" type="narrative" n="297" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D297
	 token16441: ", annotations: []
	 token16442: start="2693649, annotations: []
	 token16443: ", annotations: []
	 token16444: end="2747486, annotations: []
	 token16445: ", annotations: []
	 token16446: type="narrative, annotations: []
	 token16447: ", annotations: []
	 token16448: n="297, annotations: []
	 token16449: ", annotations: []
	 token16450: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D297, annotations: []
sentence1833: "> <head
	 token16451: ", annotations: []
	 token16452: >, annotations: []
	 token16453: <, annotations: []
	 token16454: head, annotations: []
sentence1834: lang="en
	 token16455: lang="en, annotations: []
sentence1835: ">  
	 token16456: ", annotations: []
	 to

	 token18851: ", annotations: []
	 token18852: type="place, annotations: []
	 token18853: ", annotations: []
	 token18854: key="tgn,7012005">Norwich, annotations: []
	 token18855: ,, annotations: []
	 token18856: which, annotations: []
	 token18857: was, annotations: []
	 token18858: knighted, annotations: []
	 token18859: by, annotations: []
	 token18860: <, annotations: []
	 token18861: name, annotations: []
sentence2110: type="pers">Charles
	 token18862: type="pers">Charles, annotations: []
sentence2111: the fift at the winning of <name reg=
	 token18863: the, annotations: []
	 token18864: fift, annotations: []
	 token18865: at, annotations: []
	 token18866: the, annotations: []
	 token18867: winning, annotations: []
	 token18868: of, annotations: []
	 token18869: <, annotations: []
	 token18870: name, annotations: []
	 token18871: reg=, annotations: ['name-end']
sentence2112: " +Tunisia [9,34] (nation), Africa " type="place" key="tgn,1000205">Tunis in the yeere of our Lord <date va

	 token21692: ", annotations: []
	 token21693: type="narrative, annotations: []
sentence2394: " n="381" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D381
	 token21694: ", annotations: []
	 token21695: n="381, annotations: []
	 token21696: ", annotations: []
	 token21697: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D381, annotations: []
sentence2395: "> <head
	 token21698: ", annotations: []
	 token21699: >, annotations: []
	 token21700: <, annotations: []
	 token21701: head, annotations: []
sentence2396: lang="en
	 token21702: lang="en, annotations: []
sentence2397: ">  
	 token21703: ", annotations: []
	 token21704: >, annotations: []
	 token21705:  , annotations: []
sentence2398: A commaundement of the Grand Signior to the Cadie or <name type="pers">Judge of Alexandria .    
	 token21706: A, annotations: []
	 token21707: commaundement, annotations: []
	 token21708: of, annotations: []
	 token21709: the, annotations: []
	 token21710: Grand, annotations: []
	 token21711: Signior

	 token24295: start="3666553, annotations: []
	 token24296: ", annotations: ['name-start', 'name-end']
	 token24297: end="3668967, annotations: []
	 token24298: ", annotations: []
	 token24299: type="narrative, annotations: []
	 token24300: ", annotations: []
	 token24301: n="442, annotations: []
	 token24302: ", annotations: []
	 token24303: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D442, annotations: []
	 token24304: ", annotations: []
sentence2714: > <head
	 token24305: >, annotations: []
	 token24306: <, annotations: []
	 token24307: head, annotations: []
sentence2715: lang="en
	 token24308: lang="en, annotations: []
sentence2716: ">  
	 token24309: ", annotations: []
	 token24310: >, annotations: []
	 token24311:  , annotations: []
sentence2717: A letter written by her Majestie to the <name type="pers">King of China , in <name type="place">Februarie <date value="1583
	 token24312: A, annotations: []
	 token24313: letter, annotations: []
	 token24314: written, annotations: []

sentence2958: <chunk pos="487" start="3921260" end="3922141
	 token26957: <, annotations: []
	 token26958: chunk, annotations: []
	 token26959: pos="487, annotations: []
	 token26960: ", annotations: []
	 token26961: start="3921260, annotations: []
	 token26962: ", annotations: []
	 token26963: end="3922141, annotations: []
sentence2959: " type="narrative" n="487" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D487"> <head
	 token26964: ", annotations: []
	 token26965: type="narrative, annotations: []
	 token26966: ", annotations: []
	 token26967: n="487, annotations: []
	 token26968: ", annotations: []
	 token26969: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D487, annotations: []
	 token26970: ", annotations: []
	 token26971: >, annotations: []
	 token26972: <, annotations: []
	 token26973: head, annotations: []
sentence2960: lang="en
	 token26974: lang="en, annotations: []
sentence2961: ">  
	 token26975: ", annotations: []
	 token26976: >, annotations: []
	 token26977:  , anno

	 token29382: 1, annotations: []
	 token29383: of, annotations: []
	 token29384: <, annotations: []
	 token29385: date, annotations: []
	 token29386: value="1564, annotations: []
	 token29387: -, annotations: []
	 token29388: 07, annotations: []
	 token29389: ", annotations: []
	 token29390: authname="1564, annotations: []
	 token29391: -, annotations: []
	 token29392: 07, annotations: []
sentence3216: ">
	 token29393: ", annotations: []
	 token29394: >, annotations: []
sentence3217: July 1564  . for the setting foorth of a voyage to Guinea, with the Minion of the Queens , the John Baptist of London , and the Merline of M. Gonson .    
	 token29395: July, annotations: []
	 token29396: 1564, annotations: []
	 token29397:  , annotations: []
	 token29398: ., annotations: []
	 token29399: for, annotations: []
	 token29400: the, annotations: []
	 token29401: setting, annotations: []
	 token29402: foorth, annotations: []
	 token29403: of, annotations: []
	 token29404: a, annotations: []
	 to

	 token32314: ", annotations: []
	 token32315: type="place">Isles, annotations: []
	 token32316: of, annotations: []
	 token32317: the, annotations: ['name-end']
	 token32318: Azores, annotations: []
	 token32319: ,, annotations: []
	 token32320: <, annotations: ['name-start', 'name-end']
	 token32321: date, annotations: []
	 token32322: value="1591, annotations: []
sentence3457: " authname="1591">1591 .    
	 token32323: ", annotations: []
	 token32324: authname="1591">1591, annotations: []
	 token32325: ., annotations: ['date-end']
	 token32326:    , annotations: []
sentence3458: <chunk pos="562" start="5198942
	 token32327: <, annotations: []
	 token32328: chunk, annotations: []
	 token32329: pos="562, annotations: []
	 token32330: ", annotations: []
	 token32331: start="5198942, annotations: []
sentence3459: " end="5264883" type="narrative" n="562" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D562
	 token32332: ", annotations: []
	 token32333: end="5264883, annotations: []
	 tok

	 token35220: >, annotations: []
sentence3715: What commodities and instructions may be reaped by diligent reading this Discourse.   <chunk pos="609
	 token35221: What, annotations: []
	 token35222: commodities, annotations: []
	 token35223: and, annotations: []
	 token35224: instructions, annotations: ['name-end']
	 token35225: may, annotations: []
	 token35226: be, annotations: []
	 token35227: reaped, annotations: []
	 token35228: by, annotations: []
	 token35229: diligent, annotations: []
	 token35230: reading, annotations: []
	 token35231: this, annotations: []
	 token35232: Discourse, annotations: []
	 token35233: ., annotations: []
	 token35234:   , annotations: ['name-end']
	 token35235: <, annotations: []
	 token35236: chunk, annotations: []
	 token35237: pos="609, annotations: ['name-start', 'name-end']
sentence3716: " start="5639878" end="5683286" type="narrative" n="609" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D609">
	 token35238: ", annotations: []
	 token35239: st

	 token37724:   , annotations: []
sentence3984: <chunk
	 token37725: <, annotations: []
	 token37726: chunk, annotations: []
sentence3985: pos="655" start="6386541" end="6390258
	 token37727: pos="655, annotations: []
	 token37728: ", annotations: []
	 token37729: start="6386541, annotations: []
	 token37730: ", annotations: []
	 token37731: end="6390258, annotations: []
sentence3986: " type="narrative" n="655" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D655"> <head lang="en
	 token37732: ", annotations: []
	 token37733: type="narrative, annotations: []
	 token37734: ", annotations: []
	 token37735: n="655, annotations: []
	 token37736: ", annotations: []
	 token37737: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D655, annotations: []
	 token37738: ", annotations: []
	 token37739: >, annotations: []
	 token37740: <, annotations: []
	 token37741: head, annotations: []
	 token37742: lang="en, annotations: []
sentence3987: ">
	 token37743: ", annotations: []
	 token37744: >, annot

	 token40673: Virginia, annotations: []
	 token40674: [, annotations: []
	 token40675: -80,37, annotations: []
	 token40676: ], annotations: []
	 token40677: (, annotations: []
	 token40678: state, annotations: []
	 token40679: ), annotations: []
	 token40680: ,, annotations: []
	 token40681: United, annotations: []
	 token40682: States, annotations: []
	 token40683: ,, annotations: []
	 token40684: North, annotations: []
	 token40685: and, annotations: []
	 token40686: Central, annotations: []
	 token40687: America, annotations: []
	 token40688: ", annotations: []
	 token40689: type="place, annotations: []
	 token40690: ", annotations: []
	 token40691: key="tgn,7007919">Virginia, annotations: []
	 token40692: ,, annotations: []
	 token40693: in, annotations: []
	 token40694: the, annotations: ['name-end']
	 token40695: yeere, annotations: []
	 token40696: <, annotations: []
	 token40697: date, annotations: []
	 token40698: value="1590, annotations: []
sentence4261: " authname="1590">1

	 token43339: ships, annotations: []
	 token43340: set, annotations: ['name-start']
	 token43341: foorth, annotations: []
	 token43342: by, annotations: []
	 token43343: the, annotations: ['name-end']
	 token43344: Right, annotations: []
	 token43345: Honourable, annotations: ['name-start']
	 token43346: the, annotations: []
	 token43347: <, annotations: ['name-end']
	 token43348: name, annotations: []
	 token43349: type="pers">Earle, annotations: []
	 token43350: of, annotations: []
	 token43351: Cumberland, annotations: ['name-end']
	 token43352: ,, annotations: []
	 token43353: in, annotations: []
	 token43354: the, annotations: ['name-start', 'name-end']
	 token43355: yeere, annotations: []
	 token43356: <, annotations: []
	 token43357: date, annotations: []
	 token43358: value="1586, annotations: []
sentence4460: " authname="1586">1586 .    
	 token43359: ", annotations: []
	 token43360: authname="1586">1586, annotations: []
	 token43361: ., annotations: []
	 token43362:    , anno

	 token46224: ,, annotations: []
	 token46225: to, annotations: []
	 token46226: the, annotations: []
	 token46227: isle, annotations: []
	 token46228: of, annotations: []
	 token46229: <, annotations: []
	 token46230: name, annotations: []
sentence4718: reg=
	 token46231: reg=, annotations: []
sentence4719: " +Trinidad [-61.25,10.5] (island), Trinidad and Tobago, North and Central America " type="place" key="tgn,7004789">Trinidad , and the coast of <name type="quest">Paria : with his returne home by the <name reg="Granata" type="place">Isles of Granata , Santa Cruz , Sant Juan de puerto rico, <name reg=
	 token46232: ", annotations: []
	 token46233: +, annotations: []
	 token46234: Trinidad, annotations: []
	 token46235: [, annotations: []
	 token46236: -61.25,10.5, annotations: []
	 token46237: ], annotations: []
	 token46238: (, annotations: []
	 token46239: island, annotations: []
	 token46240: ), annotations: []
	 token46241: ,, annotations: []
	 token46242: Trinidad, annotations:

sentence4987: " end="7993624" type="narrative
	 token49012: ", annotations: ['name-end']
	 token49013: end="7993624, annotations: []
	 token49014: ", annotations: []
	 token49015: type="narrative, annotations: []
sentence4988: " n="815" ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D815"> <head
	 token49016: ", annotations: []
	 token49017: n="815, annotations: []
	 token49018: ", annotations: []
	 token49019: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D815, annotations: []
	 token49020: ", annotations: []
	 token49021: >, annotations: []
	 token49022: <, annotations: []
	 token49023: head, annotations: []
sentence4989: lang="en
	 token49024: lang="en, annotations: []
sentence4990: ">  
	 token49025: ", annotations: []
	 token49026: >, annotations: []
	 token49027:  , annotations: []
sentence4991: Markes of the <name reg="Saona" type="place">Isle of Saona .    
	 token49028: Markes, annotations: []
	 token49029: of, annotations: []
	 token49030: the, annotations: []
	 token49031

	 token51609: Guard, annotations: []
	 token51610: ,, annotations: []
	 token51611: and, annotations: []
	 token51612: her, annotations: []
	 token51613: Highnesse, annotations: ['name-end']
	 token51614: Lieutenant, annotations: []
	 token51615: generall, annotations: []
	 token51616: of, annotations: []
	 token51617: the, annotations: []
	 token51618: <, annotations: []
	 token51619: name, annotations: []
	 token51620: reg="Cornewall, annotations: []
	 token51621: ", annotations: []
	 token51622: type="place">Countie, annotations: []
	 token51623: of, annotations: []
	 token51624: Cornewall, annotations: []
	 token51625: ., annotations: []
	 token51626:     , annotations: []
sentence5321: <chunk pos="876
	 token51627: <, annotations: []
	 token51628: chunk, annotations: []
	 token51629: pos="876, annotations: ['name-start']
sentence5322: " start="8355360" end="8364247
	 token51630: ", annotations: ['name-end']
	 token51631: start="8355360, annotations: []
	 token51632: ", annotations

	 token54487:    , annotations: []
sentence5581: <chunk pos="914
	 token54488: <, annotations: []
	 token54489: chunk, annotations: []
	 token54490: pos="914, annotations: []
sentence5582: " start="9254809" end="9255283" type="narrative" n="914
	 token54491: ", annotations: []
	 token54492: start="9254809, annotations: []
	 token54493: ", annotations: []
	 token54494: end="9255283, annotations: []
	 token54495: ", annotations: []
	 token54496: type="narrative, annotations: []
	 token54497: ", annotations: []
	 token54498: n="914, annotations: []
sentence5583: " ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D914"> <head lang="en
	 token54499: ", annotations: []
	 token54500: ref="Perseus%3Atext%3A1999.03.0070%3Anarrative%3D914, annotations: []
	 token54501: ", annotations: []
	 token54502: >, annotations: []
	 token54503: <, annotations: []
	 token54504: head, annotations: []
	 token54505: lang="en, annotations: []
sentence5584: ">  
	 token54506: ", annotations: []
	 token54507: >, a