In [4]:
import sys
from pathlib import Path

In [5]:
module_path = Path.cwd().parent.parent
if module_path not in sys.path:
    sys.path.append(str(module_path))

In [34]:
import spacy
from spacy.tokens import Doc

In [7]:
from src.loader import TextLoader
from src.model import DatasetType

In [8]:
loader = TextLoader(dataset_type=DatasetType.V1_WITH_PREDICTIONSTRING)

In [2]:
nlp = spacy.load("spacy_out/model-best/")

Results for the custom NER model from spaCy:
```
================================== Results ==================================

TOK     -    
NER P   63.22
NER R   73.08
NER F   67.79
SPEED   12164


=============================== NER (per type) ===============================

         P       R       F
DS   63.22   66.78   64.95
DE   63.23   79.40   70.40
```

In [3]:
nlp.pipeline

[('transformer',
  <spacy_transformers.pipeline_component.Transformer at 0x7fb2341fcb20>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x7fb23423b820>)]

In [27]:
text = loader.load_random_text(purify_discourses=True, purify_text=True)

In [29]:
text.id

'BBB6E0302CFE'

In [28]:
print(text)

I am going to write an essay on why you should join the Seagoing Cowboys program. You should join the Seagoing Cowboys because it gives you a lot of opportunities to see different places. You can go visit some of the most exciting places in the world. Also if you like animals it gives an opportunity to take care of them. If you have never gone over seas then there is your opportunity. You travel all over the world go over the seas and oceans. You get to adventure the seas oceans and the world. Also if you are a young Seagoing Cowboy and you still need to attend school the you can learn about the countries and seas by exploring them. Like I said earlier if you still need to attend school you can learn different languages because you are exploring the world. So by learning diffenrent languages it can open up many opportunities for you. But it doesn't need to be all about learning. You can play games on the way and see many different sites that's the whole world would want to see. Then wh

In [56]:
for disc in text.discourses:
    print(disc)

--- 1619807167166 (82 -> 187 | 16 -> 33) - Position ---
You should join the Seagoing Cowboys because it gives you a lot of opportunities to see different places.
-------------------------------------------------------
--- 1619807098083 (188 -> 251 | 34 -> 46) - Claim ---
You can go visit some of the most exciting places in the world.
-----------------------------------------------------
--- 1619807106639 (252 -> 322 | 47 -> 60) - Claim ---
Also if you like animals it gives an opportunity to take care of them.
-----------------------------------------------------
--- 1619807143615 (323 -> 640 | 61 -> 121) - Evidence ---
If you have never gone over seas then there is your opportunity. You travel all over the world go over the seas and oceans. You get to adventure the seas oceans and the world. Also if you are a young Seagoing Cowboy and you still need to attend school the you can learn about the countries and seas by exploring them.
-------------------------------------------------------

In [57]:
word_idx = []
for disc in text.discourses:
    word_idx.extend((disc.predictionstring[0], disc.predictionstring[-1]))

ents = []

DS_token = "B-DS"
DE_token = "B-DE"
use_DS = True
for ind, word in enumerate(text.words):
    if use_DS:
        curr_token = DS_token
    else:
        curr_token = DE_token

    if ind in word_idx:
        ents.append(curr_token)
        use_DS = not use_DS
    else:
        ents.append("O")


In [58]:
manual_doc = Doc(nlp.vocab, text.words, ents=ents)


In [59]:
for token in manual_doc.ents:
    print(token.text, token.label_, token.start_char, token.end_char)

You DS 82 85
places. DE 180 187
You DS 188 191
world. DE 245 251
Also DS 252 256
them. DE 317 322
If DS 323 325
them. DE 635 640
Like DS 641 645
learning. DE 882 891
You DS 892 895
see. DE 988 992
Then DS 993 997
us. DE 1123 1126
that DS 1141 1145
adventures. DE 1204 1215


In [60]:
doc = nlp(text.text)

In [61]:
for token in doc.ents:
    print(token.text, token.label_, token.start_char, token.end_char)

I DS 0 1
program DE 73 80
You DS 82 85
places DE 180 186
You DS 188 191
world DE 245 250
them DE 317 321
If DS 323 325
them DE 635 639
Like DS 641 645
world DE 760 765
you DE 841 844
learning DE 882 890
You DS 892 895
us DE 1123 1125
In DS 1127 1129
that DS 1141 1145
adventures DE 1204 1214


In [62]:
spacy.displacy.render(manual_doc, style="ent", jupyter=True)


In [63]:
spacy.displacy.render(doc, style="ent", jupyter=True)