In [1]:
#Import libraries
from pprint import pprint
import spacy
from iuextract.extract import label_ius, segment_ius
from iuextract.iu_utils import doc2iu_segs, doc2iu_str
SPACY_MODEL = "en_core_web_lg" # a different model can be defined
nlp = spacy.load(SPACY_MODEL)

In [2]:
text = "My dog, Chippy, just won its first grooming competition."

# Parsing raw text

You can parse raw text and obtain the segments as strings with the following

In [3]:
segs = segment_ius(text, mode='str')
print(segs)

D1|My dog, 
2|Chippy, 
D1|just won its first grooming competition.


If you need to run this command multiple times, make sure to pass the spacy model as an object.
This avoids multiple loads of the same model in RAM.

In [4]:
segs = segment_ius(text, mode='str', spacy_model=nlp)
print(segs)

D1|My dog, 
2|Chippy, 
D1|just won its first grooming competition.


You can also obtain the segments as a python object

In [5]:
segs = segment_ius(text, mode='obj', spacy_model=nlp)
segs

({'0-2-R1,R3': [My, dog, ,, just, won, its, first, grooming, competition, .],
  '0-1-R3.2,R3': [Chippy, ,]},
 {'0-2-R1,R3'})

# Adding IU labels to a spacy doc
The best way to use IUExtract is to add the IU labels to an existing spacy object.
This can be done with the `label_ius` function.

In [6]:
parsed_text = nlp(text)
label_ius(parsed_text)
print(doc2iu_str(parsed_text))

D1|My dog, 
2|Chippy, 
D1|just won its first grooming competition.


You can also print each unit row by row with each Discontinuous IU joined in a single row

In [7]:
segs = doc2iu_segs(parsed_text, gold=False)
print('\n'.join(segs))

My dog, just won its first grooming competition.
Chippy,


Look at `data.py` and `gold.py` for ideas on how to import data and gold standard human annotation. For more utilities check `iu_utils.py`.