In [1]:
import sys
sys.path.insert(0, "..")

with open("./discharge_summary.txt") as f:
    text = f.read()

In [2]:
import spacy
import medspacy

# Overview

MedspaCy now supports using spaCy SpanGroups in most of its components. SpanGroups are a new addition to spaCy that allow grouping arbitrary (including overlapping!) spans.
This notebook will go over the way to write and manipulate results in a SpanGroup.

By default, medspaCy does not write into span groups and instead continues to use `doc.ents` for all output.

**NOTE:** spaCy's SpanGroups are a relatively new feature and do not currently have `displacy` support, which powers our visualizations. `visualize_ent` and `visualize_dep` do not work on span groups at this time

## TargetMatcher

In [3]:
from medspacy.ner import TargetRule

In [4]:
target_rules = [
    TargetRule(literal="abdominal pain", category="PROBLEM"),
    TargetRule("stroke", "PROBLEM"),
    TargetRule("hemicolectomy", "TREATMENT"),
    TargetRule("Hydrochlorothiazide", "TREATMENT"),
    TargetRule("colon cancer", "PROBLEM"),
    TargetRule("metastasis", "PROBLEM"),
    
]

In [5]:
nlp = spacy.blank("en")

In [6]:
matcher = nlp.add_pipe("medspacy_target_matcher")

In [7]:
matcher.add(target_rules)

In [8]:
doc = nlp(text)

In [9]:
doc.ents

(Hydrochlorothiazide,
 Abdominal pain,
 stroke,
 abdominal pain,
 metastasis,
 Colon cancer,
 hemicolectomy,
 stroke,
 abdominal pain,
 abdominal pain)

However, this is because the optional parameter `result_type` has a default value of `ents`. Initializing the `TargetMatcher` with the parameter `result_type="group"` places all the resulting spans in a span group. By default, the span group is named `medspacy_spans`, but this can also be overridden by sending in the parameter `span_group_name="other_name"`

In [10]:
nlp = spacy.blank("en")

In [11]:
matcher = nlp.add_pipe("medspacy_target_matcher", config={"result_type":"group"})

In [12]:
matcher.add(target_rules)

In [13]:
doc = nlp(text)

In [14]:
doc.ents

()

In [15]:
doc.spans["medspacy_spans"]

[Hydrochlorothiazide, Abdominal pain, stroke, abdominal pain, metastasis, Colon cancer, hemicolectomy, stroke, abdominal pain, abdominal pain]

## QuickUMLS

In [16]:
# TODO: add quickumls span group intro

## ConText

`ConText` does not produce entities by itself, and therefore does not need to place results in a SpanGroup. However, it needs to be aware of where entities from some other sources, such as the `TargetMatcher` or `QuickUMLS` are located in order to be able to modify them correctly.

In [17]:
nlp = spacy.blank("en")
nlp.add_pipe("medspacy_pyrush")
matcher = nlp.add_pipe("medspacy_target_matcher", config={"result_type": "group"})
matcher.add(target_rules)

In [18]:
context = nlp.add_pipe("medspacy_context", config={"input_span_type": "group"})

In [19]:
doc = nlp(text)

In [20]:
doc.spans["medspacy_spans"]

[Hydrochlorothiazide, Abdominal pain, stroke, abdominal pain, metastasis, Colon cancer, hemicolectomy, stroke, abdominal pain, abdominal pain]

If we iterate over the spans in this group and see which ones are negated, along with a few words in a window, we can see that ConText is correctly altering attributes in the SpanGroup.

In [21]:
for span in doc.spans["medspacy_spans"]:
    print(span, span._.is_negated, span._.window(3), sep="  |  ")
    print()

Hydrochlorothiazide  |  False  |  Allergies:
Hydrochlorothiazide

Attending:[**First Name3

Abdominal pain  |  False  |  Complaint:
Abdominal pain

Major Surgical

stroke  |  False  |  and a recent stroke affecting her


abdominal pain  |  False  |  2 days of abdominal pain. Imaging shows

metastasis  |  True  |  no evidence of metastasis. She is

Colon cancer  |  False  |  
1. Colon cancer dx'd in [

hemicolectomy  |  False  |  , tx'd with hemicolectomy, XRT,

stroke  |  False  |  
Mother with stroke at age 82

abdominal pain  |  False  |  vomiting,
abdominal pain, shortness of

abdominal pain  |  False  |  of breath, abdominal pain or any




## Sectionizer

Like context, the sectionizer component does not produce entities that get placed in SpanGroups. If you want the sectionizer to modify attributes of existing entitites, though, it is capable of reading in spans from the specified group.

In [22]:
nlp = spacy.blank("en")
nlp.add_pipe("medspacy_pyrush")
matcher = nlp.add_pipe("medspacy_target_matcher", config={"result_type": "group"})
matcher.add(target_rules)

In [23]:
context = nlp.add_pipe("medspacy_sectionizer", config={"input_span_type": "group"})

In [24]:
doc = nlp(text)

Iterating over the entities predicted shows that the sectionizer correctly modifies `stroke` in the `family_history` section, even though `stroke` was placed in the SpanGroup rather than `doc.ents`.

In [25]:
for span in doc.spans["medspacy_spans"]:
    print(span, span._.is_family, span._.section_category, sep="  |  ")
    print()

Hydrochlorothiazide  |  False  |  allergy

Abdominal pain  |  False  |  chief_complaint

stroke  |  False  |  history_of_present_illness

abdominal pain  |  False  |  history_of_present_illness

metastasis  |  False  |  history_of_present_illness

Colon cancer  |  False  |  past_medical_history

hemicolectomy  |  False  |  past_medical_history

stroke  |  True  |  family_history

abdominal pain  |  False  |  patient_instructions

abdominal pain  |  False  |  patient_instructions



# Postprocessor

The preprocessor can also operate on SpanGroups.

In [26]:
from medspacy.postprocess import Postprocessor, PostprocessingRule, PostprocessingPattern
from medspacy.postprocess import postprocessing_functions

In [27]:
postprocessor = nlp.add_pipe("medspacy_postprocessor", config={"input_span_type":"group"})

In [28]:
postprocess_rules = [
    # Instantiate our rule
    PostprocessingRule(
        # Pass in a list of patterns
        patterns=[
            # The pattern will check if the entitie's section is "patient_instructions"
            PostprocessingPattern(condition=lambda ent: ent._.section_category, success_value="patient_instructions"),
        ],
        # If all patterns are True, this entity will be removed.
        action=postprocessing_functions.remove_ent,
        description="Remove any entities from the instructions section.",
    ),
    
]

In [29]:
postprocessor.add(postprocess_rules)

In [30]:
doc = nlp(text)

In [31]:
for span in doc.spans["medspacy_spans"]:
    print(span, span._.section_category, sep="  |  ")
    print()

Hydrochlorothiazide  |  allergy

Abdominal pain  |  chief_complaint

stroke  |  history_of_present_illness

abdominal pain  |  history_of_present_illness

metastasis  |  history_of_present_illness

Colon cancer  |  past_medical_history

hemicolectomy  |  past_medical_history

stroke  |  family_history

