In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.insert(0, "..")

In [3]:
from medspacy.visualization import visualize_ent, visualize_dep
from helpers import ENT_COLORS

In [4]:
import warnings
warnings.filterwarnings("ignore")

# 2. Attribute Detection
After extracting entities, our next step will be to assign attributes to these entities which inform us as to the contextual meaning of these entites.

There are 3 main ways that we will set attributes:
- Using the `ConText` algorithm
- Identifying which section of a clinical note a document occurs in
- Postprocessing (which will be covered in the next section)

In [5]:
from rehoused_nlp import build_nlp

In [7]:
%%capture
nlp = build_nlp()

for pipe in ('medspacy_postprocessor', 'document_classifier'):
    nlp.remove_pipe(pipe)

In [8]:
nlp.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'attribute_ruler',
 'lemmatizer',
 'medspacy_concept_tagger',
 'medspacy_target_matcher',
 'medspacy_context',
 'medspacy_sectionizer']

## ConText
Clinical text often contains mentions of concepts whose meaning depends on their context in the text. For example, perhaps they are talking about the past, patient goals, or explicitly negating some concept. Here are some examples involving housing status:

- "He does *not have* **stable housing**"
- "The patient was *previously* **homeless** but does *not have* any current **housing problems**."
- "Patient goals: She *would like to find* **an apartment**"

In all of these instances, we need to use the contextual clues around the entity to assert attributes like negation, experiencer, and uncertainty.

One method for this is the [ConText algorithm](https://www.sciencedirect.com/science/article/pii/S1532046409000744). ConText links target entities like problems with semantic modifiers like those shown above. The medSpaCy implementation of ConText is found in `medspacy.context`. More examples and documentation are available in [medspaCy's code](https://github.com/medspacy/medspacy).

This section will demonstrate how ReHouSED uses ConText to handle different scenarios involving housing-related concepts. Let's start by looking at the first example shown above:

**"He does not have stable housing"**

If you only look at the entities being extracted, this might appear to be document stable housing. However, looking at the context around it, you can see that this entity is being **negated**. It's clearly very important to our task to differentiate these scenarios.

**ConText** connects linguistic phrases like **"not have"**, called **modifiers**, with the entities in the sentences, also called **targets**. We can visualize the relationship between these concepts using the `visualize_dep` function:

In [9]:
text = "He does not have stable housing"
doc = nlp(text)

In [10]:
visualize_dep(doc)
visualize_ent(doc, colors=ENT_COLORS)

We can also see an entity's modifiers:

In [11]:
ent = doc.ents[0]

In [12]:
ent._.modifiers

(<ConTextModifier> [not have, NEGATED_EXISTENCE],)

When ConText links a negation phrase with an entity, it sets the entity's `is_negated` attribute to `True`:

In [13]:
ent._.is_negated

True

Some other important attributes are:
- `is_hypothetical`
- `is_historical`
- `is_ignored`

In [14]:
doc = nlp("He would like a house of his own.")
print("is_hypothetical:", doc.ents[0]._.is_hypothetical)
visualize_dep(doc)


is_hypothetical: True


In [15]:
doc = nlp("She was homeless in the past.")
print("is_historical:", doc.ents[0]._.is_historical)
visualize_dep(doc)


is_historical: True


In [16]:
doc = nlp("Do you live in an apartment?")
print("is_ignored:", doc.ents[0]._.is_ignored)
visualize_dep(doc)


is_ignored: True


This logic is implemented in the `ConTextComponent` pipeline component, where we can also see the rules defining modifiers (similar to the `TargetRules` in the previous notebook):

In [18]:
context = nlp.get_pipe("medspacy_context")
context

<medspacy.context.context_component.ConTextComponent at 0x7f999358b3d0>

In [19]:
context.rules[:10]

[ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD'),
 ConTextRule(literal='adequate to rule out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD'),
 ConTextRule(literal='adequate to rule the patient out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': 'the'}, {'LOWER': {'IN': ['patient', 'pt']}}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD'),
 ConTextRule(literal='any other', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD'),
 ConTextRule(literal='apart from', category='NEGATED_EXISTENCE', pattern=[{'LOWER': 'apart'}, {'LOWER': {'IN': ['for', 'from']}}], direction='TE

## Section detection

In addition to the sentence around an entity, the location of a concept in a clinical text indicates certain attributes. For example, the **Past Medical History** and **Problem List** often list medical problems a patient has had in the past which aren't necessarily active anymore. In the context of housing, sections like **Housing Status** give a clear documentation of what a patient's housing situation is, while **Patient Goals** might tell us that they are working to find stable housing.

Consider this example below. When we visualize this document, the section headers will be highlighted in gray. (Note that when they overlap with a ConText modifier or entity they will be visualized twice.)

In [20]:
text = """
History of present illness: Veteran is here to discuss his housing situation.

PmHX: 
- Pneumonia
- Afib
- Homelessness

Housing Situation: staying in a shelter.

Patient Goals: Stable housing

Assessment/Plan: The patient will continue to work towards finding stable housing.
"""

In [21]:
doc = nlp(text)
visualize_ent(doc, colors=ENT_COLORS)

Here is an interpretation of the note:
- In the **History of Present Illness (HPI)**, the author sets up the reason for this visit. In this case, they will be discussing housing, which shows that this note will be relevant to our task, although the phrase **"housing situation"** doesn't give any actual information about what their housing status is
- In the **Past Medical History**, we see **"Homelessness"**. However, this just means that the patient was homeless at one point in the past, not necessarily now
- The **Housing Situation** section gives us some clear and consie information: the patient is currently housed in a shelter
- The **Goals** section implies that the patient would like to work towards getting stable housing (which also might imply that they do not have it now)
- Finally, the **Assessment/Plan (A/P)** section summarizes the visit and next steps, indicating that the patient is going to work towards stable housing

For our system to make use of this structural information, we look at the following attributes:
- Similar to ConText, attributes like `is_hypothetical` and `is_historical` are set when they occur in certain sections, like **Goals** and **Past Medical History**
- We can also access the specific section an entity occured in with the `section_category` attribute. This will be useful when we try to infer a document classification

In [22]:
for ent in doc.ents:
    print(ent)
    print(ent._.section_category)
    print("is_historical:", ent._.is_historical)
    print("is_hypothetical:", ent._.is_hypothetical)
    print()

housing situation
history_of_present_illness
is_historical: False
is_hypothetical: False

Homelessness
past_medical_history
is_historical: True
is_hypothetical: False

Housing Situation
housing_status
is_historical: False
is_hypothetical: False

shelter
housing_status
is_historical: False
is_hypothetical: False

Stable housing
patient_goals
is_historical: False
is_hypothetical: True

Plan:
observation_and_plan
is_historical: False
is_hypothetical: False

stable housing
observation_and_plan
is_historical: False
is_hypothetical: True



Section detection is implemented in the `Sectionizer` component:

In [23]:
sectionizer = nlp.get_pipe("medspacy_sectionizer")
sectionizer

<medspacy.section_detection.sectionizer.Sectionizer at 0x7f999360c700>

In [24]:
sectionizer.rules[:10]

[SectionRule(literal="ADDENDUM:", category="addendum", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="Addendum:", category="addendum", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="ALLERGIC REACTIONS:", category="allergy", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="ALLERGIES:", category="allergy", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="Allergies", category="allergies", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="CC:", category="chief_complaint", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="CHIEF COMPLAINT:", category="chief_complaint", pattern=None, on_match=None, parents=[], parent_required=False),
 SectionRule(literal="Chief Complaint:", category="chief_complaint", pattern=None, on_match=None, parents=[], parent_required=False),
 

## `is_asserted`
Another important attribute which isn't set explicitly is `is_asserted`. If all of the other primary attributes are `False`, then `is_asserted` returns `True`. This essentially means that an entity can be taken at face value - you don't need to look at specific attributes or context to understand its semantics.

In [25]:
doc = nlp("He is living in stable housing.")
print(doc.ents[0])
print("is_asserted:", doc.ents[0]._.is_asserted)

living in stable housing
is_asserted: True


In [26]:
doc = nlp("He is not living in stable housing.")
print(doc.ents[0])
print("is_asserted:", doc.ents[0]._.is_asserted)

living in stable housing
is_asserted: False
