# Identify Conflicting Context Rules

In this example, spaCy is used with the en_ner_bc5cdr_md corpus to extract named entities from a string of text found in a sample medical document. In the real world, this document could be much longer. Using the medspacy_context pipeline, context tags are added to identified entities to give them additional meaning.


In part one of the example, the string is processed using the default model and context rules. Later on, additional context modifiers are applied, and demonstrate how to find conflicting rules that may produce unexpected/undesired results.

## Part 1 - Load the pretained model and dependencies.
Install python dependencies

In [None]:
!python -m pip install pandas
!python -m pip install spacy
!python -m pip install medspacy

### Install Pretrained Model

In [None]:
!python -m pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz

## Part 2 - Load the document and process with pretrained model + context pipeline.

This medical_doc variable in this example is a single sentence that would be part of a larger document. This was done to simplify the demonstration. Named entities are extracted using the en_ner_bc5cdr_md corpus and use the medspacy_context pipeline to tag entities with context modifications.

### Load Document Sentence

In [3]:
medical_doc = "REASON FOR NEUROLOGICAL CONSULTATION,  Muscle twitching, progressive dizziness, clumsiness, progressive pain syndrome, and gait disturbance."

### Load spaCy and medspaCy + Context Pipeline

In [4]:
import spacy 
import medspacy
from medspacy.context import ConText, ConTextRule
from medspacy.visualization import visualize_ent, visualize_dep

nlp = spacy.load("en_ner_bc5cdr_md")
nlp.add_pipe("medspacy_context")

<medspacy.context.context.ConText at 0x7f78d22dca10>

In [5]:
doc = nlp(medical_doc)
visualize_ent(doc)

## Part 3 - Create a new context rule.

In the previous example, none of the named entities had a context modifier applied. The next step in this demonstration shows how to create a context rule. Note that context_rule2 is added to demonstrate that it will not be tagged to any entities because it is cancelled by context_rule1.


After running the last cell of this section, a context modifier is detected and our entities are tagged appropriately.

### Modify Context Rules

In [6]:
context = ConText(nlp, rules=None)
context_rule1 = ConTextRule("REASON FOR", "PSEUDO_MODIFIER_0", direction="FORWARD", pattern="reason for")
context_rule2 = ConTextRule("REASON FOR", "PSEUDO_MODIFIER_1", direction="FORWARD", pattern="reason") # Note the first matching rule is used
context_rule3 = ConTextRule("PROGRESSIVE", "PSEUDO_MODIFIER_2", direction="BIDIRECTIONAL", pattern=[{"LOWER": "progressive"}], max_targets=1, terminated_by={"PUNC"})
context.add([context_rule1, context_rule2, context_rule3])
context.rules

doc = nlp(medical_doc)
context(doc)

###########################################################

from IPython.display import display, HTML

display(HTML('<h3>Entity Visualization</h3>Note, PSUDEO_MODIFIER_1 doesnt exist as it was cancelled by PSUDEO_MODIFIER_0.<hr>'))
visualize_ent(doc)

display(HTML('<h3>Dependency Visualization</h3><hr>'))
visualize_dep(doc)

display(HTML('<h3>Modifiers</h3>Note that both modifiers are valid, however the first one overrides the second, and we only see "PSEUDO_MODIFIER_0" applied to each entity.<hr>'))
import pandas as pd
pd_mods = pd.DataFrame([(x.text, [(y.rule.category, y.rule.direction, y.rule.pattern) for y in x._.modifiers]) for x in doc.ents], columns=["Ent", "Active_Rule"])
pd_mods = pd_mods.set_index("Ent")
pd_mods


Unnamed: 0_level_0,Active_Rule
Ent,Unnamed: 1_level_1
Muscle twitching,"[(PSEUDO_MODIFIER_0, FORWARD, reason for)]"
dizziness,"[(PSEUDO_MODIFIER_0, FORWARD, reason for), (PS..."
clumsiness,"[(PSEUDO_MODIFIER_0, FORWARD, reason for)]"
pain syndrome,"[(PSEUDO_MODIFIER_0, FORWARD, reason for), (PS..."


## Part 4 - Find conflicting context rules.

In the example above, PSEUDO_MODIFIER_1 is missing, however PSEUDO_MODIFIER_0 is present. In this example, each context rule is extracted and executed individually. All valid rules are saved to an array and compared with the "active" rules that exist when all context rules are used at the same time.

This is useful to identify when a rule may be valid, but was cancelled by another rule.

In this case, "reason for" matches the pattern in PSEUDO_MODIFIER_0. The same string cannot be used in PSEUDO_MODIFIER_1.

### Find All Potential Context Modifications for Each Entity

In [8]:
# Find All Context Modifications For Each Entity

mod_ents = []

for rule in context.rules:
    itr_context = ConText(nlp, rules=None)
    itr_context.add([rule])    
    
    doc = nlp(medical_doc)
    itr_context(doc)    
    
    mod_ents.extend([(x.text, [(y.rule.category, y.rule.direction, y.rule.pattern) for y in x._.modifiers]) for x in doc.ents])

pd_all_mods = pd.DataFrame(mod_ents, columns=["Ent", "All_Rule"])

In [9]:
# Group And Flatten Ent Modifications

def flatten_nested_list(items):
    results = []
    for item in items:
        if item not in results:
            results += item
    return results

pd_all_mods_grouped = pd_all_mods.groupby('Ent', axis="rows").agg(lambda x: flatten_nested_list(list(x)))


# Join Dataframes
pd_mods_joined = pd_mods.join(pd_all_mods_grouped)
pd_mods_joined.reset_index()


# Find missing rules
def find_overridden(all, active):
    for a in active:
        if a in all:
            all.remove(a)
    return all

pd_mods_joined['All_Rule'] = pd_mods_joined.apply(lambda x: find_overridden(x['All_Rule'], x['Active_Rule']), axis=1)
pd_mods_joined = pd_mods_joined.rename(columns={"All_Rule": "Cancelled_Rule"}, errors="raise")
pd_mods_joined

Unnamed: 0_level_0,Active_Rule,Cancelled_Rule
Ent,Unnamed: 1_level_1,Unnamed: 2_level_1
Muscle twitching,"[(PSEUDO_MODIFIER_0, FORWARD, reason for)]","[(PSEUDO_MODIFIER_1, FORWARD, reason)]"
dizziness,"[(PSEUDO_MODIFIER_0, FORWARD, reason for), (PS...","[(PSEUDO_MODIFIER_1, FORWARD, reason)]"
clumsiness,"[(PSEUDO_MODIFIER_0, FORWARD, reason for)]","[(PSEUDO_MODIFIER_1, FORWARD, reason)]"
pain syndrome,"[(PSEUDO_MODIFIER_0, FORWARD, reason for), (PS...","[(PSEUDO_MODIFIER_1, FORWARD, reason)]"
