In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, sys, glob
sys.path.append("..")

# Knowledge Representation and Customization
## Overview
In order to process multiple types of clinical notes, we designed this pneumonia NLP to be flexible and easily modifiable. Additionally, we wanted our system to be generalizable to be used in multiple healthcare institutions (first in VA, then University of Utah). This notebook will go over how knowledge is represented in this pipeline and how additional resources can be added to adapt to a new institution.

In [3]:
from medspacy_pna import build_nlp
from medspacy_pna.display import create_html
from medspacy.visualization import visualize_ent, visualize_dep
from IPython.display import HTML
import json

In [4]:
RESOURCES_DIR = "../medspacy_pna/resources/"

The `build_nlp` function takes a domain name and loads a set of default rules. The rules can be found in each of the relevant components. Let's print the first rule in each component:

In [5]:
%%capture
nlp_emergency = build_nlp("emergency")

In [6]:
for name, pipe in nlp_emergency.pipeline:
    if "medspacy" in name:
        print(name)
        print(pipe.rules[0])
        print()

medspacy_concept_tagger
TargetRule(literal="pneumonia", category="PNEUMONIA", pattern=pneumonias?|pna, attributes=None, on_match=None)

medspacy_target_matcher
TargetRule(literal="pneumonia", category="PNEUMONIA", pattern=[{'_': {'concept_tag': {'IN': ['FOCAL', 'INFILTRATE', 'COVID']}}, 'OP': '*'}, {'_': {'concept_tag': 'PNEUMONIA'}, 'OP': '+'}], attributes=None, on_match=None)

medspacy_context
ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')

medspacy_sectionizer
SectionRule(literal="ADDENDUM:", category="addendum", pattern=None, on_match=None, parents=[], parent_required=False)

medspacy_postprocessor
PostprocessingRule: None - Disambiguate between 'impression' meaning imaging and A/P



The rules for each pipeline are stored under:
```
- medspacy_pna/resources
    - clinical/
        - *.json
        - ... 
    - common/
    - configs/
    - discharge/
    - emergency/
    - radiology/
```

Most rules are stored and read in as JSON files (with the exception of PostprocessingRules, which are stored as Python modules and imported directly). The `discharge`, `emergency`, and `radiology` subdirectories contain rules which are specific to each of these note types. The `common` rules are used in all 3 pipelines, while `clinical` rules are used in emergency and discharge notes but not radiology.

The files under `config` specify the filepaths containing rules for each component in each different pipeline:

In [7]:
resource_fps = glob.glob(os.path.join(RESOURCES_DIR, "configs", "*.json"))
print(resource_fps)

['../medspacy_pna/resources/configs\\discharge.json', '../medspacy_pna/resources/configs\\emergency.json', '../medspacy_pna/resources/configs\\radiology.json']


In [8]:
with open(resource_fps[0]) as f:
    config = json.load(f)
config

{'domain': 'discharge',
 'resources': [{'concept_tagger': ['common/anatomy_concept_tag_rules.json',
    'common/concept_tag_rules.json'],
   'target_matcher': ['common/target_rules.json',
    'emergency/emergency_target_rules.json',
    'discharge/discharge_target_rules.json'],
   'context': ['common/context_rules.json',
    'discharge/discharge_context_rules.json',
    'clinical/clinical_context_rules.json',
    'common/anatomy_descriptor_modifier_rules.json'],
   'sectionizer': ['clinical/clinical_section_rules.json',
    'discharge/discharge_section_rules.json']}]}

In [9]:
with open(os.path.join(RESOURCES_DIR, config["resources"][0]["sectionizer"][0])) as f:
    print(f.read(500))

{
    "section_rules": [
        {
            "parents": [],
            "category": "addendum",
            "parent_required": false,
            "literal": "ADDENDUM:"
        },
        {
            "parents": [],
            "category": "addendum",
            "parent_required": false,
            "literal": "Addendum:"
        },
       {
            "parents": [],
            "category": "addendum",
            "parent_required": false,
            "literal": "<<ADDENDUM>>:",
        "me


## Customizing
When moving to a different institution, you may encounter different structures of notes and documentation styles. By adding additional rules you can adapt to these differences while also leveraging the rules and logic implemented in the default system.

The code below adds resources specific to University of Utah which are included under `resources/utah_resources`. These additional rules consist mainly of new section rules to fit University of Utah's Epic EHR. New rules could be added by imitating this structure and addig them as below.

In [10]:
from medspacy_pna.util import add_additional_resources

In [11]:
utah_resources_dir = os.path.join(RESOURCES_DIR, "utah_resources")

In [12]:
sectionizer = nlp_emergency.get_pipe("medspacy_sectionizer")
print(len(sectionizer.rules))
print(sectionizer.rules[-1])

225
SectionRule(literal="REASON FOR ADMISSION (H&P):", category="history_of_present_illness", pattern=None, on_match=None, parents=[], parent_required=False)


In [13]:
%%capture
add_additional_resources(nlp_emergency, domain="emergency", 
                         resources_dir=utah_resources_dir)

In [14]:
print(len(sectionizer.rules))
print(sectionizer.rules[-1:])

246
[SectionRule(literal="Resolved problems:", category="diagnoses", pattern=None, on_match=None, parents=['discharge_diagnoses'], parent_required=False)]
