In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.insert(0, "..")

In [3]:
from rehoused_nlp import build_nlp, visualize_doc_classification
from medspacy.visualization import visualize_ent, visualize_dep
from rehoused_nlp import calculate_rehoused

from helpers import ENT_COLORS

In [4]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
%%capture
nlp = build_nlp()

# Appendix. Customizing ReHouSED NLP
Like any clinical NLP system, the performance of this model will vary greatly based on your data and specific task. The system implemented in this package is an approximation of what was used in the manuscript, but was modified to be more general and remove any specific references to VA documentation practices. If you apply this to a new dataset, you will need to modify the system based on your EHR, the language used in clinical documents, and changed definitions.

## Resources files
The majority of logic for the system is contained in the `resources` directory: `rehoused/resources/*`. Each of these files will contain rules corresponding to one of the components described in the notebook. They will mostly be `.py` files, although many rules can also be stored as `.json`. files (the exception being `postprocessing` rules and rules which use more advanced callback functions). The subfolder `target_rules` will each contain rules for different entity classes.
```
- rehoused_nlp/
    - resources/
        - target_rules/
            - doubling_up.py
            - evidence_of_homelessness.py
            - ...
        - callbacks.py
        - concept_tag_rules.py
        - context_rules.py
        - postprocess_rules.py
        - preprocess_rules.py
        - section_rules.py
    - ...
```

We didn't discuss `preprocess_rules` or `callbacks` in these notebooks, but the medspaCy repo contains examples and documentation.

## Loading rules
The helper function `rehoused_nlp.utils.build_nlp()` handles instantiating the NLP pipeline and adding rules, but you can always manually load a model and add rules yourselves (again, see medspaCy for more examples).

## Adding rules programatically
The best way to customize rules is to edit or create resource files like the ones listed above. But you can also add them directly to pipeline components. Each of the examples below will show how to add a rule to one of the components we discussed.

### TargetMatcher

In [7]:
from medspacy.target_matcher import TargetRule

target_matcher = nlp.get_pipe("medspacy_target_matcher")
# Add a phrase for a specific homelessness shelter
rule = TargetRule("SLC Downtown Shelter for the Homeless", "TEMPORARY_HOUSING")
target_matcher.add([rule])

visualize_ent(nlp("He is staying at SLC Downtown Shelter for the Homeless."), colors=ENT_COLORS)

### ConText

In [9]:
from medspacy.context import ConTextRule

context = nlp.get_pipe("medspacy_context")
# Add a phrase for matching dates and considering them historical
rule = ConTextRule("in Xxx 20xx", "HISTORICAL", direction="BIDIRECTIONAL",
                  pattern=[
                      {"LOWER": "in"},
                      {"OP": "?"},
                      {"LOWER": {"REGEX": r"20[01]\d$"}}
                  ]
                  )
context.add([rule])

visualize_dep(nlp("He was homeless in September 2016."))

### Section detection

In [11]:
from medspacy.section_detection import SectionRule

sectionizer = nlp.get_pipe("medspacy_sectionizer")
# Add a specific note header
rule = SectionRule("Previous medical information:", "past_medical_history")
sectionizer.add([rule])

visualize_ent(nlp("Previous medical information: Homelessness"), colors=ENT_COLORS)

### Postprocessing

In [12]:
from medspacy.postprocess import PostprocessingRule, PostprocessingPattern
from rehoused_nlp.resources.postprocess_rules import set_ignored


postprocessor = nlp.get_pipe("medspacy_postprocessor")
postprocessor.debug = True
# Add a rule to ignore mentions of entities in the "patient_instructions" section
text = "Discharge instructions: Learn more about resources for stable housing."

print("Before:")
visualize_doc_classification(nlp(text))

rule = PostprocessingRule(
    patterns=[
        PostprocessingPattern(lambda x:x._.section_category == "patient_instructions")
    ],
    action=set_ignored,
    action_args=(True,)
)

postprocessor.add([rule])
print("After:")
visualize_doc_classification(nlp(text), colors=ENT_COLORS)

Before:
stable housing



After:
stable housing
Passed: PostprocessingRule: None - None on ent: stable housing Discharge instructions: Learn more about resources for stable housing.



## Saving rules
As mentioned before, some rules have to be saved as .py files. But `TargetRules`, `ConTextRules`, and `SectionRules` can be saved as JSON files, which can be convenient for sharing with other systems, such as systems other than medspaCy:

In [13]:
rules = target_matcher.rules[:5]

In [14]:
TargetRule.to_json(rules, "./example_target_rules.json")

In [15]:
import json
with open("./example_target_rules.json") as f:
    print(json.load(f))

{'target_rules': [{'literal': 'homeless', 'category': 'EVIDENCE_OF_HOMELESSNESS', 'pattern': [{'LOWER': {'REGEX': 'homeless'}}]}, {'literal': 'chronic homelessness', 'category': 'EVIDENCE_OF_HOMELESSNESS', 'pattern': [{'LOWER': {'REGEX': '^chronic'}}, {'LOWER': {'REGEX': '^homeless'}}]}, {'literal': 'literally homeless', 'category': 'EVIDENCE_OF_HOMELESSNESS'}, {'literal': 'homeless veteran', 'category': 'EVIDENCE_OF_HOMELESSNESS'}, {'literal': 'sleep in <HOMELESS_LOCATION>', 'category': 'EVIDENCE_OF_HOMELESSNESS', 'pattern': [{'_': {'concept_tag': 'RESIDES'}, 'OP': '+'}, {'OP': '?'}, {'_': {'concept_tag': 'HOMELESS_LOCATION'}}]}]}
