In [1]:
%load_ext autoreload
%autoreload 2

In [26]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
import sys
sys.path.insert(0, "..")

# Background
These notebooks will give a brief overview of the NLP syste described in "ReHouSED: A Novel Measurement of Veteran Housing Stability Using Natural Language Processing" by Chapman et al. 

This manuscript describes a methodology for classifying Veteran housing stability using information in clinical texts. The NLP system is implemented in Python using [medspaCy](https://github.com/medspacy/medspacy). This package medspaCy requires spaCy 2.2.5 and is not currently compatible with spaCy 3. 

This methodology produces a classification at two levels: **document level** and **patient level**. The document-level measure is a single classification for an individual clinical text in one of three discrete classifications: **"STABLY_HOUSED"**, **"STABLY_HOUSED"**, or **"UNKNOWN"**. The patient-level measure aggregates across multiple clinical documents for a patient across a certain time window (e.g., 30 days) and calculates the proportion of notes which are classified as "STABLY_HOUSED" out of all classified as "STABLY_HOUSED" or "UNSTABLY_HOUSED". This patient-level score is termed **Relative Housing Stability in Electronic Documentation** or ***ReHouSED***.

The overall process is shown in the diagram below. In summary:

- A set of patient documents containing keywords related to housing are taken and split into 30-day time intervals
- Each document is processed by a rule-based NLP system which assigns a single document classification to each document
- All documents in each patient-time window are aggregated to compute a ReHouSED score

![rehoused_process_flow](../images/rehoused_process_overview.png)

# Using the model
These notebooks will walk through the main components of the NLP system to demonstrate the underlying logic of the model. A general caveat is that the rules used here are really quite complex, messy, and specific to the dataset. I've removed most rules which are highly specific the data used in VA, and to use this in another setting you'll need to add additional rules to match the EHR or clinical workflows seen in your dataset. These notebooks will show how to customize for your dataset.

Additional resources which may be helpful can be found in the notebooks for [medSpaCy](https://github.com/medspacy/medspacy) or [spaCy's documentation](spacy.io). This notebook will assume some knowledge of spaCy pipelines and common workflows.

## Quickstart
Before going through each specific step, the code below will show a quick way for getting started with loading the model and processing some example texts.

If you've installed `rehoused` as a package, you can import the helper function `build_nlp`, which load the default model used in the paper. We'll also import a few other functions for visualizating the processed documents.

In [3]:
from rehoused_nlp import build_nlp, visualize_doc_classification
from medspacy.visualization import visualize_ent, visualize_dep

from helpers import ENT_COLORS # Colors for visualization

In [4]:
%%capture
nlp = build_nlp()

In [5]:
nlp

<spacy.lang.en.English at 0x7fb3716f3130>

By default, these components are loaded:

In [6]:
nlp.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'attribute_ruler',
 'lemmatizer',
 'medspacy_concept_tagger',
 'medspacy_target_matcher',
 'medspacy_context',
 'medspacy_sectionizer',
 'medspacy_postprocessor',
 'document_classifier']

Let's process a single document which mentions housing and see how the NLP handles it:

In [7]:
text = "The patient was evicted from her apartment last night and is now homeless."

In [8]:
doc = nlp(text)

We can see below that two entities were extracted: **"evicted from her apartment"** and **"homeless"**. Based on these entities. the document was assigned a document classification of **"UNSTABLY_HOUSED"**.

In [9]:
visualize_doc_classification(doc, colors=ENT_COLORS)

In [10]:
doc._.document_classification

'UNSTABLY_HOUSED'

In [11]:
doc.ents

(evicted from her apartment, homeless)

In [12]:
visualize_doc_classification(doc, colors=ENT_COLORS)

Now let's look at a different sentence. Here, two entities of class **"EVIDENCE_OF_HOUSING"** are extracted and document classification of **"STABLY_HOUSED"** is assigned.

In [13]:
text = "She signed her lease and is doing well in her new apartment."

In [14]:
doc = nlp(text)

In [15]:
doc._.document_classification

'STABLY_HOUSED'

In [16]:
doc.ents

(her lease, her new apartment.)

In [17]:
doc._.document_classification

'STABLY_HOUSED'

In [18]:
visualize_doc_classification(doc, colors=ENT_COLORS)

Let's look at one final example. In this case, the document is classified as **"UNKNOWN"**. Although there are two mentions of homelessness, they are both considered *historical* because they are modified by a **"HISTORICAL"** phrase. Attributes like this will be explained in a future notebook.

In [19]:
text = "Pt has a past medical history of Homelessness."
doc = nlp(text)

In [20]:
doc._.document_classification

'UNKNOWN'

In [21]:
doc.ents

(Homelessness,)

In [22]:
visualize_doc_classification(doc, colors=ENT_COLORS)

## Additional examples
This final section will take a list of short texts, process them, sort them by classification, and then visualize them.

In [23]:
texts = [
    "The veteran is doing well in her new apartment.",
    "He has paid his rent.",
    "He signed a lease",
    "Veteran slept on the streets.",
    "The patient is currently literally homeless.",
    "Spent last night at the Mission.",
    "Got a bed at a shelter downtown.",
    "He stayed at his mother's house",
    "Cannot pay the upcoming rent",
    "Got an eviction notice.",
    "Patient with a history of homelessness",
    "Are you in a house, apartment, or room?",
    "Here to discuss his housing situation.",
    "She lives in an apartment building",
    "The patient is not currently homeless"
]

In [24]:
visualize_dep(doc)

  from IPython.core.display import display, HTML


In [27]:
from rehoused_nlp import build_nlp, visualize_doc_classification

nlp = build_nlp()

text = """
History of present illness: The patient was evicted from her apartment two months ago. 
Since then she has lived in a shelter while looking for an apartment.

Past medical history:
1. Pneumonia
2. Afib
3. Homelessness

Housing Status: Stably Housed

Assessment/Plan: The patient was accepted to an apartment and signed the lease last week. 
"""

doc = nlp(text)

visualize_doc_classification(doc)