# Simple Clinical Natural Language Processing with pyConTextNLP

In this notebook we introduce the basics of pyConTextNLP, a simple Python tool that we have used extensively for processing clinical text, including radiology, psychiatry, etc.

pyConTextNLP is built around the concept of **targets** and **modifiers**: the target is the concept we are interested in identifying (like a cough or a pulmonary embolism); a modifier is a concept that changes the target in some sense (e.g. historical, severity, certainty, negation).

pyConTextNLP relies on [regular expressions](RegularExpressions.ipynb) to identify concepts (both targets and modifiers) within a sentence and then uses simple lexical rules to assign relationships between the identified targets and modifiers. Internally, pyConTextNLP uses graphs. Targets and modifiers are nodes in the graph and relationships between modifiers and targets are edges in the graph.

## Specifying targets, modifiers, and rules

pyConTextNLP uses a four-tuple to represent concepts. Within the program we create an instance of an ``itemData`` class. Each ``itemData`` consists of the following four attributres:

1. A **literal** (e.g. "pulmonary embolism", "no definite evidence of"): This is a lingustic representation of the target or modifier we want to identify
1. A **category** (e.g. "CRITICAL_FINDING", "PROBABLE_EXISTENCE"): This is the label we want applied to the literal when we see it in text
1. A **regular expression** that defines how to identify the literal concept. If no regular expression is specified, a regular expression will be built directly from the literal by wrapping it with word boundaries (e.g. r"""\bpulmonary embolism\b""")
1. A **rule** that defines how the concept works in the sentence (e.g. a negation term that looks **forward** in the sentence). this only applies to modifiers.

In [None]:
!pip install pycontextnlp

In [1]:
import pyConTextNLP.pyConTextGraph as pyConText
import pyConTextNLP.itemData as itemData

# The task: Identify patients with pulmonary embolism from radiology reports
## Step 1: how is the concept of pulmonary embolism represented in the reports - fill in the list below with literals you want to use.

In [11]:
mytargets = itemData.itemData()
mytargets.extend([["pulmonary embolism", "CRITICAL_FINDING", "", ""],
                   ["pneumonia", "CRITICAL_FINDING", "", ""]])

In [12]:
print(mytargets)

itemData: 2 items [pulmonary embolism, pneumonia, ]


In [13]:
!pip install radnlp

Collecting radnlp
  Downloading radnlp-0.2.0.8-py2.py3-none-any.whl
Installing collected packages: radnlp
Successfully installed radnlp-0.2.0.8
[33mYou are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Sentence Splitting

pyConTextNLP operates on a *sentence* level and so the first step we need to take is to split our document into individual sentences. pyConTextNLP comes with a simple sentence splitter class.

In [15]:
import pyConTextNLP.helpers as helpers
spliter = helpers.sentenceSplitter()
spliter.splitSentences("This is Dr. Chapman's first sentence. This is the 2.0 sentence.")


["This is Dr. Chapman's first sentence.", 'This is the 2.0 sentence.']

However, sentence splitting is a common NLP task and so most full-fledged NLP applications provide sentence splitters. We usually rely on the sentence splitter that is part of the [TextBlob](https://textblob.readthedocs.io/en/dev/) package, which in turn relies on the Natural Language Toolkit ([NLTK](http://www.nltk.org/)). So before proceeding we need to download some NLTK resources with the following command.

In [14]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.
