# How does the pyConText work

Before we explain its processing mechanism, let's learn a few important concepts.



## 1. The information model

The information model is an abstraction and representation of concepts (a formal definition can be found at [Terminology for Policy-Based Management](https://tools.ietf.org/html/rfc3198)). In pyConText, we set up a simple information model to represent the concepts we are looking for, which includes two components: targets and modifiers.

* A **target** is the component of this IM to describe the core information of the concept. For instance, *"breast cancer"* in "brother- breast CA."

* A ** modifier** is the component to describe a certain property of a target. For instance, *"brother"* in "brother- breast CA."


**Question**: Why we don't represent this concept by just using *"brother breast CA"* without separating the target and the modifier?



## 2. Three types of modifiers in pyConText

* **Negation**: whether a target is negated or not, e.g. "no *masses*".
* **Historical**: whether the concept is a historical (e.g., "a remote history of *diverticulitis* in the 70s"), present(e.g., "found by EMS at scene *unresponsive*"), or hypothetical (e.g., "if the *pain* exacerbated").  
    Note: The meaning of "present" by a physician is different from what we normally say "present."
* **Nonpatient**: whether the concept is referring to the patient or not, e.g. "Sister with *breast cancer*"






## 3. A typical pyConText rule
The pyConText rule file can be found at [KB/fam_bc_modifiers.yml](/edit/KB/fam_bc_modifiers.yml)  

A typical pyConText rule has four elements,     For instance: 
![a screenshot of modifier rule file in yml format](img/snapshot2.png)
    
The four elements are:

1) The lexicon (e.g. "can be ruled out")  
2) The type (e.g. "DEFINITE_NEGATED_EXISTENCE")  
3) The regular expression (optional) used to capture the literal in the text. If no regular expression is provided, a regular expression is generated literally from the literal.  
4) The direction states to which direction that the modifier operates in the sentence: current valid values are: "forward", the item can modify objects following it in the sentence; "backward", the item can modify objects preceding it in the sentence; or "bidirectional", the item can modify objects preceding and following it in the sentence. 


#### Add pict for each modifier

## 4. How does the pyConText work --- a simple explanation

The pyConText will first *locate* a target term, and then *look around* it to see if there is any context clue that matches the context lexicon in the pyConText rule. If there is, pyConText will mark the clue with the context type of that rule. 

### 4.1 Negation example:

Let's use the above rule as the example:

![an example visualization of pyConText](img/snapshot7.png)

As you can see, "can be ruled out" is identifed and linked to the target "breast cancer." The "dne" is the first character of each word in "DEFINITE_NEGATED_EXISTENCE."


### 4.2 Historical example

Here is an example rule to identify historical context:

![an example visualization of pyConText](img/snapshot9.png)

This rule uses a simple regular expression <span style="color:darkred">'\b\d+ years ago'</span> to express the clue related 'x years ago', where 'x' can be any positive number. For example, '20 years ago' can be identified as below:



![an example visualization of pyConText](img/snapshot8.png)

'his' is the first three characters of "HISTORICAL."

### 4.3 Nonpatient example

By default, any concept mentioned in clinical text is referring to the patient unless we find a none patient context clue. For this task, we are targeting the family history, so we need to make some context rules to identify the family related context. For example:
![an example visualization of pyConText](img/snapshot10.png)

When executing pyConText, the word "sister" is picked up as the "FAMILY" context for the target term "breast cancer":

![an example visualization of pyConText](img/snapshot3.png)


### 4.4 Read more:

The actual mechanism is much more complicated than this simple explanation. More detailed information can be found in this paper:

> Chapman WW, Hilert D, Velupillai S, Kvist M, Skeppstedt M, Chapman BE, et al. [Extending the NegEx lexicon for multiple languages](https://www.ncbi.nlm.nih.gov/pubmed/23920642). In: Proceedings of the 14th world congress on medical & health informatics (MEDINFO); 2013. p. 677–681.

## 5. pyConText Playground
Feel free to make up some examples and try it yourself to see what can be produced out of pyConText. Here is a playground for you. The cell below is to set up everything, let's ignore what's inside exactly--will explain later

In [None]:
#intall nltk
import nltk
nltk.download('punkt')

In [None]:
# ignore everything inside here, we will explain later
from DocumentClassifier import DocumentClassifier
from visual import Vis, view_pycontext_output
pos_doc_type='FAM_BREAST_CA_DOC'
TARGETS_FILE_PATH = 'KB/fam_bc_targets.yml'
MODIFIERS_FILE_PATH = 'KB/fam_bc_modifiers.yml'
FEATURE_INFERENCER_FILE_PATH = 'KB/fam_bc_featurer_inferences.csv'
DOC_INFERENCER_FILE_PATH = 'KB/fam_bc_doc_inferences.csv'
# clear just in case files/regular expressions have been updated
vis = Vis(context_file=MODIFIERS_FILE_PATH)
classifier = DocumentClassifier(TARGETS_FILE_PATH, MODIFIERS_FILE_PATH,
                            FEATURE_INFERENCER_FILE_PATH, DOC_INFERENCER_FILE_PATH,
                            {pos_doc_type})
classifier.reset_saved_predictions()

#### Try different input string (str), see what happens

In [None]:
# This is your input string, just make sure the target term 'breast cancer' is included.
str = '''mother does not only have breast cancer'''
print(classifier.predict(str))
view_pycontext_output(classifier.get_last_doc_markups(), vis)

If you got some errors complaining something about: 
```bash
'textblob' not installed.
```

Uncomment and execute the following code below:

In [None]:
#!pip install -U textblob
#!python -m textblob.download_corpora

If you got some errors complaining something about: 
```bash
Resource 'tokenizers/punkt/PY3/english.pickle' not found.
```

Uncomment and execute the following code below:

In [None]:
#!python -m textblob.download_corpora

<br/><hr/>This material presented as part of the Foundations of Healthcare Informatics Course, 2017 Fall, BMI, University of Utah. It's revised from the <a href="https://github.com/UUDeCART/decart_rule_based_nlp">material</a> of the DeCART  Summer Program (Data, exploration, Computation, and Analytics Real-world Training for the Health Sciences) at the University of Utah in 2017. <br/><br/>Original presenters : Dr. Wendy Chapman, Jianlin Shi and Kelly Peterson.<br/>
Revised by: Jianlin Shi and Dr. Wendy Chapman<br/>
<img align="left" src="https://wiki.creativecommons.org/images/1/10/Cc.org_cc_by_license.jpg" alt="Except where otherwise noted, this website is licensed under a Creative Commons Attribution 3.0 Unported License.">

