# Demonstration of Basic Sentence Markup with pyConTextNLP, Part 2.
## An ever-so-slightly more complex sentence

### Let's use a slightly more complex sentence that will illustrate pruning.

In [None]:
import pyConTextNLP.pyConText as pyConText
import pyConTextNLP.itemData as itemData
import networkx as nx

### Sentences

These example reports are taken from (with modification) the [MIMIC2 demo data set](https://physionet.org/mimic2/) that is a publically available database of de-identified medical records for deceased individuals. 

In [None]:
reports = [
    """IMPRESSION: Evaluation limited by lack of IV contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. Non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes.""",
    """IMPRESSION: Evidence of early pulmonary vascular congestion and interstitial edema. Probable scarring at the medial aspect of the right lung base, with no
     definite consolidation."""
    ,
    """IMPRESSION:
     
     1.  2.0 cm cyst of the right renal lower pole.  Otherwise, normal appearance
     of the right kidney with patent vasculature and no sonographic evidence of
     renal artery stenosis.
     2.  Surgically absent left kidney.""",
    """IMPRESSION:  No pneumothorax.""",
    """IMPRESSION: No definite pneumothorax""",
    """IMPRESSION:  New opacity at the left lower lobe consistent with pneumonia."""
]

### Read the ``itemData`` definitions

In [None]:
modifiers = itemData.get_items(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/lexical_kb_05042016.yml")
targets = itemData.get_items(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/utah_crit.yml")


### We're going to start with our simplest of sentences

In [None]:
reports[4]

### marking up a sentence

We start by creating an instance of the ``ConTextMarkup`` class. This is a subclass of a NetworkX DiGraph. Information will be stored in the nodes and edges. 

In [None]:
markup = pyConText.ConTextMarkup()

In [None]:
markup.setRawText(reports[4].lower())
print(markup)
print(len(markup.getRawText()))

markup.cleanText()
print(markup)
print(len(markup.getText()))

#### Identify concepts in the sentence


In [None]:
markup.markItems(modifiers, mode="modifier")
markup.markItems(targets, mode="target")
for node in markup.nodes(data=True):
    print(node)


(<id> 256833253737566050220546835725615337803 </id> <phrase> no </phrase> <category> ['definite_negated_existence'] </category> , {'category': 'modifier'})
(<id> 256833892316555915191107839689855045963 </id> <phrase> no definite </phrase> <category> ['definite_negated_existence'] </category> , {'category': 'modifier'})
(<id> 256826997881853923908450449495296807243 </id> <phrase> definite </phrase> <category> ['definite_existence'] </category> , {'category': 'modifier'})
(<id> 256849716557454889207255398223055655243 </id> <phrase> pneumothorax </phrase> <category> ['pneumothorax'] </category> , {'category': 'target'})


#### What does our initial markup look like?

* We've identified three concepts in the sentence: 
    1. "no"
    1. "no definite"
    1. "pneumothorax"
* Here "no" is not a true concept in the sentence; it is a subset of the concept "no definite"

#### Prune Marks

After identifying concepts, we prune concepts that are a subset of another identified concept.

In [None]:
markup.pruneMarks()
for node in markup.nodes(data=True):
    print(node)

#### What is the effect of ``pruneMarks``

We've correctly dropped ``no`` as an identified concept.

#### Apply modifiers

We now call the ``applyModifiers`` method of the ConTextMarkup object to identify any relationships between the nodes.

In [None]:
markup.applyModifiers()
for edge in markup.edges():
    print(edge)

### Here is a notebook for [Multisentence Documents](./MultiSentenceDocuments.ipynb)