# Overview
In these notebooks, we'll process an example clinical document with medSpaCy. First, we'll perform preprocessing and sentence segmentation. Next, we'll extract entities using rules, assert attributes such as negation and which section the entity occured in. We'll then put all of our pieces together to process the entire document. Finally, we'll look at an alternative pipeline using a pre-trained statistical model to extract target entities rather than rules.

In this first notebook, we'll introduce the medSpaCy library and show how to load a medSpaCy pipeline. Then in the following notebooks we'll walk through each of the pipeline steps in more detail and apply a fully built pipeline on clinical text.

These notebooks will give a high-level overview of each component, but the individual packages will typically contain more complete examples and documentation. 

**Disclaimer**: many of the subpackages are in beta, just like medSpaCy!

# Notebooks
- [1-Introduction](1-Introduction.ipynb)
- [2-Preprocessing_and_Sentence_Splitting](2-Preprocessing_and_Sentence_Splitting.ipynb)
- [3-Information-Extraction](3-Information-Extraction.ipynb)
- [4-Full-Pipeline](4-Full-Pipeline.ipynb)
- [5-Using-Pretrained-Models](5-Using-Pretrained-Models.ipynb)

# Loading a medSpaCy model
A medSpaCy model consists of a **base spaCy model** with **medSpaCy components added** to the pipeline. There are two primary ways that we can create a medSpaCy model:

1. Load a full pipeline using `medspacy.load()`
2. Add specific components to an existing model

## 1. Load a full medSpaCy pipeline
We can load a complete pipeline using the `medspacy.load()` function. By default, this will build off of spaCy's **en_core_web_sm** model and will include:
- `Preprocessor` for destructive preprocessing
- `Tagger`: A part-of-speech tagger (from **en_core_web_sm**)
- `Parser`: A dependency parser (from **en_core_web_sm**)
- `PyRuSHSentencizer` for sentence splitting
- `TargetMatcher` for extended rule-based matching
- `Sectionizer` for section detection
- `ConText` for contextual analysis and attribute detection
- `Postprocessor` for additional business logic and custom rules

In [6]:
import medspacy

In [4]:
nlp = medspacy.load()

In [5]:
nlp.pipe_names

['sentencizer',
 'tagger',
 'parser',
 'target_matcher',
 'sectionizer',
 'context',
 'postprocessor']

### Default rules
When available, components added by `medspacy.load()` include default rules. `sentencizer`, `context`, and `sectionizer` will all contain default rules:

In [7]:
context = nlp.get_pipe("context")

In [8]:
context.item_data[:10]

[ConTextItem(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, rule='FORWARD'),
 ConTextItem(literal='adequate to rule out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], rule='FORWARD'),
 ConTextItem(literal='adequate to rule the patient out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': 'the'}, {'LOWER': {'IN': ['patient', 'pt']}}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], rule='FORWARD'),
 ConTextItem(literal='any other', category='NEGATED_EXISTENCE', pattern=None, rule='FORWARD'),
 ConTextItem(literal='apart from', category='NEGATED_EXISTENCE', pattern=[{'LOWER': 'apart'}, {'LOWER': {'IN': ['for', 'from']}}], rule='TERMINATE'),
 ConTextItem(l

In [9]:
sectionizer = nlp.get_pipe("sectionizer")

In [10]:
sectionizer.patterns[:10]

[{'section_title': 'addendum', 'pattern': 'ADDENDUM:'},
 {'section_title': 'addendum', 'pattern': 'Addendum:'},
 {'section_title': 'allergies', 'pattern': 'ALLERGIC REACTIONS:'},
 {'section_title': 'allergies', 'pattern': 'ALLERGIES:'},
 {'section_title': 'chief_complaint', 'pattern': 'CC:'},
 {'section_title': 'chief_complaint', 'pattern': 'CHIEF COMPLAINT:'},
 {'section_title': 'chief_complaint', 'pattern': 'Chief Complaint:'},
 {'section_title': 'comments', 'pattern': 'COMMENTS:'},
 {'section_title': 'diagnoses', 'pattern': 'ADMISSION DIAGNOSES:'},
 {'section_title': 'diagnoses', 'pattern': 'DIAGNOSES:'}]

In [11]:
sentencizer = nlp.get_pipe("sentencizer")

You can also set `load_rules` to `False` so that the components are all blank (other than PyRuSH, which requires rules to be instantiated):

### Using specific models
If you have other models installed, either in English or other languages, you can load that model in using the `model` argument. For example, to load a German model, first download the model:

`python -m spacy download de_core_news_sm`

```python
de = medspacy.load("de_core_news_sm", load_rules=False)
```

### Specifying components
You can define which specific components to include or specific components to exclude through the `enable` and `disable` arguments:

In [17]:
nlp_sectionizer_only = medspacy.load(enable=["sentencizer"])
nlp_sectionizer_only.pipe_names

['sentencizer']

In [18]:
nlp_no_pos_dep = medspacy.load(disable=["tagger", "parser"])
nlp_no_pos_dep.pipe_names

['sentencizer', 'target_matcher', 'sectionizer', 'context', 'postprocessor']

## 2. Add specific components to an existing model
You can also import specific classes from medSpaCy, instantiate them yourself, and add them to an existing model. We'll show more examples of how to do this in future notebooks.

In [20]:
import spacy

In [21]:
en = spacy.load("en_core_web_sm")

In [22]:
from medspacy.context import ConTextComponent

In [None]:
context = ConTextComponent(nlp)

In [23]:
en.add_pipe(context)

In [24]:
en.pipe_names

['tagger', 'parser', 'ner', 'context']

# Demo Data
For data, we will use this example text derived from the [MIMIC-II](https://mimic.physionet.org/) critical care dataset:

In [1]:
with open("./discharge_summary.txt") as f:
    text = f.read()

In [2]:
print(text[:500])

Admission Date:  [**2573-5-30**]              Discharge Date:   [**2573-7-1**]

Date of Birth:  [**2498-8-19**]             Sex:   F

Service: SURGERY

Allergies:
Hydrochlorothiazide

Attending:[**First Name3 (LF) 1893**]
Chief Complaint:
Abdominal pain

Major Surgical or Invasive Procedure:
PICC line [**6-25**]
ERCP w/ sphincterotomy [**5-31**]


History of Present Illness:
74y female with type 2 dm and a recent stroke affecting her
speech, who presents with 2 days of abdominal pain. Imaging sh
