![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# **ContextualParserModel**

This notebook will cover the different parameters and usages of `ContextualParserModel` annotator. 

**📖 Learning Objectives:**

1. Understand how to use `ContextualParserModel`.

2. Become comfortable using the different parameters of the annotator.

3. Train a `ContextualParserApproach` annotator and use that model with `ContextualParserModel` in the future.


**🔗 Helpful Links:**

- Documentation : [ContextualParserModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#contextualparser)

- Python Docs : [ContextualParserModel](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/context/contextual_parser/index.html#sparknlp_jsl.annotator.context.contextual_parser.ContextualParserModel)

- Scala Docs : [ContextualParserModel](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/context/ContextualParserModel.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/09.0.Contextual_Parser_Rule_Based_NER.ipynb).

## **📜 Background**


`ContextualParser` annotator extracts entities from texts based on pattern matching. It provides more functionality than its open-source counterpart `EntityRuler` by allowing users to customize specific characteristics for pattern matching. 

It allows setting regex rules for full and partial matches, a dictionary with normalizing options, and context parameters to take into account specific conditions such as token distances.

`ContextualParserApproach` annotator learns the patterns given by JSON/TSV/CSV file to define a new `ContextualParserModel`.

## **🎬 Colab Setup**

In [1]:
!pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp

nlp.install()

In [None]:
from johnsnowlabs import nlp, medical

spark = nlp.start()

## **🖨️ Input/Output Annotation Types**

- Input: `DOCUMENT`, `TOKEN`

- Output: `CHUNK`

## **🔎 Parameters**


The parameters below are the shared parameters with `ContextualParserApproach`. So you can use them as in `ContextualParserApproach`.

- `inputCols`: The name of the columns containing the input annotations. It can read either a String column or an Array.
- `outputCol`: The name of the column in Document type that is generated. We can specify only one column here.
- `caseSensitive`: Whether to use case sensitive when matching values.
- `prefixAndSuffixMatch`: Whether to match both prefix and suffix to annotate the match.
- `optionalContextRules`: When set to true, it will output a regex match regardless of context matches.
- `shortestContextMatch`: When set to true, it will stop finding matches when prefix/suffix data is found in the text.

All the parameters can be set using the corresponding set method in the camel case. For example, `.setInputcols()`.

## Build a ContextualParser pipeline using `ContextualParserApproach` and save a `ContextualParserModel` 

Let's build a pipeline ContextualParser, then save the model to be used in `ContextualParserModel`

In [5]:
# Create a dictionary to detect cities
cities = """City\nNew York\nGotham City\nSan Antonio\nSalt Lake City"""

with open('cities.tsv', 'w') as f:
    f.write(cities)

# Check what dictionary looks like
!cat cities.tsv

City
New York
Gotham City
San Antonio
Salt Lake City

In [6]:
# Create JSON file
context_rules = {
  "entity": "City",
  "ruleScope": "document", 
  "matchScope":"sub-token",
  "completeMatchRegex": "false",
  "regex": "([A-Z]{1}[a-z]+ [A-Z]{1}[a-z]+)" # Find two consecutive words in title case.
} 

import json
with open('context_rules.json', 'w') as f:
    json.dump(context_rules, f)

In [7]:
# Build pipeline
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

contextual_parser = medical.ContextualParserApproach() \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("entity")\
    .setJsonPath("context_rules.json")\
    .setCaseSensitive(True)\
    .setDictionary('cities.tsv', options={"orientation":"vertical"})

chunk_converter = medical.ChunkConverter() \
    .setInputCols(["entity"]) \
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
        document_assembler, 
        sentence_detector,
        tokenizer,
        contextual_parser,
        chunk_converter,
        ])

In [8]:
# Create a lightpipeline model
empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

light_model = nlp.LightPipeline(model)

In [9]:
sample_text = "Peter Parker is a nice guy and lives in New York. Bruce Wayne is also a nice guy and lives in San Antonio and Gotham City. They met at salt lake city."

In [10]:
# Annotate the sample text
annotations = light_model.fullAnnotate(sample_text)[0]

[item.result for item in annotations["entity"]]

['New York', 'San Antonio', 'Gotham City']

In [11]:
visualiser = nlp.viz.NerVisualizer()

visualiser.display(annotations, label_col='ner_chunk', document_col='document')

Now let's save the `ContextualParserApproach` model.

In [12]:
model.stages[-2].write().overwrite().save('models/custom_parser_model')

Now re-use the saved model in another pipeline with the `ContextualParserModel`  annotator and load function.

In [13]:
custom_parser = medical.ContextualParserModel.load('models/custom_parser_model')\
    # .setInputCols(["sentence", "token"])\
    # .setOutputCol("entity")\

parser_pipeline = nlp.Pipeline(stages=[
        document_assembler, 
        sentence_detector,
        tokenizer,
        custom_parser, # load saved model
        chunk_converter,
        ])

# Create a lightpipeline model
empty_data = spark.createDataFrame([[""]]).toDF("text")

parser_model = parser_pipeline.fit(empty_data)

light_parser_model = nlp.LightPipeline(parser_model)

In [14]:
annotations_parser = light_parser_model.fullAnnotate(sample_text)[0]
visualiser.display(annotations_parser, label_col='ner_chunk', document_col='document')


We get the same result as the previous model. Here, with saving the model, we all saved model parameters, settings, JSON, and dictionary files. In `ContextualParserModel`, we only loaded the saved model.

In [15]:
[item.result for item in annotations_parser["entity"]]

['New York', 'San Antonio', 'Gotham City']