![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/32.0.Contextual_Assertion.ipynb)

# 📜Contextual Assertion

This notebook will cover the different parameters and usages of `ContextualAssertion` annotator.

**📖 Learning Objectives:**

1. Understand how to use `ContextualAssertion`.

2. Become comfortable using the different parameters of the annotator.

3. Build a pretraine pipeline using `ContextualAssertion` annotator.




## **📜 Background**



This model identifies  contextual cues within text data, such as negation, uncertainty, and assertion. It is used
clinical assertion detection, etc. It annotates text chunks with assertions based on configurable rules,
prefix and suffix patterns, and exception patterns.



## **🎬 Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.settings.enforce_versions=True
nlp.install(refresh_install=True)

In [4]:
from johnsnowlabs import nlp, medical#, visual
import pandas as pd

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()
spark

👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_541_mine.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.4.0, 💊Spark-Healthcare==5.4.0, running on ⚡ PySpark==3.4.0


## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

##Pretrained Contextual Assertion Models





**<center>CONTEXTUAL ASSERTION MODELS**

|index|model|
|-----:|:-----
| 1| [contextual_assertion_someone_else](https://nlp.johnsnowlabs.com/2024/06/26/contextual_assertion_someone_else_en.html)  |
2| [contextual_assertion_past](https://nlp.johnsnowlabs.com/2024/07/04/contextual_assertion_past_en.html)  |
3| [contextual_assertion_absent](https://nlp.johnsnowlabs.com/2024/07/03/contextual_assertion_absent_en.html)  |


**You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?q=Chunk+Mapping&edition=Spark+NLP+for+Healthcare)**



## **🖨️ Input/Output Annotation Types**

- Input: `SENTECE`, `TOKEN`, `CHUNK`

- Output: `ASSERTION`

## **🔎 Parameters**

**Parameters**:

- `inputCols`: Input annotations.
- `caseSensitive`: Whether to use case sensitive when matching values. By default `False`.
- `prefixAndSuffixMatch`: Whether to match both prefix and suffix to annotate the hit
- `prefixKeywords`: Prefix keywords to match
- `suffixKeywords`: Suffix keywords to match
- `exceptionKeywords`: Exception keywords not to match
- `prefixRegexPatterns`: Prefix regex patterns to match
- `suffixRegexPatterns`: Suffix regex pattern to match
- `exceptionRegexPatterns`: Exception regex pattern not to match
- `scopeWindow`: The scope window of the assertion expression
- `assertion`:Assertion to match
- `includeChunkToScope`:Whether to include chunk to scope when matching values
- `scopeWindowDelimiter`:Delimiters used to limit the scope window
- `ConfidenceCalculationDirection`: Indicates the direction for calculating assertion confidence (left, right, or both; default is left).

## Build a ContextualAssertion pipeline

In [6]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel \
    .pretrained("embeddings_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

clinical_ner = medical.NerModel \
    .pretrained("ner_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = medical.NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

contextual_assertion = medical.ContextualAssertion() \
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("assertion") \
    .setPrefixKeywords(["no", "not"]) \
    .setSuffixKeywords(["unlikely","negative"]) \
    .setPrefixRegexPatterns(["\\b(no|without|denies|never|none|free of|not include)\\b"]) \
    .setSuffixRegexPatterns(["\\b(free of|negative for|absence of|not|rule out)\\b"]) \
    .setExceptionKeywords(["without"]) \
    .setExceptionRegexPatterns(["\\b(not clearly)\\b"]) \
    .addPrefixKeywords(["negative for","negative"]) \
    .addSuffixKeywords(["absent","neither"]) \
    .setCaseSensitive(False) \
    .setPrefixAndSuffixMatch(False) \
    .setAssertion("absent") \
    .setScopeWindow([2, 2])\
    .setIncludeChunkToScope(True)

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})

pipeline = nlp.Pipeline(
    stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter,
        contextual_assertion,
        flattener])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)



embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
[OK!]


In [7]:
text = """Patient resting in bed. Patient given azithromycin without any difficulty. Patient has audible wheezing, states chest tightness.
     No evidence of hypertension. Patient denies nausea at this time. zofran declined. Patient is also having intermittent sweating
     associated with pneumonia. Patient refused pain but tylenol still given. Neither substance abuse nor alcohol use however cocaine
     once used in the last year. Alcoholism unlikely. Patient has headache and fever. Patient is not diabetic. Not clearly of diarrhea.
     Lab reports confirm lymphocytopenia. Cardaic rhythm is Sinus bradycardia. Patient also has a history of cardiac injury.
     No kidney injury reported. No abnormal rashes or ulcers. Patient might not have liver disease. Confirmed absence of hemoptysis.
     Although patient has severe pneumonia and fever, test reports are negative for COVID-19 infection. COVID-19 viral infection absent.
    """
data = spark.createDataFrame([[text]]).toDF("text")

In [8]:
result = model.transform(data)
result.show(truncate=False)

+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|nausea            |178  |183|PROBLEM  |absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|kidney injury     |664  |676|PROBLEM  |absent   |
|abnormal rashes   |691  |705|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



##🔬 Parameters Usage

  **setPrefixKeywords:** Prefix keywords to match

  **setSuffixKeywords:** Suffix keywords to match

  **setPrefixRegexPatterns:** Prefix regex patterns to match
  
  **setSuffixRegexPatterns:** Suffix regex pattern to match




To configure assertions using specific keywords and regex patterns, you can use the methods `setPrefixKeywords`, `setSuffixKeywords`, `setPrefixRegexPatterns`, and `setSuffixRegexPatterns`.

In [9]:
contextual_assertion = medical.ContextualAssertion() \
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("assertion") \
    .setPrefixKeywords(["no", "not"]) \
    .setSuffixKeywords(["unlikely","negative"]) \
    .setPrefixRegexPatterns(["\\b(no|without|denies|never|none|free of|not include)\\b"]) \
    .setSuffixRegexPatterns(["\\b(free of|negative for|absence of|not|rule out)\\b"]) \
    .setAssertion("absent") \

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})

pipeline = nlp.Pipeline(
    stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter,
        contextual_assertion,
        flattener])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)

+----------------+-----+---+---------+---------+
|ner_chunk       |begin|end|ner_label|assertion|
+----------------+-----+---+---------+---------+
|any difficulty  |59   |72 |PROBLEM  |absent   |
|hypertension    |149  |160|PROBLEM  |absent   |
|nausea          |178  |183|PROBLEM  |absent   |
|Alcoholism      |428  |437|PROBLEM  |absent   |
|diabetic        |496  |503|PROBLEM  |absent   |
|kidney injury   |664  |676|PROBLEM  |absent   |
|abnormal rashes |691  |705|PROBLEM  |absent   |
|ulcers          |710  |715|PROBLEM  |absent   |
|liver disease   |741  |753|PROBLEM  |absent   |
|severe pneumonia|815  |830|PROBLEM  |absent   |
|fever           |836  |840|PROBLEM  |absent   |
+----------------+-----+---+---------+---------+



  **setExceptionKeywords :** Exception keywords not to match
  
  **setExceptionRegexPatterns :** Exception regex pattern not to match





If you want to exclude certain keywords and patterns from being detected, use setExceptionKeywords and setExceptionRegexPatterns.

In [10]:
contextual_assertion = medical.ContextualAssertion() \
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("assertion") \
    .setExceptionKeywords(["without"]) \
    .setExceptionRegexPatterns(["\\b(not clearly)\\b"]) \

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})


pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    ner_converter,
    contextual_assertion,
    flattener])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)

+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|hypertension      |149  |160|PROBLEM  |absent   |
|nausea            |178  |183|PROBLEM  |absent   |
|zofran            |199  |204|TREATMENT|absent   |
|pain              |309  |312|PROBLEM  |absent   |
|tylenol           |318  |324|TREATMENT|absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|kidney injury     |664  |676|PROBLEM  |absent   |
|abnormal rashes   |691  |705|PROBLEM  |absent   |
|ulcers            |710  |715|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|hemoptysis        |777  |786|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



**addPrefixKeywords:** Adds the prefix keywords to match

**addSuffixKeywords:** Adds the suffix keywords to match

To add to the default keywords and regex patterns, or to include pretrained model keywords and patterns, use the methods `addPrefixKeywords` and `addSuffixKeywords`.




In [11]:
contextual_assertion = medical.ContextualAssertion() \
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("assertion") \
    .addPrefixKeywords(["negative for","negative"]) \
    .addSuffixKeywords(["absent","neither"]) \

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})


pipeline = nlp.Pipeline(
    stages=[
      document_assembler,
      sentence_detector,
      tokenizer,
      word_embeddings,
      clinical_ner,
      ner_converter,
      contextual_assertion,
      flattener])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)

+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|any difficulty    |59   |72 |PROBLEM  |absent   |
|hypertension      |149  |160|PROBLEM  |absent   |
|nausea            |178  |183|PROBLEM  |absent   |
|zofran            |199  |204|TREATMENT|absent   |
|pain              |309  |312|PROBLEM  |absent   |
|tylenol           |318  |324|TREATMENT|absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|kidney injury     |664  |676|PROBLEM  |absent   |
|abnormal rashes   |691  |705|PROBLEM  |absent   |
|ulcers            |710  |715|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|hemoptysis        |777  |786|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



**setScopeWindow** : The scope window of the assertion expression

**setIncludeChunkToScope** : Whether to include chunk to scope when matching values

Scope Window specifies the number of tokens before and after the chunk within which keywords and regex patterns should be searched, use the `setScopeWindow` method. By default, the search is conducted throughout the entire sentence.

If you want to include chunk in the search of keywords and regex patterns, the `setIncludeChunkToScope` method is used.


In [12]:
contextual_assertion = medical.ContextualAssertion()\
            .setInputCols("sentence", "token", "ner_chunk") \
            .setOutputCol("assertion") \
            .setScopeWindow([2, 2])\
            .setIncludeChunkToScope(True)\

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})


pipeline = nlp.Pipeline(stages=[
            document_assembler,
            sentence_detector,
            tokenizer,
            word_embeddings,
            clinical_ner,
            ner_converter,
            contextual_assertion,
            flattener
        ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)


+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|any difficulty    |59   |72 |PROBLEM  |absent   |
|nausea            |178  |183|PROBLEM  |absent   |
|zofran            |199  |204|TREATMENT|absent   |
|pain              |309  |312|PROBLEM  |absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|kidney injury     |664  |676|PROBLEM  |absent   |
|abnormal rashes   |691  |705|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|hemoptysis        |777  |786|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



**setPrefixAndSuffixMatch** : Whether to match both prefix and suffix to annotate the hit





To search for matches at both the beginning and end of chunks, activate this feature by setting `setPrefixAndSuffixMatch` to `True`.

In [13]:
contextual_assertion = medical.ContextualAssertion()\
            .setInputCols("sentence", "token", "ner_chunk") \
            .setOutputCol("assertion") \
            .setPrefixAndSuffixMatch(True) \


pipeline = nlp.Pipeline(stages=[
            document_assembler,
            sentence_detector,
            tokenizer,
            word_embeddings,
            clinical_ner,
            ner_converter,
            contextual_assertion

        ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)


+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------

**setCaseSensitive**


If case sensitivity is important for the keywords being searched, you can enable this by setting `setCaseSensitive` to `True`.

In [14]:
contextual_assertion = medical.ContextualAssertion()\
    .setInputCols("sentence", "token", "ner_chunk") \
    .setOutputCol("assertion") \
    .setCaseSensitive(True) \

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})


pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    ner_converter,
    contextual_assertion,
    flattener
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)


+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|any difficulty    |59   |72 |PROBLEM  |absent   |
|nausea            |178  |183|PROBLEM  |absent   |
|zofran            |199  |204|TREATMENT|absent   |
|pain              |309  |312|PROBLEM  |absent   |
|tylenol           |318  |324|TREATMENT|absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|hemoptysis        |777  |786|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



**setScopeWindowDelimiters**


Delimiters are specific characters or keywords used to define the boundaries of a text, aiding limiting the scope of operations. You can set delimiters with `setScopeWindowDelimiters()` method.

In [15]:
contextual_assertion = medical.ContextualAssertion()\
  .setInputCols("sentence", "token", "ner_chunk") \
  .setOutputCol("assertion") \
  .setScopeWindowDelimiters([",","and"])\

flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.ner_label as ner_label",
                                            "result as assertion"]})


pipeline = nlp.Pipeline(
stages=[
      document_assembler,
      sentence_detector,
      tokenizer,
      word_embeddings,
      clinical_ner,
      ner_converter,
      contextual_assertion,
      flattener
  ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)
result.show(truncate=False)


+------------------+-----+---+---------+---------+
|ner_chunk         |begin|end|ner_label|assertion|
+------------------+-----+---+---------+---------+
|any difficulty    |59   |72 |PROBLEM  |absent   |
|hypertension      |149  |160|PROBLEM  |absent   |
|nausea            |178  |183|PROBLEM  |absent   |
|zofran            |199  |204|TREATMENT|absent   |
|pain              |309  |312|PROBLEM  |absent   |
|tylenol           |318  |324|TREATMENT|absent   |
|Alcoholism        |428  |437|PROBLEM  |absent   |
|diabetic          |496  |503|PROBLEM  |absent   |
|kidney injury     |664  |676|PROBLEM  |absent   |
|abnormal rashes   |691  |705|PROBLEM  |absent   |
|ulcers            |710  |715|PROBLEM  |absent   |
|liver disease     |741  |753|PROBLEM  |absent   |
|hemoptysis        |777  |786|PROBLEM  |absent   |
|COVID-19 infection|873  |890|PROBLEM  |absent   |
|viral infection   |902  |916|PROBLEM  |absent   |
+------------------+-----+---+---------+---------+



**setConfidenceCalculationDirection**

The confidence is calculated by starting with a value of 1 and then reducing it proportionally based on the distance, where "distance" represents how far the found regex or keyword is from a chunk. `setConfidenceCalculationDirection` parameter indicates the direction for calculating assertion confidence (left, right, or both; default is left).

In [17]:
#Default ConfidenceCalculationDirection left
contextual_assertion = medical.ContextualAssertion()\
            .setInputCols("sentence", "token", "ner_chunk") \
            .setOutputCol("assertion") \



flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.confidence as confidence",
                                            "result as result"]})

pipeline = nlp.Pipeline(
    stages=[
          document_assembler,
          sentence_detector,
          tokenizer,
          word_embeddings,
          clinical_ner,
          ner_converter,
          contextual_assertion,
          flattener
      ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)

result.show(truncate=False)

+------------------+-----+---+----------+------+
|ner_chunk         |begin|end|confidence|result|
+------------------+-----+---+----------+------+
|any difficulty    |59   |72 |0.9802    |absent|
|hypertension      |149  |160|0.7711    |absent|
|nausea            |178  |183|0.9802    |absent|
|zofran            |199  |204|0.0       |absent|
|pain              |309  |312|0.9802    |absent|
|tylenol           |318  |324|0.8187    |absent|
|Alcoholism        |428  |437|0.0       |absent|
|diabetic          |496  |503|0.9802    |absent|
|kidney injury     |664  |676|0.9802    |absent|
|abnormal rashes   |691  |705|0.9802    |absent|
|ulcers            |710  |715|0.6703    |absent|
|liver disease     |741  |753|0.8869    |absent|
|hemoptysis        |777  |786|0.9802    |absent|
|COVID-19 infection|873  |890|0.9802    |absent|
|viral infection   |902  |916|0.0       |absent|
+------------------+-----+---+----------+------+



In [19]:
# ConfidenceCalculationDirection both
contextual_assertion = medical.ContextualAssertion()\
            .setInputCols("sentence", "token", "ner_chunk") \
            .setOutputCol("assertion") \
            .setConfidenceCalculationDirection("both")


flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.confidence as confidence",
                                            "result as result"]})

pipeline = nlp.Pipeline(
    stages=[
          document_assembler,
          sentence_detector,
          tokenizer,
          word_embeddings,
          clinical_ner,
          ner_converter,
          contextual_assertion,
          flattener
      ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)

result.show(truncate=False)

+------------------+-----+---+----------+------+
|ner_chunk         |begin|end|confidence|result|
+------------------+-----+---+----------+------+
|any difficulty    |59   |72 |0.9802    |absent|
|hypertension      |149  |160|0.7711    |absent|
|nausea            |178  |183|0.9802    |absent|
|zofran            |199  |204|0.9802    |absent|
|pain              |309  |312|0.9802    |absent|
|tylenol           |318  |324|0.8187    |absent|
|Alcoholism        |428  |437|0.9802    |absent|
|diabetic          |496  |503|0.9802    |absent|
|kidney injury     |664  |676|0.9802    |absent|
|abnormal rashes   |691  |705|0.9802    |absent|
|ulcers            |710  |715|0.6703    |absent|
|liver disease     |741  |753|0.8869    |absent|
|hemoptysis        |777  |786|0.9802    |absent|
|COVID-19 infection|873  |890|0.9802    |absent|
|viral infection   |902  |916|0.9802    |absent|
+------------------+-----+---+----------+------+



In [20]:
# ConfidenceCalculationDirection right
contextual_assertion = medical.ContextualAssertion()\
            .setInputCols("sentence", "token", "ner_chunk") \
            .setOutputCol("assertion") \
            .setConfidenceCalculationDirection("right")


flattener = medical.Flattener() \
    .setInputCols("assertion") \
    .setExplodeSelectedFields({"assertion":["metadata.ner_chunk as ner_chunk",
                                            "begin as begin",
                                            "end as end",
                                            "metadata.confidence as confidence",
                                            "result as result"]})

pipeline = nlp.Pipeline(
    stages=[
          document_assembler,
          sentence_detector,
          tokenizer,
          word_embeddings,
          clinical_ner,
          ner_converter,
          contextual_assertion,
          flattener
      ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

result = model.transform(data)

result.show(truncate=False)

+------------------+-----+---+----------+------+
|ner_chunk         |begin|end|confidence|result|
+------------------+-----+---+----------+------+
|any difficulty    |59   |72 |0.0       |absent|
|hypertension      |149  |160|0.0       |absent|
|nausea            |178  |183|0.0       |absent|
|zofran            |199  |204|0.9802    |absent|
|pain              |309  |312|0.0       |absent|
|tylenol           |318  |324|0.0       |absent|
|Alcoholism        |428  |437|0.9802    |absent|
|diabetic          |496  |503|0.0       |absent|
|kidney injury     |664  |676|0.0       |absent|
|abnormal rashes   |691  |705|0.0       |absent|
|ulcers            |710  |715|0.0       |absent|
|liver disease     |741  |753|0.0       |absent|
|hemoptysis        |777  |786|0.0       |absent|
|COVID-19 infection|873  |890|0.0       |absent|
|viral infection   |902  |916|0.9802    |absent|
+------------------+-----+---+----------+------+

