![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP/Mapper2Chunk.ipynb)

# **📜 Mapper2Chunk**

This notebook will cover the different parameters and usages of `Mapper2Chunk`.

**📖 Learning Objectives:**

1. Understand how this annotator converts 'LABELED_DEPENDENCY' type annotations
into 'CHUNK' type.

2. Learn how to create a new chunk-type column compatible with annotators that use chunk type as input.

3. Customize your chunk-type annotations by using the different parameters of the annotator.

**🔗 Helpful Links:**

- Reference Documentation: [Mapper2Chunk](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#mapper2chunk)



## **🎬 Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-NLP for Healthcare
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.settings.enforce_versions=False
nlp.install(refresh_install=True)

👌 Detected license file /content/license_keys.json
🚨 Outdated Medical Secrets in license file. Version=5.4.0.PR but should be Version=5.4.0
🚨 Outdated OCR Secrets in license file. Version=5.3.2 but should be Version=5.4.0
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.4.0-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.4.0-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.4.0.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.4.0.jar
🙆 JSL Home setup in /root/.johnsnowlabs
Running "/usr/bin/python3 -m pip install https://pypi.johnsnowlabs.com/[LIB_SECRET]/spark-nlp-jsl/spark_nlp_jsl-5.4.0-py3-none-any.whl --force-reinstall"
Installed 1 products:
💊 Spark-Healthcare==5.4.0 installed! ✅ Heal the planet with NLP! 


In [None]:
# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

👌 Detected license file /content/license_keys.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.4.0, 💊Spark-Healthcare==5.4.0, running on ⚡ PySpark==3.4.0


In [None]:
spark

## **🖨️ Input/Output Annotation Types**

- Input: `LABELED_DEPENDENCY`

- Output: `CHUNK`

## **🔎 Parameters**

**Parameters**:

- `setFilterNoneValues`: Whether to filter 'NONE' values (default : `False`).


### Pipeline

In [None]:
text = "Patient resting in bed. Patient given azithromycin without any difficulty. Patient denies nausea at this time. Zofran declined. Patient is also having intermittent sweating"
data = spark.createDataFrame([[text]]).toDF("text")

In [None]:
documentAssembler = nlp.DocumentAssembler() \
  .setInputCol("text") \
  .setOutputCol("document")

sentenceDetector = nlp.SentenceDetector() \
  .setInputCols(["document"]) \
  .setOutputCol("sentence")

tokenizer = nlp.Tokenizer() \
  .setInputCols(["sentence"]) \
  .setOutputCol("token")

word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") \
  .setInputCols(["sentence", "token"]) \
  .setOutputCol("embeddings")

clinical_ner = medical.NerModel.pretrained("ner_jsl", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")

ner_converter = medical.NerConverterInternal() \
  .setInputCols(["sentence", "token", "ner"]) \
  .setOutputCol("ner_chunk")

chunkMapper = medical.ChunkMapperModel.pretrained("drug_action_treatment_mapper", "en", "clinical/models") \
  .setInputCols(["ner_chunk"]) \
  .setOutputCol("relations") \
  .setRels(["action"])

pipeline = nlp.Pipeline(
    stages=[
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      clinical_ner,
      ner_converter,
      chunkMapper
])

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_jsl download started this may take some time.
[OK!]
drug_action_treatment_mapper download started this may take some time.
[OK!]


In [None]:
result = pipeline.fit(data).transform(data)

In [None]:
result.selectExpr("relations.result", "relations.annotatorType").show(truncate=False)

+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|result                                                 |annotatorType                                                                                       |
+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+
|[bactericidal, antiemetic, anti-abstinence, NONE, NONE]|[labeled_dependency, labeled_dependency, labeled_dependency, labeled_dependency, labeled_dependency]|
+-------------------------------------------------------+----------------------------------------------------------------------------------------------------+



### `setFilterNoneValues`

In [None]:
mapper2chunk = medical.Mapper2Chunk() \
  .setInputCols(["relations"]) \
  .setOutputCol("chunk") \
  .setFilterNoneValues(True)

pipeline = nlp.Pipeline(
    stages=[
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      clinical_ner,
      ner_converter,
      chunkMapper,
      mapper2chunk
])

In [None]:
result = pipeline.fit(data).transform(data)

In [None]:
result.selectExpr("chunk.result", "chunk.annotatorType").show(truncate=False)

+-------------------------------------------+---------------------+
|result                                     |annotatorType        |
+-------------------------------------------+---------------------+
|[bactericidal, antiemetic, anti-abstinence]|[chunk, chunk, chunk]|
+-------------------------------------------+---------------------+

