![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# **DrugNormalizer**

This notebook will cover the various parameters and usages of the `DrugNormalizer`. This annotator is designed to normalize unprocessed text extracted from clinical documents, like web pages or XML files, converting it into sentences.

**📖 Learning Objectives:**

1. Understand how it normalizes mentions of drugs in clinical text. You can utilize it to remove unwanted characters according to specific rules and perform lowercase formatting.

2. Become comfortable using the different parameters of the annotator.


**🔗 Helpful Links:**

- Documentation : [DrugNormalizer](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#drugnormalizer)

- Python Docs : [DrugNormalizer](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/normalizer/drug_normalizer/index.html)

- Scala Docs : [DrugNormalizer](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/DrugNormalizer.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Healthcare).

## **🎬 Colab Setup**

In [None]:
!pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
#nlp.settings.enforce_versions=False
nlp.install(refresh_install=True)

In [4]:
from johnsnowlabs import nlp, medical
import pyspark.sql.functions as F
import pandas as pd

spark = nlp.start()

👌 Detected license file /content/5.1.1.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.1.1, 💊Spark-Healthcare==5.1.1, running on ⚡ PySpark==3.1.2


## **🖨️ Input/Output Annotation Types**

- Input: `DOCUMENT`

- Output: `DOCUMENT`

## **🔎 Parameters**


- `lowercase`: (boolean) whether to convert strings to lowercase. Default is False.

- `policy`: (str) rule to remove patterns from text.  Valid policy values are:
  + **"all"**,
  + **"abbreviations"**,
  + **"dosages"**


## Data Prepare

In [5]:
# Sample data
data_to_normalize = spark.createDataFrame([
            ("A", "Sodium Chloride/Potassium Chloride 13bag", "Sodium Chloride / Potassium Chloride 13 bag"),
            ("B", "interferon alfa-2b 10 million unit ( 1 ml ) injec", "interferon alfa - 2b 10000000 unt ( 1 ml ) injection"),
            ("C", "aspirin 10 meq/ 5 ml oral sol", "aspirin 2 meq/ml oral solution")
        ]).toDF("cuid", "text", "target_normalized_text")

data_to_normalize.show(truncate=100)

+----+-------------------------------------------------+----------------------------------------------------+
|cuid|                                             text|                              target_normalized_text|
+----+-------------------------------------------------+----------------------------------------------------+
|   A|         Sodium Chloride/Potassium Chloride 13bag|         Sodium Chloride / Potassium Chloride 13 bag|
|   B|interferon alfa-2b 10 million unit ( 1 ml ) injec|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|
|   C|                    aspirin 10 meq/ 5 ml oral sol|                      aspirin 2 meq/ml oral solution|
+----+-------------------------------------------------+----------------------------------------------------+



### `setPolicy()`

Sets policy to remove patterns from text.

Possible values are “all”, “abbreviations”, or “dosages”. <br/>
- “abbreviations” will replace all abbreviations with their full form (for example, replacing “oral sol” to “oral solution”).
- “dosages” will replace all dosages to a standardized form (for example, replacing “10 million units” to “10000000 units”).
- “all” will replace both abbreviations and dosages.

In [10]:
# Annotator that transforms a text column from dataframe into normalized text (with all policy)

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

drug_normalizer = medical.DrugNormalizer() \
    .setInputCols("document") \
    .setOutputCol("document_normalized") \
    .setPolicy("all")

drug_normalizer_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    drug_normalizer
    ])

ds = drug_normalizer_pipeline.fit(data_to_normalize).transform(data_to_normalize)

In [11]:
ds = ds.selectExpr("document", "target_normalized_text", "explode(document_normalized.result) as all_normalized_text")
ds.show(truncate = False)

+-------------------------------------------------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|document                                                                                   |target_normalized_text                              |all_normalized_text                                 |
+-------------------------------------------------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|[{document, 0, 39, Sodium Chloride/Potassium Chloride 13bag, {sentence -> 0}, []}]         |Sodium Chloride / Potassium Chloride 13 bag         |Sodium Chloride / Potassium Chloride 13 bag         |
|[{document, 0, 48, interferon alfa-2b 10 million unit ( 1 ml ) injec, {sentence -> 0}, []}]|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|


In [12]:
# Annotator that transforms a text column from dataframe into normalized text (with abbreviations only policy)

drug_normalizer_abb = medical.DrugNormalizer() \
    .setInputCols("document") \
    .setOutputCol("document_normalized_abbreviations") \
    .setPolicy("abbreviations")

ds = drug_normalizer_abb.transform(ds)

In [13]:
ds = ds.selectExpr("document", "target_normalized_text", "all_normalized_text", "explode(document_normalized_abbreviations.result) as abbr_normalized_text")
ds.select("target_normalized_text", "all_normalized_text", "abbr_normalized_text").show(truncate=1000)

+----------------------------------------------------+----------------------------------------------------+-----------------------------------------------------+
|                              target_normalized_text|                                 all_normalized_text|                                 abbr_normalized_text|
+----------------------------------------------------+----------------------------------------------------+-----------------------------------------------------+
|         Sodium Chloride / Potassium Chloride 13 bag|         Sodium Chloride / Potassium Chloride 13 bag|             Sodium Chloride/Potassium Chloride 13bag|
|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa-2b 10 million unit ( 1 ml ) injection|
|                      aspirin 2 meq/ml oral solution|                      aspirin 2 meq/ml oral solution|                   aspirin 10 meq/ 5 ml oral solution|
+---------------------------

In [14]:
# Transform a text column from dataframe into normalized text (with dosages only policy)

drug_normalizer_abb = medical.DrugNormalizer() \
    .setInputCols("document") \
    .setOutputCol("document_normalized_dosages") \
    .setPolicy("dosages")

ds = drug_normalizer_abb.transform(ds)

In [15]:
ds.selectExpr("target_normalized_text", "all_normalized_text", "explode(document_normalized_dosages.result) as dos_normalized_text").show(truncate=1000)

+----------------------------------------------------+----------------------------------------------------+------------------------------------------------+
|                              target_normalized_text|                                 all_normalized_text|                             dos_normalized_text|
+----------------------------------------------------+----------------------------------------------------+------------------------------------------------+
|         Sodium Chloride / Potassium Chloride 13 bag|         Sodium Chloride / Potassium Chloride 13 bag|     Sodium Chloride / Potassium Chloride 13 bag|
|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa - 2b 10000000 unt ( 1 ml ) injec|
|                      aspirin 2 meq/ml oral solution|                      aspirin 2 meq/ml oral solution|                       aspirin 2 meq/ml oral sol|
+----------------------------------------------------+----

### `setLowercase()`



Sets whether to convert strings to lowercase.

In [16]:
# Annotator that transforms a text column from dataframe into normalized text (with all policy)

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

drug_normalizer = medical.DrugNormalizer() \
    .setInputCols("document") \
    .setOutputCol("document_normalized") \
    .setPolicy("all")\
    .setLowercase(True)

drug_normalizer_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    drug_normalizer
    ])

ds = drug_normalizer_pipeline.fit(data_to_normalize).transform(data_to_normalize)

In [17]:
ds = ds.selectExpr("document", "target_normalized_text", "explode(document_normalized.result) as all_normalized_text")
ds.show(truncate = False)

+-------------------------------------------------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|document                                                                                   |target_normalized_text                              |all_normalized_text                                 |
+-------------------------------------------------------------------------------------------+----------------------------------------------------+----------------------------------------------------+
|[{document, 0, 39, Sodium Chloride/Potassium Chloride 13bag, {sentence -> 0}, []}]         |Sodium Chloride / Potassium Chloride 13 bag         |sodium chloride / potassium chloride 13 bag         |
|[{document, 0, 48, interferon alfa-2b 10 million unit ( 1 ml ) injec, {sentence -> 0}, []}]|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|interferon alfa - 2b 10000000 unt ( 1 ml ) injection|
