![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/NORMALIZED_SECTION_HEADER_MAPPER.ipynb)

# **`normalized_section_header_mapper` model**

# **Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

# **🔎 For about models**

### 📌 **normalized_section_header_mapper**

This pretrained pipeline normalizes the section headers in clinical notes. It returns two levels of normalization called level_1 and level_2.

# **🔎Define Spark NLP pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
      .setInputCol('text')\
      .setOutputCol('document')

sentence_detector = nlp.SentenceDetector()\
      .setInputCols(["document"])\
      .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
      .setInputCols("sentence")\
      .setOutputCol("token")

embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en","clinical/models")\
      .setInputCols(["sentence", "token"])\
      .setOutputCol("word_embeddings")

clinical_ner = medical.NerModel.pretrained("ner_jsl_slim", "en", "clinical/models")\
      .setInputCols(["sentence","token", "word_embeddings"])\
      .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
      .setInputCols(["sentence", "token", "ner"])\
      .setOutputCol("ner_chunk")\
      .setWhiteList(["Header"])

chunkerMapper = medical.ChunkMapperModel.pretrained("normalized_section_header_mapper", "en", "clinical/models") \
      .setInputCols("ner_chunk")\
      .setOutputCol("mappings")\
      .setRel("level_1") #or level_2

pipeline = Pipeline().setStages([
    document_assembler,
    sentence_detector,
    tokenizer, 
    embeddings,
    clinical_ner, 
    ner_converter, 
    chunkerMapper])



embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_jsl_slim download started this may take some time.
[OK!]
normalized_section_header_mapper download started this may take some time.
[OK!]


# **🔎Sample Text**

In [None]:
sample_text = """ADMISSION DIAGNOSIS Upper respiratory illness with apnea, possible pertussis.
DISCHARGE DIAGNOSIS Upper respiratory illness with apnea, possible pertussis.
PATIENT HISTORY This is a one plus-month-old female with respiratory symptoms for approximately a week prior to admission.  This involved cough, post-tussive emesis, questionable fever, but only 99.7.  Their usual doctor prescribed amoxicillin over the phone.  The coughing persisted and worsened.  She went to the ER, where sats were normal at baseline, but dropped into the 80s with coughing spells.  They did witness some apnea.  They gave some Rocephin, did some labs, and the patient was transferred to hospital.
GENERAL HISTORY AND PHYSICAL On admission. there was some nasal discharge. Remainder of the HEENT was normal.Had few rhonchi,  No retractions,  No significant coughing or apnea during the admission physical. abdomen  benign.  
RADIOGRAPHIC STUDIES She had a CBC done Garberville, which showed a white count of 12.4, with a differential of 10 segs, 82 lymphs, 8 monos, hemoglobin of 15, hematocrit 42, platelets 296,000, and a normal BMP.  An x-ray was done and I do not have an official interpretation, but to the admitting physician, Dr. X it showed no significant infiltrate.  Well at hospital, she had a rapid influenza swab done, which was negative.  She had a rapid RSV done, which is still not in the chart, but I believe I was told that it was negative.  She also had a pertussis PCR swab done and a pertussis culture done, neither of which has result in the chart.  I do know that the pertussis culture proved to be negative.
HOSPITAL COURSE The baby was afebrile.  Required no oxygen in the hospital.  Actually fed reasonably well.  Did have one episode of coughing with slight emesis.  Appeared basically quite well between episodes.  Had no apnea witnessed and after overnight observation, the parents were anxious to go home.  The patient was started on Zithromax in the hospital.
DISCHARGE CONDITION The patient was in stable condition and good condition on exam at the time and was discharged home on Zithromax to be followed up in the office within a week.
DISCHARGE INSTRUCTIONS Include usual diet and to follow up within a week, but certainly sooner if the coughing is worse and there is cyanosis or apnea again."""


df = spark.createDataFrame([[sample_text]]).toDF('text')
df.show(truncate = False)

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# **🔎Run the pipeline**

In [None]:
result = pipeline.fit(df).transform(df)

In [None]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                     result.mappings.result)).alias("col"))\
      .select(F.expr("col['0']").alias("ner_chunk"),
              F.expr("col['1']").alias("normalized_headers")).show(truncate=False)

+----------------------+-----------------------------+
|ner_chunk             |normalized_headers           |
+----------------------+-----------------------------+
|ADMISSION DIAGNOSIS   |DIAGNOSIS                    |
|DISCHARGE DIAGNOSIS   |ADMISSION DIAGNOSIS          |
|PATIENT HISTORY       |DIAGNOSIS                    |
|GENERAL HISTORY AND   |DISCHARGE DIAGNOSIS          |
|RADIOGRAPHIC STUDIES  |HISTORY                      |
|HOSPITAL COURSE       |EXPOSURE HISTORY             |
|DISCHARGE CONDITION   |NONE                         |
|DISCHARGE INSTRUCTIONS|LABORATORY AND RADIOLOGY DATA|
|null                  |MAGNETIC RESONANCE IMAGING   |
|null                  |COURSE TYPE                  |
|null                  |HOSPITAL COURSE              |
|null                  |DISCHARGE RELATED            |
|null                  |DISCHARGE CONDITION          |
|null                  |DISCHARGE RELATED            |
|null                  |DISCHARGE INSTRUCTIONS       |
+---------