![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/NER_BERT_TOKEN_CLASSIFIER.ipynb)

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload license_keys.json to the folder that opens. Otherwise, you can look at the example outputs at the bottom of the notebook.

## Colab Setup

In [None]:
%%capture
import os
import json

with open(r'/content/spark_nlp_for_healthcare.json') as f:
    license_keys = json.load(f)
    
locals().update(license_keys)

for k,v in license_keys.items(): 
    %set_env $k=$v


In [None]:
license_keys['PUBLIC_VERSION']

'3.2.3'

In [None]:
%%capture

!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/jsl_colab_setup.sh
!bash jsl_colab_setup.sh

In [None]:
import os 

os.environ["SPARK_NLP_LICENSE"] = SPARK_NLP_LICENSE    
os.environ["SECRET"] = SECRET
os.environ["AWS_ACCESS_KEY_ID"] = AWS_ACCESS_KEY_ID     
os.environ["AWS_SECRET_ACCESS_KEY"] = AWS_SECRET_ACCESS_KEY

In [None]:
%%capture
! pip install --upgrade spark-nlp==$PUBLIC_VERSION findspark
! pip install --upgrade spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

In [None]:
!pip install --ignore-installed spark-nlp-display

Collecting spark-nlp-display
  Downloading spark_nlp_display-1.8-py3-none-any.whl (95 kB)
[K     |████████████████████████████████| 95 kB 1.9 MB/s 
[?25hCollecting svgwrite==1.4
  Downloading svgwrite-1.4-py3-none-any.whl (66 kB)
[K     |████████████████████████████████| 66 kB 4.3 MB/s 
[?25hCollecting ipython
  Downloading ipython-7.28.0-py3-none-any.whl (788 kB)
[K     |████████████████████████████████| 788 kB 19.0 MB/s 
[?25hCollecting numpy
  Downloading numpy-1.21.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
[K     |████████████████████████████████| 15.7 MB 195 kB/s 
[?25hCollecting pandas
  Downloading pandas-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
[K     |████████████████████████████████| 11.3 MB 29.4 MB/s 
[?25hCollecting spark-nlp
  Using cached spark_nlp-3.2.3-py2.py3-none-any.whl (118 kB)
Collecting traitlets>=4.2
  Downloading traitlets-5.1.0-py3-none-any.whl (101 kB)
[K     |█████████████████████████████

In [None]:
# Install java
! sudo apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
! java -version


openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)


In [None]:
os.environ['JAVA_HOME'] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ['PATH'] = os.environ['JAVA_HOME'] + "/bin:" + os.environ['PATH']

In [None]:
import sparknlp

print (f"sparknlp version: {sparknlp.version()}")
import pandas as pd
import json
import os
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession

from sparknlp.annotator import *

from sparknlp.base import *
import pyspark.sql.functions as F

# import internal package
import sparknlp_jsl
from sparknlp_jsl.annotator import *

print (f"sparknlp-jsl version: {sparknlp.version()}")

spark = sparknlp_jsl.start(license_keys['SECRET'])

sparknlp version: 3.2.3
sparknlp-jsl version: 3.2.3


## Select the models

Select the NER model - Models: **'bert_token_classifier_ner_jsl', 'bert_token_classifier_ner_drugs', 'bert_token_classifier_ner_deid'**

For more details: https://github.com/JohnSnowLabs/spark-nlp-models#pretrained-models---spark-nlp-for-healthcare

In [None]:
## Generating Example Files ##
text_list_jsl = [
             """The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this is in a similar location with her previous laminectomy, and this continues to receive daily care. The patient was transferred secondary to inability to participate in full physical and occupational therapy and continue medical management of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Fragmin 5000 units subcutaneously daily, Xenaderm to wounds topically b.i.d., Lantus 40 units subcutaneously at bedtime, OxyContin 30 mg p.o. q.12 h., folic acid 1 mg daily, levothyroxine 0.1 mg p.o. daily, Prevacid 30 mg daily, Avandia 4 mg daily, Norvasc 10 mg daily, Lexapro 20 mg daily, aspirin 81 mg daily, Senna 2 tablets p.o. q.a.m., Neurontin 400 mg p.o. t.i.d., Percocet 5/325 mg 2 tablets q.4 h. p.r.n., magnesium citrate 1 bottle p.o. p.r.n., sliding scale coverage insulin, Wellbutrin 100 mg p.o. daily, and Bactrim DS b.i.d.""",
             """The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. He has been advised Aspirin 81 milligrams QDay. Humulin N. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually PRN chest pain.""",
             """Mr. ABC is a 60-year-old gentleman who had a markedly abnormal stress test earlier today in my office with severe chest pain after 5 minutes of exercise on the standard Bruce with horizontal ST depressions and moderate apical ischemia on stress imaging only. He required 3 sublingual nitroglycerin in total (please see also admission history and physical for full details). 

The patient underwent cardiac catheterization with myself today which showed mild-to-moderate left main distal disease of 30%, moderate proximal LAD with a severe mid-LAD lesion of 99%, and a mid-left circumflex lesion of 80% with normal LV function and some mild luminal irregularities in the right coronary artery with some moderate stenosis seen in the mid to distal right PDA.

I discussed these results with the patient, and he had been relating to me that he was having rest anginal symptoms, as well as nocturnal anginal symptoms, and especially given the severity of the mid left anterior descending lesion, with a markedly abnormal stress test, I felt he was best suited for transfer for PCI. I discussed the case with Dr. X at Medical Center who has kindly accepted the patient in transfer.

CONDITION ON TRANSFER: Stable but guarded. The patient is pain-free at this time.

MEDICATIONS ON TRANSFER:
1. Aspirin 325 mg once a day.
2. Metoprolol 50 mg once a day, but we have had to hold it because of relative bradycardia which he apparently has a history of.
3. Nexium 40 mg once a day.
4. Zocor 40 mg once a day, and there is a fasting lipid profile pending at the time of this dictation. I see that his LDL was 136 on May 3, 2002.
5. Plavix 600 mg p.o. x1 which I am giving him tonight.""",
            """HISTORY OF PRESENT ILLNESS: The patient is a 68-year-old Korean gentleman with a history of coronary artery disease, hypertension, diabetes and stage III CKD with a creatinine of 1.8 in May 2006 corresponding with the GFR of 40-41 mL/min. The patient had blood work done at Dr. XYZ's office on June 01, 2006, which revealed an elevation in his creatinine up to 2.3. He was asked to come in to see a nephrologist for further evaluation. I am therefore asked by Dr. XYZ to see this patient in consultation for evaluation of acute on chronic kidney failure. The patient states that he was actually taking up to 12 to 13 pills of Chinese herbs and dietary supplements for the past year. He only stopped about two or three weeks ago. He also states that TriCor was added about one or two months ago but he is not sure of the date. He has not had an ultrasound but has been diagnosed with prostatic hypertrophy by his primary care doctor and placed on Flomax. He states that his urinary dribbling and weak stream had not improved since doing this. For the past couple of weeks, he has had dizziness in the morning. This is then associated with low glucose. However the patient's blood glucose this morning was 123 and he still was dizzy. This was worse on standing. He states that he has been checking his blood pressure regularly at home because he has felt so bad and that he has gotten under 100/60 on several occasions. His pulses remained in the 60s.

ALLERGIES: None.

MEDICATIONS: Imdur 20 mg two to three times daily, nitroglycerin p.r.n., insulin 70/30 40/45 units daily, Zetia 10 mg daily, ? Triglide 50 mg daily, Prevacid 30 mg daily, Plavix 75 mg daily, potassium 10 mEq daily, Lasix 60 mg daily, folate 1 mg b.i.d., Niaspan 500 mg daily, atenolol 50 mg daily, enalapril 10 mg b.i.d., glyburide 10 mg b.i.d., Xanax 0.25 mg b.i.d., aspirin 325 mg daily, Tylenol p.r.n., Zantac 150 mg b.i.d., Crestor 5 mg daily, TriCor 145 mg daily, Digitek 0.125 mg daily, Celexa 20 mg daily, and Flomax 0.4 mg daily.""",
            """HPI: A 69-year-old white female with a history of metastatic breast cancer, depression, anxiety, recent UTI, and obstructive uropathy, admitted to the ABCD Hospital on February 6, 2007, for lightheadedness, weakness, and shortness of breath. The patient was consulted by Psychiatry for anxiety. I know this patient from a previous consult. During this recent admission, the patient has experienced anxiety and had a panic attack yesterday with "syncopal episodes." She was given Ativan 0.25 mg on a p.r.n. basis with relief after one to two hours. The patient was seen by Abc, MD, and Def, Ph.D. The laboratories were reviewed and were positive for UTI, and anemia is also present. The TSH level was within normal limits. She previously responded well to trazodone for depression, poor appetite, and decreased sleep and anxiety. A low dose of Klonopin was also helpful for sedation.

PAST MEDICAL HISTORY: Metastatic breast cancer to bone. The patient also has a history of hypertension, hypothyroidism, recurrent UTI secondary to obstruction of left ureteropelvic junction, cholelithiasis, chronic renal insufficiency, Port-A-Cath placement, and hydronephrosis.

PAST PSYCHIATRIC HISTORY: The patient has a history of depression and anxiety. She was taking Remeron 15 mg q.h.s., Ambien 5 mg q.h.s. on a p.r.n. basis, Ativan 0.25 mg every 6 hours on a p.r.n. basis, and Klonopin 0.25 mg at night while she was at home.""",

"""The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""
]

In [None]:
## Generating Example Files ##
text_list_deid = [
    """HISTORY OF PRESENT ILLNESS: Mr. Smith is a 60-year-old white male veteran with multiple comorbidities, who has a history of bladder cancer diagnosed approximately two years ago by the VA Hospital. He underwent a resection there. He was to be admitted to the Day Hospital for cystectomy. He was seen in Urology Clinic and Radiology Clinic on 02/04/2003.

HOSPITAL COURSE: Mr. Smith presented to the Day Hospital in anticipation for Urology surgery. On evaluation, EKG, echocardiogram was abnormal, a Cardiology consult was obtained. A cardiac adenosine stress MRI was then proceeded, same was positive for inducible ischemia, mild-to-moderate inferolateral subendocardial infarction with peri-infarct ischemia. In addition, inducible ischemia seen in the inferior lateral septum. Mr. Smith underwent a left heart catheterization, which revealed two vessel coronary artery disease. The RCA, proximal was 95% stenosed and the distal 80% stenosed. The mid LAD was 85% stenosed and the distal LAD was 85% stenosed. There was four Multi-Link Vision bare metal stents placed to decrease all four lesions to 0%. Following intervention, Mr. Smith was admitted to 7 Ardmore Tower under Cardiology Service under the direction of Dr. Hart. Mr. Smith had a noncomplicated post-intervention hospital course. He was stable for discharge home on 02/07/2003 with instructions to take Plavix daily for one month and Urology is aware of the same.""",

    """HISTORY OF PRESENT ILLNESS: The patient is a 39-year-old woman with polymyositis/dermatomyositis on methotrexate once a week. The patient has also been on high-dose prednisone for an urticarial rash. The patient was admitted because of persistent high fevers without a clear-cut source of infection. She had been having temperatures of up to 103 for 8-10 days. She had been seen at Alta View Emergency Department a week prior to admission. A workup there including chest x-ray, blood cultures, and a transthoracic echocardiogram had all remained nondiagnostic, and were normal. Her chest x-ray on that occasion was normal. After the patient was seen in the office on August 10, she persisted with high fevers and was admitted on August 11 to Cottonwood Hospital. Studies done at Cottonwood: CT scan of the chest, abdomen, and pelvis. Results: CT chest showed mild bibasilar pleural-based interstitial changes. These were localized to mid and lower lung zones. The process was not diffuse. There was no ground glass change. CT abdomen and pelvis was normal. Infectious disease consultation was obtained. Dr. XYZ saw the patient. He ordered serologies for CMV including a CMV blood PCR. Next serologies for EBV, Legionella, Chlamydia, Mycoplasma, Coccidioides, and cryptococcal antigen, and a PPD. The CMV serology came back positive for IgM. The IgG was negative. The CMV blood PCR was positive, as well. Other serologies and her PPD stayed negative. Blood cultures stayed negative.

In view of the positive CMV, PCR, and the changes in her CAT scan, the patient was taken for a bronchoscopy. BAL and transbronchial biopsies were performed. The transbronchial biopsies did not show any evidence of pneumocystis, fungal infection, AFB. There was some nonspecific interstitial fibrosis, which was minimal. I spoke with the pathologist, Dr. XYZ and immunopathology was done to look for CMV. The patient had 3 nucleoli on the biopsy specimens that stained positive and were consistent with CMV infection. The patient was started on ganciclovir once her CMV serologies had come back positive. No other antibiotic therapy was prescribed. Next, the patient's methotrexate was held.

A chest x-ray prior to discharge showed some bibasilar disease, showing interstitial infiltrates. The patient was given ibuprofen and acetaminophen during her hospitalization, and her fever resolved with these measures.

On the BAL fluid cell count, the patient only had 5 WBCs and 5 RBCs on the differential. It showed 43% neutrophils, 45% lymphocytes.

Discussions were held with Dr. Grant, Dr. Roberts, her rheumatologist, and with pathology.

DISCHARGE DIAGNOSES:
1. Disseminated CMV infection with possible CMV pneumonitis.
2. Polymyositis on immunosuppressive therapy (methotrexate and prednisone).

DISCHARGE MEDICATIONS:
1. The patient is going to go on ganciclovir 275 mg IV q.12 h. for approximately 3 weeks.
2. Advair 100/50, 1 puff b.i.d.
3. Ibuprofen p.r.n. and Tylenol p.r.n. for fever, and will continue her folic acid.
4. The patient will not restart for methotrexate for now.

She is supposed to follow up with me on August 22, 2007 at 1:45 p.m. She is also supposed to see Dr. Grant in 2 weeks, and Dr. Gomez in 2-3 weeks. She also has an appointment to see an ophthalmologist in about 10 days' time. This was a prolonged discharge, more than 30 minutes were spent on discharging this patient.""",

    """This is a 48-year-old black male with stage IV chronic kidney disease, likely secondary to HIV nephropathy who presents to clinic for followup having missed prior clinic appointments. He was last seen in this clinic on 05/29/2007 by Dr. Monroe. This is the first time that I have met the patient. The patient's history of renal insufficiency dates back to 06/2006 when he was hospitalized for an HIV-associated complication. He is unclear of the exact reason for his hospitalization at that time, but he was diagnosed with renal insufficiency and was followed in our Renal Clinic for approximately one year. He had a baseline creatinine during that time of between 3.2 to 3.3. When he was initially diagnosed with renal insufficiency, he had been noncompliant with his HAART regimen. Since that time, he has been very compliant with treatment for his HIV and is seeing Dr. Jones in our Infectious Disease Clinic. He is currently on three-drug antiretroviral therapy. His last CD4 count in 03/2008 was 350. He has had no HIV complications since he was last seen in our clinic. The patient is also followed by Dr. Rogers at the outpatient VA Clinic, here in Springfield, although he has not seen her in approximately one year. The patient has an AV fistula that was placed in late 2006. The latest blood work that I have is from 06/11/2008 and shows a serum creatinine of 3.8, which represents a GFR of 22 and a potassium of 5.9. These laboratories were drawn by his infectious disease doctor and the results prompted their recommendation for him to return to our clinic for further evaluation. The only complaint that the patient has at this time is some difficulty sleeping. He was given Ambien by his primary care doctor, but this has not helped significantly with his difficulty sleeping. He says that he has trouble getting to sleep. The Ambien will allow him to sleep for about two hours, and then he is awake again. He is tired during the day, but is not taking any daytime naps. He has no history of excessive snoring or apneic periods. He has no history of falling asleep at work or while driving. He has never had a formal sleep study. He does continue to work in sales at a local butcher shop.""",

    """The baby is an ex-32 weeks small for gestational age infant with birth weight 5 lbs 9 oz. Baby was born at Ridgeland Hospital at 1:33pm on 07/14/2006. Mother is a 20-year-old gravida 1, para 0 female who received prenatal care. Prenatal course was complicated by low amniotic fluid index and hypertension. She was evaluated for evolving preeclampsia and had a C-section secondary to the nonreassuring fetal status. Baby delivered operatively, Apgar scores were 8 and 9 initially taken to level 2 satellite nursery and arrangements were to transfer to Children's Hospital. The infant was transferred to Children's Hospital for higher level of care, stayed at Children's Hospital for approximately 2 weeks, and was transferred back to ABCD where he stayed until he was discharged on 08/16/2006.""",

    """The patient is a 61-year-old white female status post right total knee replacement secondary to degenerative joint disease performed by Dr. Anderson at Lakeview Hospital on 08/21/2007. The patient was transfused with 2 units of autologous blood postoperatively. She received DVT prophylaxis with a combination of Coumadin, Lovenox, SCD boots, and TED stockings. The remainder of her postoperative course was uneventful. She was discharged on 08/24/2007 from Lakeview Hospital and admitted to the transitional care unit at Norman Services for evaluation and rehabilitation. The patient reports that her last bowel movement was on 08/24/2007 just prior to her discharge from Lakeview Hospital. She denies any urological symptoms such as dysuria, incomplete bladder emptying or other voiding difficulties. She reports having some right knee pain, which is most intense at a "certain position." The patient is unable to elaborate on which "certain position" causes her the most discomfort.""",
]


In [None]:
text_list_drug = [
    "The human KCNJ9 (Kir 3.3, GIRK3) is a member of the G-protein-activated inwardly rectifying potassium (GIRK) channel family. Here we describe the genomicorganization of the KCNJ9 locus on chromosome 1q21-23 as a candidate gene forType II diabetes mellitus in the Pima Indian population. The gene spansapproximately 7.6 kb and contains one noncoding and two coding exons separated byapproximately 2.2 and approximately 2.6 kb introns, respectively. We identified14 single nucleotide polymorphisms (SNPs), including one that predicts aVal366Ala substitution, and an 8 base-pair (bp) insertion/deletion. Ourexpression studies revealed the presence of the transcript in various humantissues including pancreas, and two major insulin-responsive tissues: fat andskeletal muscle. The characterization of the KCNJ9 gene should facilitate furtherstudies on the function of the KCNJ9 protein and allow evaluation of thepotential role of the locus in Type II diabetes.BACKGROUND: At present, it is one of the most important issues for the treatment of breast cancer to develop the standard therapy for patients previously treated with anthracyclines and taxanes. With the objective of determining the usefulnessof vinorelbine monotherapy in patients with advanced or recurrent breast cancerafter standard therapy, we evaluated the efficacy and safety of vinorelbine inpatients previously treated with anthracyclines and taxanes.",
    "The doctor prescribed aspirin 2 meq/ ml oral solution for the patient’s fever and headache.",
    "The usual dose of paracetamol is 1 or 2 500 mg tablet at a time, with a maximum recommended dose of 4 g paracetamol per day.",
    "He was put on Adalimumab 97.70 mg because the treatment with mesalazine was not effective and his condition was worse since last admission.",
    "The dose of oral rehydration salts for this pediatric patient was 0.5 oral solution given the number and texture of the stools."
]

In [None]:
model_list = ['bert_token_classifier_ner_jsl','bert_token_classifier_ner_drugs','bert_token_classifier_ner_deid']


In [None]:
# Creating input folders

for MODEL_NAME in model_list:
  INPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/'+MODEL_NAME+'/'
  OUTPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/'+MODEL_NAME+'/'
      
      # Create folders
  !rm -r $INPUT_FILE_PATH
  !mkdir -p $INPUT_FILE_PATH

  if MODEL_NAME == 'bert_token_classifier_ner_drugs': 
    for i, v in enumerate(text_list_drug):
        open(os.path.join(INPUT_FILE_PATH,'Example'+str(i+1)+'.txt'), 'w', encoding="utf8").write(v[:min(len(v)-10, 100)]+'... \n'+v)
  elif MODEL_NAME == 'bert_token_classifier_ner_jsl':
    for i, v in enumerate(text_list_jsl):
        open(os.path.join(INPUT_FILE_PATH,'Example'+str(i+1)+'.txt'), 'w', encoding="utf8").write(v[:min(len(v)-10, 100)]+'... \n'+v)
  elif MODEL_NAME == 'bert_token_classifier_ner_deid':
    for i, v in enumerate(text_list_deid):
        open(os.path.join(INPUT_FILE_PATH,'Example'+str(i+1)+'.txt'), 'w', encoding="utf8").write(v[:min(len(v)-10, 100)]+'... \n'+v)



      ## Loading back Example File



rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/bert_token_classifier_ner_jsl/': No such file or directory
rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/bert_token_classifier_ner_drugs/': No such file or directory
rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/bert_token_classifier_ner_deid/': No such file or directory


In [None]:
# Creating output folders

for MODEL_NAME in model_list:
  INPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/'+MODEL_NAME+'/'
  OUTPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/'+MODEL_NAME+'/'
      
      # Create folders
  !rm -r $OUTPUT_FILE_PATH
  !mkdir -p $OUTPUT_FILE_PATH

rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/bert_token_classifier_ner_jsl/': No such file or directory
rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/bert_token_classifier_ner_drugs/': No such file or directory
rm: cannot remove '/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/bert_token_classifier_ner_deid/': No such file or directory


### Creating Json Files and visualizing results of all the models.

In [None]:
from sparknlp_display import NerVisualizer
for MODEL_NAME in model_list:
    INPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/inputs/'+MODEL_NAME+'/'
    OUTPUT_FILE_PATH='/content/NER_BERT_TOKEN_CLASSIFIER_/outputs/'+MODEL_NAME+'/'
    text_list = []
    file_list=sorted(os.listdir(INPUT_FILE_PATH))
    file_paths =sorted([ os.path.join(INPUT_FILE_PATH, pth) for pth in file_list]) 
    for fpath in file_paths:
        txt = ''.join(open(fpath, 'r', encoding="utf8").readlines()[1:])
        text_list.append(txt)


    documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

    sentenceDetector = SentenceDetectorDLModel.pretrained() \
          .setInputCols(["document"]) \
          .setOutputCol("sentence") 

    tokenizer = Tokenizer()\
      .setInputCols("sentence")\
      .setOutputCol("token")
      
    tokenClassifier = BertForTokenClassification.pretrained( MODEL_NAME, "en", 'clinical/models')\
      .setInputCols("sentence","token")\
      .setOutputCol("ner")\
      .setCaseSensitive(True)

    ner_converter = NerConverter()\
      .setInputCols(["sentence","token","ner"])\
      .setOutputCol("ner_chunk")

    pipeline =  Pipeline(stages=[documentAssembler,sentenceDetector, tokenizer, tokenClassifier, ner_converter])
    pipelineModel = pipeline.fit(spark.createDataFrame([['']]).toDF("text"))
    
    if MODEL_NAME == "bert_token_classifier_ner_jsl":
      df = spark.createDataFrame(pd.DataFrame({"text": text_list_jsl}))
    elif MODEL_NAME == "bert_token_classifier_ner_drugs":
      df = spark.createDataFrame(pd.DataFrame({"text": text_list_drug}))
    elif MODEL_NAME == "bert_token_classifier_ner_deid":
      df = spark.createDataFrame(pd.DataFrame({"text": text_list_deid}))

    result = pipelineModel.transform(df)
    result.select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
          .select(F.expr("cols['0']").alias("chunk"),
                  F.expr("cols['1']['entity']").alias("ner_label"))\
          .show(truncate=False)

    NerVisualizer().display(
        result = result.collect()[0],
        label_col = 'ner_chunk',
        document_col = 'document'
    )
    result = result.toPandas()
 
    for i in result.index:
        result[['ner_chunk']].iloc[i].to_json(
            os.path.join(OUTPUT_FILE_PATH, file_list[i].split('.')[0]+".json"))

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
bert_token_classifier_ner_jsl download started this may take some time.
Approximate size to download 385.8 MB
[OK!]
+---------------------------+----------------------------+
|chunk                      |ner_label                   |
+---------------------------+----------------------------+
|30-year-old                |Age                         |
|female                     |Gender                      |
|insulin dependent          |Diabetes                    |
|diabetes                   |Diabetes                    |
|type 2                     |Diabetes                    |
|coronary artery disease    |Heart_Disease               |
|chronic renal insufficiency|Kidney_Disease              |
|peripheral vascular disease|Disease_Syndrome_Disorder   |
|diabetes                   |Diabetes                    |
|admitted                   |Admission_Discharge         |
|hospital 

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
bert_token_classifier_ner_drugs download started this may take some time.
Approximate size to download 385.3 MB
[OK!]
+-----------------+---------+
|chunk            |ner_label|
+-----------------+---------+
|potassium        |DrugChem |
|nucleotide       |DrugChem |
|anthracyclines   |DrugChem |
|taxanes          |DrugChem |
|vinorelbine      |DrugChem |
|vinorelbine      |DrugChem |
|anthracyclines   |DrugChem |
|taxanes          |DrugChem |
|aspirin          |DrugChem |
|paracetamol      |DrugChem |
|paracetamol      |DrugChem |
|mesalazine       |DrugChem |
|rehydration salts|DrugChem |
+-----------------+---------+



sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
bert_token_classifier_ner_deid download started this may take some time.
Approximate size to download 385.4 MB
[OK!]
+-------------------+---------+
|chunk              |ner_label|
+-------------------+---------+
|Smith              |PATIENT  |
|60-year-old        |AGE      |
|VA Hospital        |HOSPITAL |
|Day Hospital       |HOSPITAL |
|02/04/2003         |DATE     |
|Smith              |PATIENT  |
|Day Hospital       |HOSPITAL |
|Smith              |PATIENT  |
|Smith              |PATIENT  |
|7 Ardmore Tower    |STREET   |
|Hart               |DOCTOR   |
|Smith              |PATIENT  |
|02/07/2003         |DATE     |
|Alta View          |HOSPITAL |
|August 10          |DATE     |
|August 11          |DATE     |
|Cottonwood Hospital|HOSPITAL |
|Cottonwood         |HOSPITAL |
|XYZ                |DOCTOR   |
|XYZ                |DOCTOR   |
+-------------------+---------+
only sho

## Running the Streamlit App

In [None]:
%cd /content/
get_ipython().system_raw('./ngrok http 8501 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"


/content
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)


In [None]:
%cd /content/streamlit-demo-apps/healthcare/NER_POSOLOGY
! streamlit run streamlit_app.py

In [None]:
!ps aux | grep "streamli"

In [None]:
!kill -9 2389