
# The use of [pre-trained] Language Models in NLP of clinical records

In this notebook I want to do a brief showcase of the use of pre-trained Language Models--specifically BERT-- on general purpose and biomedical domains to perform the downstream task of Named Entity Recognition for information extraction.

In [24]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)


In [None]:
from google.colab import files
uploaded = files.upload()

Saving ClinNotes.csv to ClinNotes (1).csv


In [26]:
import io
import random
import pandas as pd
df = pd.read_csv(io.BytesIO(uploaded['ClinNotes.csv']))

def sample_record():
  '''
  Function to randomly sample one record from file
  '''

  index = random.sample(range(len(df)), 1)
  print('Sample record from: '+ df.loc[index,'category'].values[0])
  return df.loc[index,'notes'].values[0]

**BioBERT** model fine-tuned in Named Entity Recognition task with BC5CDR-diseases and NCBI-diseases corpus along with selected pubtator annotations from LitCOVID dataset



In [None]:
%%capture
!pip install transformers

In [None]:
from transformers import AutoTokenizer, pipeline, AutoModelForTokenClassification
  
BC5 = "datummd/NCBI_BC5CDR_disease"
ncbi = 'fidukm34/biobert_v1.1_pubmed-finetuned-ner-finetuned-ner'

tokenizer = AutoTokenizer.from_pretrained(ncbi)
model = AutoModelForTokenClassification.from_pretrained(ncbi)

nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer)

In [58]:
%%capture
!pip uninstall spacy -y
!pip install scispacy

In [None]:
%%capture 
!python -m spacy download en_core_web_lg

In [64]:
import spacy
from spacy import displacy

nlp = spacy.load('en_core_web_lg')


# text = '''HISTORY OF PRESENT ILLNESS: , I was kindly asked to see Ms. ABC by Dr. X for cardiology consultation regarding preoperative evaluation for right hip surgery.  She is a patient with a history of coronary artery disease status post bypass surgery in 1971 who tripped over her oxygen last p.m. she states and fell.  She suffered a right hip fracture and is being considered for right hip replacement.  The patient denies any recent angina, but has noted more prominent shortness of breath.,Past cardiac history is significant for coronary artery disease status post bypass surgery, she states in 1971, I believe it was single vessel.  She has had stress test done in our office on September 10, 2008, which shows evidence of a small apical infarct, no area of ischemia, and compared to study of December of 2005, there is no significant change.  She had a transthoracic echocardiogram done in our office on August 29, 2008, which showed normal left ventricular size and systolic function, dilated right ventricle with septal flattening of the left ventricle consistent with right ventricular pressure overload, left atrial enlargement, severe tricuspid regurgitation with estimated PA systolic pressure between 75-80 mmHg consistent with severe pulmonary hypertension, structurally normal aortic and mitral valve.  She also has had some presumed atrial arrhythmias that have not been sustained.  She follows with Dr. Y my partner at Cardiology Associates.,PAST MEDICAL HISTORY:  ,Other medical history includes severe COPD and she is oxygen dependent, severe pulmonary hypertension, diabetes, abdominal aortic aneurysm, hypertension, dyslipidemia.  Last ultrasound of her abdominal aorta done June 12, 2009 states that it was fusiform, infrarenal shaped aneurysm of the distal abdominal aorta measuring 3.4 cm unchanged from prior study on June 11, 2008.,MEDICATIONS:,  As an outpatient:,1.  Lanoxin 0.125 mg, 1/2 tablet once a day.,2.  Tramadol 50 mg p.o. q.i.d. as needed.,3.  Verapamil 240 mg once a day.,4.  Bumex 2 mg once a day.,5.  ProAir HFA.,6.  Atrovent nebs b.i.d.,7.  Pulmicort nebs b.i.d.,8.  Nasacort 55 mcg, 2 sprays daily.,9.  Quinine sulfate 325 mg p.o. q.h.s. p.r.n.,10.  Meclizine 12.5 mg p.o. t.i.d. p.r.n.,11.  Aldactone 25 mg p.o. daily.,12.  Theo-24 200 mg p.o., 2 in the morning.,13.  Zocor 40 mg once a day.,14.  Vitamin D 400 units twice daily.,15.  Levoxyl 125 mcg once a day.,16.  Trazodone 50 mg p.o. q.h.s. p.r.n.,17.  Janumet 50/500, 1 tablet p.o. b.i.d.,ALLERGIES: , To medications are listed as:,1.  LEVAQUIN.,2.  AZITHROMYCIN.,3.  ADHESIVE TAPE.,4.  BETA BLOCKERS.  When I talked to the patient about the BETA BLOCKER, she states that they made her more short of breath in the past.,She denies shrimp, seafood or dye allergy.,FAMILY HISTORY:  ,Significant for heart problems she states in her mother and father.,SOCIAL HISTORY:  ,She used to smoke cigarettes and smoked from the age of 14 to 43 and quit at the time of her bypass surgery.  She does not drink alcohol nor use illicit drugs.  She lives alone and is widowed.  She is a retired custodian at University.  Of note, she is accompanied with her verbal consent by her daughter and grandson at the bedside.,REVIEW OF SYSTEMS:  ,Unable to obtain as the patient is somnolent from her pain medication, but she is alert and able to answer my direct questions.,PHYSICAL EXAM: , Height 5'2", weight 160 pounds, temperature is 99.5 degrees ranging up to 101.6, blood pressure 137/67 to 142/75, pulse 92, respiratory rate 16, O2 saturation 93-89%.  On general exam, she is an elderly, chronically ill appearing woman in no acute distress.  She is able to lie flat, she does have pain if she moves.  HEENT shows the cranium is normocephalic, atraumatic.  She has dry mucosal membranes.  Neck veins are not distended.  There are no carotid bruits.  Visible skin is warm and she appears pale.  Affect appropriate and she is somnolent from her pain medications, but arouses easily and answers my direct questions appropriately.  Lungs are clear to auscultation anteriorly, no wheezes.  Cardiac exam S1, S2 regular rate, soft holosystolic murmur heard over the tricuspid region.  No rub nor gallop.  PMI is nondisplaced, unable to appreciate RV heave.  Abdomen soft, mildly distended, appears benign.  Extremities with trivial peripheral edema.  Pulses grossly intact.  She has quite a bit of pain at the right hip fracture.,DIAGNOSTIC/LABORATORY DATA:  ,Sodium 135, potassium 4.7, chloride 99, bicarbonate 33, BUN 22, creatinine 1.3, glucose 149, troponin was 0.01 followed by 0.04.  Theophylline level 16.6 on January 23, 2009.  TSH 0.86 on March 10, 2009.  INR 1.06.  White blood cell count 9.5, hematocrit 35, platelet count 160.,EKG done July 16, 2009 at 7:31:15, shows sinus rhythm, which showed PR interval of about 118 milliseconds, nonspecific T wave changes.  When compared to EKG done July 15, 2009 at 1948, previously there more frequent PVCs seen.  This ECG appears similar to the ones she has had done previously in our office including on June 11, 2009, although the T wave changes are a bit more prominent, which is a nonspecific finding.,IMPRESSION: , She is an 81-year-old woman with severe O2 requiring chronic obstructive pulmonary disease with evidence of right heart overload, as well as known coronary artery disease status post single-valve bypass in 1971 suffering a right hip fracture for whom a right hip replacement is being considered.  I have had a long discussion with the patient, as well as her daughter and grandson at the bedside today.  There are no clear absolute cardiac contraindications that I can see.  Of note at the time of this dictation a chest x-ray report is pending.  With that being said, however, she is extremely high risk more from a pulmonary than cardiac standpoint.  We did also however review that untreated hip fractures themselves have very high morbidity and mortality incidences.  The patient is deciding on surgery and is clearly aware that she is very high risk for proposed surgery, as well as if she were to not pursue surgery.,PLAN/RECOMMENDATIONS:,1.  The patient is going to decide on surgery.  If she does have the right hip surgery, I would recommend overnight observation in the intensive care unit.,2.  Optimize pulmonary function and pursue aggressive DVT prophylaxis.,3.  Continue digoxin and verapamil.  Again, the patient describes clear INTOLERANCE TO BETA BLOCKERS by her history.'''
record = sample_record()
doc = nlp(record)

s = ''
for sent in doc.sents:
  if len(s) + len(sent.text) < 1000:
    s = s + ' ' + sent.text
  else:
    if nerpipeline(s):
      # print(sent.text)
      ner = nerpipeline(s)
      for ent in ner:
        ent['label']=ent['entity']
    else: ner = []
    displacy.render({'text':s, 'ents':ner}, style="ent", manual=True, jupyter=True)
    # print(ner)
    s = ''


Sample record from: Cardiovascular / Pulmonary


**Ontonotes English NER** in Flair

This is the large 18-class NER model for English

In [None]:
%%capture
!pip install flair

In [None]:
from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")

2021-09-12 21:50:14,975 loading file /root/.flair/models/ner-english-ontonotes-large/2da6c2cdd76e59113033adf670340bfd820f0301ae2e39204d67ba2dc276cc28.ec1bdb304b6c66111532c3b1fc6e522460ae73f1901848a4d0362cdf9760edb1


In [47]:
ner

In [49]:
record = sample_record()
doc = nlp(record)

s = ''
for sent in doc.sents:
  if len(s) + len(sent.text) < 1000:
    s = s + ' ' + sent.text
  else:
    # make example sentence
    sentence = Sentence(s)

    # predict NER tags
    tagger.predict(sentence)

    ents = [{'label':str(span.labels[0].value),'start':span.start_pos,'end':span.end_pos} for span in sentence.get_spans('ner')]
    displacy.render({'text':s, 'ents':ents}, style="ent", manual=True, jupyter=True)
    s = ''
    


Sample record from: Cardiovascular / Pulmonary


**SciBERT**

In [60]:
%%capture 
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bionlp13cg_md-0.4.0.tar.gz

In [59]:
# Import the large dataset
import en_ner_bionlp13cg_md

# Identify entities
nlp = en_ner_bionlp13cg_md.load()
# doc = nlp(text)
# displacy_image = displacy.render(doc, jupyter = True, style = "ent")


In [63]:
# import scispacy
# import spacy

# nlp = spacy.load("en_ner_bionlp13cg_md")
# text = """
# HISTORY OF PRESENT ILLNESS: , I was kindly asked to see Ms. ABC by Dr. X for cardiology consultation regarding preoperative evaluation for right hip surgery.  She is a patient with a history of coronary artery disease status post bypass surgery in 1971 who tripped over her oxygen last p.m. she states and fell.  She suffered a right hip fracture and is being considered for right hip replacement.  The patient denies any recent angina, but has noted more prominent shortness of breath.,Past cardiac history is significant for coronary artery disease status post bypass surgery, she states in 1971, I believe it was single vessel.  She has had stress test done in our office on September 10, 2008, which shows evidence of a small apical infarct, no area of ischemia, and compared to study of December of 2005, there is no significant change.  She had a transthoracic echocardiogram done in our office on August 29, 2008, which showed normal left ventricular size and systolic function, dilated right ventricle with septal flattening of the left ventricle consistent with right ventricular pressure overload, left atrial enlargement, severe tricuspid regurgitation with estimated PA systolic pressure between 75-80 mmHg consistent with severe pulmonary hypertension, structurally normal aortic and mitral valve.  She also has had some presumed atrial arrhythmias that have not been sustained.  She follows with Dr. Y my partner at Cardiology Associates.,PAST MEDICAL HISTORY:  ,Other medical history includes severe COPD and she is oxygen dependent, severe pulmonary hypertension, diabetes, abdominal aortic aneurysm, hypertension, dyslipidemia.  Last ultrasound of her abdominal aorta done June 12, 2009 states that it was fusiform, infrarenal shaped aneurysm of the distal abdominal aorta measuring 3.4 cm unchanged from prior study on June 11, 2008.,MEDICATIONS:,  As an outpatient:,1.  Lanoxin 0.125 mg, 1/2 tablet once a day.,2.  Tramadol 50 mg p.o. q.i.d. as needed.,3.  Verapamil 240 mg once a day.,4.  Bumex 2 mg once a day.,5.  ProAir HFA.,6.  Atrovent nebs b.i.d.,7.  Pulmicort nebs b.i.d.,8.  Nasacort 55 mcg, 2 sprays daily.,9.  Quinine sulfate 325 mg p.o. q.h.s. p.r.n.,10.  Meclizine 12.5 mg p.o. t.i.d. p.r.n.,11.  Aldactone 25 mg p.o. daily.,12.  Theo-24 200 mg p.o., 2 in the morning.,13.  Zocor 40 mg once a day.,14.  Vitamin D 400 units twice daily.,15.  Levoxyl 125 mcg once a day.,16.  Trazodone 50 mg p.o. q.h.s. p.r.n.,17.  Janumet 50/500, 1 tablet p.o. b.i.d.,ALLERGIES: , To medications are listed as:,1.  LEVAQUIN.,2.  AZITHROMYCIN.,3.  ADHESIVE TAPE.,4.  BETA BLOCKERS.  When I talked to the patient about the BETA BLOCKER, she states that they made her more short of breath in the past.,She denies shrimp, seafood or dye allergy.,FAMILY HISTORY:  ,Significant for heart problems she states in her mother and father.,SOCIAL HISTORY:  ,She used to smoke cigarettes and smoked from the age of 14 to 43 and quit at the time of her bypass surgery.  She does not drink alcohol nor use illicit drugs.  She lives alone and is widowed.  She is a retired custodian at University.  Of note, she is accompanied with her verbal consent by her daughter and grandson at the bedside.,REVIEW OF SYSTEMS:  ,Unable to obtain as the patient is somnolent from her pain medication, but she is alert and able to answer my direct questions.,PHYSICAL EXAM: , Height 5'2", weight 160 pounds, temperature is 99.5 degrees ranging up to 101.6, blood pressure 137/67 to 142/75, pulse 92, respiratory rate 16, O2 saturation 93-89%.  On general exam, she is an elderly, chronically ill appearing woman in no acute distress.  She is able to lie flat, she does have pain if she moves.  HEENT shows the cranium is normocephalic, atraumatic.  She has dry mucosal membranes.  Neck veins are not distended.  There are no carotid bruits.  Visible skin is warm and she appears pale.  Affect appropriate and she is somnolent from her pain medications, but arouses easily and answers my direct questions appropriately.  Lungs are clear to auscultation anteriorly, no wheezes.  Cardiac exam S1, S2 regular rate, soft holosystolic murmur heard over the tricuspid region.  No rub nor gallop.  PMI is nondisplaced, unable to appreciate RV heave.  Abdomen soft, mildly distended, appears benign.  Extremities with trivial peripheral edema.  Pulses grossly intact.  She has quite a bit of pain at the right hip fracture.,DIAGNOSTIC/LABORATORY DATA:  ,Sodium 135, potassium 4.7, chloride 99, bicarbonate 33, BUN 22, creatinine 1.3, glucose 149, troponin was 0.01 followed by 0.04.  Theophylline level 16.6 on January 23, 2009.  TSH 0.86 on March 10, 2009.  INR 1.06.  White blood cell count 9.5, hematocrit 35, platelet count 160.,EKG done July 16, 2009 at 7:31:15, shows sinus rhythm, which showed PR interval of about 118 milliseconds, nonspecific T wave changes.  When compared to EKG done July 15, 2009 at 1948, previously there more frequent PVCs seen.  This ECG appears similar to the ones she has had done previously in our office including on June 11, 2009, although the T wave changes are a bit more prominent, which is a nonspecific finding.,IMPRESSION: , She is an 81-year-old woman with severe O2 requiring chronic obstructive pulmonary disease with evidence of right heart overload, as well as known coronary artery disease status post single-valve bypass in 1971 suffering a right hip fracture for whom a right hip replacement is being considered.  I have had a long discussion with the patient, as well as her daughter and grandson at the bedside today.  There are no clear absolute cardiac contraindications that I can see.  Of note at the time of this dictation a chest x-ray report is pending.  With that being said, however, she is extremely high risk more from a pulmonary than cardiac standpoint.  We did also however review that untreated hip fractures themselves have very high morbidity and mortality incidences.  The patient is deciding on surgery and is clearly aware that she is very high risk for proposed surgery, as well as if she were to not pursue surgery.,PLAN/RECOMMENDATIONS:,1.  The patient is going to decide on surgery.  If she does have the right hip surgery, I would recommend overnight observation in the intensive care unit.,2.  Optimize pulmonary function and pursue aggressive DVT prophylaxis.,3.  Continue digoxin and verapamil.  Again, the patient describes clear INTOLERANCE TO BETA BLOCKERS by her history.
# """
record = sample_record()

doc = nlp(record)

displacy.render(doc, style='ent', jupyter=True)


Sample record from: Neurology
