# Transformers
* Transformers are a type of neural network architecture that are suitable for sequence to sequence or encoder-decoder model task.
* One of the concepts that makes them stand out from other models is the attention mechanism
* Attention mechanism replaces recurrence by focusing on the most important part of the sequence.

`What was used before transformers`
* Word embidding, such as Word2vec and Glove
* Recurrent neural networks (RNN)
* Context based embedding such as ELMo

`Types of Transformers`
* Autoregressive = GPT, GPT-2, XLNet, CTRL
* Autoencoding = BERT, ALBERT, RoBERTa, DistilBERT
* Sequence-to-sequence models = BART, Pegasus, T5

`Central Principles`

The central principles for my transformer models are:
 * Model can be pretrained 
 * Learning can be transfered to smaller tasks
 * They can be fine tuned to specific tasks
 * They have longer memory compared to recurrent models

`Transformer Models for Clinical Tasks`

There are some transformer models that are specifically trained on clinical and biomedical relatedd texts.
* Bio_ClinicalBERT
* Clinical-Longformer
* CORe clinical diagnosis prediction
* Clini-dialog_sum-T5
* BioBERT
* PubMedBERT
* Clinical-BigBird
* BERT-ClinicalQA

`Practical Uses of Transformers in Clinical Tasks`
* Treatment outcome prediction
* Translation of clinical notest to other languages 
* Summarization of clinical notes
* Detection of clinical entities

The sample data are snippets from Clinical Practice and Cases in Emergency Medicine (https://escholarship.org/uc/uciem_cpcem). Case reports and transformer models from Python library Huggingface

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline, AutoModelForTokenClassification
from pprint import pprint
import torch
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load pre-trained tokenizier and classification models for core clinical diagnosis prediction
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

In [3]:
# Saving the sample data
input = " A 58-year-old male presents to the emergency department with headache, hand numbness, and phantosmia."

In [4]:
# Tokenizing the input variable
tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

In [5]:
# Now, I get the predicted labels by mapping the predicted logic using-
# the ID to label configuration
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

In [6]:
output

SequenceClassifierOutput(loss=None, logits=tensor([[ -9.2949,  -9.9079, -10.6348,  ..., -11.0518,  -6.9629,  -9.2351]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [7]:
# Running this code alone shows only numbers
predictions

tensor([[9.1887e-05, 4.9779e-05, 2.4062e-05,  ..., 1.5858e-05, 9.4548e-04,
         9.7545e-05]], grad_fn=<SigmoidBackward0>)

In [8]:
# When I run this code, it shows what the model thinks the clinical diagnosis is based on the-
# history in the input
predicted_labels

# For example, this 58 year old male presents to the emergency department with headache, hand numbness, and phantosmia.
# From here, it is suspected to be hemorrgaging or hypertension. 

['272',
 '401',
 '4019',
 'complication',
 'essential',
 'hemorrhage',
 'hypertension',
 'mention',
 'status',
 'unspecified',
 'use',
 'without']

In [9]:
# I will now repeat the above code for input2
input2 = "We present a case of a 19-year-old female presenting with intermittent chest pain, palpitations, and weakness present for two months. The patient had previously been evaluated at our emergency department one week earlier."

In [10]:
tokenized_input2 = tokenizer(input2, return_tensors="pt")
output2 = model(**tokenized_input2)

In [11]:
predictions2 = torch.sigmoid(output2.logits)
predicted_labels2 = [model.config.id2label[_id] for _id in (predictions2 > 0.3).nonzero()[:, 1].tolist()]

In [12]:
# The clinical diagnosis prediction model predicts anemia, congestive heart disorder,-
# and maybe a condition related to the artery.
predicted_labels2

['285',
 '2859',
 '424',
 '428',
 '4280',
 '780',
 'anemia',
 'artery',
 'congestive',
 'disorders',
 'heart',
 'specified',
 'unspecified']

In [13]:
input3 = "A 60-year-old female presented with sudden onset visual disturbance in her right eye"

In [14]:
tokenized_input3 = tokenizer(input3, return_tensors="pt")
output3 = model(**tokenized_input3)

In [15]:
predictions3 = torch.sigmoid(output3.logits)
predicted_labels3 = [model.config.id2label[_id] for _id in (predictions3 > 0.3).nonzero()[:, 1].tolist()]

In [16]:
predicted_labels3

['affecting', 'cerebral', 'disorder', 'infarction', 'type', 'unspecified']

In [17]:
input4 = """We report the case of a 57-year-old man who presented to the ED with difficulty voiding. A urinary catheter was placed. 
The patient had severe post-obstructive diuresis. He developed hematuria and became hypotensive."""

In [18]:
tokenized_input4 = tokenizer(input4, return_tensors="pt")
output4 = model(**tokenized_input4)

In [19]:
predictions4 = torch.sigmoid(output4.logits)
predicted_labels4 = [model.config.id2label[_id] for _id in (predictions4 > 0.3).nonzero()[:, 1].tolist()]

In [20]:
predicted_labels4

['276',
 '2762',
 '2767',
 '458',
 '4582',
 '584',
 '5845',
 '5849',
 '590',
 '591',
 '592',
 '5920',
 '593',
 '599',
 '5990',
 '788',
 '7882',
 'acidosis',
 'acute',
 'calculus',
 'failure',
 'hydronephrosis',
 'hyperpotassemia',
 'hypotension',
 'iatrogenic',
 'infection',
 'kidney',
 'lesion',
 'necrosis',
 'obstruction',
 'pyelonephritis',
 'retention',
 'site',
 'specified',
 'tract',
 'tubular',
 'unspecified',
 'urinary',
 'urine',
 'without']

### Transformer models can also be used for clinical name data recognition

In [21]:
# Istantiate the pipeline
# This model is pre-trained to recognize entities such as problem, treatment, and test
ner_pipe = pipeline('ner', model='samrawal/bert-base-uncased_clinical-ner')

In [22]:
sample_text = """Background: Hypoglycemia is uncommon in people who are not being treated for diabetes mellitus and, when present, the differential diagnosis is broad. 
Artifactual hypoglycemia describes discrepancy between low capillary and normal plasma glucose levels regardless of symptoms and should be considered in patients with Raynaud’s phenomenon.
Case Presentation: A 46-year-old female patient with a history of a sleeve gastrectomy started complaining about episodes of lipothymias preceded by sweating, nausea, and dizziness. 
During one of these episodes, a capillary blood glucose was obtained with a value of 24 mg/dl. She had multiple emergency admissions with low-capillary glycemia. 
An exhaustive investigation for possible causes of hypoglycemia was made for 18 months. 
The 72h fasting test was negative for hypoglycemia. A Raynaud’s phenomenon was identified during one appointment.
Conclusion: Artifactual hypoglycemia has been described in various conditions including Raynaud’s phenomenon, peripheral arterial disease, Eisenmenger syndrome, acrocyanosis, or hypothermia. 
With this case report, we want to reinforce the importance of being aware of this diagnosis to prevent anxiety, unnecessary treatment, and diagnostic tests."""

In [23]:
pprint(sample_text)

('Background: Hypoglycemia is uncommon in people who are not being treated for '
 'diabetes mellitus and, when present, the differential diagnosis is broad. \n'
 'Artifactual hypoglycemia describes discrepancy between low capillary and '
 'normal plasma glucose levels regardless of symptoms and should be considered '
 'in patients with Raynaud’s phenomenon.\n'
 'Case Presentation: A 46-year-old female patient with a history of a sleeve '
 'gastrectomy started complaining about episodes of lipothymias preceded by '
 'sweating, nausea, and dizziness. \n'
 'During one of these episodes, a capillary blood glucose was obtained with a '
 'value of 24 mg/dl. She had multiple emergency admissions with low-capillary '
 'glycemia. \n'
 'An exhaustive investigation for possible causes of hypoglycemia was made for '
 '18 months. \n'
 'The 72h fasting test was negative for hypoglycemia. A Raynaud’s phenomenon '
 'was identified during one appointment.\n'
 'Conclusion: Artifactual hypoglycemia has b

In [24]:
# Pass the sample text into the NER pipeline and save the output
entities_list = ner_pipe(sample_text)

In [25]:
type(entities_list)

list

In [26]:
# For a better view of the data, I will save it as a pandas dataframe 
df = pd.DataFrame(entities_list)

In [27]:
df.head(50)
# From the output, hypoglycemia takes the first 5 rows, and was labeled as a problem-
# which is true. 

# The breaking word as seen here, typically happens when a pre-trained model does not have the-
# exact representation of a word at training time.

Unnamed: 0,entity,score,index,word,start,end
0,B-problem,0.996357,3,h,12,13
1,I-problem,0.565359,4,##yp,13,15
2,I-problem,0.995098,5,##og,15,17
3,I-problem,0.995741,6,##ly,17,19
4,I-problem,0.990938,7,##ce,19,21
5,I-problem,0.994446,8,##mia,21,24
6,B-problem,0.996699,19,diabetes,77,85
7,I-problem,0.997851,20,mel,86,89
8,I-problem,0.992689,21,##lit,89,92
9,I-problem,0.998373,22,##us,92,94


In [28]:
# Another transformer model which can detect up to 107 english biomedical entities such as:
# this is a disorder, diagnostic procedure, qualitative concepts, theraputic procedure, science symptom, biological structure-
# is the biomedical-ner-all model.
tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all")
model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all")

In [29]:
pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
pipe(sample_text)

# The model predicted hypoglycemia as a disorder. 46-year-old as Age. Gastrectomy as a theraputic procedure

[{'entity_group': 'Disease_disorder',
  'score': 0.95556676,
  'word': 'h',
  'start': 12,
  'end': 13},
 {'entity_group': 'Disease_disorder',
  'score': 0.98589087,
  'word': '##yp',
  'start': 13,
  'end': 15},
 {'entity_group': 'Disease_disorder',
  'score': 0.9024655,
  'word': '##oglycemia',
  'start': 15,
  'end': 24},
 {'entity_group': 'Disease_disorder',
  'score': 0.9986807,
  'word': 'diabetes mellitus',
  'start': 77,
  'end': 94},
 {'entity_group': 'Detailed_description',
  'score': 0.99984884,
  'word': 'artifact',
  'start': 152,
  'end': 160},
 {'entity_group': 'Disease_disorder',
  'score': 0.9197547,
  'word': 'h',
  'start': 164,
  'end': 165},
 {'entity_group': 'Disease_disorder',
  'score': 0.9275444,
  'word': '##ypoglycemia',
  'start': 165,
  'end': 176},
 {'entity_group': 'Detailed_description',
  'score': 0.9120741,
  'word': 'low cap',
  'start': 207,
  'end': 214},
 {'entity_group': 'Qualitative_concept',
  'score': 0.9098876,
  'word': 'normal',
  'start': 2

### Clinical Word prediction using transformers
* A use case for this can be for implementing text predictionin electronic health record systems.

In [30]:
# Make the pipeline and specify the model of choice 
unmasker = pipeline('fill-mask', model='microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext')

Downloading: 100%|██████████| 385/385 [00:00<00:00, 130kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading: 100%|██████████| 440M/440M [00:10<00:00, 42.6MB/s] 
Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequen

In [31]:
# Apply the mask to the sentence below 
unmasker("A 58-year-old male presents to the emergency department with headache, hand numbness, and [MASK]")

# These predicted entities are all possible in a real clinical setting 

[{'score': 0.19383329153060913,
  'token': 24009,
  'token_str': 'dizziness',
  'sequence': 'a 58 - year - old male presents to the emergency department with headache, hand numbness, and dizziness'},
 {'score': 0.13504955172538757,
  'token': 13954,
  'token_str': 'vomiting',
  'sequence': 'a 58 - year - old male presents to the emergency department with headache, hand numbness, and vomiting'},
 {'score': 0.09129738807678223,
  'token': 12175,
  'token_str': 'weakness',
  'sequence': 'a 58 - year - old male presents to the emergency department with headache, hand numbness, and weakness'},
 {'score': 0.07852588593959808,
  'token': 18,
  'token_str': '.',
  'sequence': 'a 58 - year - old male presents to the emergency department with headache, hand numbness, and.'},
 {'score': 0.07658973336219788,
  'token': 13759,
  'token_str': 'nausea',
  'sequence': 'a 58 - year - old male presents to the emergency department with headache, hand numbness, and nausea'}]

In [32]:
unmasker("We present a case of a 19-year-old female presenting with intermittent chest pain, palpitations, and weakness present for [MASK]. The patient had previously been evaluated at our emergency department one week earlier.")

[{'score': 0.10617193579673767,
  'token': 2739,
  'token_str': 'years',
  'sequence': 'we present a case of a 19 - year - old female presenting with intermittent chest pain, palpitations, and weakness present for years. the patient had previously been evaluated at our emergency department one week earlier.'},
 {'score': 0.0987035483121872,
  'token': 4391,
  'token_str': 'evaluation',
  'sequence': 'we present a case of a 19 - year - old female presenting with intermittent chest pain, palpitations, and weakness present for evaluation. the patient had previously been evaluated at our emergency department one week earlier.'},
 {'score': 0.0760069414973259,
  'token': 3221,
  'token_str': 'months',
  'sequence': 'we present a case of a 19 - year - old female presenting with intermittent chest pain, palpitations, and weakness present for months. the patient had previously been evaluated at our emergency department one week earlier.'},
 {'score': 0.05585772544145584,
  'token': 9087,
  'to

One of the limitations of this model is the ability to take on multiple mask values.