# BioBERT

In [1]:
#!pip install transformers

## Fill Mask VS BERT

In [2]:
masked_text = "The doctor prescribed antibiotics to treat the patient's [MASK] infection."

In [3]:
from transformers import pipeline

In [4]:
mask_filler = pipeline("fill-mask", "google-bert/bert-base-cased")
mask_filler(masked_text, top_k=3)

Some weights of the model checkpoint at google-bert/bert-base-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.06669342517852783,
  'token': 13306,
  'token_str': 'chronic',
  'sequence': "The doctor prescribed antibiotics to treat the patient's chronic infection."},
 {'score': 0.06559422612190247,
  'token': 19560,
  'token_str': 'bacterial',
  'sequence': "The doctor prescribed antibiotics to treat the patient's bacterial infection."},
 {'score': 0.055493537336587906,
  'token': 3472,
  'token_str': 'stomach',
  'sequence': "The doctor prescribed antibiotics to treat the patient's stomach infection."}]

In [5]:
mask_filler = pipeline("fill-mask", "dmis-lab/biobert-base-cased-v1.2")
mask_filler(masked_text, top_k=3)

Some weights of the model checkpoint at dmis-lab/biobert-base-cased-v1.2 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.14005273580551147,
  'token': 9622,
  'token_str': 'HIV',
  'sequence': "the doctor prescribed antibiotics to treat the patient's HIV infection."},
 {'score': 0.0697694942355156,
  'token': 2241,
  'token_str': 'skin',
  'sequence': "the doctor prescribed antibiotics to treat the patient's skin infection."},
 {'score': 0.054565902799367905,
  'token': 19192,
  'token_str': 'respiratory',
  'sequence': "the doctor prescribed antibiotics to treat the patient's respiratory infection."}]

## NER

In [6]:
text = '''
A 48 year-old female presented with vaginal bleeding and abnormal Pap smears. 
Upon diagnosis of invasive non-keratinizing SCC of the cervix, she underwent a 
radical hysterectomy with salpingo-oophorectomy which demonstrated positive 
spread to the pelvic lymph nodes and the parametrium. Pathological examination 
revealed that the tumour also extensively involved the lower uterine segment.
'''

In [7]:
# Use a pipeline as a high-level helper
from transformers import pipeline

ner_pipe = pipeline("ner", model="jordyvl/biobert-base-cased-v1.2_ncbi_disease-sm-first-ner")

results = ner_pipe(text, aggregation_strategy = "simple")
for e in results:
    print (e)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'entity_group': 'Disease', 'score': 0.9995035, 'word': 'v', 'start': 37, 'end': 38}
{'entity_group': 'Disease', 'score': 0.99942905, 'word': '##agi', 'start': 38, 'end': 41}
{'entity_group': 'Disease', 'score': 0.9990461, 'word': '##nal bleeding', 'start': 41, 'end': 53}
{'entity_group': 'Disease', 'score': 0.99946207, 'word': 't', 'start': 335, 'end': 336}
{'entity_group': 'Disease', 'score': 0.9993754, 'word': '##umour', 'start': 336, 'end': 341}
