# Clinical Named Entity Recognition (CNER):
* Medical NLP task
* Extracts named entities in clinical narratives
* Analyze and categorize important concepts into predefined categories

How is CNER different from NER?

The use of synonyms is common in clinical narratives. For example, cardiac arrest, cardiac infraction, and heart attack could be used interchangeably.

* Variability in describing the same entities
* Nested entities
* Domain-specific training data and expert annotation
(If I am training a custom clinical named entity recognition model, because the domain is specific, only expertts can be annotators.)

Clinical Named Entities:

Depending on the use case, there are many clinical named entities that I can use. Either pre-trained or custom models.

Some of these are:
* Medication
* Disease
* Symptoms
* Adverse drug reaction
* Anatomy
* Duration
* Gene
* Specialty
* Lab tests
* Route of administration 

Machine Learning Algorithms:
* The ML algorithms used for clinical named entity recognition can be supervised or unsupervised
* Supervised = Training data is labeled 
* Unsupervised = Training data is not labeled 
* The supervised approuch is more prominenet in real life.

### Central Principles

Key things to remeber about clinical named entity recognition are:
* It is a token classification task, where I am assigning entities to different classes 
* In my context, the entities are clinical or biomedical.
* Clinical info, collection, mining, and retrieval.

### Practical Uses

* Extraction of entities such a genes and biomarkers.
* Entity recognition for mapping into medical knowledge bases
* Pattern analysis in electronic health records
* Detection of medication names
* Entity extraction as preprocessing for other tasks

Sample text is a snippet from a scientific case report on Artificial Hypoglycemia

Pre-Trained models used are from the scispaCy library:
* pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_scibert-0.5.1.tar.gz
* pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_jnlpba_md-0.5.1.tar.gz
* pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz
* pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz
* pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

In [1]:
import scispacy
import spacy
from spacy import displacy
import en_core_sci_scibert
import en_ner_jnlpba_md
import en_ner_bc5cdr_md
import en_ner_bionlp13cg_md
from pprint import pprint
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Saving my sample data into a variable called case_report 
case_report = """Background: Hypoglycemia is uncommon in people who are not being treated for diabetes mellitus and, when present, the differential diagnosis is broad. 
Artifactual hypoglycemia describes discrepancy between low capillary and normal plasma glucose levels regardless of symptoms and should be considered in patients with Raynaud’s phenomenon.
Case Presentation: A 46-year-old female patient with a history of a sleeve gastrectomy started complaining about episodes of lipothymias preceded by sweating, nausea, and dizziness. 
During one of these episodes, a capillary blood glucose was obtained with a value of 24 mg/dl. She had multiple emergency admissions with low-capillary glycemia. 
An exhaustive investigation for possible causes of hypoglycemia was made for 18 months. 
The 72h fasting test was negative for hypoglycemia. A Raynaud’s phenomenon was identified during one appointment.
Conclusion: Artifactual hypoglycemia has been described in various conditions including Raynaud’s phenomenon, peripheral arterial disease, Eisenmenger syndrome, acrocyanosis, or hypothermia. 
With this case report, we want to reinforce the importance of being aware of this diagnosis to prevent anxiety, unnecessary treatment, and diagnostic tests."""

In [3]:
# Since I will be experimenting with up to 4 pre-trained models, I will make a function that-
# takes the model name, document as type string, color options.

def display_entities(model, document, color_options):
    """
    This function displays word entities

    Parameters:
        model (module): A pre-trained model from ScispaCy
        document (str): Document to be procesed  
        color_options (dict): Dictionary of entities and colors to be rendered in display image
    
    Returns: Image rendering and list of named/un-named word entities and entity labels
    """
    
    nlp = model.load()
    doc = nlp(document)
    displacy_image = displacy.render(doc, jupyter=True, style='ent', options=color_options)
    entity_and_label = pprint(set([(X.text, X.label_) for X in doc.ents]))
    
    return displacy_image, entity_and_label

In [4]:
# The viz requires me to set the colors for the entities
colors = {"entity": "yellow"}
color_options = {"ents": ["Entity"], "colors": colors}

In [5]:
# Call the display entities function and pass in the desired model-
# the sample data, and the color options.

# * This model can detect up to 780,000 scientific entities

display_entities(en_core_sci_scibert, case_report, color_options)

# Looking at the outut, I can see the words that may be of clinical significance.
# Such as: Hypoglycemia, diabetes mellitus, plasma glucose 



{('Artifactual', 'ENTITY'),
 ('Eisenmenger syndrome', 'ENTITY'),
 ('Hypoglycemia', 'ENTITY'),
 ('Raynaud’s phenomenon', 'ENTITY'),
 ('acrocyanosis', 'ENTITY'),
 ('anxiety', 'ENTITY'),
 ('appointment', 'ENTITY'),
 ('capillary blood glucose', 'ENTITY'),
 ('case report', 'ENTITY'),
 ('complaining', 'ENTITY'),
 ('conditions', 'ENTITY'),
 ('diabetes mellitus', 'ENTITY'),
 ('diagnosis', 'ENTITY'),
 ('diagnostic tests', 'ENTITY'),
 ('differential diagnosis', 'ENTITY'),
 ('discrepancy', 'ENTITY'),
 ('dizziness', 'ENTITY'),
 ('emergency admissions', 'ENTITY'),
 ('episodes', 'ENTITY'),
 ('exhaustive', 'ENTITY'),
 ('fasting test', 'ENTITY'),
 ('female', 'ENTITY'),
 ('history', 'ENTITY'),
 ('hypoglycemia', 'ENTITY'),
 ('hypothermia', 'ENTITY'),
 ('identified', 'ENTITY'),
 ('investigation', 'ENTITY'),
 ('lipothymias', 'ENTITY'),
 ('low capillary', 'ENTITY'),
 ('low-capillary glycemia', 'ENTITY'),
 ('months', 'ENTITY'),
 ('nausea', 'ENTITY'),
 ('negative', 'ENTITY'),
 ('patient', 'ENTITY'),
 ('patie

(None, None)

In [6]:
# For the second model I will assign the color options based on the documented entities it-
# is portrayed to detect.

colors = {"DNA": "yellow", "CELL_TYPE": "green", "CELL_LINE": "red", "RNA": "brown", "PROTEIN": "pink"}
color_options = {"ents": ["DNA", "CELL_TYPE", "CELL_LINE", "RNA", "PROTEIN"], "colors": colors}

In [7]:
# The model used in this code is going to find entities such as DNA, cell type and protein
display_entities(en_ner_jnlpba_md, case_report, color_options)

# *This returns no highlighted entities. This means that there are no related entities in the sample text.



set()


(None, None)

In [8]:
colors = {"DISEASE": "yellow", "CHEMICAL": "red"}
color_options = {"ents": ["DISEASE", "CHEMICAL"], "colors": colors}

In [9]:
# This model tries to recognize entities that are related to disease or chemical
display_entities(en_ner_bc5cdr_md, case_report, color_options)

{('Eisenmenger syndrome', 'DISEASE'),
 ('Hypoglycemia', 'DISEASE'),
 ('acrocyanosis', 'DISEASE'),
 ('anxiety', 'DISEASE'),
 ('capillary blood glucose', 'DISEASE'),
 ('diabetes mellitus', 'DISEASE'),
 ('dizziness', 'DISEASE'),
 ('glucose', 'CHEMICAL'),
 ('hypoglycemia', 'DISEASE'),
 ('hypothermia', 'DISEASE'),
 ('lipothymias', 'DISEASE'),
 ('nausea', 'DISEASE'),
 ('peripheral arterial disease', 'DISEASE')}


(None, None)

In [12]:
colors = {
    "AMINO_ACID": "yellow",
    "ANATOMICAL_SYSTEM": "green",
    "CANCER": "red",
    "CELL": "brown",
    "CELLULAR_COMPONENT": "pink",
    "DEVELOPING_ANATOMICAL_STRUCTURE": "blue",
    "GENE_OR_GENE_PRODUCT": "orange",
    "IMMATERIAL_ANATOMICAL_ENTITY": "lightblue",
    "MULTI-TISSUE_STRUCTURE": "lightgreen", 
    "ORGAN": "purple",
    "ORGANISM": "grey",
    "ORGANISM_SUBDIVISION": "Cyan",
    "ORGANISM_SUBSTANCE": "magneta",
    "PATHOLOGICAL_FORMATION": "lilac",
    "SIMPLE_CHEMICAL": "wine",
    "TISSUE": "lemon"
}

color_options = {"ents": ["AMINO_ACID", "ANATOMICAL_SYSTEM", "CANCER", "CELL", "CELLULAR_COMPONENT", "DEVELOPING_ANATOMICAL_STRUCTURE",
                          "GENE_OR_GENE_PRODUCT", "IMMATERIAL_ANATOMICAL_ENTITY", "MULTI-TISSUE_STRUCTURE", "ORGAN", "ORGANISM", "ORGANISM_SUBDIVISION",
                          "ORGANISM_SUBSTANCE", "PATHOLOGICAL_FORMATION", "SIMPLE_CHEMICAL", "TISSUE"], "colors": colors}

In [13]:
# This model detects patient entities, like anatomical systems, organism.
display_entities(en_ner_bionlp13cg_md, case_report, color_options)

# In the output below I can see the detection of people as organism, capillary as tissue, and plasma glucose as organism substance.

{('capillary', 'TISSUE'),
 ('capillary blood glucose', 'TISSUE'),
 ('patient', 'ORGANISM'),
 ('patients', 'ORGANISM'),
 ('people', 'ORGANISM'),
 ('peripheral arterial', 'MULTI_TISSUE_STRUCTURE'),
 ('plasma glucose', 'ORGANISM_SUBSTANCE')}


(None, None)

In [14]:
# To assess only the entities and labels without the visialization, I tweaked the original-
# display function

def entities_and_label_extractor(model, document):
    """
    This function displays word entities

    Parameters:
        model (module): A pre-trained model from ScispaCy
        document (str): Document to be procesed  
    
    Returns: list of named/un-named word entities and entity labels
    """
    
    nlp = model.load()
    doc = nlp(document)
    entity_and_label = set([(X.text, X.label_) for X in doc.ents])
    
    return entity_and_label

In [15]:
bionlp_ner = entities_and_label_extractor(en_ner_bionlp13cg_md, case_report)

In [16]:
type(bionlp_ner)

set

In [17]:
bionlp_ner

{('capillary', 'TISSUE'),
 ('capillary blood glucose', 'TISSUE'),
 ('patient', 'ORGANISM'),
 ('patients', 'ORGANISM'),
 ('people', 'ORGANISM'),
 ('peripheral arterial', 'MULTI_TISSUE_STRUCTURE'),
 ('plasma glucose', 'ORGANISM_SUBSTANCE')}

In [18]:
# Save the output as a dataframe:

# save returned values of entities and labels in a dataframe  
bionlp_entities_df = pd.DataFrame(bionlp_ner, columns=['Entity', 'Label'])

# include a column with constant value of NER model
bionlp_entities_df['NER_model'] = 'bionlp13cg'
bionlp_entities_df

Unnamed: 0,Entity,Label,NER_model
0,peripheral arterial,MULTI_TISSUE_STRUCTURE,bionlp13cg
1,capillary blood glucose,TISSUE,bionlp13cg
2,plasma glucose,ORGANISM_SUBSTANCE,bionlp13cg
3,patients,ORGANISM,bionlp13cg
4,capillary,TISSUE,bionlp13cg
5,patient,ORGANISM,bionlp13cg
6,people,ORGANISM,bionlp13cg


Working with different name detection models and exploring the range of available entities should have improved my intuition on replicating similar tasks on my clinical dataset.