<a href="https://colab.research.google.com/github/ainouche/NLP/blob/main/Natural_Language_Processing_for_Clinical_Text_Understanding(Clinical_NER_Notebook)(Cleared_Outputs).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.3.0/en_core_sci_lg-0.3.0.tar.gz
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.3.0/en_ner_craft_md-0.3.0.tar.gz
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.3.0/en_ner_jnlpba_md-0.3.0.tar.gz
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.3.0/en_ner_bc5cdr_md-0.3.0.tar.gz
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.3.0/en_ner_bionlp13cg_md-0.3.0.tar.gz

In [2]:
import scispacy
import spacy
from spacy import displacy
from collections import Counter,OrderedDict
import en_core_sci_lg
import en_ner_craft_md
import en_ner_jnlpba_md
import en_ner_bc5cdr_md
import en_ner_bionlp13cg_md

from pprint import pprint

In [3]:
def entities_and_labels(model,document):
    """ A function that returns named word entities and labels and an image rendering

    Parameters: 
         model(module): A pretrained model from spaCy(https://spacy.io/models) or ScispaCy(https://allenai.github.io/scispacy/)
         document(str): Document to be processed

    Returns: Image rendering and list of word entities and entity labels 
     """
    nlp = model.load()
    doc = nlp(document)
    displacy_image = displacy.render(doc, jupyter=True,style='ent')
    entity_and_label =list((set([(X.text, X.label_) for X in doc.ents])))
    return  entity_and_label, displacy_image

link to original publication = https://www.journalmc.org/index.php/Jmc/article/view/3566/2886

In [4]:
test_clinical_note = """ A 75-year-old male visited JA Toride Medical Center with complaints of fever (37.6 °C) and general fatigue before a hemodialysis session.
 Maintenance hemodialysis was undertaken thrice a week for the past 2 years at the same hospital. The patient was a permanent non-smoker, but he had a long history of poorly controlled 
 type 2 diabetes mellitus resulting in end-stage kidney failure (ESKD). According to the blood examination 1 week before this episode, hemoglobin A1c (HbA1c) was 8.7% and glycoalbumin was 35.6%. 
 Moreover, a pacemaker had been implanted for treatment of complete atrioventricular block since 2006. He did not get vaccinated for influenza in the last year.

On visit, he had no apparent respiratory symptoms with normal breath sound, and there were no remarkable physical findings except for mild leg edema. However, images of plain chest X-ray (Fig. 1) and chest computed tomography (CT) (Fig. 2) revealed a consolidative shadow in the S6 region of the right lung, which suggested bacterial pneumonia, rather than pneumonia due to COVID-19. As Tables 1 and 2 show, the blood examination showed normal count and fraction of leukocytes and slight elevation in C-reactive protein (CRP). Because the patient had oliguric, urinary antigens of Streptococcus and Legionella species were not be able to be tested. Since the hospital accepted COVID-19 patients at that time, a nasal swab sampling was carried out for a reverse transcription-PCR (RT-PCR) test for excluding SARS-CoV-2 infection.
The RT-PCR was conducted through the established protocol [5] at Ibaraki Prefectural Institute of Public Health (Ibaraki, Japan).
Then he had been followed up at the outpatient with oral administration of 400 mg of moxifloxacin (MFLX) daily until day 10. But his condition had not been improved with this treatment alone.
 To rule out COVID-19, additional RT-PCR tests were done at the institute described above on days 5, 8, 18, and at another laboratory, SRL Inc. (Tokyo, Japan) on day 10 (Table 2). 
 But their results were all negative. Simultaneous blood culture study also revealed no growth of infectious microorganisms.

On day 10, since the initial consolidation became blurred, and appeared like ground glass opacification (GGO), and similar shadows were extensively found in the contralateral lung in the chest CT scans 
(Fig. 3), the patient was admitted to the hospital with a probable diagnosis of COVID-19. He had no contact history with other COVID-19 patients treated in this hospital. Meanwhile, 
he was a city council member, and usually worked with a variety of people. Thus, his actual exposure timing to SARS-CoV-2 was not been specified. His son, only a family member living together, 
showed a negative PCR result of SARS-CoV-2 RNA, although his serum antibodies against SARS-CoV-2 had not been examined.After obtaining the informed consent from the patient according to the
approval by the ethical committee of JA Toride Medical Center on the administration of ciclesonide and favipiravir against COVID-19 cases, inhalation therapy of 200 µg of ciclesonide, 
twice a day was started on his admission, and 3.6 g/day of favipiravir was initially administered on day 12, then 1.6 g/day was continued until day 25 (Fig. 4).
To treat possible concomitant bacterial pneumonia, 500 mg of meropenem (MEPM) had been simultaneously administered between day 12 and day 23. 
Saturation of percutaneous oxygen (SpO2) of the patient had been kept above 95% (95-100%) with 1 - 2 L/min of oxygen inhalation through a nasal cannula (Fig.4). As Table 2 indicates, immunoglobulin M (IgM) antibodies against SARS-CoV-2, initially negative on day 1, were detected from his serum sample with an assay kit (One Stop COVID19 IgM/IgG Antibody Test, Kyokuto Pharmaceutical Industrial Co., Tokyo, Japan) on day 10. IgG antibodies also turned positive on day 15. IgM antibodies against SARS-CoV-2 were also ascertained positive with another assay kit (Immunochromatographic Test Kit, Kurabo Industries Co, LTD, Osaka, Japan) on day 21.
His inflammatory signs, such as his body temperatures and serum CRP levels had been transiently suppressed during days 22 and 25, but these signs had elevated again from day 26. MEPM, 0.5 g/day,
 was resumed, and 40 mg/day of methyl prednisolone was added to prevent a possible cytokine storm caused by COVID-19, because pulmonary infiltrative shadows were rather improved (Figs. 1, 2), 
 both of serum interleukin 6 (IL-6) and D-dimer levels elevated (Table 2), and the blood culture test showed no bacterial growth (Table 2). 
Antifungal agents were not required, although elevation in serum β-D-glucan level was noted in his recovery phase. With these treatments, his inflammatory signs were reduced, 
and his general condition recovered around 35 days after the onset (Fig. 4)."""

In [5]:
entities_and_labels(en_core_sci_lg, test_clinical_note)

([('improved', 'ENTITY'),
  ('MEPM', 'ENTITY'),
  ('hemoglobin A1c', 'ENTITY'),
  ('shadows', 'ENTITY'),
  ('immunoglobulin M', 'ENTITY'),
  ('days', 'ENTITY'),
  ('leukocytes', 'ENTITY'),
  ('diagnosis', 'ENTITY'),
  ('approval', 'ENTITY'),
  ('elevated', 'ENTITY'),
  ('people', 'ENTITY'),
  ('oliguric', 'ENTITY'),
  ('episode', 'ENTITY'),
  ('blood culture test', 'ENTITY'),
  ('Kurabo Industries Co', 'ENTITY'),
  ('right lung', 'ENTITY'),
  ('serum β-D-glucan', 'ENTITY'),
  ('onset', 'ENTITY'),
  ('admission', 'ENTITY'),
  ('inflammatory signs', 'ENTITY'),
  ('urinary antigens', 'ENTITY'),
  ('normal breath sound', 'ENTITY'),
  ('tested', 'ENTITY'),
  ('Osaka', 'ENTITY'),
  ('protocol', 'ENTITY'),
  ('hospital', 'ENTITY'),
  ('concomitant', 'ENTITY'),
  ('Maintenance', 'ENTITY'),
  ('COVID-19', 'ENTITY'),
  ('timing', 'ENTITY'),
  ('infectious microorganisms', 'ENTITY'),
  ('resumed', 'ENTITY'),
  ('ground glass opacification', 'ENTITY'),
  ('PCR', 'ENTITY'),
  ('physical findings', 

In [6]:
def entities_and_labels_no_display(model,document):
    """ A function that returns named word entities and labels.

    Parameters: 
         model(module): A pretrained model from spaCy(https://spacy.io/models) or ScispaCy(https://allenai.github.io/scispacy/)
         document(str): Document to be processed

    Returns: list of word entities and entity labels 
     """
    nlp = model.load()
    doc = nlp(document)
    #displacy_image = displacy.render(doc, jupyter=True,style='ent')
    entity_and_label =list((set([(X.text, X.label_) for X in doc.ents])))
    return  entity_and_label

In [7]:
entities_and_labels_no_display(en_ner_craft_md,test_clinical_note)

[('RNA', 'SO'),
 ('serum interleukin 6', 'GGP'),
 ('bacterial', 'TAXON'),
 ('moxifloxacin', 'CHEBI'),
 ('CRP', 'GGP'),
 ('microorganisms', 'GO'),
 ('meropenem', 'CHEBI'),
 ('C-reactive protein', 'GGP'),
 ('MEPM', 'CHEBI'),
 ('oxygen', 'CHEBI'),
 ('Legionella species', 'TAXON'),
 ('ciclesonide', 'CHEBI'),
 ('leukocytes', 'CL'),
 ('D-dimer', 'GGP'),
 ('hemoglobin', 'CHEBI'),
 ('mellitus', 'CHEBI'),
 ('His', 'CHEBI'),
 ('Stop COVID19', 'SO'),
 ('methyl', 'CHEBI'),
 ('antibodies', 'GO'),
 ('Antibody', 'GO')]

In [8]:
entities_and_labels_no_display(en_ner_jnlpba_md,test_clinical_note)

[('IgM antibodies', 'PROTEIN'),
 ('LTD', 'PROTEIN'),
 ('serum antibodies', 'PROTEIN'),
 ('Stop COVID19 IgM/IgG Antibody', 'PROTEIN'),
 ('IL-6', 'PROTEIN'),
 ('D-dimer', 'PROTEIN'),
 ('CRP', 'PROTEIN'),
 ('leukocytes', 'CELL_TYPE'),
 ('C-reactive protein', 'PROTEIN'),
 ('SARS-CoV-2', 'PROTEIN'),
 ('SRL Inc.', 'PROTEIN'),
 ('immunoglobulin M', 'PROTEIN'),
 ('IgG antibodies', 'PROTEIN'),
 ('COVID-19', 'PROTEIN'),
 ('cytokine', 'PROTEIN'),
 ('S6 region', 'PROTEIN'),
 ('serum interleukin 6', 'PROTEIN'),
 ('His', 'PROTEIN'),
 ('SARS-CoV-2 RNA', 'RNA')]

In [9]:
entities_and_labels_no_display(en_ner_bc5cdr_md,test_clinical_note)

[('pneumonia', 'DISEASE'),
 ('JA Toride', 'CHEMICAL'),
 ('diabetes mellitus', 'DISEASE'),
 ('ciclesonide', 'CHEMICAL'),
 ('oliguric', 'DISEASE'),
 ('end-stage kidney failure', 'DISEASE'),
 ('IL-6', 'CHEMICAL'),
 ('methyl prednisolone', 'CHEMICAL'),
 ('favipiravir', 'CHEMICAL'),
 ('infection', 'DISEASE'),
 ('infectious microorganisms', 'DISEASE'),
 ('fever', 'DISEASE'),
 ('atrioventricular block', 'DISEASE'),
 ('MFLX', 'CHEMICAL'),
 ('meropenem', 'CHEMICAL'),
 ('pulmonary infiltrative', 'DISEASE'),
 ('ESKD', 'DISEASE'),
 ('SRL', 'CHEMICAL'),
 ('MEPM', 'CHEMICAL'),
 ('oxygen', 'CHEMICAL'),
 ('edema', 'DISEASE'),
 ('LTD', 'DISEASE'),
 ('fatigue', 'DISEASE'),
 ('bacterial pneumonia', 'DISEASE'),
 ('blurred', 'DISEASE'),
 ('moxifloxacin', 'CHEMICAL')]

In [10]:
entities_and_labels_no_display(en_ner_bionlp13cg_md,test_clinical_note)

[('immunoglobulin M', 'GENE_OR_GENE_PRODUCT'),
 ('COVID-19', 'GENE_OR_GENE_PRODUCT'),
 ('SpO2', 'SIMPLE_CHEMICAL'),
 ('LTD', 'SIMPLE_CHEMICAL'),
 ('serum interleukin 6', 'ORGANISM_SUBSTANCE'),
 ('hemoglobin A1c', 'GENE_OR_GENE_PRODUCT'),
 ('Fig. 3', 'GENE_OR_GENE_PRODUCT'),
 ('body', 'ORGANISM_SUBDIVISION'),
 ('MEPM', 'SIMPLE_CHEMICAL'),
 ('oxygen', 'SIMPLE_CHEMICAL'),
 ('IL-6', 'GENE_OR_GENE_PRODUCT'),
 ('nasal swab', 'ORGAN'),
 ('SRL', 'SIMPLE_CHEMICAL'),
 ('meropenem', 'SIMPLE_CHEMICAL'),
 ('CRP', 'GENE_OR_GENE_PRODUCT'),
 ('D-dimer', 'GENE_OR_GENE_PRODUCT'),
 ('atrioventricular', 'MULTI_TISSUE_STRUCTURE'),
 ('GGO', 'MULTI_TISSUE_STRUCTURE'),
 ('[5]', 'SIMPLE_CHEMICAL'),
 ('COVID-19 patients', 'CANCER'),
 ('kidney', 'ORGAN'),
 ('patients', 'ORGANISM'),
 ('serum CRP', 'ORGANISM_SUBSTANCE'),
 ('people', 'ORGANISM'),
 ('moxifloxacin', 'SIMPLE_CHEMICAL'),
 ('COVID-19', 'SIMPLE_CHEMICAL'),
 ('JA Toride', 'SIMPLE_CHEMICAL'),
 ('serum', 'ORGANISM_SUBSTANCE'),
 ('ciclesonide', 'SIMPLE_CHEMI