# NER Detection using flair

Following the tutorial: https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md

In [60]:
from flair.data import Sentence
from flair.nn import Classifier
from flair.tokenization import SciSpacyTokenizer
import spacy

In [61]:
file_path = "C:/Data_Science/McArdle_Knowledge_Graph/data/33240189.txt"
with open(file_path, "r") as file:
    file_contents = file.read()

In [62]:
print(file_contents)

McArdle Disease vs. Stiff-Person Syndrome: A Case Report Highlighting the Similarities Between Two Rare and Distinct Disorders. McArdle disease is a rare autosomal recessive disorder of muscle glycogen metabolism that presents with pain and fatigue during exercise. Stiff-Person Syndrome is an autoimmune-related neurologic process characterized by fluctuating muscle rigidity and spasm. Reported is a 41-year-old male who presented to the emergency department due to sudden-onset weakness and chest pain while moving his refrigerator at home. Cardiac workup was non-contributory, but a creatine kinase level > 6,000 warranted a muscle biopsy. The biopsy pathology report was misinterpreted to be diagnostic for McArdle disease given the clinical presentation. After 4 years of treatment without symptomatic improvement, a gradual transition of symptoms from pain alone to pain with stiffness was noted. A positive glutamic acid decarboxylase antibody test resulted in a change of diagnosis to Stiff-

In [63]:
sentence = Sentence(file_contents, use_tokenizer=SciSpacyTokenizer())

In [64]:
# load biomedical tagger
tagger = Classifier.load("hunflair")

# tag sentence
tagger.predict(sentence)

2023-08-24 13:30:05,289 SequenceTagger predicts: Dictionary with 8 tags: <unk>, O, B-Disease, E-Disease, I-Disease, S-Disease, <START>, <STOP>
2023-08-24 13:30:07,295 SequenceTagger predicts: Dictionary with 8 tags: <unk>, O, S-Gene, B-Gene, I-Gene, E-Gene, <START>, <STOP>
2023-08-24 13:30:09,323 SequenceTagger predicts: Dictionary with 8 tags: <unk>, O, S-Species, B-Species, I-Species, E-Species, <START>, <STOP>
2023-08-24 13:30:11,239 SequenceTagger predicts: Dictionary with 8 tags: <unk>, O, S-CellLine, B-CellLine, I-CellLine, E-CellLine, <START>, <STOP>
2023-08-24 13:30:13,325 SequenceTagger predicts: Dictionary with 8 tags: <unk>, O, S-Chemical, B-Chemical, I-Chemical, E-Chemical, <START>, <STOP>


In [102]:
dp_spans = []
labels = []
ents = []

for entity in sentence.get_labels():
    dp.append(entity.data_point)
    labels.append(entity.to_dict()['value'])
    text = []
    for token in entity.data_point:
        #print(str(token).split(': ')[1][1:-1])
        text.append(str(token).split(': ')[1][1:-1])
    txt = ' '.join(text)
    ents.append(txt)

#print(dp)
#print(labels)
#print(ents)

[Span[0:2]: "McArdle Disease" → Disease (0.9855), Span[3:7]: "Stiff-Person Syndrome" → Disease (0.9657), Span[21:23]: "McArdle disease" → Disease (0.9875), Span[26:33]: "autosomal recessive disorder of muscle glycogen metabolism" → Disease (0.7885), Span[36:37]: "pain" → Disease (0.9524), Span[38:39]: "fatigue" → Disease (0.9707), Span[42:46]: "Stiff-Person Syndrome" → Disease (0.9665), Span[48:53]: "autoimmune-related neurologic process" → Disease (0.6707), Span[56:58]: "muscle rigidity" → Disease (0.9423), Span[59:60]: "spasm" → Disease (0.9929), Span[79:80]: "weakness" → Disease (0.7689), Span[81:83]: "chest pain" → Disease (0.9717), Span[99:101]: "creatine kinase" → Gene (0.71), Span[99:100]: "creatine" → Chemical (0.9681), Span[119:121]: "McArdle disease" → Disease (0.9603), Span[141:142]: "pain" → Disease (0.9683), Span[144:145]: "pain" → Disease (0.961), Span[146:147]: "stiffness" → Disease (0.8947), Span[152:155]: "glutamic acid decarboxylase" → Gene (0.8885), Span[152:154]: "g

In [59]:
for entity in sentence.get_labels():
    print(entity.to_dict())

In [54]:
from flair.splitter import SciSpacySentenceSplitter

# initialize the sentence splitter
splitter = SciSpacySentenceSplitter()

# split text into a list of Sentence objects
sentences = splitter.split(file_contents)

# you can apply the HunFlair tagger directly to this list
tagger.predict(sentences)



In [55]:
for sentence in sentences:
    print(sentence.to_tagged_string())

Sentence[21]: "McArdle Disease vs. Stiff-Person Syndrome: A Case Report Highlighting the Similarities Between Two Rare and Distinct Disorders." → ["McArdle Disease"/Disease, "Stiff-Person Syndrome"/Disease]
Sentence[21]: "McArdle disease is a rare autosomal recessive disorder of muscle glycogen metabolism that presents with pain and fatigue during exercise." → ["McArdle disease"/Disease, "autosomal recessive disorder of muscle glycogen metabolism"/Disease, "pain"/Disease, "fatigue"/Disease]
Sentence[19]: "Stiff-Person Syndrome is an autoimmune-related neurologic process characterized by fluctuating muscle rigidity and spasm." → ["Stiff-Person Syndrome"/Disease, "muscle rigidity"/Disease, "spasm"/Disease]
Sentence[29]: "Reported is a 41-year-old male who presented to the emergency department due to sudden-onset weakness and chest pain while moving his refrigerator at home." → ["weakness"/Disease, "chest pain"/Disease]
Sentence[19]: "Cardiac workup was non-contributory, but a creatine ki

# NER with SciSpaCy

In [26]:
nlp = spacy.load("en_ner_bionlp13cg_md")#("en_core_sci_sm")



In [30]:
doc = nlp(file_contents)

In [31]:
print(doc.ents)

(McArdle Disease, Stiff-Person Syndrome, Case Report, Similarities, Rare, Disorders, McArdle disease, autosomal recessive disorder, muscle glycogen metabolism, pain, fatigue, exercise, Stiff-Person Syndrome, autoimmune-related neurologic process, muscle rigidity, spasm, male, emergency department, sudden-onset weakness, chest pain, moving, refrigerator, home, Cardiac workup, non-contributory, creatine kinase, level, muscle biopsy, biopsy pathology report, diagnostic, McArdle disease, clinical presentation, years, treatment, symptomatic, improvement, transition, symptoms, pain, pain, stiffness, positive glutamic acid decarboxylase antibody test, diagnosis, Stiff-Person Syndrome, similarities, rare, disease processes, necessity, history taking, maintenance, knowledge, complex, pathology reports)


In [35]:
# Print named entity labels
for ent in doc.ents:
    print(ent.text, ent.label_)

McArdle Disease ENTITY
Stiff-Person Syndrome ENTITY
Case Report ENTITY
Similarities ENTITY
Rare ENTITY
Disorders ENTITY
McArdle disease ENTITY
autosomal recessive disorder ENTITY
muscle glycogen metabolism ENTITY
pain ENTITY
fatigue ENTITY
exercise ENTITY
Stiff-Person Syndrome ENTITY
autoimmune-related neurologic process ENTITY
muscle rigidity ENTITY
spasm ENTITY
male ENTITY
emergency department ENTITY
sudden-onset weakness ENTITY
chest pain ENTITY
moving ENTITY
refrigerator ENTITY
home ENTITY
Cardiac workup ENTITY
non-contributory ENTITY
creatine kinase ENTITY
level ENTITY
muscle biopsy ENTITY
biopsy pathology report ENTITY
diagnostic ENTITY
McArdle disease ENTITY
clinical presentation ENTITY
years ENTITY
treatment ENTITY
symptomatic ENTITY
improvement ENTITY
transition ENTITY
symptoms ENTITY
pain ENTITY
pain ENTITY
stiffness ENTITY
positive glutamic acid decarboxylase antibody test ENTITY
diagnosis ENTITY
Stiff-Person Syndrome ENTITY
similarities ENTITY
rare ENTITY
disease processes 

There are specialized entities in spacy for which there are special entity label. However they do not cover as much as the flair named entities.