<a href="https://colab.research.google.com/github/Alfred9/Natural-Language-Processing-Projects/blob/main/Named%20Entity%20Recognition/Named_Entity_Recognition_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

In [1]:
!pip install gliner
!pip install -q gradio



In [2]:
from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_base")

text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""

labels = ["person", "award", "date", "competitions", "teams"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions


In [3]:
text = """The patient was admitted for a right-sided pleural effusion for thoracentesis on Monday by Dr. X. Her Coumadin was placed on hold.
         which was treated with pericardial window in an outside hospital, at that time she was also found to have mesenteric pain and thrombosis, is now anticoagulated.
         Her pericardial fluid was accumulated and she was seen by Dr. Y. At that time, she was recommended for pericardectomy, which was performed by Dr. Z.
         Review of her CT scan from March 2006 prior to her pericardectomy, already shows bilateral plural effusions. The patient improved clinically after the pericardectomy with resolution of her symptoms.
         Recently, she was readmitted to the hospital with chest pain and found to have bilateral pleural effusion, the right greater than the left. CT of the chest also revealed a large mediastinal lymph node.
         We reviewed the pathology obtained from the pericardectomy in March 2006, which was diagnostic of mesothelioma. At this time, chest tube placement for drainage of the fluid occurred and thoracoscopy with fluid biopsies, which were performed, which revealed epithelioid malignant mesothelioma.
         The patient was then stained with a PET CT, which showed extensive uptake in the chest, bilateral pleural pericardial effusions, and lymphadenopathy. She also had acidic fluid, pectoral and intramammary lymph nodes and uptake in L4 with SUV of 4. This was consistent with stage III disease
         Her repeat echocardiogram showed an ejection fraction of 45% to 49%. She was transferred to Oncology service and started on chemotherapy on September 1, 2007 with cisplatin 75 mg/centimeter squared equaling 109 mg IV piggyback over 2 hours on September 1, 2007, Alimta 500 mg/ centimeter squared equaling 730 mg IV piggyback over 10 minutes.
         This was all initiated after a Port-A-Cath was placed. The chemotherapy was well tolerated and the patient was discharged the following day after discontinuing IV fluid and IV. Her Port-A-Cath was packed with heparin according to protocol.
DISCHARGE MEDICATIONS:
Zofran, Phenergan, Coumadin, and Lovenox, and Vicodin"""

labels = ["patient_age","test", "doctor", "admission_date","date" "symptom", "drug", "problem", "bodypart"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Monday => admission_date
Dr. X => doctor
Coumadin => drug
mesenteric pain => problem
thrombosis => problem
Dr. Y => doctor
Dr. Z => doctor
CT scan => test
March 2006 => admission_date
chest pain => problem
CT => test
chest => bodypart
mediastinal lymph node => bodypart
March 2006 => admission_date
chest => bodypart
epithelioid malignant mesothelioma => problem
PET CT => test
chest => bodypart
lymphadenopathy => problem
pectoral => bodypart
intramammary lymph nodes => bodypart
L4 => bodypart
echocardiogram => test
September 1, 2007 => admission_date
cisplatin => drug
Alimta => drug
heparin => drug
Zofran => drug
Phenergan => drug
Coumadin => drug
Lovenox => drug
Vicodin => drug


In [4]:
text_1= """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
"""

labels_1 = ["patient_age","gender", "test", "doctor", "admission_date","date", "symptoms", "drug", "problem", "bodypart", "disease", "result", "location", "procedure"]

entities = model.predict_entities(text_1, labels_1, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

28-year-old => patient_age
female => gender
gestational diabetes mellitus => disease
type two diabetes mellitus => disease
HTG-induced pancreatitis => disease
acute hepatitis => disease
polyuria => symptoms
poor appetite => symptoms
vomiting => symptoms
metformin => drug
glipizide => drug
dapagliflozin => drug
atorvastatin => drug
gemfibrozil => drug
dapagliflozin => drug
Physical examination => test
oral mucosa => bodypart
abdominal examination => test
tenderness => symptoms
guarding => symptoms
rigidity => symptoms
laboratory findings => test
serum glucose => test
creatinine => test
triglycerides => result
total cholesterol => test
venous pH => test


In [None]:
import gradio as gr

def highlight_entities(text):
    # Load the GLiNER model
    model = GLiNER.from_pretrained("urchade/gliner_base")

    # Define the labels and their corresponding colors
    labels = {
        "patient_age": "blue",
        "gender": "green",
        "test": "orange",
        "doctor": "red",
        "admission_date": "purple",
        "date": "yellow",
        "symptoms": "cyan",
        "drug": "magenta",
        "problem": "grey",
        "bodypart": "black",
        "disease": "brown"
    }

    # Predict entities
    entities = model.predict_entities(text, list(labels.keys()))

    # Sort entities by start position in descending order
    entities.sort(key=lambda x: x["start"], reverse=True)

    # Initialize highlighted text
    highlighted_text = text

    # Add HTML markup for each entity
    for entity in entities:
        highlighted_text = highlighted_text[:entity["start"]] + \
                           f"<mark style='background-color:{labels[entity['label']]}'>{entity['text']}</mark>" + \
                           f" <span style='color:{labels[entity['label']]}'>[{entity['label']}]</span> " + \
                           highlighted_text[entity["end"]:]

    return highlighted_text

iface = gr.Interface(fn=highlight_entities, inputs="text", outputs="html", title=" Biomedical NER Highlighting App", description="Input text and see named entities highlighted with labels.")
iface.launch(share= True, debug = True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://d80cd022875c6b0fde.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
