<a href="https://colab.research.google.com/github/Alfred9/Natural-Language-Processing/blob/main/Named%20Entity%20Recognition/Named_Entity_Recognition_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Named Entity Recognition (NER)
Named Entity Recognition (NER) is a sub-task of information extraction in Natural Language Processing (NLP) that classifies named entities into predefined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, and more

In [1]:
!pip install gliner
!pip install -q gradio

Collecting gliner
  Downloading gliner-0.1.6-py3-none-any.whl (25 kB)
Collecting huggingface-hub>=0.21.4 (from gliner)
  Downloading huggingface_hub-0.22.2-py3-none-any.whl (388 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting flair==0.13.1 (from gliner)
  Downloading flair-0.13.1-py3-none-any.whl (388 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.3/388.3 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting seqeval (from gliner)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting boto3>=1.20.27 (from flair==0.13.1->gliner)
  Downloading boto3-1.34.79-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m3.9 MB/s[0m eta 

### Load the Model
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

In [2]:
from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_base")

pytorch_model.bin:   0%|          | 0.00/792M [00:00<?, ?B/s]

gliner_config.json:   0%|          | 0.00/732 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]

### Business Application

In [7]:
from gliner import GLiNER

Bns_text="""Nvidia's best days are probably behind it — and the stock's meteoric rise amid the artificial intelligence craze will probably start to stumble this year, according to several Wall Street analysts.
           That bearishness comes amid a stellar year so far for Nvidia, with the Jensen Huang-led firm crushing earnings estimates quarter after quarter. The company is now worth more than Alphabet and Amazon, and it just dethroned Tesla as the top stock pick among retail investors.
           But the chipmaker's monster-sized gains could soon come to an end, according to Gil Luria, analyst at DA Davidson. He's calling for as much as a 20% slide in Nvidia stock by the end of the year — joining a handful of other strategists who are skeptical of Nvidia's dizzying stock market valuation.
           Nvidia is unlikely to keep up its rapid pace of growth, as companies investing in AI are bound to tap out eventually, Luria said, speaking recently to BNN Bloomberg. In a note, he assigned a "hold" rating to the stock and a price target of $620, the lowest estimate on Wall Street. """

Bns_labels =['Company_name','Stock_symbol','Revenue','Profit_margin','Market_capitalization','CEO_name','Merger_acquisition','Earnings_report','Dividend_yield','Share_price']

entities = model.predict_entities(Bns_text, Bns_labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Nvidia => Company_name
Nvidia => Company_name
Jensen Huang-led => CEO_name
Alphabet => Company_name
Amazon => Company_name
Tesla => Company_name
Gil Luria => CEO_name
DA Davidson => Company_name
Nvidia => Company_name
Nvidia => Company_name
Nvidia => Company_name
BNN Bloomberg => Company_name
$620 => Share_price


### Court Documentation Application

In [4]:
text = """
A New York judge ordered Donald Trump on Friday to pay $355 million in penalties, finding that the former president lied about his wealth for years in a sweeping civil fraud verdict that pierces his billionaire image but stops short of putting his real estate empire out of business.
Judge Arthur Engoron’s decision after a trial in New York Attorney General Letitia James’ lawsuit punishes Trump, his company and executives, including his two eldest sons, for scheming to dupe banks, insurers and others by inflating his wealth on financial statements. It forces a shakeup at the top of his Trump Organization, putting the company under court supervision and curtailing how it does business.
The decision is a staggering setback for the Republican presidential front-runner, the latest and costliest consequence of his recent legal troubles. The magnitude of the verdict on top of penalties in other cases could dramatically dent Trump’s financial resources and damage his identity as a savvy businessman who parlayed his fame as a real estate developer into reality TV stardom and the presidency. He has vowed to appeal and won’t have to pay immediately."""

labels = ['Case_name','Defendant_name','Plaintiff_name','Judge_name','Court_name','Legal_document','Verdict','Attorney_name','Lawsuit','Jurisdiction']

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

New York => Jurisdiction
civil fraud verdict => Verdict
Judge Arthur Engoron => Attorney_name
New York => Jurisdiction
financial statements => Legal_document
verdict => Verdict


### **Biomedical Application**

In [5]:

from gliner import GLiNER

Bio_model = GLiNER.from_pretrained("urchade/gliner_base")

text_1= """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting.
She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation.
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.
"""

med_labels = ["patient_age","gender", "test", "doctor", "admission_date","date", "symptoms", "drug", "problem", "bodypart", "disease", "result", "location", "procedure"]

entities = Bio_model.predict_entities(text_1, med_labels, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

28-year-old => patient_age
female => patient_age
gestational diabetes mellitus => disease
type two diabetes mellitus => disease
HTG-induced pancreatitis => disease
acute hepatitis => disease
polyuria, poor appetite => symptoms
vomiting => symptoms
metformin, glipizide => drug
dapagliflozin => drug
atorvastatin and gemfibrozil => drug
dapagliflozin => drug
Physical examination => procedure
oral mucosa => bodypart
abdominal examination => test
tenderness => symptoms
guarding => symptoms
rigidity => symptoms
laboratory findings => procedure
serum glucose => result
creatinine => result
triglycerides => test
total cholesterol => result
venous pH => procedure


In [6]:
import gradio as gr

def highlight_entities(text):
    # Load the GLiNER model
    model = GLiNER.from_pretrained("urchade/gliner_large_bio-v0.1")

    # Define the labels and their corresponding colors
    labels = {
        "patient_age": "blue",
        "gender": "green",
        "test": "orange",
        "doctor": "red",
        "admission_date": "purple",
        "date": "yellow",
        "symptoms": "cyan",
        "drug": "magenta",
        "problem": "grey",
        "bodypart": "black",
        "disease": "brown"
    }

    # Predict entities
    entities = model.predict_entities(text, list(labels.keys()))

    # Sort entities by start position in descending order
    entities.sort(key=lambda x: x["start"], reverse=True)

    # Initialize highlighted text
    highlighted_text = text

    # Add HTML markup for each entity
    for entity in entities:
        highlighted_text = highlighted_text[:entity["start"]] + \
                           f"<mark style='background-color:{labels[entity['label']]}'>{entity['text']}</mark>" + \
                           f" <span style='color:{labels[entity['label']]}'>[{entity['label']}]</span> " + \
                           highlighted_text[entity["end"]:]

    return highlighted_text

iface = gr.Interface(fn=highlight_entities, inputs="text", outputs="html", title=" Biomedical NER Highlighting App", description="Input text and see named entities highlighted with labels.")
iface.launch(share= True, debug = True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://24fdb9eb2942652244.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://24fdb9eb2942652244.gradio.live


