<a href="https://colab.research.google.com/github/LxYuan0420/nlp/blob/main/notebooks/Inference_with_Open_Type_Zeroshot_NER_GLiNER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

GLiNER is a robust Named Entity Recognition (NER) model utilizing a bidirectional transformer architecture akin to BERT. Unlike traditional NER models restricted to predefined entities, GLiNER dynamically recognizes a broad range of entity types, making it a practical alternative to resource-heavy Large Language Models. It efficiently operates on CPUs, ensuring accessibility for environments with limited GPU resources. For more information, consult the [GLiNER paper](https://arxiv.org/abs/2311.08526) and the [GitHub repository](https://github.com/urchade/GLiNER). The model supports multiple languages and versions, detailed under its Apache-2.0 and cc-by-nc-4.0 licenses, making it adaptable for various NER applications.

In [None]:
!pip install gliner

In [2]:
from gliner import GLiNER

# Initialize the GLiNER model with a pretrained version
model = GLiNER.from_pretrained("urchade/gliner_large-v2.1")

# Define the text containing detailed biographical information about Albert Einstein
text = """
Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical \
physicist who developed the theory of relativity, one of the two pillars of modern \
physics (alongside quantum mechanics). His work is also known for its influence \
on the philosophy of science. He is best known for his mass–energy equivalence \
formula E = mc^2, which has been dubbed "the world's most famous equation". \
He received the 1921 Nobel Prize in Physics "for his services to Theoretical Physics, \
and especially for his discovery of the law of the photoelectric effect", a \
pivotal step in the development of quantum theory.
"""

# Specify the labels of interest for entity extraction
labels = ["person", "award", "date", "scientific_concept", "quote"]

# Use the model to predict entities based on the provided text and labels
entities = model.predict_entities(text, labels)

# Print each identified entity along with its associated label
for entity in entities:
    print(entity["text"], "=>", entity["label"])


pytorch_model.bin:   0%|          | 0.00/1.78G [00:00<?, ?B/s]

gliner_config.json:   0%|          | 0.00/477 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/580 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/874M [00:00<?, ?B/s]

Albert Einstein => person
14 March 1879 => date
18 April 1955 => date
theory of relativity => scientific_concept
quantum mechanics => scientific_concept
the world's most famous equation => quote
1921 => date
Nobel Prize in Physics => award


#### Multilingual example

##### Chinese example

It didn't work so well, probably due to the need for word segmentation in the Chinese sentence.

In [4]:
from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_multi")

text = """
阿尔伯特·爱因斯坦（Albert Einstein，1879年3月14日至1955年4月18日）是一位出生于德国的理论物理学家，他发展了相对论，这是现代物理学的两大支柱之一（与量子力学并列）。 他的作品还因其对科学哲学的影响而闻名。 他最著名的是质能等价公式 E = mc^2，该公式被称为“世界上最著名的方程”。 他因“对理论物理学的贡献，特别是发现光电效应定律”而获得 1921 年诺贝尔物理学奖，这是量子理论发展的关键一步。
"""

labels = ["人物", "奖项", "日期", "科学概念", "引用"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])


pytorch_model.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

gliner_config.json:   0%|          | 0.00/734 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

阿尔伯特·爱因斯坦 => 人物
Albert Einstein => 人物
1879年3月14日至1955年4月18日 => 日期
1921 年诺贝尔物理学奖 => 日期


##### Malay example

In [5]:
text = """
Albert Einstein (14 Mac 1879 - 18 April 1955) ialah seorang ahli fizik teori kelahiran Jerman yang membangunkan teori relativiti, salah satu daripada dua tiang fizik moden (bersama mekanik kuantum). Karya beliau juga terkenal dengan pengaruhnya terhadap falsafah sains. Beliau terkenal dengan formula kesetaraan jisim-tenaga E = mc^2, yang telah digelar sebagai "persamaan paling terkenal di dunia". Beliau menerima Hadiah Nobel dalam Fizik 1921 "atas perkhidmatannya kepada Fizik Teori, dan terutamanya untuk penemuannya tentang undang-undang kesan fotoelektrik", satu langkah penting dalam pembangunan teori kuantum.
"""

# en:
# labels = ["person", "award", "date", "scientific_concept", "quote"]
labels = ["orang", "anugerah", "tarikh", "konsep_ilmiah", "petikan"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])


Albert Einstein => orang
14 Mac 1879 => tarikh
18 April 1955 => tarikh
fizik teori => konsep_ilmiah
teori relativiti => konsep_ilmiah
fizik moden => konsep_ilmiah
mekanik kuantum => konsep_ilmiah
falsafah sains => konsep_ilmiah
formula kesetaraan jisim-tenaga E = mc^2 => konsep_ilmiah
persamaan paling terkenal di dunia => konsep_ilmiah
Hadiah Nobel dalam Fizik => anugerah
Fizik Teori => konsep_ilmiah
undang-undang kesan fotoelektrik => konsep_ilmiah
teori kuantum => konsep_ilmiah


##### Malay sentence with english label

In [6]:
text = """
Albert Einstein (14 Mac 1879 - 18 April 1955) ialah seorang ahli fizik teori kelahiran Jerman yang membangunkan teori relativiti, salah satu daripada dua tiang fizik moden (bersama mekanik kuantum). Karya beliau juga terkenal dengan pengaruhnya terhadap falsafah sains. Beliau terkenal dengan formula kesetaraan jisim-tenaga E = mc^2, yang telah digelar sebagai "persamaan paling terkenal di dunia". Beliau menerima Hadiah Nobel dalam Fizik 1921 "atas perkhidmatannya kepada Fizik Teori, dan terutamanya untuk penemuannya tentang undang-undang kesan fotoelektrik", satu langkah penting dalam pembangunan teori kuantum.
"""

# en:
labels = ["person", "award", "date", "scientific_concept", "quote"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])


Albert Einstein => person
14 Mac 1879 => date
18 April 1955 => date
fizik teori => scientific_concept
teori relativiti => scientific_concept
mekanik kuantum => scientific_concept
falsafah sains => scientific_concept
formula kesetaraan jisim-tenaga E = mc^2 => scientific_concept
persamaan paling terkenal di dunia => scientific_concept
Hadiah Nobel dalam Fizik => award
Fizik Teori => scientific_concept
undang-undang kesan fotoelektrik => scientific_concept
teori kuantum => scientific_concept
