In [1]:
!pip install gliner2
!pip install pandas
!pip install matplotlib



In [8]:
from gliner_to_labelstudio import (
    load_gliner_schema_config,
    create_gliner_schema_from_config,
    create_gliner_schema_from_config_file,
    get_schema_metadata
)

SCHEMA_CONFIG_PATH = r'C:\Users\Heike\PycharmProjects\ghentcdh-glinerv2-tutorial\GLiNER2_Latin_tests\gliner_schema_hagiographics.json'

schema_config = load_gliner_schema_config(SCHEMA_CONFIG_PATH)


In [41]:
import json

schema = create_gliner_schema_from_config_file(extractor, SCHEMA_CONFIG_PATH)

text = """
[13] Ferunt quidam, quod post collocatum monasterium ac collectam congregationem atque regulariter instructam, per sex annos peregerit vitam, & regulariter atque irreprehensibiliter vixerit, & ceteris se sequentibus imitabilem ostenderit: quod modo sacratissimis ejus virtutibus declaratum est. Post sex vero annos incorrupta migravit ad Dominum suum, inviolabilem sponsum; cui se voverat, & multis temporibus juncta esse cupiebat. Dicunt quidam, quod in totum vitæ suæ cursum triginta annorum impleverit. Postea quoque præcepit se ferri ad locum, quem in illo tempore ad Sanctos vocabant Apostolos, ibique se jussit sepeliri.
""" #sentence from Label Studio ID 2834

results = extractor.extract(text, schema, threshold=0.1, include_confidence=True, include_spans=True, format_results=False)
print(json.dumps(results, indent=2, ensure_ascii=False))

#human labeled entities:
# - "vir" => devine_entity
# - "duo pueri" => group
# - "sanctæ Virgini" => person
# - "velum" => object

{
  "entities": [
    {
      "person": [
        {
          "text": "Apostolos",
          "confidence": 0.6441665291786194,
          "start": 590,
          "end": 599
        },
        {
          "text": "Dominum suum",
          "confidence": 0.601327121257782,
          "start": 339,
          "end": 351
        }
      ],
      "group": [
        {
          "text": "congregationem",
          "confidence": 0.6025076508522034,
          "start": 67,
          "end": 81
        },
        {
          "text": "Apostolos",
          "confidence": 0.5861753821372986,
          "start": 590,
          "end": 599
        }
      ],
      "institution": [
        {
          "text": "monasterium",
          "confidence": 0.9041874408721924,
          "start": 42,
          "end": 53
        }
      ],
      "place": [
        {
          "text": "locum",
          "confidence": 0.529434084892273,
          "start": 542,
          "end": 547
        }
      ],
      "object": [],
   

# text nr 1
Here you can see that GLiNER2 extracts roughly the same entities as the human annotator, but with some differences.

The model extracts "vir quidam ignotus" as a person, while the human annotator labeled only "vir" as a divine entity. This is possible because the divine aspect of the "vir" is implied by it having an "angelico vultu" (angelic face), which is a common attribute of divine entities in hagiographic texts. The model probably recognizes this as an attribute of a person, rather than being a reference to a divine entity.

The model also extracts "duo pueri" as both a person and a group, while the human annotator only labeled it as a group. This could be because the model recognizes that "duo pueri" refers to two individuals, while the human annotator may have focused on the collective aspect of the phrase. However, it also labeled it as group with a higher confidence than as a person, which suggests that it recognizes the collective aspect of the phrase.

The model also labels "sanctæ Virgini" both as a person and a divine entity, while the human annotator only labeled it as a person. Additionally, the model labels "cunctis" as a divine entity, which was not labeled by the human annotator.

In [None]:
"""
Instante itaque jam septima die, quæ prima feria, atque Dominicus appellatur dies, vir quidam ignotus angelico vultu advenit, quem sequebantur duo pueri, qui sanctæ Virgini, ob violentiæ metum juxta altare positæ, velum sanctæ religionis detulit, atque mox velatam coram cunctis reliquit.
""" #sentence from Label Studio ID 2831

#human labeled entities:
# - "vir" => devine_entity
# - "duo pueri" => group
# - "sanctæ Virgini" => person
# - "velum" => object

#model extracted entities:
# - "vir quidam ignotus" => person (confidence: 0.71)
# - "sanctæ Virgini" => person (confidence: 0.69)
# - "duo pueri" => person (confidence: 0.53)
# - "duo pueri" => group (confidence: 0.58 )
# - "velum" => object (confidence: 0.72)
# - "sanctæ Virgini" => divine_entity (confidence: 0.73)
# - "cunctis" => divine_entity (confidence: 0.60)


# text nr 2
In this second example, we can see that the model extracts more entities than the human annotator, but non that are the same as the human annotator.

**important question:** is it possible that F. only wants labels from text fragments labeled as 'events'? Because in this text, only the sentence 'Quod ... decernunt' is labeled as an event, while the rest of the text is not labeled at all. If this is the case, then it would make sense that the model extracts entities from the entire text, while the human annotator only labels entities from the event sentence.

In [None]:
"""
[11] Nemo tamen dixit, quid quærerent, quidve egissent; sed omnes magno detinebantur terrore, & nesciebant, quid, imminente terrore, facere potuissent. Quod cuncti cernentes atque scientes manibus factum angelicis, non jam persequendam, ut impiam, sed venerandam, ut Sanctam, decernunt. Et prostrati omnes veniam postulant; ut eis sacra virgo & Deo dilecta Glodesindis indulgentiam tribueret; ac sic ei reconciliati sunt, atque veniam impetrarunt. Audiant hoc virgines, quæ sui corporis quantulacumque
""" #sentence from Label Studio ID 2832

#human labeled entities:
# cuncti => group
# Sanctam => person

#model extracted entities:
#"Glodesindis" ⇒ person (confidence: 0.65)
#"virgines" ⇒ person (confidence: 0.56)
#"virgines" ⇒ group (confidence: 0.75)
#"prostrati" ⇒ group (confidence: 0.69)
#"virgo" ⇒ divine_entity (confidence: 0.92)
#"Deo" ⇒ divine_entity (confidence: 0.77)
#"Glodesindis" ⇒ divine_entity (confidence: 0.57)