In [1]:
!pip install gliner2
!pip install pandas
!pip install matplotlib



In [24]:
from gliner_to_labelstudio import (
    load_gliner_schema_config,
    create_gliner_schema_from_config,
    create_gliner_schema_from_config_file,
    get_schema_metadata
)

SCHEMA_CONFIG_PATH = r'C:\Users\Heike\PycharmProjects\ghentcdh-glinerv2-tutorial\GLiNER2_Latin_tests\gliner_schema_hagiographics.json'

schema_config = load_gliner_schema_config(SCHEMA_CONFIG_PATH)


In [27]:
import json
from gliner2 import GLiNER2

extractor = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")

schema = create_gliner_schema_from_config_file(extractor, SCHEMA_CONFIG_PATH)

text = """
[7] Denique pater prÃ¦dictÃ¦ sacrÃ¦ Virginis voluit eam alteri sociare viro in conjugium. Illa hoc totis nisibus respuens; atque appropinquante interea die nuptiarum, in qua tradenda erat beata [Col. 0204B] Virgo sponso corruptibili, sponso integritatis inimico, qui, [post] paululum putredo, vermis, & pulvis erat futurus; hostia jam Domino facta, talem sponsum contemnens, pro Christo patrem parentesque relinquens & sponsum, soli Deo famulari desiderabat. Pater quoque jam dictÃ¦ sacrÃ¦ Virginis volens eam ad Treverensium secum ducere civitatem ad sororem suam, nomine Rotlindam, quÃ¦ ibidem illo tempore in sancta conversatione manebat, ut vel cum ea inibi blandis ac dulcibus sermonibus ejus mollificaretur animus, quatinus vel sic alteri eam potuisset dare viro in conjugium.
""" #sentence from Label Studio ID 2834

results = extractor.extract(text, schema, threshold=0.1, include_confidence=True, include_spans=True, format_results=False)
print(json.dumps(results, indent=2, ensure_ascii=False))


You are using a model of type extractor to instantiate a model of type . This is not supported for all configurations of models and can yield errors.


ðŸ§  Model Configuration
Encoder model      : microsoft/mdeberta-v3-base
Counting layer     : count_lstm
Token pooling      : first
{
  "entities": [
    {
      "person": [
        {
          "text": "Rotlindam",
          "confidence": 0.880836546421051,
          "start": 569,
          "end": 578
        },
        {
          "text": "Pater",
          "confidence": 0.7074596881866455,
          "start": 457,
          "end": 462
        },
        {
          "text": "pater",
          "confidence": 0.5924856662750244,
          "start": 13,
          "end": 18
        }
      ],
      "group": [
        {
          "text": "Treverensium",
          "confidence": 0.5031154155731201,
          "start": 509,
          "end": 521
        }
      ],
      "institution": [],
      "place": [
        {
          "text": "civitatem",
          "confidence": 0.8329572081565857,
          "start": 535,
          "end": 544
        }
      ],
      "object": [
        {
          "text": 

# text nr 1
Here you can see that GLiNER2 extracts roughly the same entities as the human annotator, but with some differences.

The model extracts "vir quidam ignotus" as a person, while the human annotator labeled only "vir" as a divine entity. This is possible because the divine aspect of the "vir" is implied by it having an "angelico vultu" (angelic face), which is a common attribute of divine entities in hagiographic texts. The model probably recognizes this as an attribute of a person, rather than being a reference to a divine entity.

The model also extracts "duo pueri" as both a person and a group, while the human annotator only labeled it as a group. This could be because the model recognizes that "duo pueri" refers to two individuals, while the human annotator may have focused on the collective aspect of the phrase. However, it also labeled it as group with a higher confidence than as a person, which suggests that it recognizes the collective aspect of the phrase.

The model also labels "sanctÃ¦ Virgini" both as a person and a divine entity, while the human annotator only labeled it as a person. Additionally, the model labels "cunctis" as a divine entity, which was not labeled by the human annotator.

In [None]:
"""
Instante itaque jam septima die, quÃ¦ prima feria, atque Dominicus appellatur dies, vir quidam ignotus angelico vultu advenit, quem sequebantur duo pueri, qui sanctÃ¦ Virgini, ob violentiÃ¦ metum juxta altare positÃ¦, velum sanctÃ¦ religionis detulit, atque mox velatam coram cunctis reliquit.
""" #sentence from Label Studio ID 2831

#human labeled entities:
# - "vir" => devine_entity
# - "duo pueri" => group
# - "sanctÃ¦ Virgini" => person
# - "velum" => object

#model extracted entities:
# - "vir quidam ignotus" => person (confidence: 0.71)
# - "sanctÃ¦ Virgini" => person (confidence: 0.69)
# - "duo pueri" => person (confidence: 0.53)
# - "duo pueri" => group (confidence: 0.58 )
# - "velum" => object (confidence: 0.72)
# - "sanctÃ¦ Virgini" => divine_entity (confidence: 0.73)
# - "cunctis" => divine_entity (confidence: 0.60)


# text nr 2
In this second example, we can see that the model extracts more entities than the human annotator, but non that are the same as the human annotator.

**important question:** is it possible that F. only wants labels from text fragments labeled as 'events'? Because in this text, only the sentence 'Quod ... decernunt' is labeled as an event, while the rest of the text is not labeled at all. If this is the case, then it would make sense that the model extracts entities from the entire text, while the human annotator only labels entities from the event sentence.

In [None]:
"""
[11] Nemo tamen dixit, quid quÃ¦rerent, quidve egissent; sed omnes magno detinebantur terrore, & nesciebant, quid, imminente terrore, facere potuissent. Quod cuncti cernentes atque scientes manibus factum angelicis, non jam persequendam, ut impiam, sed venerandam, ut Sanctam, decernunt. Et prostrati omnes veniam postulant; ut eis sacra virgo & Deo dilecta Glodesindis indulgentiam tribueret; ac sic ei reconciliati sunt, atque veniam impetrarunt. Audiant hoc virgines, quÃ¦ sui corporis quantulacumque
""" #sentence from Label Studio ID 2832

#human labeled entities:
# cuncti => group
# Sanctam => person

#model extracted entities:
#"Glodesindis" â‡’ person (confidence: 0.65)
#"virgines" â‡’ person (confidence: 0.56)
#"virgines" â‡’ group (confidence: 0.75)
#"prostrati" â‡’ group (confidence: 0.69)
#"virgo" â‡’ divine_entity (confidence: 0.92)
#"Deo" â‡’ divine_entity (confidence: 0.77)
#"Glodesindis" â‡’ divine_entity (confidence: 0.57)

# text nr 3
- words like 'ei' and 'illa' are difficult to label as 'person'
- why is 'ad praedictam amitam suam Rotlindam labeled as an event within an event?
-

In [None]:
"""
[13] Ferunt quidam, quod post collocatum monasterium ac collectam congregationem atque regulariter instructam, per sex annos peregerit vitam, & regulariter atque irreprehensibiliter vixerit, & ceteris se sequentibus imitabilem ostenderit: quod modo sacratissimis ejus virtutibus declaratum est. Post sex vero annos incorrupta migravit ad Dominum suum, inviolabilem sponsum; cui se voverat, & multis temporibus juncta esse cupiebat. Dicunt quidam, quod in totum vitÃ¦ suÃ¦ cursum triginta annorum impleverit. Postea quoque prÃ¦cepit se ferri ad locum, quem in illo tempore ad Sanctos vocabant Apostolos, ibique se jussit sepeliri.
""" #sentence from Label Studio ID 2834


#model extracted entities:
#- "Apostolos" => person (confidence: 0.64)
#- "Dominum suum" => person (confidence: 0.60)
#- "congregationem" => group (confidence: 0.60)
#- "Apostolos" => group (confidence: 0.59)
#- "monasterium" => institution (confidence: 0.90)
#- "locum" => place (confidence: 0.53)