# GLiREL Notebook: Relationship Extraction on Label Studio Annotations

This notebook demonstrates how to use the **GLiREL model** for relationship extraction (RE) on texts that have been annotated with entities in **Label Studio**.

## Workflow Overview

1. **Load Text & Annotations**: Read the original text and entity annotations from Label Studio JSON export
2. **Prepare GLiREL Input**: Convert Label Studio annotations to GLiREL-compatible format
3. **Relationship Extraction**: Use GLiREL to identify and classify relationships between entities
4. **Analyze Results**: Display and export extracted relationships

## Table of Contents

**Setup & Data Loading**
- [Installation](#installation) - Install dependencies
- [Load Example Data](#load-data) - Read text and Label Studio annotations
- [Data Exploration](#explore-data) - Understand the structure

**Data Preparation**
- [Convert LS to GLiREL Format](#convert-format) - Prepare input for GLiREL model

**Relationship Extraction**
- [Extract Relations](#extract-relations) - Run GLiREL on prepared data

---

## 1. Installation {#installation}

Install required packages for relationship extraction with GLiREL:


In [4]:
!pip install gliner

Collecting transformers>=4.57.3
  Downloading transformers-5.1.0-py3-none-any.whl (10.3 MB)
     ---------------------------------------- 0.0/10.3 MB ? eta -:--:--
     --- ------------------------------------ 0.8/10.3 MB 25.3 MB/s eta 0:00:01
     -------- ------------------------------- 2.2/10.3 MB 23.6 MB/s eta 0:00:01
     ------------ --------------------------- 3.3/10.3 MB 26.5 MB/s eta 0:00:01
     ----------------- ---------------------- 4.4/10.3 MB 23.5 MB/s eta 0:00:01
     --------------------- ------------------ 5.6/10.3 MB 25.6 MB/s eta 0:00:01
     -------------------------- ------------- 6.9/10.3 MB 25.7 MB/s eta 0:00:01
     ------------------------------- -------- 8.1/10.3 MB 24.5 MB/s eta 0:00:01
     ------------------------------------ --- 9.3/10.3 MB 25.9 MB/s eta 0:00:01
     --------------------------------------  10.3/10.3 MB 25.1 MB/s eta 0:00:01
     --------------------------------------- 10.3/10.3 MB 23.3 MB/s eta 0:00:00
Installing collected packages: trans


[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. preprocessing input data for GLiREL {#load-data}

If you want to use your own data, you have to make sure that the data is in the correct format. The following code snippet shows how to two conversions:
1. Convert GLiNER2 output to GLiREL input format
2. Convert Label Studio annotations to GLiREL input format

### (1) GLiNER2 output -> GLiREL input {#convert-gliner-to-glirel}
While the GLiNER2 model gives an output that is similar to the GLiREL input format, some adjustments are needed to make it fully compatible. Specifically, we need to convert character-based entity spans to word-based spans.

This code will convert the character spans from the GLiNER2 output to word spans suitable for GLiREL:

In [None]:
import json
import re
from pathlib import Path


def get_word_positions(text: str) -> list:
    """
    Get character positions of each whitespace-separated word.
    Returns: List of tuples: [(word, start_char, end_char), ...]
    """
    words = []
    for match in re.finditer(r'\S+', text):
        words.append((match.group(), match.start(), match.end()))
    return words


def char_span_to_word_span(text: str, start_char: int, end_char: int) -> tuple:
    """
    Convert character span to word span indices (0-indexed, exclusive end).
    """
    words = get_word_positions(text)

    word_start = None
    word_end = None

    for i, (word, w_start, w_end) in enumerate(words):
        # Find first word that overlaps with the character span
        if word_start is None and w_end > start_char and w_start < end_char:
            word_start = i + 1  # inclusive start
        # Find last word that overlaps with the character span
        if w_start < end_char and w_end > start_char:
            word_end = i + 1  # exclusive end

    return word_start, word_end


def add_word_spans_to_labelstudio(data: list) -> list:
    """
    Add word_start and word_end to all entity annotations in Label Studio format.

    Args:
        data: List of Label Studio tasks

    Returns:
        Same structure with word_start and word_end added to each entity value
    """
    for task in data:
        text = task.get("data", {}).get("text", "")
        if not text:
            continue

        # Process annotations
        for annotation in task.get("annotations", []):
            for result in annotation.get("result", []):
                if result.get("type") == "labels":
                    value = result.get("value", {})
                    start_char = value.get("start")
                    end_char = value.get("end")

                    if start_char is not None and end_char is not None:
                        word_start, word_end = char_span_to_word_span(text, start_char, end_char)
                        value["word_start"] = word_start
                        value["word_end"] = word_end

        # Process predictions
        for prediction in task.get("predictions", []):
            for result in prediction.get("result", []):
                if result.get("type") == "labels":
                    value = result.get("value", {})
                    start_char = value.get("start")
                    end_char = value.get("end")

                    if start_char is not None and end_char is not None:
                        word_start, word_end = char_span_to_word_span(text, start_char, end_char)
                        value["word_start"] = word_start
                        value["word_end"] = word_end

    return data


# === MAIN: Load, convert, and save ===

input_file = Path("../labelstudio_batch_20260203_150043.json")
output_file = Path("../Relationship Extraction/gliren_input") / f"{input_file.stem}_with_word_spans.json"

# Load Label Studio JSON
print(f"Loading: {input_file}")
with open(input_file, 'r', encoding='utf-8') as f:
    data = json.load(f)

print(f"Loaded {len(data)} tasks")

# Add word spans
data_with_word_spans = add_word_spans_to_labelstudio(data)

# Count how many entities were processed
total_entities = 0
for task in data_with_word_spans:
    for annotation in task.get("annotations", []):
        total_entities += sum(1 for r in annotation.get("result", []) if r.get("type") == "labels")
    for prediction in task.get("predictions", []):
        total_entities += sum(1 for r in prediction.get("result", []) if r.get("type") == "labels")

print(f"Processed {total_entities} entity annotations")

# Save output
with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(data_with_word_spans, f, indent=2, ensure_ascii=False)

print(f"✓ Saved to: {output_file}")

# Show sample of first entity with word spans
if data_with_word_spans:
    task = data_with_word_spans[0]
    text = task.get("data", {}).get("text", "")
    words = get_word_positions(text)

    # Find first entity to show
    sample = None
    for annotation in task.get("annotations", []):
        for result in annotation.get("result", []):
            if result.get("type") == "labels":
                sample = result
                break
        if sample:
            break
    if not sample:
        for prediction in task.get("predictions", []):
            for result in prediction.get("result", []):
                if result.get("type") == "labels":
                    sample = result
                    break
            if sample:
                break

    if sample:
        v = sample["value"]
        print(f"\n=== Sample entity ===")
        print(f"Text: '{v.get('text')}'")
        print(f"Label: {v.get('labels')}")
        print(f"Char span: [{v.get('start')}:{v.get('end')}]")
        print(f"Word span: [{v.get('word_start')}:{v.get('word_end')}]")

        ws, we = v.get('word_start'), v.get('word_end')
        if ws is not None and we is not None:
            word_texts = [words[i][0] for i in range(ws, min(we, len(words)))]
            print(f"Words: {word_texts}")

### (2) Label Studio output -> GLiREL input {#convert-LS-to-GLiREL}

Transform Label Studio entity annotations into GLiREL-compatible input format. For this, make sure that you export your Label Studio annotations in a **JSON-MIN** format.

This code will read the Label Studio JSON export, convert character spans to word spans, and save the result in a GLiREL-compatible format:


In [33]:
import json
import re
from pathlib import Path

def get_word_positions(text: str) -> list:
    """Get character positions of each whitespace-separated word."""
    words = []
    for match in re.finditer(r'\S+', text):
        words.append((match.group(), match.start(), match.end()))
    return words


def char_span_to_word_span(text: str, start_char: int, end_char: int) -> tuple:
    """Convert character span to word span indices (0-indexed, exclusive end)."""
    words = get_word_positions(text)

    word_start = None
    word_end = None

    for i, (word, w_start, w_end) in enumerate(words):
        if word_start is None and w_end > start_char and w_start < end_char:
            word_start = i+1  # inclusive start
        if w_start < end_char and w_end > start_char:
            word_end = i + 1

    return word_start, word_end


def add_word_spans_simple_format(data: list) -> list:
    """
    Add word_start and word_end to annotations in simple format:
    [{"text": "...", "id": 123, "label": [{"start": 0, "end": 10, "text": "...", "labels": [...]}]}]
    """
    for item in data:
        text = item.get("text", "")
        if not text:
            continue

        for label_item in item.get("label", []):
            start_char = label_item.get("start")
            end_char = label_item.get("end")

            if start_char is not None and end_char is not None:
                word_start, word_end = char_span_to_word_span(text, start_char, end_char)
                label_item["word_start"] = word_start
                label_item["word_end"] = word_end

    return data


# === MAIN: Load, convert, and save ===

input_file = Path("../Relationship Extraction/example_LS_output.json")
output_file = Path("../Relationship Extraction/gliren_input") / f"{input_file.stem}_LS_with_word_spans.json"


# Load JSON
print(f"Loading: {input_file}")
with open(input_file, 'r', encoding='utf-8') as f:
    data = json.load(f)

print(f"Loaded {len(data)} documents")

# Add word spans
data_with_word_spans = add_word_spans_simple_format(data)

# Count entities
total_entities = sum(len(item.get("label", [])) for item in data_with_word_spans)
print(f"Processed {total_entities} entity annotations")

# Save output
with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(data_with_word_spans, f, indent=2, ensure_ascii=False)

print(f"✓ Saved to: {output_file}")

# Show sample
if data_with_word_spans and data_with_word_spans[0].get("label"):
    item = data_with_word_spans[0]
    text = item["text"]
    words = get_word_positions(text)
    label = item["label"][0]

    print(f"\n=== Sample entity ===")
    print(f"Text: '{label.get('text')}'")
    print(f"Label: {label.get('labels')}")
    print(f"Char span: [{label.get('start')}:{label.get('end')}]")
    print(f"Word span: [{label.get('word_start')}:{label.get('word_end')}]")

    ws, we = label.get('word_start'), label.get('word_end')
    if ws is not None and we is not None:
        word_texts = [words[i][0] for i in range(ws, min(we, len(words)))]
        print(f"Words: {word_texts}")

Loading: ..\Relationship Extraction\example_LS_output.json
Loaded 1 documents
Processed 79 entity annotations
✓ Saved to: ..\Relationship Extraction\gliren_input\example_LS_output_LS_with_word_spans.json

=== Sample entity ===
Text: '4 avril 1909'
Label: ['DATE']
Char span: [1767:1779]
Word span: [283:285]
Words: ['avril', '1909.']


---


# 3. Relation Extraction with GLiREL

Now that we have all the entities extracted and formatted, we can proceed to run the GLiREL model to identify relationships between these entities.

### GLiREL labels
To extract the relationships, GLiREL first has to know what types of relationships to look for. Therefore, you have to define the possible head and/or tail entity types and the possible relationship types in the schema file. The schema file should be formatted as follows:
```json
{
  "glirel_labels": {
    "RELATION_NAME_1": {
      "allowed_head": ["ENTITY_TYPE_A"],
      "allowed_tail": ["ENTITY_TYPE_B"]
    },
    "RELATION_NAME_2": {
      "allowed_head": ["ENTITY_TYPE_X", "ENTITY_TYPE_Y"],
      "allowed_tail": ["ENTITY_TYPE_Z"]
    },
    "RELATION_NAME_3": {
      "allowed_head": ["ENTITY_TYPE"],
      "allowed_tail": ["ENTITY_TYPE"]
    },
    "no relation": {}
  }
}
```
The 'gliren_input' folder contains a relations schema file 'gliren_schema_relations.json'. These relations will be used in the following steps.

### installing requirements
To avoid a version mismatch, please install the following package versions:

```json
transformers==4.52.4
huggingface-hub==0.36.1
tokenizers==0.21.4
```


In [5]:
#installing specific package versions
!pip install transformers==4.52.4
!pip install huggingface-hub==0.36.1
!pip install tokenizers==0.21.4

!pip install spacy
!pip install glirel
!pip install loguru

Collecting transformers==4.52.4
  Using cached transformers-4.52.4-py3-none-any.whl (10.5 MB)
Collecting huggingface-hub<1.0,>=0.30.0
  Using cached huggingface_hub-0.36.1-py3-none-any.whl (566 kB)
Collecting tokenizers<0.22,>=0.21
  Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl (2.5 MB)
Installing collected packages: huggingface-hub, tokenizers, transformers
  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface_hub 1.4.0
    Uninstalling huggingface_hub-1.4.0:
      Successfully uninstalled huggingface_hub-1.4.0
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.22.2
    Uninstalling tokenizers-0.22.2:
      Successfully uninstalled tokenizers-0.22.2
  Attempting uninstall: transformers
    Found existing installation: transformers 5.1.0
    Uninstalling transformers-5.1.0:
      Successfully uninstalled transformers-5.1.0
Successfully installed huggingface-hub-0.36.1 tokenizers-0.21.4 transformers-4.52.4


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gliner 0.2.24 requires transformers>=4.57.3, but you have transformers 4.52.4 which is incompatible.

[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.0.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip


### test with GLiREL #1

In [3]:
from glirel import GLiREL
import spacy

model = GLiREL.from_pretrained("jackboyla/glirel-large-v0")

nlp = spacy.load('en_core_web_sm')

text = 'Derren Nesbitt had a history of being cast in "Doctor Who", having played villainous warlord Tegana in the 1964 First Doctor serial "Marco Polo".'
doc = nlp(text)
tokens = [token.text for token in doc]

labels = ['country of origin', 'licensed to broadcast to', 'father', 'followed by', 'characters']

ner = [[26, 27, 'PERSON', 'Marco Polo'], [22, 23, 'Q2989412', 'First Doctor']] # 'type' is not used -- it can be any string!

relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)

print('Number of relations:', len(relations))

sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)

print("\nDescending Order by Score:")

for item in sorted_data_desc:

    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")

#expected output:
"""
Number of relations: 2

Descending Order by Score:
['First', 'Doctor'] --> followed by --> ['Marco', 'Polo'] | score: 0.0028011146932840347
['Marco', 'Polo'] --> followed by --> ['First', 'Doctor'] | score: 0.0027413994539529085
"""



  from .autonotebook import tqdm as notebook_tqdm


Number of relations: 2

Descending Order by Score:
['First', 'Doctor'] --> followed by --> ['Marco', 'Polo'] | score: 0.0028011146932840347
['Marco', 'Polo'] --> followed by --> ['First', 'Doctor'] | score: 0.0027413994539529085


### test with Spacy

In [117]:
import spacy
from glirel import GLiREL

# Load a blank spaCy model or an existing one
nlp = spacy.load('en_core_web_sm')

# Add the GLiREL component to the pipeline
nlp.add_pipe("glirel", after="ner")

# Now you can use the pipeline with the GLiREL component
text = "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976. The company is headquartered in Cupertino, California."

labels = {"glirel_labels": {
    'co-founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'country of origin': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["LOC", "GPE"]},
    'licensed to broadcast to': {"allowed_head": ["ORG"]},
    'no relation': {},
    'parent': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'followed by': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["PERSON", "ORG"]},
    'located in or next to body of water': {"allowed_head": ["LOC", "GPE", "FAC"], "allowed_tail": ["LOC", "GPE"]},
    'spouse': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'child': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'headquartered in': {"allowed_head": ["ORG"], "allowed_tail": ["LOC", "GPE", "FAC"]},
    'acquired by': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    'subsidiary of': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    }
}

# Add the labels to the pipeline at inference time
docs = list( nlp.pipe([(text, labels)], as_tuples=True) )
relations = docs[0][0]._.relations

print('Number of relations:', len(relations))

sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")


Number of relations: 5

Descending Order by Score:
['Steve', 'Wozniak'] --> founder --> ['Apple', 'Inc.'] | score: 0.8068649768829346
['Steve', 'Jobs'] --> founder --> ['Apple', 'Inc.'] | score: 0.8051494359970093
['Ronald', 'Wayne'] --> founder --> ['Apple', 'Inc.'] | score: 0.7925519943237305
['Apple', 'Inc.'] --> headquartered in --> ['California'] | score: 0.7537093758583069
['Apple', 'Inc.'] --> headquartered in --> ['Cupertino'] | score: 0.7475748062133789


### test with GLiREL #3: no SpaCy used

In [121]:
from glirel import GLiREL
import spacy

model = GLiREL.from_pretrained("jackboyla/glirel-large-v0")

nlp = spacy.load('en_core_web_sm')

text = 'Derren Nesbitt had a history of being cast in "Doctor Who", having played villainous warlord Tegana in the 1964 First Doctor serial "Marco Polo".'
doc = nlp(text)
tokens = [token.text for token in doc]

labels = ['country of origin', 'licensed to broadcast to', 'father', 'followed by', 'characters']

ner = [[26, 27, 'PERSON', 'Marco Polo'], [22, 23, 'Q2989412', 'First Doctor']] # 'type' is not used -- it can be any string!

relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)

print(relations)

print('Number of relations:', len(relations))



sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")

[{'head_pos': [22, 24], 'tail_pos': [26, 28], 'head_text': ['First', 'Doctor'], 'tail_text': ['Marco', 'Polo'], 'label': 'followed by', 'score': 0.0028011146932840347}, {'head_pos': [26, 28], 'tail_pos': [22, 24], 'head_text': ['Marco', 'Polo'], 'tail_text': ['First', 'Doctor'], 'label': 'followed by', 'score': 0.0027413994539529085}]
Number of relations: 2

Descending Order by Score:
['First', 'Doctor'] --> followed by --> ['Marco', 'Polo'] | score: 0.0028011146932840347
['Marco', 'Polo'] --> followed by --> ['First', 'Doctor'] | score: 0.0027413994539529085


In [169]:
from glirel import GLiREL

#model = GLiREL.from_pretrained("jackboyla/glirel-large-v0")

tokens = ["L'assemblée", "désigne", "comme", "scrutateurs", "MM.", "Louis", "Cricquillon,", "administrateur", "délégué", "du", "Comptoir", "commercial", "anversois," , "demeurant", "à", "Anvers,", "67,", "avenue", "des", "Arts,", "et", "Ferdinand", "De", "Bruyn,", "candidat-notaire,", "demeurant", "à", "Schooten."]

# text = "L'assemblée désigne comme scrutateurs MM. Louis Cricquillon, administrateur délégué du Comptoir commercial anversois, demeurant à Anvers, 67, avenue des Arts, et Ferdinand De Bruyn, candidat-notaire, demeurant à Schooten."


labels = {
    "has occupation": {
      "allowed_head": ["PERSON"],
      "allowed_tail": ["OCCUPATION"]
    },
    "resides at": {
      "allowed_head": ["PERSON"],
      "allowed_tail": ["ADDRESS", "CITY"]
  },
}

# Keep original labels dict for constraint checking
labels_and_constraints = labels
# Extract just the relation names for the model
relation_names = list(labels.keys())

ner = [
    [5, 6, "PERSON", "Louis Cricquillon,"],
    [7, 8, "OCCUPATION", "administrateur délégué"],
    [15, 15, "CITY", "Anvers"],
    [20, 24, "ADDRESS", "67 avenue des Arts"],
    [21, 23, "PERSON", "Ferdinand De Bruyn,"],
    [24, 24, "OCCUPATION", "candidat-notaire,"],
    [27, 27, "CITY", "Schooten."]
]

from glirel.modules.utils import constrain_relations_by_entity_type
from types import SimpleNamespace

relations = model.predict_relations(tokens, relation_names, threshold=0.0, ner=ner, top_k=-1)

# Create a mapping of (start, end) -> entity_label from the NER list
# This maps token positions to their entity types for validation
ner_lookup = {(start, end): entity_label for start, end, entity_label, _ in ner}
print(ner_lookup)

# Extract labels_and_constraints from the labels dict (which is still a dict, not a list)
labels_and_constraints = labels

# Filter relations to only include those that match allowed entity types
filtered_relations = []
for item in relations:
    # Get the relation label
    rel_label = item['label']

    # Skip if this relation type is not in constraints
    if rel_label not in labels_and_constraints:
        continue

    # Get the allowed head and tail entity types for this relation
    allowed_head_types = labels_and_constraints[rel_label].get("allowed_head", [])
    allowed_tail_types = labels_and_constraints[rel_label].get("allowed_tail", [])

    # Extract head and tail positions from the relation
    # head_pos and tail_pos are [start, end] in inclusive format
    head_pos = tuple(item['head_pos'])
    tail_pos = tuple(item['tail_pos'])

    # Look up the entity types in the NER dictionary
    head_entity_type = ner_lookup.get(head_pos)
    tail_entity_type = ner_lookup.get(tail_pos)

    # Only include this relation if both head and tail match allowed types
    if head_entity_type and tail_entity_type:
        if head_entity_type in allowed_head_types and tail_entity_type in allowed_tail_types:
            filtered_relations.append(item)

# Display filtered relations
print(f"Total relations found: {len(relations)}")
print(f"Relations matching constraints: {len(filtered_relations)}\n")

for item in filtered_relations:
    print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")





{(5, 6): 'PERSON', (7, 8): 'OCCUPATION', (15, 15): 'CITY', (20, 24): 'ADDRESS', (21, 23): 'PERSON', (24, 24): 'OCCUPATION', (27, 27): 'CITY'}
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(5, 7)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(7, 9)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(15, 16)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(20, 25)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(21, 24)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(24, 25)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
(27, 28)
Total relations found: 84
Relations matching constraints: 0



In [105]:
# Real-world example
from glirel.modules.utils import constrain_relations_by_entity_type


text = ["Apple", "Inc", ".", "was", "founded", "by", "Steve", "Jobs", ",", "Steve", "Wozniak", ",", "and", "Ronald", "Wayne", "in", "April", "1976", ".", "The", "company", "is", "headquartered", "in", "Cupertino",",", "California","."]

ner = [
    [0, 1, "ORG", "Apple Inc"],
    [6, 7, "PERSON", "Steve Jobs"],
    [9, 10, "PERSON", "Steve Wozniak"],
    [13, 14, "PERSON", "Ronald Wayne"],
    [25, 25, "GPE", "Cupertino"],
    [27, 27, "GPE", "California"]
]

# ----------------------------------------------------------
# Convert NER list → spaCy‑free "entity spans"
# ----------------------------------------------------------

entities = []

for start, end, label, ent_text in ner:
    entities.append({
        "start": start,
        "end": end + 1,     # end is inclusive in your NER, exclusive in spans
        "label": label,
        "text": ent_text,
        "tokens": text[start:end+1]
    })

print(entities)

# text = "Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward."

labels = {"glirel_labels": {
    'co-founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'country of origin': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["LOC", "GPE"]},
    'licensed to broadcast to': {"allowed_head": ["ORG"]},
    'no relation': {},
    'parent': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'followed by': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["PERSON", "ORG"]},
    'located in or next to body of water': {"allowed_head": ["LOC", "GPE", "FAC"], "allowed_tail": ["LOC", "GPE"]},
    'spouse': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'child': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'founded on date': {"allowed_head": ["ORG"], "allowed_tail": ["DATE"]},
    'headquartered in': {"allowed_head": ["ORG"], "allowed_tail": ["LOC", "GPE", "FAC"]},
    'acquired by': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    'subsidiary of': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    }
}


def predict_and_show(text, labels):
    text = text
    print(f"Text: {text}")

    tokens = text

    # NOTE: the end index should be inclusive
    print(f"Entities detected: {ner}")

    labels_and_constraints = None
    if isinstance(labels, dict):
        labels = labels["glirel_labels"]
        labels_and_constraints = labels
        labels = list(labels.keys())

    relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)

    if isinstance(labels_and_constraints, dict):
        print('Constraining relations by entity type')
        # The constraint util expects spaCy-like ents (objects with .start, .end, .label_)
        # Convert the raw ner list ([start, end_inclusive, label, text]) into simple objects
        from types import SimpleNamespace
        ents_for_constraints = [SimpleNamespace(start=s, end=e+1, label_=lab) for s, e, lab, _ in ner]

        relations = constrain_relations_by_entity_type(ents_for_constraints, labels_and_constraints, relations)

    print('Number of relations:', len(relations))

    sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
    print("\nDescending Order by Score:")
    for item in sorted_data_desc:
        print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")

predict_and_show(text, labels)

[{'start': 0, 'end': 2, 'label': 'ORG', 'text': 'Apple Inc', 'tokens': ['Apple', 'Inc']}, {'start': 6, 'end': 8, 'label': 'PERSON', 'text': 'Steve Jobs', 'tokens': ['Steve', 'Jobs']}, {'start': 9, 'end': 11, 'label': 'PERSON', 'text': 'Steve Wozniak', 'tokens': ['Steve', 'Wozniak']}, {'start': 13, 'end': 15, 'label': 'PERSON', 'text': 'Ronald Wayne', 'tokens': ['Ronald', 'Wayne']}, {'start': 25, 'end': 26, 'label': 'GPE', 'text': 'Cupertino', 'tokens': [',']}, {'start': 27, 'end': 28, 'label': 'GPE', 'text': 'California', 'tokens': ['.']}]
Text: ['Apple', 'Inc', '.', 'was', 'founded', 'by', 'Steve', 'Jobs', ',', 'Steve', 'Wozniak', ',', 'and', 'Ronald', 'Wayne', 'in', 'April', '1976', '.', 'The', 'company', 'is', 'headquartered', 'in', 'Cupertino', ',', 'California', '.']
Entities detected: [[0, 1, 'ORG', 'Apple Inc'], [6, 7, 'PERSON', 'Steve Jobs'], [9, 10, 'PERSON', 'Steve Wozniak'], [13, 14, 'PERSON', 'Ronald Wayne'], [25, 25, 'GPE', 'Cupertino'], [27, 27, 'GPE', 'California']]
Cons

In [104]:
# Real-world example
from glirel.modules.utils import constrain_relations_by_entity_type


text = ["L'assemblée", "désigne", "comme", "notaire", "MM", ".", "Louis", "Cricquillon"]

ner = [
    [6, 7, "PERSON", "Louis Cricquillon"],
    [3, 3, "OCCUPATION", "notaire"],
]

# ----------------------------------------------------------
# Convert NER list → spaCy‑free "entity spans"
# ----------------------------------------------------------

print(entities)

# text = "Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward."

labels = {"glirel_labels": {
    'co-founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'country of origin': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["LOC", "GPE"]},
    'licensed to broadcast to': {"allowed_head": ["ORG"]},
    'no relation': {},
    'parent': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'followed by': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["PERSON", "ORG"]},
    'has occupation': {"allowed_head": ["PERSON"], "allowed_tail": ["OCCUPATION"]},
    'located in or next to body of water': {"allowed_head": ["LOC", "GPE", "FAC"], "allowed_tail": ["LOC", "GPE"]},
    'spouse': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'child': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
    'founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
    'founded on date': {"allowed_head": ["ORG"], "allowed_tail": ["DATE"]},
    'headquartered in': {"allowed_head": ["ORG"], "allowed_tail": ["LOC", "GPE", "FAC"]},
    'acquired by': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    'subsidiary of': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
    }
}


def predict_and_show(text, labels):
    text = text
    print(f"Text: {text}")

    tokens = text

    # NOTE: the end index should be inclusive
    print(f"Entities detected: {ner}")

    labels_and_constraints = None
    if isinstance(labels, dict):
        labels = labels["glirel_labels"]
        labels_and_constraints = labels
        labels = list(labels.keys())

    relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)

    if isinstance(labels_and_constraints, dict):
        print('Constraining relations by entity type')
        # The constraint util expects spaCy-like ents (objects with .start, .end, .label_)
        # Convert the raw ner list ([start, end_inclusive, label, text]) into simple objects
        from types import SimpleNamespace
        ents_for_constraints = [SimpleNamespace(start=s, end=e+1, label_=lab) for s, e, lab, _ in ner]

        relations = constrain_relations_by_entity_type(ents_for_constraints, labels_and_constraints, relations)

    print('Number of relations:', len(relations))

    sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
    print("\nDescending Order by Score:")
    for item in sorted_data_desc:
        print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")

predict_and_show(text, labels)

[{'start': 0, 'end': 2, 'label': 'ORG', 'text': 'Apple Inc', 'tokens': ['Apple', 'Inc']}, {'start': 6, 'end': 8, 'label': 'PERSON', 'text': 'Steve Jobs', 'tokens': ['Steve', 'Jobs']}, {'start': 9, 'end': 11, 'label': 'PERSON', 'text': 'Steve Wozniak', 'tokens': ['Steve', 'Wozniak']}, {'start': 13, 'end': 15, 'label': 'PERSON', 'text': 'Ronald Wayne', 'tokens': ['Ronald', 'Wayne']}, {'start': 25, 'end': 26, 'label': 'GPE', 'text': 'Cupertino', 'tokens': [',']}, {'start': 27, 'end': 28, 'label': 'GPE', 'text': 'California', 'tokens': ['.']}]
Text: ["L'assemblée", 'désigne', 'comme', 'notaire', 'MM', '.', 'Louis', 'Cricquillon']
Entities detected: [[6, 7, 'PERSON', 'Louis Cricquillon'], [3, 3, 'OCCUPATION', 'notaire']]
Constraining relations by entity type
Number of relations: 0

Descending Order by Score:


In [109]:
# Real-world example
from glirel.modules.utils import constrain_relations_by_entity_type

text = ["L'assemblée", "désigne", "comme", "scrutateurs", "MM", ".", "Louis", "Cricquillon", ",", "administrateur", "délégué"]


ner = [
    [6, 7, "PERSON", "Louis Cricquillon"],
    [9, 10, "OCCUPATION", "administrateur délégué"],
]

from types import SimpleNamespace

def ner_to_spacy_like(ner):
    return [
        SimpleNamespace(
            start=start,
            end=end + 1,      # spaCy-style exclusive
            label_=label
        )
        for start, end, label, _ in ner
    ]

ents = ner_to_spacy_like(ner)

# text = "Jack Dorsey's father, Tim Dorsey, is a licensed pilot. Jack met his wife Sarah Paulson in New York in 2003. They have one son, Edward."

labels = {"glirel_labels": {
   "HAS_OCCUPATION": {
      "allowed_head": ["PERSON"],
      "allowed_tail": ["OCCUPATION"]
    }
}}


def predict_and_show(text, labels):
    print(f"Text: {text}")

    tokens = text

    # NOTE: the end index should be inclusive
    print(f"Entities detected: {ner}")

    labels_and_constraints = None
    if isinstance(labels, dict):
        labels = labels["glirel_labels"]
        labels_and_constraints = labels
        labels = list(labels.keys())

    relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)

    if isinstance(labels_and_constraints, dict):
        print('Constraining relations by entity type')
        # Create a mapping of (start, end) -> entity_label from the NER list
        ner_lookup = {(start, end): entity_label for start, end, entity_label, _ in ner}

        # Filter relations to only include those that match allowed entity types
        filtered_relations = []
        for item in relations:
            rel_label = item['label']

            # Skip if this relation type is not in constraints
            if rel_label not in labels_and_constraints:
                continue

            # Get the allowed head and tail entity types for this relation
            allowed_head_types = labels_and_constraints[rel_label].get("allowed_head", [])
            allowed_tail_types = labels_and_constraints[rel_label].get("allowed_tail", [])

            # Extract head and tail positions from the relation
            head_pos = tuple(item['head_pos'])
            tail_pos = tuple(item['tail_pos'])

            # Look up the entity types in the NER dictionary
            head_entity_type = ner_lookup.get(head_pos)
            tail_entity_type = ner_lookup.get(tail_pos)

            # Only include this relation if both head and tail match allowed types
            if head_entity_type and tail_entity_type:
                if head_entity_type in allowed_head_types and tail_entity_type in allowed_tail_types:
                    filtered_relations.append(item)

        relations = filtered_relations

    print('Number of relations:', len(relations))

    if relations:
        sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
        print("\nDescending Order by Score:")
        for item in sorted_data_desc:
            print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")
    else:
        print("No relations found.")

predict_and_show(text, labels)


Text: ["L'assemblée", 'désigne', 'comme', 'scrutateurs', 'MM', '.', 'Louis', 'Cricquillon', ',', 'administrateur', 'délégué']
Entities detected: [[6, 7, 'PERSON', 'Louis Cricquillon'], [9, 10, 'OCCUPATION', 'administrateur délégué']]
Constraining relations by entity type
Number of relations: 1

Descending Order by Score:
['Louis', 'Cricquillon'] --> HAS_OCCUPATION --> ['administrateur', 'délégué'] | score: 0.06594593077898026
