# Tagging Formats

NER labeling schemes define how entities are represented in token sequences. Several formats are used to encode entity boundaries.

---

#### 🔹 BIO (Beginning, Inside, Outside)

* The most common tagging scheme.
* `B-` prefix marks the beginning of an entity.
* `I-` prefix continues the same entity.
* `O` marks tokens outside any named entity.

This format helps identify both the presence and boundary of an entity.

---

#### 🔹 IOB / IOB2

* Similar to BIO, but `I-` tags can follow either `B-` or another `I-` of the same type.
* In **IOB2**, every entity must explicitly begin with `B-`.

This subtle variation impacts how models are trained and evaluated.

---

#### 🔹 IOBES (Inside, Outside, Begin, End, Single)

* A more expressive scheme:

  * `B-`: beginning of multi-token entity
  * `I-`: inside multi-token entity
  * `E-`: end of multi-token entity
  * `S-`: single-token entity
  * `O`: outside any entity

It enables better supervision at entity boundaries and is useful in sequence-to-sequence or CRF-based models.

---

#### 🔹 BILOU (Begin, Inside, Last, Outside, Unit)

* BILOU is functionally similar to IOBES but uses:

  * `B-`, `I-`, `L-` (last), `O`, `U-` (unit/single token)

This scheme is designed to better represent the exact boundaries and entity span length.

---

#### 🔹 Span-level Annotation

* Instead of token-wise labels, entities are stored as spans:

  * Each span is defined by a start and end index and a label.
* Common in **question-answering style NER** and MRC-based systems.
* Useful for modeling long-range dependencies and overlapping entities.

In [None]:
!pip install -q transformers torch accelerate scikit-learn

In [None]:
!pip install -U -q datasets

In [4]:
!pip install -q seqeval

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone


### Dataset Overview

We use the **CoNLL-2003** dataset, a standard benchmark for NER, containing annotations for four entity types: `PER`, `LOC`, `ORG`, and `MISC`.

Each data sample includes:

* `tokens`: list of words in the sentence
* `ner_tags`: BIO-formatted entity labels
* Additional metadata: `pos_tags`, `chunk_tags`, etc.

In [None]:
from datasets import load_dataset

# Load the dataset
dataset = load_dataset("conll2003")

# Access a sample
sample = dataset["train"][0]
tokens = sample["tokens"]
ner_ids = sample["ner_tags"]

In [7]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 14041
    })
    validation: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3250
    })
    test: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3453
    })
})

In [5]:
sample

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}

In [18]:
dataset["train"].features["chunk_tags"].feature.names[0:5]

['O', 'B-ADJP', 'I-ADJP', 'B-ADVP', 'I-ADVP']

In [25]:
dataset["train"].features["pos_tags"].feature.names[5:17]

[')', ',', '.', ':', '``', 'CC', 'CD', 'DT', 'EX', 'FW', 'IN', 'JJ']

In [14]:
dataset["train"].features["ner_tags"].feature.names

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

In [26]:
# Map IDs to label names
label_list = dataset["train"].features["ner_tags"].feature.names
tags = [label_list[i] for i in ner_ids]

print("📝 Tokens:", tokens)
print("🏷️ BIO Tags:", tags)

📝 Tokens: ['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
🏷️ BIO Tags: ['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']


In [57]:
from datasets import load_dataset
from seqeval.metrics.sequence_labeling import get_entities

# Load CoNLL-2003 dataset
dataset = load_dataset("conll2003")

# Sample a sentence
samples = dataset["train"][5:7]

In [58]:
print(samples['id'])
print(samples['tokens'])
print(samples['pos_tags'])
print(samples['chunk_tags'])
print(samples['ner_tags'])

['5', '6']
[['"', 'We', 'do', "n't", 'support', 'any', 'such', 'recommendation', 'because', 'we', 'do', "n't", 'see', 'any', 'grounds', 'for', 'it', ',', '"', 'the', 'Commission', "'s", 'chief', 'spokesman', 'Nikolaus', 'van', 'der', 'Pas', 'told', 'a', 'news', 'briefing', '.'], ['He', 'said', 'further', 'scientific', 'study', 'was', 'required', 'and', 'if', 'it', 'was', 'found', 'that', 'action', 'was', 'needed', 'it', 'should', 'be', 'taken', 'by', 'the', 'European', 'Union', '.']]
[[0, 28, 41, 30, 37, 12, 16, 21, 15, 28, 41, 30, 37, 12, 24, 15, 28, 6, 0, 12, 22, 27, 16, 21, 22, 22, 14, 22, 38, 12, 21, 21, 7], [28, 38, 16, 16, 21, 38, 40, 10, 15, 28, 38, 40, 15, 21, 38, 40, 28, 20, 37, 40, 15, 12, 22, 22, 7]]
[[0, 11, 21, 22, 22, 11, 12, 12, 17, 11, 21, 22, 22, 11, 12, 13, 11, 0, 0, 11, 12, 11, 12, 12, 12, 12, 12, 12, 21, 11, 12, 12, 0], [11, 21, 11, 12, 12, 21, 22, 0, 17, 11, 21, 22, 17, 11, 21, 22, 11, 21, 22, 22, 13, 11, 12, 12, 0]]
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

In [59]:
def iob2_to_entities(tags):
    # Returns list of tuples (label, start, end)
    return get_entities(tags)

def convert_to_bio(entities, length):
    tags = ["O"] * length
    for label, start, end in entities:
        tags[start] = f"B-{label}"
        for i in range(start + 1, end + 1):
            tags[i] = f"I-{label}"
    return tags

def convert_to_iobes(entities, length):
    tags = ["O"] * length
    for label, start, end in entities:
        if start == end:
            tags[start] = f"S-{label}"
        else:
            tags[start] = f"B-{label}"
            for i in range(start + 1, end):
                tags[i] = f"I-{label}"
            tags[end] = f"E-{label}"
    return tags

def convert_to_bilou(entities, length):
    tags = ["O"] * length
    for label, start, end in entities:
        if start == end:
            tags[start] = f"U-{label}"
        else:
            tags[start] = f"B-{label}"
            for i in range(start + 1, end):
                tags[i] = f"I-{label}"
            tags[end] = f"L-{label}"
    return tags

def convert_to_span_format(tokens, entities):
    return [{
        "text": " ".join(tokens[start:end+1]),
        "start": start,
        "end": end,
        "label": label
    } for label, start, end in entities]


In [60]:
# Loop through samples
# We are iterating through the indices of the samples,
# and accessing the data for each sample by its index.
for idx in range(len(samples["id"])):
    print(f"\n🔹 Sample {idx + 1}")

    # Access the data for the current sample using the index
    sample_tokens = samples["tokens"][idx]
    sample_ner_tags = samples["ner_tags"][idx]

    # Convert the integer tags to string labels for the current sample
    iob2_tags = [dataset["train"].features["ner_tags"].feature.int2str(tag) for tag in sample_ner_tags]

    print("Tokens:     ", sample_tokens)
    print("IOB2 Tags:  ", iob2_tags)

    # Convert to entity spans
    entities = iob2_to_entities(iob2_tags)
    length = len(sample_tokens)

    # Tagging formats using the current sample's tokens and entities
    bio_tags   = convert_to_bio(entities, length)
    iobes_tags = convert_to_iobes(entities, length)
    bilou_tags = convert_to_bilou(entities, length)
    span_ann   = convert_to_span_format(sample_tokens, entities)

    print("BIO:        ", bio_tags)
    print("IOBES:      ", iobes_tags)
    print("BILOU:      ", bilou_tags)
    print("SPAN:       ", span_ann)


🔹 Sample 1
Tokens:      ['"', 'We', 'do', "n't", 'support', 'any', 'such', 'recommendation', 'because', 'we', 'do', "n't", 'see', 'any', 'grounds', 'for', 'it', ',', '"', 'the', 'Commission', "'s", 'chief', 'spokesman', 'Nikolaus', 'van', 'der', 'Pas', 'told', 'a', 'news', 'briefing', '.']
IOB2 Tags:   ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O', 'O', 'B-PER', 'I-PER', 'I-PER', 'I-PER', 'O', 'O', 'O', 'O', 'O']
BIO:         ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'O', 'O', 'O', 'B-PER', 'I-PER', 'I-PER', 'I-PER', 'O', 'O', 'O', 'O', 'O']
IOBES:       ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'S-ORG', 'O', 'O', 'O', 'B-PER', 'I-PER', 'I-PER', 'E-PER', 'O', 'O', 'O', 'O', 'O']
BILOU:       ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O

### Key Insights

* Each tagging scheme has trade-offs between simplicity, expressiveness, and decoding complexity.
* Converting between schemes enables comparative evaluation and hybrid model development.
* Choosing the right format is crucial for model performance, especially in tasks like multi-span extraction or nested NER.

### Summary
Choosing the right tagging scheme can impact model performance and decoding logic. While BIO is standard, schemes like IOBES and BILOU are preferred in models needing precise boundaries.