In [None]:
import numpy as np
import pandas as pd

# Fine-Tuning BERT for Named Entity Recognition

This notebook covers fine-tuning a pretrained BERT model for Named Entity Recognition (NER) on the CoNLL-2003 dataset.

NER is a common NLP task that involves identifying and classifying key entities (people, organizations, locations etc.) in text. It is an essential step for many downstream applications.

We will use Hugging Face's implementations of BERT and Trainer to fine-tune a model to perform NER. The key steps are:

1. Prepare training data and map labels  
2. Load pretrained BERT model and tokenizer
3. Define training arguments and trainer
4. Fine-tune model on training data
5. Evaluate on validation data

The trained model can extract named entities from text by encoding the text and applying the model's token classification head.

This provides a simple template for fine-tuning transformer models like BERT for sequence tagging tasks like NER. The same principles can be applied to other datasets and use cases as well.

Let's get started!


In [None]:
def read_conll(filepath):
    sentences = []
    labels = []
    sentence = []
    ner_tags = []

    with open(filepath, encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if not line:
                if sentence:
                    sentences.append(sentence)
                    labels.append(ner_tags)
                    sentence, ner_tags = [], []
                continue

            parts = line.split('\t')
            if len(parts) >= 2:
                token = parts[0].strip()
                tag = parts[-1].strip()

                # Remove HTML junk like </td>
                tag = tag.replace('</td>', '').strip()

                # If the tag contains multiple tags joined by commas, take the first one
                if ',' in tag:
                    tag = tag.split(',')[0].strip()

                sentence.append(token)
                ner_tags.append(tag)

    return sentences, labels


In [None]:
train_tokens, train_tags = read_conll("/content/WNUT17/wnut17train.conll")
val_tokens, val_tags  = read_conll("/content/WNUT17/emerging.dev.conll")
test_tokens, test_tags = read_conll("/content/WNUT17/emerging.test.conll")



In [None]:
# Get unique label list
unique_labels = sorted(set(tag for seq in train_tags for tag in seq))
label2id = {label: i for i, label in enumerate(unique_labels)}
id2label = {i: label for label, i in label2id.items()}

In [None]:
# Convert tags to IDs
train_tags = [[label2id[tag] for tag in seq] for seq in train_tags]
val_tags = [[label2id[tag] for tag in seq] for seq in val_tags]
test_tags = [[label2id[tag] for tag in seq] for seq in test_tags]

In [None]:
# Fix tag conversion
train_tags = [
    [label2id[tag] for tag in seq.split(',')] if isinstance(seq, str) else [label2id[tag] for tag in seq]
    for seq in train_tags
]
val_tags = [
    [label2id[tag] for tag in seq.split(',')] if isinstance(seq, str) else [label2id[tag] for tag in seq]
    for seq in val_tags
]
test_tags = [
    [label2id[tag] for tag in seq.split(',')] if isinstance(seq, str) else [label2id[tag] for tag in seq]
    for seq in test_tags
]

In [None]:
# Wrap in Hugging Face dataset
from datasets import Dataset, DatasetDict
dataset = DatasetDict({
    "train": Dataset.from_dict({"tokens": train_tokens, "ner_tags": train_tags}),
    "validation": Dataset.from_dict({"tokens": val_tokens, "ner_tags": val_tags}),
    "test": Dataset.from_dict({"tokens": test_tokens, "ner_tags": test_tags})
})

In [None]:
# Label names
label_names = unique_labels
print(dataset['train'].features)

{'tokens': List(Value('string')), 'ner_tags': List(Value('int64'))}


In [None]:
#  Tokenizer & Label Alignment
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
def align_target(labels, word_ids):
    begin2inside = {
        1: 2, 3: 4, 5: 6, 7: 8
    }
    align_labels, last_word = [], None
    for word in word_ids:
        if word is None:
            label = -100
        elif word != last_word:
            label = labels[word]
        else:
            label = labels[word]
            if label in begin2inside:
                label = begin2inside[label]
        align_labels.append(label)
        last_word = word
    return align_labels

In [None]:
# Tokenize & align batch
def tokenize_fn(batch):
    tokenized = tokenizer(batch["tokens"], is_split_into_words=True, truncation=True)
    all_labels = batch["ner_tags"]
    aligned_labels = []
    for i, labels in enumerate(all_labels):
        word_ids = tokenized.word_ids(i)
        aligned_labels.append(align_target(labels, word_ids))
    tokenized["labels"] = aligned_labels
    return tokenized

In [None]:
# Apply
tokenized_dataset = dataset.map(tokenize_fn, batched=True, remove_columns=dataset["train"].column_names)


Map:   0%|          | 0/3394 [00:00<?, ? examples/s]

Map:   0%|          | 0/1008 [00:00<?, ? examples/s]

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

In [None]:
#  Data Collator
from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer)

In [None]:
!pip install seqeval

Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16162 sha256=bb36c0f24db07ebf4af7f4617ed9acbbf794b5f648ba1c607b3935c1e555d79f
  Stored in directory: /root/.cache/pip/wheels/bc/92/f0/243288f899c2eacdfa8c5f9aede4c71a9bad0ee26a01dc5ead
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2


In [None]:
#!pip install evaluate
from evaluate import load
metric = load("seqeval")


In [None]:

def compute_metrics(p):
    predictions, labels = p
    preds = np.argmax(predictions, axis=-1)
    true_labels = [[label_names[l] for l in label if l != -100] for label in labels]
    true_preds = [[label_names[p] for (p, l) in zip(pred, label) if l != -100] for pred, label in zip(preds, labels)]
    results = metric.compute(predictions=true_preds, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"]
    }

In [None]:
# Load Model
from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained(
    "distilbert-base-cased",
    num_labels=len(label_names),
    id2label=id2label,
    label2id=label2id
)

Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
!pip install --upgrade transformers


Collecting transformers
  Downloading transformers-4.55.0-py3-none-any.whl.metadata (39 kB)
Downloading transformers-4.55.0-py3-none-any.whl (11.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.3/11.3 MB[0m [31m63.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.54.0
    Uninstalling transformers-4.54.0:
      Successfully uninstalled transformers-4.54.0
Successfully installed transformers-4.55.0


In [None]:
# Training
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="fine_tuned_model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01
)


In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

  trainer = Trainer(


In [None]:
trainer.train()
trainer.save_model("fine_tuned_model")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmamatayadav1812[0m ([33mmamatayadav1812-intellip[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
500,0.1957


In [None]:
# Inference Pipeline
from transformers import pipeline

In [None]:
ner = pipeline(
    "token-classification",
    model="fine_tuned_model",
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

Device set to use cpu


In [None]:
print(ner("Apple Inc. is planning to open a new store in San Francisco, California."))


[{'entity_group': 'corporation', 'score': np.float32(0.55203134), 'word': 'Apple', 'start': 0, 'end': 5}, {'entity_group': 'corporation', 'score': np.float32(0.31063372), 'word': 'Inc', 'start': 6, 'end': 9}, {'entity_group': 'location', 'score': np.float32(0.8150413), 'word': 'San Francisco', 'start': 46, 'end': 59}, {'entity_group': 'location', 'score': np.float32(0.8833993), 'word': 'California', 'start': 61, 'end': 71}]


## Simple Explanation

Let’s say the original sentence was:

> **"Apple Inc is headquartered in San Francisco, California."**

The model is saying:

- **"Apple"** is a **corporation** with ~55% confidence.
- **"Inc"** is also a **corporation**, but with lower confidence (~31%).
- **"San Francisco"** is a **location**, 81% confident.
- **"California"** is also a **location**, 88% confident.


## Conclusion

1. **Successful Fine-Tuning of a Transformer for NER**  
   We successfully fine-tuned a pretrained transformer model (such as BERT) for the task of Named Entity Recognition using Hugging Face’s Transformers library. This allowed us to leverage state-of-the-art language understanding capabilities for entity extraction.

2. **Effective Data Cleaning and Preprocessing**  
   Raw NER data often contains inconsistencies and irregular tag formats. Through systematic cleaning and label normalization, we ensured the dataset was suitable for training without introducing label noise.

3. **Accurate Label Alignment with Tokenized Input**  
   One of the key challenges in NER is aligning entity labels with tokenized words, especially when words are broken into subword tokens. We implemented precise label alignment logic to maintain consistency across token boundaries.

4. **Efficient Model Training with Trainer API**  
   The Hugging Face `Trainer` and `TrainingArguments` classes streamlined the training loop, making it easier to manage hyperparameters, evaluation, logging, and checkpointing with minimal code.

5. **Use of Custom Evaluation Metrics**  
   We integrated the `seqeval` metric for evaluating entity-level precision, recall, and F1-score — offering a more reliable assessment than token-level accuracy for NER tasks.

6. **Scalable and Reproducible Pipeline**  
   The modular design of our pipeline — from preprocessing to evaluation — ensures it can be extended to other NER datasets or languages with minimal changes.

---

### Key Takeaway  
Fine-tuning transformer models like BERT for NER demonstrates the power of transfer learning in NLP, significantly reducing the effort required to build high-quality entity recognition systems from scratch.
