**Fine Tuning BERT for Named Entity Recognition (NER)**

DataSet used- Dataset Card for "conll2003"

https://huggingface.co/datasets/eriktks/conll2003

Named Entity Recognition (NER) is a natural language processing (NLP) technique that involves identifying and classifying key pieces of information (entities) in text into predefined categories. These entities typically include things like:

People's names (e.g., "Albert Einstein")

Organizations (e.g., "OpenAI")

Locations (e.g., "Paris")

Dates and times (e.g., "August 22, 2025")

Monetary values (e.g., "$100")

Miscellaneous entities like products, events, etc.

The goal of NER is to extract structured information from unstructured text, which can then be used for tasks like information retrieval, question answering, summarization, and more.

For example, given the sentence:
"Apple was founded by Steve Jobs in California."

An NER system might identify:

Apple as an Organization

Steve Jobs as a Person

California as a Location

We will concentrate on four types of named entities: **persons, locations, organizations and names of miscellaneous entities** that do not belong to the previous three groups.

In [None]:
!pip install transformers tokenizers  seqeval -q


[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone


In [None]:

import numpy as np
from transformers import BertTokenizerFast, DataCollatorForTokenClassification, AutoModelForTokenClassification


In [None]:
pip install datasets==3.6.0

Collecting datasets==3.6.0
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets
  Attempting uninstall: datasets
    Found existing installation: datasets 4.0.0
    Uninstalling datasets-4.0.0:
      Successfully uninstalled datasets-4.0.0
Successfully installed datasets-3.6.0


In [None]:
from datasets import load_dataset
conll2003 = load_dataset("conll2003", revision="main")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
conll2003

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 14041
    })
    validation: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3250
    })
    test: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3453
    })
})

In [None]:
type(conll2003)

datasets.dataset_dict.DatasetDict

In [None]:
conll2003['train']

Dataset({
    features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
    num_rows: 14041
})

In [None]:
type(conll2003['train'][0])

dict

In [None]:
conll2003['train'][0]

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}

So looking at above we are interested in features 'id','tokens','ner_tags'

Lets understand the numbers in ner tags
ner_tags: a list of classification labels (int). Full tagset with indices:
{'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}

O (0): Outside — the token is not part of any named entity like articles,pronouns

B-PER (1): Beginning of a person name.

I-PER (2): Inside a person name (continuation after the beginning).

B-ORG (3): Beginning of an organization name.

I-ORG (4): Inside an organization name.

B-LOC (5): Beginning of a location name.

I-LOC (6): Inside a location name.

B-MISC (7): Beginning of a miscellaneous entity (things that don't fit the other categories).

In [None]:
conll2003['train'].features['ner_tags']

Sequence(feature=ClassLabel(names=['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC'], id=None), length=-1, id=None)

In [None]:
conll2003['train'].description

'The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on\nfour types of named entities: persons, locations, organizations and names of miscellaneous entities that do\nnot belong to the previous three groups.\n\nThe CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on\na separate line and there is an empty line after each sentence. The first item on each line is a word, the second\na part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags\nand the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only\nif two phrases of the same type immediately follow each other, the first word of the second phrase will have tag\nB-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2\ntagging scheme, whereas the original dataset uses 

In [None]:
# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')


**To see how tokenizer works with sample example**

In [None]:
example_text=conll2003['train'][0]
example_text

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}

In [None]:
#to see how tokenizer using one sample
example_text=conll2003['train'][0]
encoding=tokenizer(example_text["tokens"],is_split_into_words=True)

In [None]:
encoding

{'input_ids': [101, 7270, 22961, 1528, 1840, 1106, 21423, 1418, 2495, 12913, 119, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

Understanding above output


1)input_ids:
These are the token IDs — integers that correspond to specific tokens in the BERT vocabulary.

Example:

101 is always the special [CLS] token (start of sentence).

102 is always the [SEP] token (end of sentence).

The numbers in between correspond to tokens representing words or subwords.

2)token_type_ids:
Also called segment IDs.

Used to distinguish segments in tasks like question answering, where input is [CLS] question [SEP] context [SEP].

Since all values are 0, this input is a single segment (just one sentence or piece of text).

3)attention_mask:
Indicates which tokens should be attended to (1) and which are padding tokens (0).

Here, all tokens have 1, meaning no padding; the entire sequence is valid.

In [None]:
tokens = tokenizer.convert_ids_to_tokens(encoding['input_ids'])
tokens

['[CLS]',
 'EU',
 'rejects',
 'German',
 'call',
 'to',
 'boycott',
 'British',
 'la',
 '##mb',
 '.',
 '[SEP]']

lamb breaks into la and ##mb , ## to show it is part of the main part

In [None]:
encoding.word_ids()

[None, 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, None]

so we observe ner tags are 9 and len(word_ids) is 12 with 7,7 repeated for la and mb  are so for dimesnion to match CLS and SEP we should assign -100 as they are not important.because pytorch ignores -100 while training

In [None]:
example_text

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}

sample example ends above

In [None]:
def tokenize_and_align_labels(examples, is_all_labels=True):
    tokenized_inputs = tokenizer(
        examples["tokens"],
        truncation=True,
        is_split_into_words=True
    )

    labels = []
    for i, label in enumerate(examples["ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to words

        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            if word_idx is None:
                label_ids.append(-100)  # Special tokens get -100 to be ignored
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            else:
                if is_all_labels:
                    label_ids.append(label[word_idx])  # Assign same label to subwords
                else:
                    label_ids.append(-100)  # Ignore subwords by default if is_all_labels=False
            previous_word_idx = word_idx
        labels.append(label_ids)
    tokenized_inputs["labels"] = labels
    return tokenized_inputs


In [None]:
tokenize_and_align_labels(conll2003['train'][2:3])

{'input_ids': [[101, 26660, 13329, 12649, 15928, 1820, 118, 4775, 118, 1659, 102]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'labels': [[-100, 5, 5, 5, 5, 0, 0, 0, 0, 0, -100]]}

In [None]:
#apply on entire dataset
tokenized_datasets = conll2003.map(tokenize_and_align_labels, batched=True)

In [None]:
tokenized_datasets['train'][0]

{'id': '0',
 'tokens': ['EU',
  'rejects',
  'German',
  'call',
  'to',
  'boycott',
  'British',
  'lamb',
  '.'],
 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],
 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],
 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0],
 'input_ids': [101,
  7270,
  22961,
  1528,
  1840,
  1106,
  21423,
  1418,
  2495,
  12913,
  119,
  102],
 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
 'labels': [-100, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0, -100]}

So with this datapreprocessing is done .

In [None]:
#Load pretrained bert model
from transformers import AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained(
    "bert-base-cased",
    num_labels=9  # Number of unique NER tags #Full tagset with indices: {'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}
)


Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Now train on your downstream task

In [None]:
!pip install --upgrade transformers


Collecting transformers
  Downloading transformers-4.55.4-py3-none-any.whl.metadata (41 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.55.4-py3-none-any.whl (11.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.3/11.3 MB[0m [31m97.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.55.2
    Uninstalling transformers-4.55.2:
      Successfully uninstalled transformers-4.55.2
Successfully installed transformers-4.55.4


In [None]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer)

In [None]:
pip install evaluate


Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.5


In [None]:
pip install seqeval

Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/43.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16162 sha256=552703943ec10a3a565e524e7d6065da8db58d435173b03fe023824bf1e11bd7
  Stored in directory: /root/.cache/pip/wheels/5f/b8/73/0b2c1a76b701a677653dd79ece07cfabd7457989dbfbdcd8d7
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2


In [None]:
pip install accelerate -U

Collecting accelerate
  Downloading accelerate-1.10.1-py3-none-any.whl.metadata (19 kB)
Downloading accelerate-1.10.1-py3-none-any.whl (374 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.9/374.9 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: accelerate
  Attempting uninstall: accelerate
    Found existing installation: accelerate 1.10.0
    Uninstalling accelerate-1.10.0:
      Successfully uninstalled accelerate-1.10.0
Successfully installed accelerate-1.10.1


In [None]:
import evaluate

metric = evaluate.load("seqeval")


In [None]:
label_list = conll2003["train"].features["ner_tags"].feature.names
label_list

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

In [None]:
import numpy as np

label_list = conll2003["train"].features["ner_tags"].feature.names
def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }


In [None]:
#Fine Tuning
from transformers import TrainingArguments, Trainer
#training_args
training_args = TrainingArguments(
    output_dir="./bert-ner",
    eval_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    report_to=["tensorboard"]
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)


  trainer = Trainer(


In [None]:
import wandb


In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

# Or set wandb to offline mode
os.environ["WANDB_MODE"] = "offline"

In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.0216,0.056914,0.93347,0.937825,0.935643,0.984356
2,0.0226,0.071225,0.941956,0.937646,0.939796,0.984724
3,0.0113,0.067091,0.942253,0.947156,0.944698,0.986092


TrainOutput(global_step=2634, training_loss=0.02434608061927326, metrics={'train_runtime': 509.6658, 'train_samples_per_second': 82.648, 'train_steps_per_second': 5.168, 'total_flos': 1050534559887048.0, 'train_loss': 0.02434608061927326, 'epoch': 3.0})

In [None]:
model.save_pretrained('ner_model')

In [None]:
#save tokenizer
tokenizer.save_pretrained('tokenizer')

('tokenizer/tokenizer_config.json',
 'tokenizer/special_tokens_map.json',
 'tokenizer/vocab.txt',
 'tokenizer/added_tokens.json',
 'tokenizer/tokenizer.json')

In [None]:
label_list

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

In [None]:
id2label={
    str(i):label for i,label in enumerate(label_list)
}

In [None]:
id2label

{'0': 'O',
 '1': 'B-PER',
 '2': 'I-PER',
 '3': 'B-ORG',
 '4': 'I-ORG',
 '5': 'B-LOC',
 '6': 'I-LOC',
 '7': 'B-MISC',
 '8': 'I-MISC'}

In [None]:
label2id={
    label:str(i) for i,label in enumerate(label_list)
}

In [None]:
label2id

{'O': '0',
 'B-PER': '1',
 'I-PER': '2',
 'B-ORG': '3',
 'I-ORG': '4',
 'B-LOC': '5',
 'I-LOC': '6',
 'B-MISC': '7',
 'I-MISC': '8'}

In [None]:
import json
config=json.load(open("ner_model/config.json"))
config["id2label"]=id2label
config["label2id"]=label2id
json.dump(config,open("ner_model/config.json","w"))

In [None]:
#laod model
model_fine_tuned=AutoModelForTokenClassification.from_pretrained("ner_model")

In [None]:
from transformers import AutoModelForTokenClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("tokenizer")

In [None]:
from transformers import pipeline

ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

result = ner_pipeline("Delhi is capital of India .")
print(result)


Device set to use cuda:0


[{'entity': 'LABEL_5', 'score': np.float32(0.9993437), 'index': 1, 'word': 'Delhi', 'start': 0, 'end': 5}, {'entity': 'LABEL_0', 'score': np.float32(0.9998054), 'index': 2, 'word': 'is', 'start': 6, 'end': 8}, {'entity': 'LABEL_0', 'score': np.float32(0.9998412), 'index': 3, 'word': 'capital', 'start': 9, 'end': 16}, {'entity': 'LABEL_0', 'score': np.float32(0.9998115), 'index': 4, 'word': 'of', 'start': 17, 'end': 19}, {'entity': 'LABEL_5', 'score': np.float32(0.9996667), 'index': 5, 'word': 'India', 'start': 20, 'end': 25}, {'entity': 'LABEL_0', 'score': np.float32(0.9996805), 'index': 6, 'word': '.', 'start': 26, 'end': 27}]
