This notebook demonstrates a fine-tuned DistilBERT model that identifies mountain names in text.<br>
It loads the trained model, runs predictions on example sentences, and evaluates performance on a test dataset.

First of all you need to install next libraries if they haven't been installed yet:<br>
datasets, transformers, torch<br>
You can do it by running next command in console: <i>pip install torch transformers datasets seqeval</i>

Importing all necessary libraries

In [1]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
from seqeval.metrics import classification_report

Loading model and tokenizer

In [2]:
id2label = {0: "O", 1: "B-MOUNTAIN", 2: "I-MOUNTAIN"}
label2id = {"O": 0, "B-MOUNTAIN": 1, "I-MOUNTAIN": 2}

tokenizer = AutoTokenizer.from_pretrained("./mountain-ner-model")
model = AutoModelForTokenClassification.from_pretrained(
    "./mountain-ner-distilbert",
    id2label=id2label,
    label2id=label2id
)

ner_pipeline = pipeline(
        "ner",
        model=model,
        tokenizer=tokenizer,
        aggregation_strategy="simple",
        device = 0 # We will use GPU so evaluating will much faster
    )

Device set to use cuda:0


Let's see how model will work on some basic examples

In [3]:
texts_with_mountains = [
    "I climbed up Hoverla last year.",
    "Mount Everest is the highest mountain in the world",
    "You need to visit Mont Blanc at least once in your life!",
    "After weeks of planning, the team finally reached Mount Aspiring, where the sunrise painted the snow in shades of gold.",
    "The legends told by the locals say that spirits once guarded Kangchenjunga, protecting travelers from avalanches.",
    "Even from miles away, the shadow of Eiger dominates the skyline, warning climbers of its treacherous north face.",
    "From the frozen slopes of Elbrus, you can glimpse Kazbek standing solemnly beyond the valleys, like a silent rival.",
    "Storms rolled between Dhaulagiri and Annapurna, lightning striking the peaks as though the mountains themselves were arguing.",
    "The route connecting Mont Blanc, Gran Paradiso, and Monte Rosa forms a treacherous triangle feared even by seasoned alpinists.",
    "As the expedition passed Everest, Lhotse, and Makalu, the climbers realized how small human ambition seems among giants of stone and snow."
]
texts_without_mountains = [
    "Thick fog descended upon the valley, muffling every sound except the distant rush of a river.",
    "The expedition journal was found years later, half-buried under layers of ice and rock.",
    "I always wanted to visit Paris.",
    "Even the most experienced guides hesitated before crossing that narrow, wind-beaten ridge.",
    "A faint echo of laughter carried through the canyon, blending with the cry of an unseen eagle.",
    "The landscape looked almost lunar — gray dust, scattered stones, and a silence too vast to describe."
]

In [4]:
predictions1 = ner_pipeline(texts_with_mountains, batch_size = 10)
predictions2 = ner_pipeline(texts_without_mountains, batch_size = 6)

  attn_output = torch.nn.functional.scaled_dot_product_attention(


In [5]:
print("Predictions of model on sentences with mountains:\n")
for text, entities in zip(texts_with_mountains, predictions1):
    print(text)
    for entity in entities:
        print(entity)
    print()

Predictions of model on sentences with mountains:

I climbed up Hoverla last year.
{'entity_group': 'MOUNTAIN', 'score': 0.9999984, 'word': 'Hoverla', 'start': 13, 'end': 20}

Mount Everest is the highest mountain in the world
{'entity_group': 'MOUNTAIN', 'score': 0.99999726, 'word': 'Mount Everest', 'start': 0, 'end': 13}

You need to visit Mont Blanc at least once in your life!
{'entity_group': 'MOUNTAIN', 'score': 0.99998957, 'word': 'Mont Blanc', 'start': 18, 'end': 28}

After weeks of planning, the team finally reached Mount Aspiring, where the sunrise painted the snow in shades of gold.
{'entity_group': 'MOUNTAIN', 'score': 0.99999774, 'word': 'Mount Aspiring', 'start': 50, 'end': 64}

The legends told by the locals say that spirits once guarded Kangchenjunga, protecting travelers from avalanches.
{'entity_group': 'MOUNTAIN', 'score': 0.9999983, 'word': 'Kangchenjunga', 'start': 61, 'end': 74}

{'entity_group': 'MOUNTAIN', 'score': 0.99999774, 'word': 'Eiger', 'start': 36, 'end':

In [6]:
print("Predictions of model on sentences without mountains:\n")
for text, entity in zip(texts_without_mountains, predictions2):
    print(text)
    print(entity)
    print()

Predictions of model on sentences without mountains:

Thick fog descended upon the valley, muffling every sound except the distant rush of a river.
[]

The expedition journal was found years later, half-buried under layers of ice and rock.
[]

I always wanted to visit Paris.
[]

Even the most experienced guides hesitated before crossing that narrow, wind-beaten ridge.
[]

A faint echo of laughter carried through the canyon, blending with the cry of an unseen eagle.
[]

The landscape looked almost lunar — gray dust, scattered stones, and a silence too vast to describe.
[]



Now let's see how our model works on test dataset

In [7]:
test_data = load_dataset("json", data_files="test.jsonl")["train"]

In [8]:
def to_entity_list(entities):
    """Convert list of dicts with spans into list of (label, start, end)"""
    return [(e["label"], e["start"], e["end"]) for e in entities]

def to_seqeval_format(entities_batch):
    """Convert for seqeval metric input"""
    # Each text → list of entity labels
    return [[e[0] for e in to_entity_list(ents)] for ents in entities_batch]

In [9]:
texts = [ex["text"] for ex in test_data]
true_entities = [ex["entities"] for ex in test_data]

predictions = ner_pipeline(texts, batch_size=16)
y_true = to_seqeval_format(true_entities)
y_pred = []
for preds in predictions:
    y_pred.append([p["entity_group"] for p in preds])

print(classification_report(y_true, y_pred, digits=5))



              precision    recall  f1-score   support

     OUNTAIN    1.00000   1.00000   1.00000     12840

   micro avg    1.00000   1.00000   1.00000     12840
   macro avg    1.00000   1.00000   1.00000     12840
weighted avg    1.00000   1.00000   1.00000     12840



You also can check model yourself in next cell

In [10]:
text = "Type your own sentence here"
ner_pipeline(text)

[]

Summary<br>
- The fine-tuned DistilBERT model successfully recognizes mountain names in sentences.
- It also doesn't hallucinate on sentences without mountains.
- It generalizes well even to unseen text.