# Finetuning a sentence similarity model

[SBERT](https://sbert.net) offers a framework for calculating similarities between sentences (of course, also larget entities can be used due to the increased context length). A lot of fantastic models already exist and can be found on the [MTEB](https://huggingface.co/spaces/mteb/leaderboard).

To show how well finetuning works, we deliberately start with an unsuitable model, again using ModernBERT. We then find out that after finetuning our results improve considerably.

# Load the dataset

In [1]:
from datasets import load_dataset

In [2]:
train_dataset = load_dataset("sentence-transformers/all-nli", "pair-score", split="train")

Inspect

In [3]:
train_dataset

Dataset({
    features: ['sentence1', 'sentence2', 'score'],
    num_rows: 942069
})

First row

In [4]:
train_dataset[0]

{'sentence1': 'A person on a horse jumps over a broken down airplane.',
 'sentence2': 'A person is training his horse for a competition.',
 'score': 0.5}

Looks strange, take a deeper look

In [5]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
df = train_dataset.to_pandas()
df.head()

Unnamed: 0,sentence1,sentence2,score
0,A person on a horse jumps over a broken down airplane.,A person is training his horse for a competition.,0.5
1,A person on a horse jumps over a broken down airplane.,"A person is at a diner, ordering an omelette.",0.0
2,A person on a horse jumps over a broken down airplane.,"A person is outdoors, on a horse.",1.0
3,Children smiling and waving at camera,They are smiling at their parents,0.5
4,Children smiling and waving at camera,There are children present,1.0


How many different scores are present?

In [6]:
df["score"].value_counts()

score
1.0    314315
0.0    314090
0.5    313664
Name: count, dtype: int64

Take a look at the sample!

In [7]:
df.sample(20, random_state=42)

Unnamed: 0,sentence1,sentence2,score
762721,you had to do some work and,you had to do a lot of work for the army.,0.5
674347,The others were grouped together at a little distance away.,There was a group of people at a short distance.,1.0
2362,A small child in blue shirt and blue jeans is running through a wooded path over dry leaves.,A child is running over leaves.,1.0
913903,"Wood Quay, on the south bank of the river (downhill from the arch and dominated by the featureless offices of the Dublin Corporation), is the site of the original Viking settlement and its clever recreation, The Viking Adventure (Tuesday Saturday 10am 4:30pm, closed Sunday and Monday; adults IRa4.","The original Viking settlement and its modern-day recreation are located on Wood Quay, downhill from the arch.",1.0
204130,An older man smiling and holding twine.,an older man is going to tie a knot,0.5
220223,A group of people in life jackets standing on a large boulder in the mountains.,A group of people are about to go swimming.,0.5
277414,Two men enjoying a beer together.,The two men are drunk.,0.5
788967,"The appeals court rapped the agency for its scare tactics, saying it must base its conclusions on solid facts and a realistic appraisal of the danger rather than on vague fears extrapolated beyond any foreseeable threat.",The court supported the agency and thought they way they dealt with things was friendly.,0.0
722422,Texas yeah i'm in California right now i'm i'm originally from North Carolina though i,I love California more than I love North Carolina.,0.5
791949,The clothes didn't fit.,The dress didn't fit.,0.5


# Create a SBERT model based on ModernBERT

In [8]:
from sentence_transformers import SentenceTransformer

In [9]:
model_name = "answerdotai/ModernBERT-base"
model = SentenceTransformer(model_name, device="cuda")

No sentence-transformers model found with name answerdotai/ModernBERT-base. Creating a new one with mean pooling.
Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in ModernBertModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", dtype=torch.float16)`


In [10]:
model.to('cuda')

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

We need a loss function which tells us how our models rates the sentences compared to the ground truth.

In [11]:
from sentence_transformers.losses import CoSENTLoss
loss = CoSENTLoss(model)

SBERT has a training (i.e. finetuning) framework integrated

In [12]:
from sentence_transformers import SentenceTransformerTrainingArguments
from sentence_transformers.training_args import BatchSamplers
args = SentenceTransformerTrainingArguments(
    # Required parameter:
    output_dir="models/ModernSBERT",
    # Optional training parameters:
    num_train_epochs=1,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    fp16=False,
    bf16=True,
    batch_sampler=BatchSamplers.NO_DUPLICATES,  # losses that use "in-batch negatives" benefit from no duplicates
)

We need to evaluate the similarity for the different parts of the dataset (training, validation, test)

In [13]:
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, SimilarityFunction

# Load the STSB dataset (https://huggingface.co/datasets/sentence-transformers/stsb)
eval_dataset = load_dataset("sentence-transformers/all-nli", "pair-score", split="dev")

# Initialize the evaluator
dev_evaluator = EmbeddingSimilarityEvaluator(
    sentences1=eval_dataset["sentence1"],
    sentences2=eval_dataset["sentence2"],
    scores=eval_dataset["score"],
    main_similarity=SimilarityFunction.COSINE,
    name="all-nli-dev",
)

In [14]:
test_dataset = load_dataset("sentence-transformers/all-nli", "pair-score", split="test")

In [15]:
test_evaluator = EmbeddingSimilarityEvaluator(
    sentences1=test_dataset["sentence1"],
    sentences2=test_dataset["sentence2"],
    scores=test_dataset["score"],
    main_similarity=SimilarityFunction.COSINE,
)

Check how well the model performs on the test dataset

In [16]:
test_evaluator(model)

{'pearson_cosine': 0.13668265700537993, 'spearman_cosine': 0.16279784846295933}

Run the training (takes too long in the live course)

In [17]:
from sentence_transformers import SentenceTransformerTrainer
trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss=loss,
    evaluator=dev_evaluator,
)
trainer.train()

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss


KeyboardInterrupt: 

In [None]:
model.save_pretrained("ModernSBERT-base-all-nli/final")

Run the evaluator again on the finetuned dataset

In [18]:
model = SentenceTransformer("./ModernSBERT-base-all-nli/final")

In [19]:
test_evaluator(model)

{'pearson_cosine': 0.7533082839438638, 'spearman_cosine': 0.7905991891313705}