# Fine-Tuning an Embedding Model

 sentence-transformers framework allows nearly all
 embedding models to be used as a base for fine-tuning. We can choose an
 embedding model that was already trained on a large amount of data and
 fine-tune it for our specific data or purpose.

### supervised

In [1]:
from datasets import load_dataset
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator



In [2]:
#load the MNLI dataset
train_dataset= load_dataset(
    "glue", "mnli", split= "train"
).select(range(50000))

train_dataset= train_dataset.remove_columns("idx")

In [3]:
#create an embedding similarity evaluator for stsb
val_sts= load_dataset(
    "glue", "stsb", split= "validation"
)
evaluator= EmbeddingSimilarityEvaluator(
    sentences1= val_sts["sentence1"],
    sentences2= val_sts["sentence2"],
    scores=[score/5 for score in val_sts["label"]],
    main_similarity= "cosine",
)

In [5]:
from sentence_transformers import losses, SentenceTransformer
from sentence_transformers.trainer import SentenceTransformerTrainer
from sentence_transformers.training_args import SentenceTransformerTrainingArguments



In [6]:
#def model

embedding_model= SentenceTransformer(
    "sentence-transformers/all-MiniLM-L6-v2"
)


#loss function
train_loss= losses.MultipleNegativesRankingLoss(model=embedding_model)




In [7]:
#define the training args

args= SentenceTransformerTrainingArguments(
    output_dir="finetuned_embedding_model",
    num_train_epochs= 1,
    per_device_eval_batch_size=32,
    per_gpu_eval_batch_size=32,
    warmup_steps= 100,
    fp16=True,
    logging_steps= 100,
    eval_steps= 100,
    report_to="none" # Disable WandB integration
)

In [8]:
#train model
trainer= SentenceTransformerTrainer(
    model=embedding_model,
    args= args,
    train_dataset= train_dataset,
    loss= train_loss,
    evaluator= evaluator
)
trainer.train()

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

  0%|          | 0/6250 [00:00<?, ?it/s]

Column 'hypothesis' is at index 1, whereas a column with this name is usually expected at index 0. Note that the column order can be important for some losses, e.g. MultipleNegativesRankingLoss will always consider the first column as the anchor and the second as the positive, regardless of the dataset column names. Consider renaming the columns to match the expected order, e.g.:
dataset = dataset.select_columns(['hypothesis', 'entailment', 'contradiction'])
  attn_output = torch.nn.functional.scaled_dot_product_attention(


{'loss': 0.0757, 'grad_norm': 5.261423110961914, 'learning_rate': 4.9500000000000004e-05, 'epoch': 0.02}
{'loss': 0.0661, 'grad_norm': 1.385972499847412, 'learning_rate': 4.9195121951219514e-05, 'epoch': 0.03}
{'loss': 0.058, 'grad_norm': 0.7234396934509277, 'learning_rate': 4.8382113821138216e-05, 'epoch': 0.05}
{'loss': 0.0541, 'grad_norm': 0.018010197207331657, 'learning_rate': 4.756910569105692e-05, 'epoch': 0.06}
{'loss': 0.0423, 'grad_norm': 7.422886371612549, 'learning_rate': 4.675609756097561e-05, 'epoch': 0.08}
{'loss': 0.0541, 'grad_norm': 0.20211006700992584, 'learning_rate': 4.594308943089431e-05, 'epoch': 0.1}
{'loss': 0.0478, 'grad_norm': 0.0816231444478035, 'learning_rate': 4.513008130081301e-05, 'epoch': 0.11}
{'loss': 0.0689, 'grad_norm': 0.10752610117197037, 'learning_rate': 4.431707317073171e-05, 'epoch': 0.13}
{'loss': 0.0438, 'grad_norm': 12.70060920715332, 'learning_rate': 4.350406504065041e-05, 'epoch': 0.14}
{'loss': 0.0508, 'grad_norm': 2.2857697010040283, 'lea

TrainOutput(global_step=6250, training_loss=0.04726924160003662, metrics={'train_runtime': 893.9164, 'train_samples_per_second': 55.934, 'train_steps_per_second': 6.992, 'total_flos': 0.0, 'train_loss': 0.04726924160003662, 'epoch': 1.0})

In [9]:
#evaluate
evaluator(embedding_model)

{'pearson_cosine': 0.8263247584053353, 'spearman_cosine': 0.8289087350030249}