## Loss functions
Instead of going through every single loss function out there, there are two
loss functions that are typically used and seem to perform generally well,
namely:

 Cosine similarity

 Multiple negatives ranking (MNR) loss

#### Cosine Similarity

The cosine similarity loss is an intuitive and easy-to-use loss that works
 across many different use cases and datasets. It is typically used in semantic
 textual similarity tasks. In these tasks, a similarity score is assigned to the
 pairs of texts over which we optimize the model.

 Instead of having strictly positive or negative pairs of sentences, we assume
 pairs of sentences that are similar or dissimilar to a certain degree.

 ![image1.png](<image1.png>)

In [3]:
#to use cosine similarity we need to convert the three labels to 2
# pip install datasets
#1- entailment, 0- neutral and contradiction

from datasets import Dataset, load_dataset

#load mnli dataset
# 0 = entailment, 1 = neutral, 2 = contradiction
train_dataset = load_dataset("glue", "mnli", split="train").select(range(50000))

train_dataset= train_dataset.remove_columns("idx")

mapping= {2:0, 1:0, 0:1}

train_dataset = Dataset.from_dict({
    "sentence1": train_dataset["premise"],
    "sentence2": train_dataset["hypothesis"],
    "label": [float(mapping[label])for label in train_dataset["label"]]
})


README.md:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/52.2M [00:00<?, ?B/s]

(…)alidation_matched-00000-of-00001.parquet:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

(…)dation_mismatched-00000-of-00001.parquet:   0%|          | 0.00/1.25M [00:00<?, ?B/s]

test_matched-00000-of-00001.parquet:   0%|          | 0.00/1.22M [00:00<?, ?B/s]

test_mismatched-00000-of-00001.parquet:   0%|          | 0.00/1.26M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/392702 [00:00<?, ? examples/s]

Generating validation_matched split:   0%|          | 0/9815 [00:00<?, ? examples/s]

Generating validation_mismatched split:   0%|          | 0/9832 [00:00<?, ? examples/s]

Generating test_matched split:   0%|          | 0/9796 [00:00<?, ? examples/s]

Generating test_mismatched split:   0%|          | 0/9847 [00:00<?, ? examples/s]

In [4]:
#for evaluation we will use Semantic Textual Similarity Benchmark (STSB)

from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator

#create an embedding similarity evaluator for STSB

val_sts= load_dataset("glue", "stsb", split="validation")

evaluator= EmbeddingSimilarityEvaluator(
    sentences1= val_sts["sentence1"],
    sentences2= val_sts["sentence2"],
    scores=[score/5 for score in val_sts["label"]],
    main_similarity= "cosine",
)

train-00000-of-00001.parquet:   0%|          | 0.00/502k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/151k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/114k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5749 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1379 [00:00<?, ? examples/s]

In [6]:
from sentence_transformers import SentenceTransformer, losses
from sentence_transformers.trainer import SentenceTransformerTrainer
from sentence_transformers.training_args import SentenceTransformerTrainingArguments

#define model
embedding_model= SentenceTransformer("bert-base-uncased")

#loss function
train_loss= losses.CosineSimilarityLoss(model=embedding_model)

#training args
# Remove the 'mixed_precision' argument
args= SentenceTransformerTrainingArguments(
    output_dir="cosineloss_embedding_model",
    num_train_epochs= 1,
    per_device_eval_batch_size=32,
    per_gpu_eval_batch_size=32,
    warmup_steps= 100,
    logging_steps= 100,
    eval_steps= 100,
    deepspeed=None # Explicitly disable deepspeed
)

#train model

trainer= SentenceTransformerTrainer(
    model=embedding_model,
    args= args,
    train_dataset= train_dataset,
    loss= train_loss,
    evaluator= evaluator
)
trainer.train()



Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msreys007[0m ([33msreys007-dit-pune[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
100,0.2465
200,0.1928
300,0.1897
400,0.1773
500,0.1772
600,0.1649
700,0.1747
800,0.1682
900,0.1678
1000,0.1752


TrainOutput(global_step=6250, training_loss=0.15996440284729005, metrics={'train_runtime': 1189.4861, 'train_samples_per_second': 42.035, 'train_steps_per_second': 5.254, 'total_flos': 0.0, 'train_loss': 0.15996440284729005, 'epoch': 1.0})

In [7]:
evaluator(embedding_model)

{'pearson_cosine': np.float64(0.7049089473911252),
 'spearman_cosine': np.float64(0.7064982342649344)}

### Multiple negatives ranking loss

Multiple negatives ranking (MNR) loss,6 often referred to as InfoNCE7 or
 NTXentLoss,8 is a loss that uses either positive pairs of sentences or triplets
 that contain a pair of positive sentences and an additional unrelated
 sentence. This unrelated sentence is called a negative and represents the
 dissimilarity between the positive sentences.

  For example, you might have pairs of question/answer, image/image
 caption, paper title/paper abstract, etc. The great thing about these pairs is
 that we can be confident they are hard positive pairs. In MNR loss
 , negative pairs are constructed by mixing a positive pair
 with another positive pair. In the example of a paper title and abstract, you
 would generate a negative pair by combining the title of a paper with a
 completely different abstract. These negatives are called in-batch negatives
 and can also be used to generate the triplets.


 ![image2.png](<image2.png>)



In [2]:
import random
from tqdm import tqdm
from datasets import Dataset, load_dataset

#load the dataset
mnli= load_dataset("glue", "mnli", split= "train").select(range(50000))

mnli= mnli.remove_columns("idx")
mnli= mnli.filter(lambda x: True if x["label"]== 0 else False)

#prepare data and add a soft negative
#Initialize train_dataset as a dictionary to store the data for anchor, positive and negative
train_dataset= {"anchor": [], "positive": [], "negative": []}
soft_negatives= mnli["hypothesis"]
random.shuffle(soft_negatives)

for row, soft_negative in tqdm(zip(mnli, soft_negatives)):
  train_dataset["anchor"].append(row["premise"])
  train_dataset["positive"].append(row["hypothesis"])
  train_dataset["negative"].append(soft_negative)

# Convert the dictionary to a Dataset object
train_dataset= Dataset.from_dict(train_dataset)

16875it [00:01, 13895.77it/s]


reduced the dataset here from 50000 -> 16875 (as we are only considering entailment)

In [3]:
#for evaluation we will use Semantic Textual Similarity Benchmark (STSB)

from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator

#create an embedding similarity evaluator for STSB

val_sts= load_dataset("glue", "stsb", split="validation")

evaluator= EmbeddingSimilarityEvaluator(
    sentences1= val_sts["sentence1"],
    sentences2= val_sts["sentence2"],
    scores=[score/5 for score in val_sts["label"]],
    main_similarity= "cosine",
)

In [4]:
from sentence_transformers import SentenceTransformer, losses
from sentence_transformers.trainer import SentenceTransformerTrainer
from sentence_transformers.training_args import SentenceTransformerTrainingArguments

#define model
embedding_model= SentenceTransformer("bert-base-uncased")

#loss function
train_loss= losses.MultipleNegativesRankingLoss(model=embedding_model)

#training args
# Remove the 'mixed_precision' argument
args= SentenceTransformerTrainingArguments(
    output_dir="mnrloss_embedding_model",
    num_train_epochs= 1,
    per_device_eval_batch_size=32,
    per_gpu_eval_batch_size=32,
    warmup_steps= 100,
    logging_steps= 100,
    eval_steps= 100,
    deepspeed=None # Explicitly disable deepspeed
)

#train model

trainer= SentenceTransformerTrainer(
    model=embedding_model,
    args= args,
    train_dataset= train_dataset,
    loss= train_loss,
    evaluator= evaluator
)
trainer.train()



Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33msreys007[0m ([33msreys007-dit-pune[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
100,0.1764
200,0.0559
300,0.0358
400,0.0516
500,0.0673
600,0.0339
700,0.0523
800,0.0341
900,0.0488
1000,0.0454


TrainOutput(global_step=2110, training_loss=0.043489859454439714, metrics={'train_runtime': 460.8221, 'train_samples_per_second': 36.619, 'train_steps_per_second': 4.579, 'total_flos': 0.0, 'train_loss': 0.043489859454439714, 'epoch': 1.0})

In [5]:
evaluator(embedding_model)

{'pearson_cosine': np.float64(0.795345735269074),
 'spearman_cosine': np.float64(0.7988216651543605)}

#### we can see that we are getting better similarities here