In [12]:
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import pytorch_cos_sim
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, InformationRetrievalEvaluator, NanoBEIREvaluator
import torch
import datasets
import random
torch._dynamo.config.suppress_errors = True

MODEL = 'answerdotai/ModernBERT-base'

In [3]:
model  = SentenceTransformer(MODEL)

No sentence-transformers model found with name answerdotai/ModernBERT-base. Creating a new one with mean pooling.


In [6]:
# semantic score
data = datasets.load_dataset("sentence-transformers/stsb", split='validation')
evaluator = EmbeddingSimilarityEvaluator(sentences1=data['sentence1'], sentences2=data['sentence2'], scores=data['score'])
result = evaluator(model)
result

{'pearson_cosine': 0.5190741868883064, 'spearman_cosine': 0.5566359148742774}

In [15]:
# information reterival eval
datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
    "QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\nQuery: ",
    "MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
}
ir_evaluator = NanoBEIREvaluator(dataset_names=datasets, query_prompts=query_prompts)
result = ir_evaluator(model)

In [16]:
result

{'NanoQuoraRetrieval_cosine_accuracy@1': 0.04,
 'NanoQuoraRetrieval_cosine_accuracy@3': 0.12,
 'NanoQuoraRetrieval_cosine_accuracy@5': 0.12,
 'NanoQuoraRetrieval_cosine_accuracy@10': 0.16,
 'NanoQuoraRetrieval_cosine_precision@1': 0.04,
 'NanoQuoraRetrieval_cosine_precision@3': 0.04,
 'NanoQuoraRetrieval_cosine_precision@5': 0.024000000000000004,
 'NanoQuoraRetrieval_cosine_precision@10': 0.016,
 'NanoQuoraRetrieval_cosine_recall@1': 0.04,
 'NanoQuoraRetrieval_cosine_recall@3': 0.12,
 'NanoQuoraRetrieval_cosine_recall@5': 0.12,
 'NanoQuoraRetrieval_cosine_recall@10': 0.15,
 'NanoQuoraRetrieval_cosine_ndcg@10': 0.09591463641493617,
 'NanoQuoraRetrieval_cosine_mrr@10': 0.07916666666666666,
 'NanoQuoraRetrieval_cosine_map@100': 0.08583165716368223,
 'NanoMSMARCO_cosine_accuracy@1': 0.0,
 'NanoMSMARCO_cosine_accuracy@3': 0.0,
 'NanoMSMARCO_cosine_accuracy@5': 0.0,
 'NanoMSMARCO_cosine_accuracy@10': 0.0,
 'NanoMSMARCO_cosine_precision@1': 0.0,
 'NanoMSMARCO_cosine_precision@3': 0.0,
 'NanoM