# BEIR Benchmark Evaluation

This notebook evaluate a BEIR benchmark dataset using a SentenceBERT model for dense retrieval.

## Setup and Initialization

### Import Libraries

Import the necessary libraries for data loading, model creation, retrieval, and evaluation.

In [1]:
from datetime import datetime

import pandas as pd
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.models import SentenceBERT
from beir.retrieval.search.dense import DenseRetrievalExactSearch as dres
from beir.retrieval.evaluation import EvaluateRetrieval
import json

import os

  from tqdm.autonotebook import tqdm





### Load Configuration

Loads the configuration file that contains the dataset name, model name, and other parameters for the evaluation.

In [2]:
config_file = "configs/beir_benchmark_config.json"

with open(config_file, "r") as f:
    config = json.load(f)

## Evaluation

### Load Dataset

Loads a specified BEIR dataset for evaluation

In [3]:
dataset = config['dataset']
data_path = config['datasets_folder']
corpus, queries, qrels = GenericDataLoader(data_folder=os.path.join(data_path, dataset)).load(split="test")

  0%|          | 0/25657 [00:00<?, ?it/s]

### Create the Embedding Model

Creates a SentenceBERT model for dense retrieval using the specified model name from the configuration.

In [4]:
model_name = config['model_name']
sbert = SentenceBERT(model_name)
retriever = EvaluateRetrieval(dres(sbert, batch_size=config['batch_size']), score_function=config['score_function'])

### Retrieve Documents

This section retrieves documents for the queries in the dataset and evaluates the retrieval performance using specified metrics.

In [5]:
retrieved = retriever.retrieve(corpus, queries)

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

Batches:   0%|          | 0/802 [00:00<?, ?it/s]

### Validate the Retrieval

Validates the retrieval results by checking if the retrieved documents match the expected query relevance judgments.

In [6]:
k_values = config['k_values']
results = retriever.evaluate(qrels, retrieved, k_values=k_values)
output_folder = config["output_folder"]
run_name = f"run_beir_{datetime.now().strftime("%Y%m%d_%H%M%S")}"
out_path = os.path.join(output_folder, run_name)
os.makedirs(out_path, exist_ok=True)

results_df = pd.DataFrame()
results_df["k"] = k_values

for r in results:
    metric = next(iter(r.keys()))
    metric = metric.split("@")[0]

    values = r.values()

    results_df[metric] = values

### Save Results

Saves the evaluation results to a CSV file and the configuration used for the evaluation to a JSON file.

In [7]:
file_path = os.path.join(out_path, f"results.csv")
results_df.to_csv(file_path, index=False)

with open(os.path.join(out_path, "used_config.json"), "w") as f:
    json.dump(config, f, indent=4)