# Retriever Customization

Authors - Aditya Malte, Vinay Raman, Ali Taghibakhshi

## Setup Instructions

This is the second notebook as part of this two-notebook tutorial
It runs in the Docker container `nemo:24.01.01`.

Run docker when inside the `synthetic-data-retriever-customization` directory using this command: 

`docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.01.01`

This notebook was tested on a setup comprising 2xA6000 GPUs with CUDA setup.

Use the command `ngc registry model download-version "ohlfw0olaadg/ea-participants/nv-embed-qa:4"` to download the NeMo Retriever model. It must be downloaded to the directory `files/models`. The same model - NeMo Retriever - has been used as an example in this notebook. If you do not have NVAIE access, then you may download and convert a HF embedding like `intfloat/e5-large-unsupervised` for your purpose as follows:
```
/NeMo/scripts/nlp_language_modeling/convert_bert_hf_to_nemo.py \
       --input_name_or_path "intfloat/e5-large-unsupervised" \
       --output_path /workspace/files/models/my_model.nemo
```

For the purpose of this notebook, we have used the NeMo Retriever model. If you use another model, or convert an HF model, ensure that the model path is updated accordingly

In [1]:
!rm /opt/NeMo/nemo/collections/nlp/models/information_retrieval/megatron_sbert_model.py
!curl -o /opt/NeMo/nemo/collections/nlp/models/information_retrieval/megatron_sbert_model.py https://raw.githubusercontent.com/NVIDIA/NeMo/main/nemo/collections/nlp/models/information_retrieval/megatron_sbert_model.py
!ln -s /opt/NeMo /NeMo

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33505  100 33505    0     0   154k      0 --:--:-- --:--:-- --:--:--  154k


In [2]:
!pip install ipywidgets
!pip install beir

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting ipywidgets
  Downloading ipywidgets-8.1.2-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.10 (from ipywidgets)
  Downloading widgetsnbextension-4.0.10-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.10 (from ipywidgets)
  Downloading jupyterlab_widgets-3.0.10-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.2-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jupyterlab_widgets-3.0.10-py3-none-any.whl (215 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.0/215.0 kB[0m [31m329.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading widgetsnbextension-4.0.10-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m118.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collec

Please restart kernel after installing the libraries

## Import libraries and set configuration

In [3]:
import torch
import numpy as np
import json
import math
from tqdm import tqdm
import pandas as pd
from collections import OrderedDict
import os

In [4]:
QA_PAIRS_PATH = "/workspace/files/data/qa_pairs_meta_llama_Llama_2_13b_chat_hf_num_questions_300_BeIR_nfcorpus.csv" 
HARD_NEGATIVE_MINING_MODEL_NAME_OR_PATH = 'intfloat/e5-large-unsupervised'

OUTPUT_DATA_PATH = "/tmp/data/output_data.json"
output_dir_path = os.path.dirname(OUTPUT_DATA_PATH)
if not os.path.exists(output_dir_path):
    os.mkdir(output_dir_path)

In [5]:
NUM_DEVICES=2 # number of gpus to train on
CONFIG_PATH="/NeMo/examples/nlp/information_retrieval/conf/"
CONFIG_NAME="megatron_sbert_config"
PATH_TO_NEMO_MODEL= "/workspace/files/models/NV-Embed-QA-4.nemo" # Path to conveted nemo model from hf, if you have a different model
DATASET_PATH= OUTPUT_DATA_PATH # Path to json dataset
SAVE_DIR= "/tmp/trained_model/" # where the checkpoint and logs are saved

### Read QA Pairs file

In [6]:
qa_pairs = pd.read_csv(QA_PAIRS_PATH).sample(frac=1).reset_index(drop=True)
qa_pairs

Unnamed: 0,question,positive_chunk,positive_chunk_id,paragraph_id
0,What are the chronic effects of coffee consump...,Coffee and endothelial function: a battle betw...,0,67
1,Can poor blood supply to the intervertebral di...,Symptomatic disc herniation and serum lipid le...,0,16
2,What are the structural diversity and molecula...,An update on bioactive plant lignans.\nLignans...,0,78
3,What are the recent advances in natural and or...,Beyond celery and starter culture: advances in...,0,95
4,What are the factors that contribute to the de...,Constipation and a Low-Fiber Diet are Not Asso...,0,32
...,...,...,...,...
295,How does galactose consumption affect the risk...,Adolescent milk fat and galactose consumption ...,0,18
296,How does dietary cholesterol affect LDL choles...,Maintenance of the LDL cholesterol:HDL cholest...,0,82
297,What are the patterns of monoclonal immunoglob...,Clinical Trials and Observations: Monoclonal g...,0,46
298,How do the prevalence of ideal cardiovascular ...,Status of Cardiovascular Health in US Adults: ...,0,85


## Convert pandas dataframe to qrels, queries and passages

In [7]:
passages = OrderedDict()
queries = []
positive_passage_ids = []
for _, row in qa_pairs.iterrows():
    queries.append(row["question"])
    positive_passage_str = row["positive_chunk"]
    if(positive_passage_str in passages):
        positive_passage_id = passages[positive_passage_str]
        positive_passage_ids.append(positive_passage_id)
    else:
        positive_passage_id = len(passages)
        passages[positive_passage_str] = positive_passage_id
        positive_passage_ids.append(positive_passage_id)

In [8]:
len(queries)

300

## Perform Embedding Generation

In [9]:
import math
from tqdm import tqdm
import torch

from nemo.collections.nlp.models.information_retrieval.megatron_sbert_model import MegatronSBertModel
from pytorch_lightning.trainer.trainer import Trainer

model = MegatronSBertModel.restore_from(
            PATH_TO_NEMO_MODEL,
            trainer=Trainer()
        ).to("cuda:1")

def encode_text(model, texts, batch_size=1, device="cuda:0"):
    with torch.no_grad():
        tokenized_texts = model.tokenize(texts)
        model = model.to(device).eval()

        input_ids = tokenized_texts["input_ids"].to(device)
        attention_mask = tokenized_texts["attention_mask"].to(device)
        token_type_ids = tokenized_texts["token_type_ids"].to(device)

        num_batches = int(math.ceil(len(texts)/batch_size))

        embeddings = []
        for batch_id in tqdm(range(num_batches)):
            start = batch_size * batch_id
            end = batch_size * (batch_id+1)

            batch_embeddings = model(input_ids[start:end, :], attention_mask[start:end, :], token_type_ids[start:end, :])
            embeddings.append(batch_embeddings)
        return torch.cat(embeddings, dim=1).swapaxes(0,1)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: gradient_accumulation_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make

[NeMo I 2024-03-15 22:59:21 megatron_init:253] Rank 0 has data parallel group : [0, 1]
[NeMo I 2024-03-15 22:59:21 megatron_init:259] Rank 0 has combined group of data parallel and context parallel : [0, 1]
[NeMo I 2024-03-15 22:59:21 megatron_init:264] All data parallel group ranks with context parallel combined: [[0, 1]]
[NeMo I 2024-03-15 22:59:21 megatron_init:267] Ranks 0 has data parallel rank: 0
[NeMo I 2024-03-15 22:59:21 megatron_init:284] Rank 0 has context parallel group: [0]
[NeMo I 2024-03-15 22:59:21 megatron_init:287] All context parallel group ranks: [[0], [1]]
[NeMo I 2024-03-15 22:59:21 megatron_init:288] Ranks 0 has context parallel rank: 0
[NeMo I 2024-03-15 22:59:21 megatron_init:299] Rank 0 has model parallel group: [0]
[NeMo I 2024-03-15 22:59:21 megatron_init:300] All model parallel group ranks: [[0], [1]]
[NeMo I 2024-03-15 22:59:21 megatron_init:310] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-03-15 22:59:21 megatron_init:314] All tensor model par

[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: gradient_accumulation_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:21 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: tp_comm_split_

[NeMo I 2024-03-15 22:59:21 tokenizer_utils:198] Getting Megatron tokenizer for pretrained model name: intfloat/e5-large-unsupervised, custom vocab file: None, and merges file: None
[NeMo I 2024-03-15 22:59:21 tokenizer_utils:127] Getting HuggingFace AutoTokenizer with pretrained_model_name: intfloat/e5-large-unsupervised, vocab_file: None, merges_files: None, special_tokens_dict: {}, and use_fast: False


tokenizer_config.json:   0%|          | 0.00/372 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

[NeMo I 2024-03-15 22:59:22 megatron_base_model:574] Padded vocab_size: 30592, original vocab_size: 30522, dummy tokens: 70.


[NeMo W 2024-03-15 22:59:22 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:22 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:22 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: gradient_accumulation_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:22 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-03-15 22:59:22 megatron_base_model:1139] The model: MegatronSBertModel() does not have field.name: tp_comm_split_

Random seed set as 42
[NeMo I 2024-03-15 22:59:23 nlp_overrides:1119] Model MegatronSBertModel was successfully restored from /workspace/files/models/NV-Embed-QA-4.nemo.


In [10]:
query_embeddings = encode_text(model, [("query: "+query) for query in queries], device = "cuda:0", batch_size=5)
passage_embeddings = encode_text(model, [("passage: "+passage) for passage in list(passages)],  device = "cuda:0", batch_size=5)

100%|███████████████████| 60/60 [00:01<00:00, 32.25it/s]
100%|███████████████████| 20/20 [00:01<00:00, 10.48it/s]


## Mine Hard Negatives

Hard negative mining refers to the creation of negative examples that are 'hard'. Essentially, what this means is that rather than performing random sampling - which would lead to easy negatives - we mine for harder negative examples.

This has an advantage that the negatives would not be obvious to the model during training, and hence would actually be more helpful.

However, hard negative mining has a higher probability of generating false negatives. To avoid this, we set a safety `margin`. This margin is a hyperparameter and you may change it depending on if more false negatives are being generated. For instance, a larger corpus has a higher probability of generating false negatives than a smaller one, as the probability of finding another positive increases. In such cases a lower `margin` value may be more helpful.

In [11]:
def hard_negative_mining(
        query_embeddings,
        passage_embeddings,
        batch_size,
        margin,
        num_negs,
        query_positive_paragraph_idxs
):
    hard_negative_idxs = []
    num_batches = int(math.ceil(query_embeddings.shape[0] / batch_size))
    # Split the query embeddings into batches of given batch size
    for current_batch_idx in range(num_batches):
        start = (current_batch_idx)*batch_size
        end = (current_batch_idx+1)*(batch_size)
        batch_query_embeddings = query_embeddings[start:end]
        batch_query_positive_paragraph_idxs = query_positive_paragraph_idxs[start:end]
        
        # Find minimum query-positive_chunk similarity score for each query in a batch
        query_passage_pos_scores = np.matmul(batch_query_embeddings, passage_embeddings.T)

        min_pos_scores = []
        for query_id, row in enumerate(query_passage_pos_scores):
            min_value = float("inf")
            for query_positive_paragraph_idx in query_positive_paragraph_idxs[query_id+start]:
                min_value = min(min_value, row[query_positive_paragraph_idx])
            min_pos_scores.append(min_value)
        min_pos_scores = np.array(min_pos_scores)
            
        # For each query set minimum threshold as margin*minimum_batch_positive_score 
        mining_thresholds = min_pos_scores*margin
        
        # Filter out all chunks belonging to the same paragraph as positive passage OR those manually labelled as positives
        for query_idx, positive_paragraph_idxs in enumerate(batch_query_positive_paragraph_idxs):
            batch_query_idx = query_idx%batch_size
            query_passage_pos_scores[batch_query_idx][positive_paragraph_idxs] = -float("inf")
        
        # Filter out all chunks with score>mining_threshold
        for row_idx in range(query_passage_pos_scores.shape[0]):
            row = query_passage_pos_scores[row_idx]
            row[row>mining_thresholds[row_idx]] = -float("inf")
            
        # For each query get top_k hard negatives from all that remains
        for row in query_passage_pos_scores:
            top_k_hard_negative_idxs = np.argpartition(row, -num_negs)[-num_negs:]
            hard_negative_idxs.append(list(top_k_hard_negative_idxs))
            
    return hard_negative_idxs

In [12]:
positive_passage_ids_list = [[element] for element in positive_passage_ids]
hard_negative_idxs = hard_negative_mining(query_embeddings=query_embeddings.cpu().numpy(), passage_embeddings=passage_embeddings.cpu().numpy(), query_positive_paragraph_idxs=positive_passage_ids_list,
                    batch_size=32, num_negs=10, margin=0.95)

### Construct training data in the format
```
[
    {
        "question": "Query",
        "pos_doc": ["Positive"],
        "neg_doc": ["Negative_1", "Negative_2", ..., "Negative_n"]
    },
    {
        // Next data instance
    },
    ...,
    {
        // Subsequent data instance
    }
]
```

In [13]:
data = []
for query_id, query in enumerate(queries):
    hard_negative_passages = []
    for hard_negative_idx in hard_negative_idxs[query_id]:
        for key, val in passages.items():
            if val == hard_negative_idx:
                hard_negative_passage = key
                hard_negative_passages.append(hard_negative_passage)
    
    for key, val in passages.items():
        if val == positive_passage_ids[query_id]:
            positive_passage = key
            break

    datapoint = {
        "question" : query,
        "pos_doc" : [positive_passage],
        "neg_doc" : hard_negative_passages
    }
    data.append(datapoint)

In [14]:
len(data)

300

In [15]:
print(f"Saving data to: {OUTPUT_DATA_PATH}")

Saving data to: /tmp/data/output_data.json


In [16]:
with open(OUTPUT_DATA_PATH, "w") as file:
    json.dump(data, file)

## Training

In [17]:
COMMAND = f"python /opt/NeMo/examples/nlp/information_retrieval/megatron_sbert_finetune.py \
--config-path={CONFIG_PATH} \
--config-name={CONFIG_NAME} \
restore_from_path={PATH_TO_NEMO_MODEL} \
trainer.devices={NUM_DEVICES} \
trainer.val_check_interval=10 \
trainer.max_epochs=1 \
+trainer.num_sanity_val_steps=0 \
model.global_batch_size=8 \
model.micro_batch_size=4 \
model.tokenizer.library=huggingface \
model.tokenizer.type=intfloat/e5-large-unsupervised \
++model.data.data_prefix={DATASET_PATH} \
++model.tokenizer.do_lower_case=False \
++model.data.evaluation_sample_size=50 \
++model.data.hard_negatives_to_train=4 \
++model.data.evaluation_steps=100 \
exp_manager.explicit_log_dir={SAVE_DIR} \
exp_manager.create_wandb_logger=False \
++exp_manager.checkpoint_callback_params.save_best_model=True \
exp_manager.resume_if_exists=False"

print(COMMAND)

python /opt/NeMo/examples/nlp/information_retrieval/megatron_sbert_finetune.py --config-path=/NeMo/examples/nlp/information_retrieval/conf/ --config-name=megatron_sbert_config restore_from_path=/workspace/files/models/NV-Embed-QA-4.nemo trainer.devices=2 trainer.val_check_interval=10 trainer.max_epochs=1 +trainer.num_sanity_val_steps=0 model.global_batch_size=8 model.micro_batch_size=4 model.tokenizer.library=huggingface model.tokenizer.type=intfloat/e5-large-unsupervised ++model.data.data_prefix=/tmp/data/output_data.json ++model.tokenizer.do_lower_case=False ++model.data.evaluation_sample_size=50 ++model.data.hard_negatives_to_train=4 ++model.data.evaluation_steps=100 exp_manager.explicit_log_dir=/tmp/trained_model/ exp_manager.create_wandb_logger=False ++exp_manager.checkpoint_callback_params.save_best_model=True exp_manager.resume_if_exists=False


In [None]:
!{COMMAND}

    See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(
    
[NeMo I 2024-03-15 22:59:40 megatron_sbert_finetune:31] 
    
    ************** Experiment configuration ***********
[NeMo I 2024-03-15 22:59:40 megatron_sbert_finetune:32] 
    name: megatron_bert
    restore_from_path: /workspace/files/models/NV-Embed-QA-4.nemo
    trainer:
      devices: 2
      num_nodes: 1
      accelerator: gpu
      precision: 16
      logger: false
      enable_checkpointing: false
      use_distributed_sampler: false
      max_epochs: 1
      max_steps: 100000
      log_every_n_steps: 10
      val_check_interval: 10
      limit_val_batches: 50
      limit_test_batches: 500
      accumulate_grad_batches: 1
      gradient_clip_val: 1.0
      benchmark: false
      num_sanity_val_steps: 0
    exp_manager:
      explicit_log_dir: /tmp/trained_model/
      exp_dir: null
      name: megatron_bert
      create_wandb_logger: false
      w

## Model Evaluation

In [None]:
from beir import util, LoggingHandler
from beir.retrieval import models
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "nfcorpus"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
out_dir = os.path.join("/tmp", "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")


#### Create a wrapper NeMo model for retrieval evaluation on this dataset

In [None]:
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from nemo.collections.nlp.models.information_retrieval.megatron_sbert_model import MegatronSBertModel
from pytorch_lightning.trainer.trainer import Trainer
from typing import List, Dict
import numpy as np

import torch
import math
from tqdm import tqdm

class NeMoModel:
    def __init__(self, model_path=None, **kwargs):
        self.model = MegatronSBertModel.restore_from(
            model_path,
            trainer=Trainer())
        self.model = self.model.to("cuda:0").half()
    
    def encode_text(self, texts, batch_size=1, device="cuda:0"):
        with torch.no_grad():
            tokenized_texts = self.model.tokenize(texts)
            
            input_ids = tokenized_texts["input_ids"].to(device)
            attention_mask = tokenized_texts["attention_mask"].to(device)
            token_type_ids = tokenized_texts["token_type_ids"].to(device)

            num_batches = int(math.ceil(len(texts)/batch_size))

            embeddings = []
            for batch_id in tqdm(range(num_batches)):
                start = batch_size * batch_id
                end = batch_size * (batch_id+1)

                batch_embeddings = self.model(input_ids[start:end, :], attention_mask[start:end, :], token_type_ids[start:end, :])
                embeddings.append(batch_embeddings)
            return torch.cat(embeddings, dim=1).swapaxes(0,1)

    # Write your own encoding query function (Returns: Query embeddings as numpy array)
    def encode_queries(self, queries: List[str], batch_size: int, **kwargs) -> np.ndarray:
        queries = [f"query: {query}" for query in queries]
        embeddings = self.encode_text(texts=queries, batch_size=batch_size)
        return embeddings
    
    # Write your own encoding corpus function (Returns: Document embeddings as numpy array)  
    def encode_corpus(self, corpus: List[Dict[str, str]], batch_size: int, **kwargs) -> np.ndarray:
        corpus = [f"passage: {passage}" for passage in corpus]
        embeddings = self.encode_text(texts=corpus, batch_size=batch_size)
        return embeddings

Our Fine-tuned model:

In [None]:
new_model = DRES(NeMoModel(model_path="/tmp/trained_model/checkpoints/megatron_bert.nemo"), batch_size=1)
retriever = EvaluateRetrieval(new_model, score_function="dot") # or "cos_sim" for cosine similarity
results = retriever.retrieve(corpus, queries)

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000] 
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

The original model:

In [None]:
# The original model
old_model = DRES(NeMoModel(model_path=PATH_TO_NEMO_MODEL), batch_size=1)
retriever = EvaluateRetrieval(old_model, score_function="dot") # or "cos_sim" for cosine similarity
results = retriever.retrieve(corpus, queries)

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000] 
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

As you can see, there is some improvement in the results on evaluation. The improvements might be more noticeable depending on the domain used for synthetic data generation and also the hyperparameters. The improvements might also improve if we sample a larger amount of data for synthetic data generation; although various hyperparameters (like for hard negative mining, batch size) might have to be changed/tested in that case.