# Retriever Customization - Fine-Tuning & Evaluation (2/2)

Authors - Aditya Malte, Vinay Raman, Ali Taghibakhshi, Dora Li

## Overview
This is part two of a two-part series. 
1. `synthetic_data_generation_nemo.ipynb`:
    - Use an LLM from build.nvidia.com (or deploy your own using NIM!) to create training examples containing generated queries and positive chunks. By default the notebook will use nfcorpus, but you can easily swap in your own data.
    - Implement hard negative mining to find challenging negative examples
    - Save results to a `.jsonl` file 


2. `retriever_customization.ipynb` **(this notebook)**:
    - Use the generated training data in the `.jsonl` file to fine-tune a retriever model using Nemo Framework
    - Evaluate the results of your fine-tuned embedding model against the original using BeIR Benchmark
    

## Setup Instructions

#### NeMo Framework Docker container
This notebook requires the NeMo Framework Docker container. Download the appropriate Docker image and build the container when inside the `synthetic-data-retriever-customization` directory using this command: 

`docker run -it --rm --gpus all --ipc=host --network host -v $(pwd):/workspace nvcr.io/nvidia/nemo:24.07`

This notebook was tested on a setup comprising of 1xL40S GPUs with CUDA setup.


#### Download NV-Embed-QA-4 model weights from NGC
Use the command `ngc registry model download-version "ohlfw0olaadg/ea-participants/nv-embed-qa:4"` to download the NeMo Retriever model. It must be downloaded to the directory `files/models`. The same model - NeMo Retriever - has been used as an example in this notebook. If you do not have NVAIE access, then you may download and convert a HF embedding like `intfloat/e5-large-unsupervised` for your purpose as follows:
```
/NeMo/scripts/nlp_language_modeling/convert_bert_hf_to_nemo.py \
       --input_name_or_path "intfloat/e5-large-unsupervised" \
       --output_path /workspace/files/models/my_model.nemo
```

For the purpose of this notebook, we have used the NeMo Retriever model. If you use another model, or convert an HF model, ensure that the model path is updated accordingly

In [1]:
!pip install ipywidgets
!pip install beir

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


## Import libraries and set configuration

In [20]:
import numpy as np
import json
import pandas as pd
from collections import OrderedDict
import os

In [2]:
# This should be the synthetic dataset generated in Part 1, consisting of the queries, pos_doc, and neg_docs
OUTPUT_DATA_PATH = "/tmp/data/output_data.jsonl"

In [3]:
NUM_DEVICES=1 # number of gpus available for fine-tuning

# Use the default config for BERT Embedding Model
CONFIG_PATH="/opt/NeMo/examples/nlp/information_retrieval/conf/"
CONFIG_NAME="megatron_bert_embedding_config"

PATH_TO_NEMO_MODEL= "/workspace/files/models/NV-Embed-QA-4.nemo" # Path to converted nemo model from hf, if you have a different model
DATASET_PATH= OUTPUT_DATA_PATH # Path to jsonl dataset
SAVE_DIR= "/tmp/trained_model/" # where the checkpoint and logs are saved

## Training

Run the `megatron_bert_embedding_finetuning.py` script. This script sets up and trains a Megatron-BERT model using  NVIDIA NeMo Framework, with configurations managed by Hydra. It loads the pre-trained `.nemo` model from a checkpoint, adjusts settings like batch size, and sets up parallel processing for multi-GPU training. Finally, it initializes the trainer and starts the training process with the NeMo Framework Megatron Trainer. 

Note `model.global_batch_size = model.micro_batch_size * trainer.devices (aka # of GPUs)`. Please keep micro_batch_size=4 and set the other parameters accordingly. 

`model.data.hard_negatives_to_train` should be set to the number of neg_docs corresponding to each query in your synthetic dataset. 

In [4]:
COMMAND = f"python /opt/NeMo/examples/nlp/information_retrieval/megatron_bert_embedding_finetuning.py \
--config-path={CONFIG_PATH} \
--config-name={CONFIG_NAME} \
restore_from_path={PATH_TO_NEMO_MODEL} \
trainer.devices={NUM_DEVICES} \
trainer.val_check_interval=10 \
trainer.max_epochs=1 \
+trainer.num_sanity_val_steps=0 \
trainer.max_steps=100000 \
model.global_batch_size=4 \
model.micro_batch_size=4 \
model.mcore_bert=False \
model.tokenizer.library=huggingface \
model.tokenizer.type=intfloat/e5-large-unsupervised \
model.megatron_legacy=True \
++model.data.data_prefix={DATASET_PATH} \
++model.tokenizer.do_lower_case=False \
++model.data.evaluation_sample_size=50 \
++model.data.hard_negatives_to_train=5 \
++model.data.evaluation_steps=100 \
++model.data.data_train={DATASET_PATH} \
++model.data.num_workers=7 \
exp_manager.explicit_log_dir={SAVE_DIR} \
exp_manager.create_wandb_logger=False \
++exp_manager.checkpoint_callback_params.save_best_model=True \
exp_manager.resume_if_exists=False"

print(COMMAND)

python /opt/NeMo/examples/nlp/information_retrieval/megatron_bert_embedding_finetuning.py --config-path=/opt/NeMo/examples/nlp/information_retrieval/conf/ --config-name=megatron_bert_embedding_config restore_from_path=/workspace/files/models/NV-Embed-QA-4.nemo trainer.devices=1 trainer.val_check_interval=10 trainer.max_epochs=1 +trainer.num_sanity_val_steps=0 trainer.max_steps=100000 model.global_batch_size=4 model.micro_batch_size=4 model.mcore_bert=False model.tokenizer.library=huggingface model.tokenizer.type=intfloat/e5-large-unsupervised model.megatron_legacy=True ++model.data.data_prefix=/tmp/data/output_data.jsonl ++model.tokenizer.do_lower_case=False ++model.data.evaluation_sample_size=50 ++model.data.hard_negatives_to_train=5 ++model.data.evaluation_steps=100 ++model.data.data_train=/tmp/data/output_data.jsonl ++model.data.num_workers=7 exp_manager.explicit_log_dir=/tmp/trained_model/ exp_manager.create_wandb_logger=False ++exp_manager.checkpoint_callback_params.save_best_mode

In [5]:
!{COMMAND}

`zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
    See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(
    
[NeMo I 2024-11-15 06:11:34 megatron_bert_embedding_finetuning:31] 
    
    ************** Experiment configuration ***********
[NeMo I 2024-11-15 06:11:34 megatron_bert_embedding_finetuning:32] 
    name: megatron_bert
    restore_from_path: /workspace/files/models/NV-Embed-QA-4.nemo
    trainer:
      devices: 1
      num_nodes: 1
      accelerator: gpu
      precision: 16
      logger: false
      enable_checkpointing: false
      use_distributed_sampler: false
      max_epochs: 1
      max_steps: 100000
      log_every_n_steps: 10
      val_check_interval: 10
      limit_val_batches: 50
      limit_test_batches: 500
      accumulate_grad_batches: 1
      gradient_clip_val: 1.0
      benchmark: false
      num_sanity_val_steps: 0
    exp_manag

[NeMo I 2024-11-15 06:11:35 megatron_bert_embedding_finetuning:47] Loading model from /workspace/files/models/NV-Embed-QA-4.nemo
[NeMo W 2024-11-15 06:11:39 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:11:39 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:11:39 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:11:39 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping t

tokenizer_config.json: 100%|███████████████████| 372/372 [00:00<00:00, 2.31MB/s]
vocab.txt: 100%|█████████████████████████████| 232k/232k [00:00<00:00, 3.69MB/s]
special_tokens_map.json: 100%|█████████████████| 112/112 [00:00<00:00, 1.48MB/s]
tokenizer.json: 100%|████████████████████████| 466k/466k [00:00<00:00, 2.42MB/s]
[NeMo I 2024-11-15 06:11:40 megatron_base_model:595] Padded vocab_size: 30592, original vocab_size: 30522, dummy tokens: 70.
[NeMo W 2024-11-15 06:11:40 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:11:40 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:11:40 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() do

[NeMo I 2024-11-15 06:11:43 nlp_overrides:1346] Model MegatronBertEmbeddingModel was successfully restored from /workspace/files/models/NV-Embed-QA-4.nemo.
[NeMo W 2024-11-15 06:11:43 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:161: You have overridden `MegatronBertEmbeddingModel.configure_sharded_model` which is deprecated. Please override the `configure_model` hook instead. Instantiation with the newer hook will be created on the device right away and have the right data type depending on the precision setting in the Trainer.
    
[NeMo W 2024-11-15 06:11:43 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:143: You are using the `dataloader_iter` step flavor. If you consume the iterator more than once per step, the `batch_idx` argument in any hook that takes it will not match with the batch index of the last batch consumed. This might have unforeseen effect

If your training completed, you should see a megatron_bert.nemo in your `SAVE_DIR` directory. 

If training failed due to memmap-related errors, delete any output_data.jsonl.idx* (index) files that have been generated in the `OUTPUT_DATA_PATH` directory where output_data.jsonl is located. To save memory, NeMo Framework doesn't rebuild index files if they already exist. So if you've changed any parameters related to the data or changed the data itself, this will cause errors. 

## Model Evaluation

For this tutorial, we'll use the scifact dataset from BeIR to compare the retrieval accuracy between the original model and the fine-tuned model. For a true apples to apples comparison, you should create your own domain-specific evaluation dataset that matches the domain of the synthetic fine-tuning dataset. This evaluation dataset should comprise of corpus, queries, and qrel (query relevance) scores.  

We will use NeMo Framework to restore both the original and fine-tuned models from their respective checkpoints and BeIR libraries to easily evaluate the retrieval accuracy. 

Finally we'll evaluate the model with NDCG@k, MAP@K, Recall@K and Precision@K scores. These metrics assess different aspects of retrieval performance, where NDCG and MAP focus on the quality of rankings, with higher values indicating better-ranked relevant documents.Recall measures how many relevant documents are retrieved at different ranks, improving as k increases. Precision evaluates the accuracy of the top k documents, with higher precision indicating more relevant results at the top.

In [27]:
from beir import util, LoggingHandler
from beir.retrieval import models
from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.evaluation import EvaluateRetrieval

import torch
import math
from tqdm import tqdm
import logging
import pathlib, os

#### Just some code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])
#### /print debug information to stdout

#### Download scifact.zip dataset and unzip the dataset
dataset = "scifact"
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
out_dir = os.path.join("/tmp", "datasets")
data_path = util.download_and_unzip(url, out_dir)

#### Provide the data_path where scifact has been downloaded and unzipped
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")


  0%|          | 0/5183 [00:00<?, ?it/s]

#### Create a wrapper NeMo model for retrieval evaluation on this dataset

In [28]:
from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
from nemo.collections.nlp.models.information_retrieval.megatron_bert_embedding_model import MegatronBertEmbeddingModel
from pytorch_lightning.trainer.trainer import Trainer
from typing import List, Dict
import numpy as np

class NeMoModel:
    def __init__(self, model_path=None, override_configs=None, **kwargs):
        cfg = MegatronBertEmbeddingModel.restore_from(model_path, return_config=True)
        if override_configs is not None:
            for k in override_configs:
                cfg[k] = override_configs[k]
        self.model = MegatronBertEmbeddingModel.restore_from(
            model_path,
            trainer=Trainer(),
            override_config_path=cfg)
        self.model = self.model.to("cuda:0").half()
    
    def encode_text(self, texts, batch_size=1, device="cuda:0"):
        with torch.no_grad():
            tokenized_texts = self.model.tokenizer.tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
            
            input_ids = tokenized_texts["input_ids"].to(device)
            attention_mask = tokenized_texts["attention_mask"].to(device)
            token_type_ids = tokenized_texts["token_type_ids"].to(device)

            num_batches = int(math.ceil(len(texts)/batch_size))

            embeddings = []
            for batch_id in tqdm(range(num_batches)):
                start = batch_size * batch_id
                end = batch_size * (batch_id+1)

                batch_embeddings = self.model(input_ids[start:end, :], attention_mask[start:end, :], token_type_ids[start:end, :])
                embeddings.append(batch_embeddings)
            return torch.cat(embeddings, dim=1).swapaxes(0,1)

    # Write your own encoding query function (Returns: Query embeddings as numpy array)
    def encode_queries(self, queries: List[str], batch_size: int, **kwargs) -> np.ndarray:
        queries = [f"query: {query}" for query in queries]
        embeddings = self.encode_text(texts=queries, batch_size=batch_size)
        return embeddings
    
    # Write your own encoding corpus function (Returns: Document embeddings as numpy array)  
    def encode_corpus(self, corpus: List[Dict[str, str]], batch_size: int, **kwargs) -> np.ndarray:
        corpus = [f"passage: {passage}" for passage in corpus]
        embeddings = self.encode_text(texts=corpus, batch_size=batch_size)
        return embeddings

#### Evaluate the Fine-tuned model:

NOTE: there may be a bug in Nemo 24.07 where certain global variables are set by default and must match the passed in config variables. One example is global_batch_size=8. So even though we set global_batch_size=4 during fine-tuning, we need to manually override it here to successfully restore the model. This does not impact the model performance. 

In [29]:
new_model = DRES(NeMoModel(model_path="/tmp/trained_model/checkpoints/megatron_bert.nemo", override_configs={'global_batch_size': 8}), batch_size=1)
retriever = EvaluateRetrieval(new_model, score_function="dot") # or "cos_sim" for cosine similarity
results = retriever.retrieve(corpus, queries)

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000] 
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
print(ndcg, _map, recall, precision)

I1115 06:52:24.300187 140712728913024 rank_zero.py:64] GPU available: True (cuda), used: True
I1115 06:52:24.300975 140712728913024 rank_zero.py:64] TPU available: False, using: 0 TPU cores
I1115 06:52:24.301555 140712728913024 rank_zero.py:64] HPU available: False, using: 0 HPUs


2024-11-15 06:52:24 - GPU available: True (cuda), used: True
2024-11-15 06:52:24 - TPU available: False, using: 0 TPU cores
2024-11-15 06:52:24 - HPU available: False, using: 0 HPUs


[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo I 2024-11-15 06:52:25 megatron_init:269] Rank 0 has data parallel group : [0]
[NeMo I 2024-11-15 06:52:25 megatron_init:275] Rank 0 has combined group of data parallel and context parallel : [0]
[NeMo I 2024-11-15 06:52:25 megatron_init:280] All data parallel group ranks with context parallel combined: [[0]]
[NeMo I 2024-11-15 06:52:25 megatron_init:283] Ranks 0 has data parallel rank: 0
[NeMo I 2024-11-15 06:52:25 megatron_init:291] Rank 0 has context parallel group: [0]
[NeMo I 2024-11-15 06:52:25 megatron_init:294] All context parallel group ranks: [[0]]
[NeMo I 2024-11-15 06:52:25 megatron_init:295] Ranks 0 has context parallel rank: 0
[NeMo I 2024-11-15 06:52:25 megatron_init:302] Rank 0 has model parallel group: [0]
[NeMo I 2024-11-15 06:52:25 megatron_init:303] All model parallel group ranks: [[0]]
[NeMo I 2024-11-15 06:52:25 megatron_init:312] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-11-15 06:52:25 megatron_init:316] All tensor model parallel group ranks: 

[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo I 2024-11-15 06:52:25 tokenizer_utils:183] Getting HuggingFace AutoTokenizer with pretrained_model_name: intfloat/e5-large-unsupervised
[NeMo I 2024-11-15 06:52:25 megatron_base_model:595] Padded vocab_size: 30592, original vocab_size: 30522, dummy tokens: 70.


[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo W 2024-11-15 06:52:25 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: persist_layer_norm in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: memory_efficient_layer_norm in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp8_margin in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp8_interval in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:52:25 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp

[NeMo I 2024-11-15 06:52:28 nlp_overrides:1346] Model MegatronBertEmbeddingModel was successfully restored from /tmp/trained_model/checkpoints/megatron_bert.nemo.


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:13<00:00, 22.04it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5183/5183 [03:35<00:00, 24.05it/s]


{'NDCG@1': 0.64333, 'NDCG@3': 0.70376, 'NDCG@5': 0.73643, 'NDCG@10': 0.75493, 'NDCG@100': 0.77466, 'NDCG@1000': 0.77817} {'MAP@1': 0.61428, 'MAP@3': 0.6783, 'MAP@5': 0.70104, 'MAP@10': 0.70977, 'MAP@100': 0.71426, 'MAP@1000': 0.71441} {'Recall@1': 0.61428, 'Recall@3': 0.75006, 'Recall@5': 0.82911, 'Recall@10': 0.88256, 'Recall@100': 0.97333, 'Recall@1000': 1.0} {'P@1': 0.64333, 'P@3': 0.27333, 'P@5': 0.18667, 'P@10': 0.09967, 'P@100': 0.01103, 'P@1000': 0.00113}


#### Evaluate the original model: 

In [30]:
# The original model
old_model = DRES(NeMoModel(model_path=PATH_TO_NEMO_MODEL), batch_size=1)
retriever = EvaluateRetrieval(old_model, score_function="dot") # or "cos_sim" for cosine similarity
results = retriever.retrieve(corpus, queries)

#### Evaluate your model with NDCG@k, MAP@K, Recall@K and Precision@K  where k = [1,3,5,10,100,1000] 
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
print(ndcg, _map, recall, precision)

I1115 06:56:54.851454 140712728913024 rank_zero.py:64] GPU available: True (cuda), used: True
I1115 06:56:54.852136 140712728913024 rank_zero.py:64] TPU available: False, using: 0 TPU cores
I1115 06:56:54.852690 140712728913024 rank_zero.py:64] HPU available: False, using: 0 HPUs


2024-11-15 06:56:54 - GPU available: True (cuda), used: True
2024-11-15 06:56:54 - TPU available: False, using: 0 TPU cores
2024-11-15 06:56:54 - HPU available: False, using: 0 HPUs


[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo I 2024-11-15 06:56:55 megatron_init:269] Rank 0 has data parallel group : [0]
[NeMo I 2024-11-15 06:56:55 megatron_init:275] Rank 0 has combined group of data parallel and context parallel : [0]
[NeMo I 2024-11-15 06:56:55 megatron_init:280] All data parallel group ranks with context parallel combined: [[0]]
[NeMo I 2024-11-15 06:56:55 megatron_init:283] Ranks 0 has data parallel rank: 0
[NeMo I 2024-11-15 06:56:55 megatron_init:291] Rank 0 has context parallel group: [0]
[NeMo I 2024-11-15 06:56:55 megatron_init:294] All context parallel group ranks: [[0]]
[NeMo I 2024-11-15 06:56:55 megatron_init:295] Ranks 0 has context parallel rank: 0
[NeMo I 2024-11-15 06:56:55 megatron_init:302] Rank 0 has model parallel group: [0]
[NeMo I 2024-11-15 06:56:55 megatron_init:303] All model parallel group ranks: [[0]]
[NeMo I 2024-11-15 06:56:55 megatron_init:312] Rank 0 has tensor model parallel group: [0]
[NeMo I 2024-11-15 06:56:55 megatron_init:316] All tensor model parallel group ranks: 

[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:55 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo I 2024-11-15 06:56:56 tokenizer_utils:216] Getting Megatron tokenizer for pretrained model name: intfloat/e5-large-unsupervised, custom vocab file: None, and merges file: None
[NeMo I 2024-11-15 06:56:56 tokenizer_utils:132] Getting HuggingFace AutoTokenizer with pretrained_model_name: intfloat/e5-large-unsupervised, vocab_file: None, merges_files: None, special_tokens_dict: {}, and use_fast: False
[NeMo I 2024-11-15 06:56:56 megatron_base_model:595] Padded vocab_size: 30592, original vocab_size: 30522, dummy tokens: 70.


[NeMo W 2024-11-15 06:56:56 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: moe_extended_tp in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:1182] The model: MegatronBertEmbeddingModel() doe

[NeMo W 2024-11-15 06:56:56 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: persist_layer_norm in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: memory_efficient_layer_norm in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp8_margin in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp8_interval in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
[NeMo W 2024-11-15 06:56:56 megatron_base_model:568] The model: MegatronBertEmbeddingModel() does not have field.name: fp

[NeMo I 2024-11-15 06:56:58 nlp_overrides:1346] Model MegatronBertEmbeddingModel was successfully restored from /workspace/files/models/NV-Embed-QA-4.nemo.


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:12<00:00, 24.19it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5183/5183 [03:29<00:00, 24.73it/s]


{'NDCG@1': 0.63, 'NDCG@3': 0.69648, 'NDCG@5': 0.72435, 'NDCG@10': 0.74511, 'NDCG@100': 0.76424, 'NDCG@1000': 0.76918} {'MAP@1': 0.59828, 'MAP@3': 0.66946, 'MAP@5': 0.68929, 'MAP@10': 0.69941, 'MAP@100': 0.70364, 'MAP@1000': 0.70387} {'Recall@1': 0.59828, 'Recall@3': 0.74533, 'Recall@5': 0.81444, 'Recall@10': 0.873, 'Recall@100': 0.96333, 'Recall@1000': 1.0} {'P@1': 0.63, 'P@3': 0.27111, 'P@5': 0.182, 'P@10': 0.09867, 'P@100': 0.01093, 'P@1000': 0.00113}


As you can see, there is some improvement in the results on evaluation. Using a larger amount of data for fine-tuning and proprietary, domain-specific data is likely to make the improvement much more significant. From some initial testing with proprietary corporate data, we've seen around 5-10% accuracy improvement. Your results may vary depending on the other configurations set. 

**Congratulations!** You've officially created synthetic data and fine-tuned a text embedding model using NeMo Framework!