# Finetuning E5-Large/LLaMA-3.2-1B Model on AllNLI for Embedding Tasks  

## Goal  
While the `intfloat/e5-large-v2 or LLaMA-3.2-1B` model is a powerful pretrained embedding model released on Hugging Face, adapting it for **specific downstream tasks** (such as semantic search, document retrieval, clustering, or retrieval-augmented generation) greatly benefits from **domain-specific finetuning**.  

In this tutorial, we will demonstrate how to:  
- Convert the Hugging Face `e5-large-v2/ LLaMA-3.2-1B` model into NeMo’s `.nemo` format.  
- Prepare the **AllNLI triplet dataset** in a format compatible with NeMo’s `CustomRetrievalDataModule`.  
- Fine-tune the E5 model to enhance its performance on **embedding-rich tasks** (retrieval, RAG, text similarity, etc.).  

By leveraging **triplet training** (query, positive doc, negative doc), the model learns:  
- To generate **semantically meaningful dense vector representations (embeddings)**.  
- Improve retrieval quality by maximizing similarity between queries and positive docs while minimizing similarity with negatives.  

***

## NeMo Tools and Resources  
- [NeMo Framework Documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html)  
- Hugging Face Model Hub (`intfloat/e5-large-v2 OR meta-llama/Llama-3.2-1B`)  
- NeMo `llm.import_ckpt` utility for checkpoint conversion  
- NeMo `CustomRetrievalDataModule` for embedding retraining  

***

## Software Requirements  
- NVIDIA NeMo Framework (`pip install nemo_toolkit[all]` or use NGC container)  
- Hugging Face CLI & `datasets` library  
- PyTorch >= 2.0, CUDA-enabled environment  

***

## Hardware Requirements  
This playbook has been tested on:  
- **Single GPU** setups (A100, H100) for quick runs  
- **Multi-GPU** (2×A100, 2×H100) for large-scale training  
- NeMo is fully scalable across **multi-node GPU clusters** via `torchrun` or `nemo_run`  

***

## Launching the NeMo Container  

### With Docker  
```bash
docker run \
  --gpus all \
  --shm-size=2g \
  --net=host \
  --ulimit memlock=-1 \
  --rm -it \
  -v ${PWD}:/workspace \
  -w /workspace \
  nvcr.io/nvidia/nemo:25.04 bash
```

Start Jupyter inside the container:  
```bash
jupyter-lab --ip=0.0.0.0 --allow-root \
  --NotebookApp.token="embedding" \
  --port=9989 --notebook-dir=/workspace
```

### With Enroot (alternative)  
```bash
mkdir -p "$PWD/.jupyter_data" "$PWD/.jupyter_runtime" "$PWD/.jupyter_config" \
         "$PWD/.hf_cache" "$PWD/.matplotlib" "$PWD/.triton" "$PWD/.cache" \
         "$PWD/enroot_data" "$PWD/enroot_cache" "$PWD/nemo_home" \
         "$PWD/nemo_cache" "$PWD/nemo_run"

ENROOT_DATA_PATH="$PWD/enroot_data" \
ENROOT_CACHE_PATH="$PWD/enroot_cache" \
enroot start --root \
  --mount "$PWD:/host_pwd" \
  --env NVIDIA_VISIBLE_DEVICES=0,1 \
  --env NVIDIA_DRIVER_CAPABILITIES=all \
  --env JUPYTER_DATA_DIR=/host_pwd/.jupyter_data \
  --env JUPYTER_RUNTIME_DIR=/host_pwd/.jupyter_runtime \
  --env JUPYTER_CONFIG_DIR=/host_pwd/.jupyter_config \
  --env HF_HOME=/host_pwd/.hf_cache \
  --env MPLCONFIGDIR=/host_pwd/.matplotlib \
  --env TRITON_CACHE_DIR=/host_pwd/.triton \
  --env XDG_CACHE_HOME=/host_pwd/.cache \
  --env NEMO_HOME=/host_pwd/PythonNotebook/nemo_home \
  --env NEMO_MODELS_CACHE=/host_pwd/PythonNotebook/nemo_cache \
  --env NEMO_RUN_DIR=/host_pwd/PythonNotebook/nemo_run \
  nemo-25.04 \
  jupyter-lab --ip=0.0.0.0 --allow-root \
              --NotebookApp.token="embedding" \
              --port=9989 \
              --notebook-dir=/host_pwd
```

# Prepare AllNLI triplet data for CustomRetrievalDataModule

This cell:
- Downloads the AllNLI triplet training split via Hugging Face Datasets.
- Transforms each triplet into a record compatible with CustomRetrievalDataModule:
  - query: anchor sentence
  - pos_doc: positive (entailing) sentence
  - neg_doc: negative (contradicting/neutral) sentence
- Saves all records to a UTF-8 JSON file allnli_triplet.json (pretty-printed).

Prerequisites:
- pip install datasets
- Internet access and sufficient disk space (dataset is large).

Output format (example record):
```
{
  "query": "A man is playing a guitar on stage.",
  "pos_doc": "Someone is performing music in front of an audience.",
  "neg_doc": "No one is playing any instruments."
}
```

Notes:
- To test quickly, you can load a subset: split='train[:1%]'.
- If you hit memory limits, consider processing in chunks or using streaming (IterableDataset) and writing incrementally.

In [None]:
import json
from datasets import load_dataset

print("Downloading AllNLI triplet dataset (train split)...")
ds = load_dataset('sentence-transformers/all-nli', 'triplet', split='train')
len(ds), ds


print("Transforming to CustomRetrievalDataModule-compatible JSON...")

records = [
    {
        "query": ex["anchor"],
        "pos_doc": ex["positive"],
        "neg_doc": ex["negative"],
    }
    for ex in ds
]

out_path = "allnli_triplet.json"
with open(out_path, "w", encoding="utf-8") as f:
    json.dump(records, f, ensure_ascii=False, indent=2)

print(f"Saved {len(records)} triplets to {out_path}")

# Import Hugging Face E5-Large model into NeMo format

This script:
- Uses NVIDIA NeMo's `llm.import_ckpt` utility to **download and convert** the Hugging Face model `intfloat/e5-large-v2`.
- Wraps the E5-Large embedding model in NeMo’s `BertEmbeddingModel` + `BertEmbeddingLargeConfig`.
- Saves the converted checkpoint to a local `.nemo` file (`e5-large-v2.nemo`).

Steps performed:
1. Define working directory as default `NEMO_HOME` and `NEMO_MODELS_CACHE`.
2. Initialize the NeMo model configuration for E5-Large.
3. Import the model from Hugging Face (`hf://intfloat/e5-large-v2`).
4. Persist the converted checkpoint locally as `e5-large-v2.nemo`.
5. Log start/end points for progress visibility.

Prerequisites:
- Nemo Container OR`pip install nemo_toolkit[all]` (or latest NeMo nightly with `collections.llm`).
- Hugging Face access (ensure `huggingface_hub` is installed).
- Adequate GPU/CPU + disk space (~1.5GB checkpoint).

Usage:
```bash
python import_e5_large.py
```

Output:
- A NeMo-compatible checkpoint: `e5-large-v2.nemo`  

In [2]:
%%writefile import_e5_large.py
#!/usr/bin/env python3
"""
import_e5_large.py

Downloads and converts the `intfloat/e5-large-v2` embedding model from Hugging Face
into NeMo `.nemo` format using NeMo's llm.import_ckpt utility.
"""

import os
import logging
from nemo.collections import llm

# Setup logging
logging.basicConfig(level=logging.INFO)

def main():
    # Step 1: Define working directory and environment paths
    cwd = os.getcwd()
    os.environ.setdefault("NEMO_HOME", cwd)
    os.environ["NEMO_MODELS_CACHE"] = cwd

    # Step 2: Create model config for E5-Large embeddings
    model_config = llm.BertEmbeddingModel(llm.BertEmbeddingLargeConfig())

    # Hugging Face source
    hf_source = 'hf://intfloat/e5-large-v2'

    # Step 3: Convert and save model
    output_file = os.path.join(cwd, 'e5-large-v2.nemo')
    logging.info(f" Importing from {hf_source} → {output_file}...")

    llm.import_ckpt(
        model=model_config,
        source=hf_source,
        output_path=output_file,
    )

    # Step 4: Confirm success
    logging.info(f" Done. Checkpoint saved to {os.path.abspath(output_file)}")

if __name__ == '__main__':
    main()


Writing import_e5_large.py


# Authenticate with Hugging Face and run NeMo model import  

This cell:  
- Logs into Hugging Face with a personal access token so that private or gated models can be downloaded.  
- Executes the `import_e5_large.py` script using `torchrun` — this ensures proper distributed/parallel execution if GPUs are available.  

Steps:  
1. `huggingface-cli login --token "…" `
   - Provides your token to Hugging Face Hub (so `intfloat/e5-large-v2` can be accessed).  
   - Token is stored locally (usually under `~/.huggingface/token`).  

2. `torchrun import_e5_large.py`  
   - Launches the import script you created.  
   - Downloads the Hugging Face model weights.  
   - Converts them into NeMo `.nemo` format.  
   - Saves locally as `e5-large-v2.nemo`.  


- Use `!huggingface-cli login` without `--token` and paste interactively, or  
- Set the token in an environment variable (e.g., `export HUGGINGFACE_TOKEN=...`) and run `!huggingface-cli login --token "$HUGGINGFACE_TOKEN"`.  

Output after successful run:  
- `e5-large-v2.nemo` checkpoint will appear in your working directory, ready for use inside NeMo pipelines.  

In [None]:
!huggingface-cli login --token "hf_***************************"
!torchrun import_e5_large.py

In [4]:
%%writefile import_llama1b.py
#!/usr/bin/env python3
"""
import_llama1b.py

This script downloads and converts a Hugging Face-hosted LLaMA-3 1B embedding model
into NeMo format using NVIDIA NeMo's `llm.import_ckpt` utility.

The final `.nemo` checkpoint is saved in the current working directory.
"""

import os
import logging
from nemo.collections import llm

# Setup logging
logging.basicConfig(level=logging.INFO)

def main():
    # Step 1: Define working directory and set required environment variables
    cwd = os.getcwd()
    os.environ.setdefault("NEMO_HOME", cwd)
    os.environ["NEMO_MODELS_CACHE"] = cwd

    # Step 2: Create model config for LLaMA-3 1B embeddings
    model_config = llm.LlamaEmbeddingModel(llm.Llama32EmbeddingConfig1B())

    # Define Hugging Face source
    hf_source = 'hf://meta-llama/Llama-3.2-1B'

    # Step 3: Convert and save model to NeMo format
    output_file = os.path.join(cwd, 'Llama-3.2-1B.nemo')
    logging.info(f" Importing from {hf_source} → {output_file}...")

    llm.import_ckpt(
        model=model_config,
        source=hf_source,
        output_path=output_file,
    )

    # Step 4: Confirm success
    logging.info(f" Done. Checkpoint saved to {os.path.abspath(output_file)}")

if __name__ == '__main__':
    main()

Writing import_llama1b.py


In [20]:
#!huggingface-cli login --token "hf_*******************************"
#!torchrun import_e5_large.py

# Setup environment and training inputs  

This cell:  
- Imports required libraries (`os`, `nemo_run`, and NeMo’s `llm`).  
- Defines key paths for training and experimentation:  
  - `TRAIN_DATA_PATH`: Path to the dataset in **CustomRetrievalDataModule-compatible JSON** (prepared previously from AllNLI triplets).  
  - `PRETRAINED_NEMO_MODEL`: Path to the converted E5-Large NeMo checkpoint (`.nemo` file imported earlier).  
- Sets `NEMO_HOME` to the current working directory — NeMo will use this as its default workspace for logs, caches, and outputs.  

Prerequisites:  
- You should already have run:  
  - The dataset export (`allnli_triplet.json`).  
  - The Hugging Face → NeMo import for E5-Large (`e5-large-v2.nemo`).  
- Ensure `nemo_run` (MCP runtime/launcher) and `nemo-toolkit` with `collections.llm` are installed.  

Next steps:  
- Attach this environment setup with your **training or finetuning script** for embedding retrieval tasks.  

In [2]:
import os
import nemo_run as run
from nemo.collections import llm

# Dataset path (downloaded separately)
TRAIN_DATA_PATH = "allnli_triplet.json"

# Pretrained E5 checkpoint (converted earlier using import_e5_large.py)
PRETRAINED_NEMO_MODEL = "e5-large-v2.nemo"

# NeMo working directory
os.environ["NEMO_HOME"] = os.getcwd()

      from .autonotebook import tqdm as notebook_tqdm
    
      """
    
      """Run interleaved 1F1B schedule with communication between pipeline stages as needed.
    
      """Run non-interleaved 1F1B schedule, with communication between pipeline stages.
    
      """Adam optimizer with ZeRO algorithm.
    
      """
    
      command_line, option="strong", grp_pattern="\d+"
    
      re_han_default = re.compile("([\u4E00-\u9FD5a-zA-Z0-9+#&\._%\-]+)", re.U)
    
      re_skip_default = re.compile("(\r\n|\s)", re.U)
    
      re_skip = re.compile("([a-zA-Z0-9]+(?:\.\d+)?%?)")
    
      text = re.sub("\s+", " ", text)
    
      text = re.sub("\s+\.\s+", ".", text)
    
      m = re.match('([su]([0-9]{1,2})p?) \(([0-9]{1,2}) bit\)$', token)
    
      m2 = re.match('([su]([0-9]{1,2})p?)( \(default\))?$', token)
    
      elif re.match('(flt)p?( \(default\))?$', token):
    
      elif re.match('(dbl)p?( \(default\))?$', token):
    
    
      _cupy_array_type_regex = re.compi

# Define Custom DataLoader for Triplet Retrieval  

This helper function builds a **CustomRetrievalDataModule** that feeds triplet-style data (query, positive doc, negative doc) into the E5 embedding model during training.  

Key points:  
- **Inputs:**  
  - `data_path`: Path to JSON dataset (e.g., `allnli_triplet.json`).  
  - `dataset_identifier`: A label for experiment tracking (`'allnli_e5_triplet'`).  
  - `seq_length`: Maximum token length for inputs (default: 512).  
  - `micro_batch_size`: Per-GPU batch size (default: 16).  
  - `global_batch_size`: Total batch size across GPUs (default: 64).  
  - `tokenizer`: Tokenizer object (should match the pretrained `e5-large-v2` checkpoint).  
  - `num_workers`: Dataloader workers for efficient loading (default: 8).  

- **Triplet Mapping:**  
  - `query_key="query"` → input query sentence.  
  - `pos_doc_key="pos_doc"` → positive sentence (entailment).  
  - `neg_doc_key="neg_doc"` → negative sentence (contradiction/neutral).  

- **Return:**  
  - A configured `run.Config` object, wrapping `llm.CustomRetrievalDataModule`, which NeMo can use directly in training pipelines.  

Usage example:  
```python
tokenizer = llm.AutoTokenizer.from_pretrained("hf://intfloat/e5-large-v2")
train_dataloader = get_custom_dataloader(
    data_path=TRAIN_DATA_PATH,
    tokenizer=tokenizer
)

In [3]:
def get_custom_dataloader(
    data_path,
    dataset_identifier='allnli_e5_triplet',
    seq_length=512,
    micro_batch_size=16,
    global_batch_size=64,
    tokenizer=None,
    num_workers=8
):
    """
    Creates a CustomRetrievalDataModule for triplet training with E5.
    """
    return run.Config(
        llm.CustomRetrievalDataModule,
        data_root=data_path,
        dataset_identifier=dataset_identifier,
        seq_length=seq_length,
        micro_batch_size=micro_batch_size,
        global_batch_size=global_batch_size,
        tokenizer=tokenizer,
        num_workers=num_workers,
        query_key="query",
        pos_doc_key="pos_doc",
        neg_doc_key="neg_doc",
    )


# Fine-tune E5-Large on AllNLI Triplets  

This function defines and launches a **fine-tuning run** for the `e5-large-v2` embedding model on the **AllNLI triplet dataset**. It integrates the dataset, pretrained checkpoint, and NeMo recipe into a runnable training workflow.  

### Workflow:
1. **Pretrained checkpoint**  
   - Loads `e5-large-v2.nemo` converted earlier with `import_e5_large.py`.  

2. **Dataset integration**  
   - Calls `get_custom_dataloader(...)` to wrap the `allnli_triplet.json` dataset in a `CustomRetrievalDataModule`.  

3. **Training recipe**  
   - Loads NeMo’s built-in `e5_340m.finetune_recipe` as a base.  
   - Overrides settings for this run (e.g. model path, dataset, hardware params).  
   - Uses **single-GPU, single-node setup** (modifiable).  

4. **Customized hyperparameters**  
   - Enables **global in-batch negatives** for contrastive retrieval.  
   - Learning rate: `5e-6` (with scheduler min LR = `5e-7`).  
   - Training duration: 100 steps (toy example, extend as needed).  
   - Validation every 10 steps, checking 5 mini-batches each time.  

5. **Execution**  
   - `run.run(recipe, executor=run.LocalExecutor())` launches training locally.  

### Usage:
```python
train_e5_on_allnli(TRAIN_DATA_PATH)
```

This will:  
- Load your pretrained **E5-Large `.nemo` checkpoint**  
- Fine-tune it on **AllNLI triplets**  
- Save outputs/logs inside your current `NEMO_HOME` workspace  

In [4]:
def train_e5_on_allnli(json_file_path):
    """
    Fine-tune the E5 model on the AllNLI triplet dataset.
    """
    pretrained_model_path = os.path.abspath(PRETRAINED_NEMO_MODEL)

    # Create datamodule
    datamodule = get_custom_dataloader(
        data_path=json_file_path,
        dataset_identifier="allnli_e5_triplet"
    )

    # Load recipe
    recipe = llm.recipes.e5_340m.finetune_recipe(
        name="allnli_e5_large_finetune",
        resume_path=pretrained_model_path,
        num_nodes=1,
        num_gpus_per_node=1,
    )

    # Customize recipe params
    recipe.model.config.global_in_batch_negatives = True
    recipe.optim.config.lr = 5e-6
    recipe.optim.lr_scheduler.min_lr = 5e-7
    recipe.trainer.max_steps = 100
    recipe.trainer.val_check_interval = 10
    recipe.trainer.limit_val_batches = 5
    recipe.data = datamodule

    # Run training
    run.run(recipe, executor=run.LocalExecutor())


# Launch Fine-tuning Run  

This cell executes the full **fine-tuning job** by calling:  

```python
train_e5_on_allnli(TRAIN_DATA_PATH)
```

What happens when run:  
- Loads the **pretrained `e5-large-v2.nemo` checkpoint**.  
- Prepares the **AllNLI triplet dataset** (`allnli_triplet.json`) via the custom dataloader.  
- Builds and configures the **fine-tune recipe** (single GPU, local execution).  
- Starts training with the specified hyperparameters (100 steps, periodic validation).  
- Saves logs, checkpoints, and artifacts inside your `NEMO_HOME` directory.  

⚠️ Note: This configuration is set up for a **short debugging/trial run** (100 steps).  
- Increase `recipe.trainer.max_steps` and adjust validation intervals for full training.  
- For multi-GPU or cluster training, replace `LocalExecutor()` with a distributed executor (or use `torchrun`).  


In [7]:
train_e5_on_allnli(TRAIN_DATA_PATH)

Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1756196008/nemo.collections.llm.api.finetune


Log directory is: /root/.nemo_run/experiments/nemo.collections.llm.api.finetune/nemo.collections.llm.api.finetune_1756196008/nemo.collections.llm.api.finetune
Launched app: local_persistent://nemo_run/nemo.collections.llm.api.finetune-zn4250k9j945t


Waiting for job nemo.collections.llm.api.finetune-zn4250k9j945t to finish [log=True]...


i.finetune/0       """
i.finetune/0     
i.finetune/0       """Run interleaved 1F1B schedule with communication between pipeline stages as needed.
i.finetune/0     
i.finetune/0       """Run non-interleaved 1F1B schedule, with communication between pipeline stages.
i.finetune/0     
i.finetune/0       """Adam optimizer with ZeRO algorithm.
i.finetune/0     
i.finetune/0       """
i.finetune/0     
i.finetune/0       command_line, option="strong", grp_pattern="\d+"
i.finetune/0     
i.finetune/0       re_han_default = re.compile("([\u4E00-\u9FD5a-zA-Z0-9+#&\._%\-]+)", re.U)
i.finetune/0     
i.finetune/0       re_skip_default = re.compile("(\r\n|\s)", re.U)
i.finetune/0     
i.finetune/0       re_skip = re.compile("([a-zA-Z0-9]+(?:\.\d+)?%?)")
i.finetune/0     
i.finetune/0       text = re.sub("\s+", " ", text)
i.finetune/0     
i.finetune/0       text = re.sub("\s+\.\s+", ".", text)
i.finetune/0     
i.finetune/0       m = re.match('([su]([0-9]{1,2})p?) \(([0-9]{1,2}) bit\)$', token)
i

Job nemo.collections.llm.api.finetune-zn4250k9j945t finished: SUCCEEDED
