# Part II: LoRA Fine-tuning Using NeMo Customizer

This notebook covers the following:

0. [Prerequisites: Configurations, Health Checks, and Namespaces](#step-0)
1. [Upload Data to NeMo Datastore](#step-1)
2. [LoRA Customization with NeMo Customizer](#step-2)
3. [Running Inference on the Customized Model with NVIDIA NIM](#step-3)

In [1]:
import os
import json
import requests
import random
from openai import OpenAI

<a id="step-0"></a>
## Prerequisites: Configurations, Health Checks, and Namespaces

Before you proceed, make sure that you completed the first notebook on data preparation to obtain the assets required to follow along.

### Configure NeMo Microservices Endpoints

This section includes importing required libraries, configuring endpoints, and performing health checks to ensure that the NeMo Data Store, NIM, and other services are running correctly.

In [2]:
from config import *

print(f"Data Store endpoint: {NDS_URL}")
print(f"Entity Store, Customizer, Evaluator endpoint: {NEMO_URL}")
print(f"NIM endpoint: {NIM_URL}")
print(f"Namespace: {NMS_NAMESPACE}")
print(f"Base Model for Customization: {BASE_MODEL}")

Data Store endpoint: http://data-store.test
Entity Store, Customizer, Evaluator endpoint: http://nemo.test
NIM endpoint: http://nim.test
Namespace: xlam-tutorial-ns
Base Model for Customization: meta/llama-3.2-1b-instruct


### Configure Path to Prepared data

The following code sets the paths to the prepared dataset files.

In [3]:
# Path where data preparation notebook saved finetuning and evaluation data
DATA_ROOT = os.path.join(os.getcwd(), "data")
CUSTOMIZATION_DATA_ROOT = os.path.join(DATA_ROOT, "customization")
VALIDATION_DATA_ROOT = os.path.join(DATA_ROOT, "validation")
EVALUATION_DATA_ROOT = os.path.join(DATA_ROOT, "evaluation")

# Sanity checks
train_fp = f"{CUSTOMIZATION_DATA_ROOT}/training.jsonl"
assert os.path.exists(train_fp), f"The training data at '{train_fp}' does not exist. Please ensure that the data was prepared successfully."

val_fp = f"{VALIDATION_DATA_ROOT}/validation.jsonl"
assert os.path.exists(val_fp), f"The validation data at '{val_fp}' does not exist. Please ensure that the data was prepared successfully."

test_fp = f"{EVALUATION_DATA_ROOT}/xlam-test-single.jsonl"
assert os.path.exists(test_fp), f"The test data at '{test_fp}' does not exist. Please ensure that the data was prepared successfully."

### Resource Organization Using Namespace

You can use a [namespace](https://developer.nvidia.com/docs/nemo-microservices/manage-entities/namespaces/index.html) to isolate and organize the artifacts in this tutorial.

#### Create Namespace

Both Data Store and Entity Store use namespaces. The following code creates namespaces for the tutorial.

In [4]:
def create_namespaces(entity_host, ds_host, namespace):
    # Create namespace in Entity Store
    entity_store_url = f"{entity_host}/v1/namespaces"
    resp = requests.post(entity_store_url, json={"id": namespace})
    assert resp.status_code in (200, 201, 409, 422), \
        f"Unexpected response from Entity Store during namespace creation: {resp.status_code}"
    print(resp)

    # Create namespace in Data Store
    nds_url = f"{ds_host}/v1/datastore/namespaces"
    resp = requests.post(nds_url, data={"namespace": namespace})
    assert resp.status_code in (200, 201, 409, 422), \
        f"Unexpected response from Data Store during namespace creation: {resp.status_code}"
    print(resp)

create_namespaces(entity_host=NEMO_URL, ds_host=NDS_URL, namespace=NMS_NAMESPACE)

<Response [200]>
<Response [409]>


#### Verify Namespaces

The following [Data Store API](https://developer.nvidia.com/docs/nemo-microservices/api/datastore.html) and [Entity Store API](https://developer.nvidia.com/docs/nemo-microservices/api/entity-store.html) list the namespace created in the previous cell.

In [5]:
# Verify Namespace in Data Store
response = requests.get(f"{NDS_URL}/v1/datastore/namespaces/{NMS_NAMESPACE}")
print(f"Status Code: {response.status_code}\nResponse JSON: {response.json()}")

# Verify Namespace in Entity Store
response = requests.get(f"{NEMO_URL}/v1/namespaces/{NMS_NAMESPACE}")
print(f"Status Code: {response.status_code}\nResponse JSON: {response.json()}")

Status Code: 201
Response JSON: {'namespace': 'xlam-tutorial-ns', 'created_at': '2025-04-07T23:37:44Z', 'updated_at': '2025-04-08T07:03:53Z'}
Status Code: 200
Response JSON: {'id': 'xlam-tutorial-ns', 'created_at': '2025-04-08T07:05:07.362008', 'updated_at': '2025-04-08T07:05:07.362012', 'description': None, 'project': None, 'custom_fields': {}, 'ownership': None}


**Tips**:
* You may generally use `{DATASTORE_HOST}/v1/datastore/namespaces/` and `{ENTITYSTORE_HOST}/v1/namespaces/` GET APIs to list **all** available namespaces.
* Send DELETE requests to `{DATASTORE_HOST}/v1/datastore/namespaces/{namespace}` and `{ENTITYSTORE_HOST}/v1/namespaces/{namespace}` APIs to delete a namespace.

---
<a id="step-1"></a>
## Step 1: Upload Data to NeMo Data Store

The NeMo Data Store supports data management using the Hugging Face `HfApi` Client. 

**Note that this step does not interact with Hugging Face at all, it just uses the client library to interact with NeMo Data Store.** This is in comparison to the previous notebook, where we used the `load_dataset` API to download the xLAM dataset from Hugging Face's repository.

More information can be found in [documentation](https://developer.nvidia.com/docs/nemo-microservices/manage-entities/tutorials/manage-dataset-files.html#set-up-hugging-face-client)

### 1.1 Create Repository

In [8]:
repo_id = f"{NMS_NAMESPACE}/{DATASET_NAME}"

In [10]:
from huggingface_hub import HfApi

hf_api = HfApi(endpoint=f"{NDS_URL}/v1/hf", token="")

# Create repo
hf_api.create_repo(
    repo_id=repo_id,
    repo_type='dataset',
)

RepoUrl('datasets/xlam-tutorial-ns/xlam-ft-dataset', endpoint='http://data-store.test/v1/hf', repo_type='dataset', repo_id='xlam-tutorial-ns/xlam-ft-dataset')

Next, creating a dataset programmatically requires two steps: uploading and registration. More information can be found in [documentation](https://developer.nvidia.com/docs/nemo-microservices/manage-entities/datasets/create-dataset.html#how-to-create-a-dataset).

### 1.2 Upload Dataset Files to NeMo Data Store

In [11]:
hf_api.upload_file(path_or_fileobj=train_fp,
    path_in_repo="training/training.jsonl",
    repo_id=repo_id,
    repo_type='dataset',
)

hf_api.upload_file(path_or_fileobj=val_fp,
    path_in_repo="validation/validation.jsonl",
    repo_id=repo_id,
    repo_type='dataset',
)

hf_api.upload_file(path_or_fileobj=test_fp,
    path_in_repo="testing/xlam-test-single.jsonl",
    repo_id=repo_id,
    repo_type='dataset',
)

training.jsonl:   0%|          | 0.00/6.06M [00:00<?, ?B/s]

validation.jsonl:   0%|          | 0.00/1.30M [00:00<?, ?B/s]

xlam-test-single.jsonl:   0%|          | 0.00/1.19M [00:00<?, ?B/s]

CommitInfo(commit_url='', commit_message='Upload testing/xlam-test-single.jsonl with huggingface_hub', commit_description='', oid='f3f1705c7652aaa06823693293c44ce228884a85', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)

Other tips:
* Take a look at the `path_in_repo` argument above. If there are more than one files in the subfolders:
    * All the .jsonl files in `training/` will be merged and used for training by customizer.
    * All the .jsonl files in `validation/` will be merged and used for validation by customizer.
* NeMo Data Store generally supports data management using the [HfApi API](https://huggingface.co/docs/huggingface_hub/en/package_reference/hf_api). For example, to delete a repo, you may use - 
```python
   hf_api.delete_repo(
     repo_id=repo_id,
     repo_type="dataset"
)
```

### 1.3 Register the Dataset with NeMo Entity Store

To use a dataset for operations such as evaluations and customizations, register a dataset using the `/v1/datasets` endpoint.
Register the dataset to refer to it by its namespace and name afterward.

In [12]:
resp = requests.post(
    url=f"{NEMO_URL}/v1/datasets",
    json={
        "name": DATASET_NAME,
        "namespace": NMS_NAMESPACE,
        "description": "Tool calling xLAM dataset in OpenAI ChatCompletions format",
        "files_url": f"hf://datasets/{NMS_NAMESPACE}/{DATASET_NAME}",
        "project": "tool_calling",
    },
)
assert resp.status_code in (200, 201), f"Status Code {resp.status_code} Failed to create dataset {resp.text}"
resp.json()

{'created_at': '2025-04-08T07:05:42.895214',
 'updated_at': '2025-04-08T07:05:42.895217',
 'name': 'xlam-ft-dataset',
 'namespace': 'xlam-tutorial-ns',
 'description': 'Tool calling xLAM dataset in OpenAI ChatCompletions format',
 'format': None,
 'files_url': 'hf://datasets/xlam-tutorial-ns/xlam-ft-dataset',
 'hf_endpoint': None,
 'split': None,
 'limit': None,
 'id': 'dataset-3ozLsXkX7TqvQm9fCuCUGT',
 'project': 'tool_calling',
 'custom_fields': {}}

In [13]:
# Sanity check to validate dataset
res = requests.get(url=f"{NEMO_URL}/v1/datasets/{NMS_NAMESPACE}/{DATASET_NAME}")
assert res.status_code in (200, 201), f"Status Code {res.status_code} Failed to fetch dataset {res.text}"
dataset_obj = res.json()

print("Files URL:", dataset_obj["files_url"])
assert dataset_obj["files_url"] == f"hf://datasets/{repo_id}"

Files URL: hf://datasets/xlam-tutorial-ns/xlam-ft-dataset


---
<a id="step-2"></a>
## 2. LoRA Customization with NeMo Customizer

### 2.1 Start the Training Job


Start the training job by sending a POST request to the `/v1/customization/jobs` endpoint.
The following code sets the training parameters and sends the request.

 **The training job will take approximately 45 minutes to complete.**

In [14]:
headers = {"wandb-api-key": WANDB_API_KEY} if WANDB_API_KEY else None

training_params = {
    "name": "llama-3.2-1b-xlam-ft",
    "output_model": f"{NMS_NAMESPACE}/llama-3.2-1b-xlam-run1",
    "config": BASE_MODEL,
    "dataset": {"name": DATASET_NAME, "namespace" : NMS_NAMESPACE},
    "hyperparameters": {
        "training_type": "sft",
        "finetuning_type": "lora",
        "epochs": 2,
        "batch_size": 16,
        "learning_rate": 0.0001,
        "lora": {
            "adapter_dim": 32,
            "adapter_dropout": 0.1
        }
    }
}

resp = requests.post(f"{NEMO_URL}/v1/customization/jobs", json=training_params, headers=headers)
customization = resp.json()
customization

{'id': 'cust-BTkGbfifLfEAjV2THu3tas',
 'created_at': '2025-04-08T07:05:52.236823',
 'updated_at': '2025-04-08T07:05:52.236829',
 'namespace': 'default',
 'dataset': 'xlam-tutorial-ns/xlam-ft-dataset',
 'output_model': 'xlam-tutorial-ns/llama-3.2-1b-xlam-run1@cust-BTkGbfifLfEAjV2THu3tas',
 'config': {'base_model': 'meta/llama-3.2-1b-instruct',
  'precision': 'bf16-mixed',
  'num_gpus': 1,
  'num_nodes': 1,
  'micro_batch_size': 1,
  'tensor_parallel_size': 1,
  'max_seq_length': 4096,
  'prompt_template': '{prompt} {completion}'},
 'hyperparameters': {'finetuning_type': 'lora',
  'training_type': 'sft',
  'batch_size': 16,
  'epochs': 2,
  'learning_rate': 0.0001,
  'lora': {'adapter_dim': 32, 'alpha': 16, 'adapter_dropout': 0.1},
  'sequence_packing_enabled': False},
 'status': 'created',
 'status_details': {'created_at': '2025-04-08T07:05:53.328702',
  'updated_at': '2025-04-08T07:05:53.328702',
  'steps_completed': 0,
  'epochs_completed': 0,
  'percentage_done': 0.0,
  'status_logs'

The following code sets variables for storing the job ID and customized model name.

In [15]:
# To track status
JOB_ID = customization["id"]

# This will be the name of the model that will be used to send inference queries to
CUSTOMIZED_MODEL = customization["output_model"]

**Tips**:
* If you configured the NeMo Customizer microservice with your own [Weights & Biases (WandB)](https://wandb.ai/) API key, you can find the training graphs and logs in your WandB account, "nvidia-nemo-customizer" project. Your run ID is similar to your customization `JOB_ID`.
  
* To cancel a job that you scheduled incorrectly, run the following code.
  
  ```python
  requests.post(f"{NEMO_URL}/v1/customization/jobs/{JOB_ID}/cancel")
  ```

### 2.2 Get Job Status

Get the job status by sending a GET request to the `/v1/customization/jobs/{JOB_ID}/status` endpoint.
The following code sets the job ID and sends the request.

In [17]:
response = requests.get(f"{NEMO_URL}/v1/customization/jobs/{JOB_ID}/status")

assert response.status_code == 200, (
    f"Status Code {response.status_code}: Failed to get job status. Response: {response.text}"
)
print("Response JSON:", json.dumps(response.json(), indent=4))

Response JSON: {
    "created_at": "2025-04-08T07:05:53.328702",
    "updated_at": "2025-04-08T07:26:36.749083",
    "status": "completed",
    "steps_completed": 438,
    "epochs_completed": 2,
    "percentage_done": 100.0,
    "best_epoch": 1,
    "train_loss": 0.052824027836322784,
    "val_loss": 0.047389596700668335,
    "metrics": {
        "keys": [
            "train_loss",
            "val_loss"
        ],
        "metrics": {
            "train_loss": [
                {
                    "value": 1.7395085096359253,
                    "step": 9,
                    "timestamp": "2025-04-08T07:07:50.318152"
                },
                {
                    "value": 0.5097759962081909,
                    "step": 19,
                    "timestamp": "2025-04-08T07:08:21.445975"
                },
                {
                    "value": 0.11785753816366196,
                    "step": 29,
                    "timestamp": "2025-04-08T07:08:45.141126"
           

**IMPORTANT:** Monitor the job status. Ensure training is completed before proceeding by observing the `percentage_done` key in the response frame.

### 2.3 Validate Availability of Custom Model
The following NeMo Entity Store API should display the model when the training job is complete.
The list below shows all models filtered by your namespace and sorted by the latest first.
For more information about this API, see the [NeMo Entity Store API reference](https://developer.nvidia.com/docs/nemo-microservices/api/entity-store.html).
With the following code, you can find all customized models, including the one trained in the previous cells.
Look for the `name` fields in the output, which should match your `CUSTOMIZED_MODEL`.

In [18]:
response = requests.get(f"{NEMO_URL}/v1/models", params={"filter[namespace]": NMS_NAMESPACE, "sort" : "-created_at"})

assert response.status_code == 200, f"Status Code {response.status_code}: Request failed. Response: {response.text}"
print("Response JSON:", json.dumps(response.json(), indent=4))

Response JSON: {
    "object": "list",
    "data": [
        {
            "created_at": "2025-04-08T07:05:53.417350",
            "updated_at": "2025-04-08T07:05:53.417354",
            "name": "llama-3.2-1b-xlam-run1@cust-BTkGbfifLfEAjV2THu3tas",
            "namespace": "xlam-tutorial-ns",
            "description": "None",
            "spec": {
                "num_parameters": 1000000000,
                "context_size": 4096,
                "num_virtual_tokens": 0,
                "is_chat": true
            },
            "artifact": {
                "gpu_arch": "Ampere",
                "precision": "bf16-mixed",
                "tensor_parallelism": 1,
                "backend_engine": "nemo",
                "status": "upload_completed",
                "files_url": "hf://xlam-tutorial-ns/llama-3.2-1b-xlam-run1@cust-BTkGbfifLfEAjV2THu3tas"
            },
            "base_model": "meta/llama-3.2-1b-instruct",
            "peft": {
                "finetuning_type": "lora"
  

**Tips**:

* You can also find the model with its name directly:
  ```python
    # To get specifically the custom model, you may use the following API -
    response = requests.get(f"{NEMO_URL}/v1/models/{CUSTOMIZED_MODEL}")
    
    assert response.status_code == 200, f"Status Code {response.status_code}: Request failed. Response: {response.text}"
    print("Response JSON:", json.dumps(response.json(), indent=4))
  ```
  

NVIDIA NIM directly picks up the LoRA adapters from NeMo Entity Store. You can also query the NIM endpoint to look for it, as shown in the following code.

In [19]:
# Check if the custom LoRA model is hosted by NVIDIA NIM
resp = requests.get(f"{NIM_URL}/v1/models")

models = resp.json().get("data", [])
model_names = [model["id"] for model in models]

assert CUSTOMIZED_MODEL in model_names, \
    f"Model {CUSTOMIZED_MODEL} not found"

---

<a id="step-3"></a>
## Step 3: Sanity Test the Customized Model By Running Sample Inference

Once the model is customized, its adapter is automatically saved in NeMo Entity Store and is ready to be picked up by NVIDIA NIM.
You can test the model by sending a prompt to its NIM endpoint.

First, choose one of the examples from the test set.

### 3.1 Get Test Data Sample

In [20]:
def read_jsonl(file_path):
    """Reads a JSON Lines file and yields parsed JSON objects"""
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            line = line.strip()  # Remove leading/trailing whitespace
            if not line:
                continue  # Skip empty lines
            try:
                yield json.loads(line)
            except json.JSONDecodeError as e:
                print(f"Error decoding JSON: {e}")
                continue


test_data = list(read_jsonl(test_fp))

print(f"There are {len(test_data)} examples in the test set")

There are 713 examples in the test set


In [21]:
# Randomly choose
test_sample = random.choice(test_data)

# Visualize the inputs to the LLM - user query and available tools
test_sample['messages'], test_sample['tools']

([{'role': 'user',
   'content': 'What are the zip codes for New York City in the United States?'}],
 [{'type': 'function',
   'function': {'name': 'zipcode_by_city',
    'description': 'Retrieves the zip code(s) of a given city using the GeoSource API.',
    'parameters': {'type': 'object',
     'properties': {'city': {'description': 'The name of the city for which to retrieve zip code(s). The city must be in the supported countries list (44 countries).',
       'type': 'string',
       'default': 'Brussels'}}}}},
  {'type': 'function',
   'function': {'name': 'place_details_google',
    'description': 'Fetches contact and opening hours information for a place using its Google Place ID.',
    'parameters': {'type': 'object',
     'properties': {'is_id': {'description': 'The Google Place ID of the location to retrieve details for.',
       'type': 'string',
       'default': 'ChIJCxzX9tpwhlQRIFVwULOI2FM'}}}}},
  {'type': 'function',
   'function': {'name': 'get_states',
    'descriptio

### 3.2 Send an Inference Call to NIM

NIM exposes an OpenAI-compatible completions API endpoint, which you can query using the `OpenAI` client library as shown in the following code.

In [22]:
inference_client = OpenAI(
  base_url = f"{NIM_URL}/v1",
  api_key = "None"
)

completion = inference_client.chat.completions.create(
  model = CUSTOMIZED_MODEL,
  messages = test_sample["messages"],
  tools = test_sample["tools"],
  tool_choice = 'auto',
  temperature = 0.1,
  top_p = 0.7,
  max_tokens = 512,
  stream = False
)

completion.choices[0].message.tool_calls

[ChatCompletionMessageToolCall(id='chatcmpl-tool-538bbb6949e642a5be7c6240c69f7caf', function=Function(arguments='{"city": "New York City"}', name='zipcode_by_city'), type='function')]

Given that the fine-tuning job was successful, you can get an inference result comparable to the ground truth:

In [23]:
# The ground truth answer
test_sample['tool_calls']

[{'type': 'function',
  'function': {'name': 'zipcode_by_city',
   'arguments': {'city': 'New York City'}}}]

### 3.3 Take Note of Your Custom Model Name

Take note of your custom model name, as you will use it to run evaluations in the subsequent notebook.

In [24]:
print(f"Name of your custom model is: {CUSTOMIZED_MODEL}")

Name of your custom model is: xlam-tutorial-ns/llama-3.2-1b-xlam-run1@cust-BTkGbfifLfEAjV2THu3tas
