<img src="./images/DLI_Header.png" style="width: 400px;">

# 5.0 Nemo Microservices Deployment and Customization, evaluation, inference flow

The following diagram illustrates how the NeMo microservices construct a data flywheel in a scenario for model customization and evaluation.

<center><img src="./images-dli/nemo-platform-customization-workflow.png" style="width: 800px;"></center>


## 5.0 Local Lab's Setup

### 5.0.1 Global Variables

In [None]:
# get the Kubernetes Controlplane IP
minikube_ip=!minikube ip
minikube_ip=minikube_ip[0]
minikube_ip

# Define endpoints
data_store_url = "http://nemo-datastore.local"
nim_url = "http://llama3-1-8b-instruct.local"
eval_url = "http://nemo-evaluator.local"
entity_store_url = "http://nemo-entity-store.local"
customizer_url = "http://nemo-customizer.local"
nim_internal_endpoint="http://meta-llama3-1-8b-instruct.llama3-1-8b-instruct.svc.cluster.local:8000"

health_check_endpoints = {
    "DataStore": f"{data_store_url}/v1/health",
    "EntityStore": f"{entity_store_url}/v1/health/ready",
    "NIM": f"{nim_url}/v1/health/ready",
    "Evaluator": f"{eval_url}/health",
    "Customizer": f"{customizer_url}/health/ready"
}

### 5.0.2 Global Functions

In [None]:
import subprocess
import time
import requests
import urllib3
from pprint import pprint
import json
import requests
from huggingface_hub import configure_http_backend
from huggingface_hub import HfApi
from IPython.display import JSON
import time
# Disable SSL warnings
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

def wait_for_rollouts(deployments, check_interval=5):
    """
    Waits for all specified Kubernetes deployments to be successfully rolled out.

    Parameters:
    - deployments (list of tuples): List of (namespace, deployment_name) pairs.
    - check_interval (int): Time in seconds to wait between rollout status checks.

    Returns:
    - None
    """
    while True:
        all_ready = True  # Flag to track if all deployments are ready

        for namespace, deployment in deployments:
            # Run kubectl rollout status command for the current deployment
            result = subprocess.run(
                ["kubectl", "rollout", "status", f"deployment/{deployment}", "-n", namespace, "--timeout=5s"],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True
            )

            # Check if the deployment is fully rolled out
            if "successfully rolled out" not in result.stdout:
                all_ready = False  # Mark that at least one deployment is not ready
                print(f"Waiting for {deployment} in {namespace} namespace to be ready...")

        if all_ready:
            print("All deployments are ready!")
            break  # Exit the loop when all deployments are ready

        time.sleep(check_interval)  # Wait before checking again


### 5.0.3 Check the status of Nemo MS deployments

In [None]:
deployments = [
    ("nemo-customizer", "nemo-customizer-api"),
    ("nemo-datastore", "nemo-datastore"),
    ("nemo-entity-store", "nemo-entity-store"),
    ("nemo-evaluator", "nemo-evaluator"),
    ("llama3-1-8b-instruct", "meta-llama3-1-8b-instruct")
]
# Call the function to wait for all deployments to be ready
wait_for_rollouts(deployments)

### 5.0.4 Check health status of Nemo Microservices (Get pods)

Example output 

```
nemo-customizer            nemo-customizer-api-5d9c8f9cb5-4wmvp                              1/1     Running     0               33m
nemo-customizer            nemo-customizer-opentelemetry-collector-5c477756f4-792jc          1/1     Running     0               5h43m
nemo-datastore             nemo-datastore-5d45d56869-v7ttg                                   1/1     Running     0               98m
nemo-entity-store          nemo-entity-store-79b56cc6c5-hvmts                                1/1     Running     0               34m
nemo-evaluator             nemo-evaluator-5564df495f-dtn5x                                   1/1     Running     0               5h44m
nemo-kubernetes-operator   nemo-kubernetes-operator-customizer-controller-manager-8fc8nrqf   2/2     Running     0               35m
```

In [None]:
!kubectl get pods -A | grep nemo

#### 5.0.5 Check health status of each Nemo Microservices

We have defined above the endpoints, and here we try to check the health endpoint of each and see all are healthy or not

Example output

```json
{'Customizer': {'response': '{"status":"healthy"}', 'status_code': 200},
 'DataStore': {'response': '{\n'
                           '  "status": "pass",\n'
                           '  "description": "Datastore",\n'
                           '  "checks": {\n'
                           '    "cache:ping": [\n'
                           '      {\n'
                           '        "status": "pass",\n'
                           '        "time": "2025-02-18T11:46:18Z"\n'
                           '      }\n'
                           '    ],\n'
                           '    "database:ping": [\n'
                           '      {\n'
                           '        "status": "pass",\n'
                           '        "time": "2025-02-18T11:46:18Z"\n'
                           '      }\n'
                           '    ]\n'
                           '  }\n'
                           '}',
               'status_code': 200},
 'EntityStore': {'response': '{"status":"ready"}', 'status_code': 200},
 'Evaluator': {'response': '{"status":"healthy"}', 'status_code': 200},
 'NIM': {'response': '{"object":"health.response","message":"Service is '
                     'ready."}',
         'status_code': 200}}
```


In [None]:
# Check health status
health_status = {}
for name, url in health_check_endpoints.items():
    try:
        response = requests.get(url, verify=False, timeout=5)
        health_status[name] = {"status_code": response.status_code, "response": response.text}
    except requests.exceptions.RequestException as e:
        health_status[name] = {"error": str(e)}

# Print results
pprint(health_status)

# 5.1 Fine-Tuning

This section is focused on fine-tuning a model (`llama-3.1-8-b-instruct`)

## 5.1.1 Create Global Variables

These variables define key names used in the fine-tuning process:
- ```namespace```: Defines a logical grouping (like a workspace) in which datasets and models are managed.
- ```dataset_name```: Specifies the name of the dataset being created.
- ```project_name```: Represents the project associated with this fine-tuning task.
- ```new_model_name```: Defines the name of the new fine-tuned model.


```HF_ENDPOINT```: This constructs the API endpoint for Hugging Face-compatible interactions using data_store_url, likely pointing to an internal model/dataset store like NVIDIA Nemo Datastore.

```HF_TOKEN```: A placeholder for an authentication token, required to interact with the API


In [None]:
namespace = "default"
dataset_name="test-dataset"
project_name="example-project"
new_model_name="example-model@v2"

#Define the endpoint and token
HF_ENDPOINT = f"{data_store_url}/v1/hf"
HF_TOKEN = "token"

#### 5.1.1.1 Configure HF Backend 

`backend_factory` function creates an HTTP session using the requests library.
`session.verify` = False disables SSL verification.


`hf_api` is an instance of HfApi, which is an API client for interacting with the Hugging Face-compatible backend.


In [None]:
def backend_factory() -> requests.Session:
    session = requests.Session()
    session.verify = False
    return session

configure_http_backend(backend_factory=backend_factory)

hf_api = HfApi(endpoint=HF_ENDPOINT, token=HF_TOKEN)


## 5.1.2 Create a Dataset in Nemo DataStore

This step involves invoking the Nemo Datastore API to create a dataset repository.
Expected Behavior:

- A request is sent to the Nemo Datastore API.
- The dataset `example-dataset` is created in the default namespace.
- The dataset gets a `repository_id`, which uniquely identifies it.

Example output

```
RepoUrl('datasets/default/test-dataset', endpoint='http://nemo-datastore.local/v1/hf', repo_type='dataset', repo_id='default/test-dataset')

```

In [None]:
repo_id = f"{namespace}/{dataset_name}"
repo_type = "dataset"

hf_api.create_repo(repo_id, repo_type=repo_type)

### 5.1.2.1 Verify Datasets in DataStore

Expected output: 
```json
[
  {
    "id": "default/test-dataset",
    "name": "test-dataset",
    "created_at": "2025-02-22T23:31:52.000000001Z",
    "last_modified": "2025-02-22T23:31:52.000000001Z"
  }
]
```

In [None]:
!curl $data_store_url/v1/hf/api/datasets | jq

## 5.1.3 Upload the training, testting and validation files into the Dataset

This step involves uploading training, testing, and validation files into the Nemo Datastore. The dataset repository has already been created in the previous step (`5.1.2`), and now we are adding the actual data.

- The training, testing, and validation datasets (typically in .jsonl format) are stored in a local directory. We already have created a sample datasets for you in the following folders: 
    - Training: `./dataset/training/`
    - Testing:  `./dataset/testing/`
    - Validation: `./dataset/validation/`
- The notebook uploads these files to the Nemo Datastore, making them available for fine-tuning.

### 5.1.3.1 Check the training dataset

```
{
  "prompt": "Who designed the Gold State Coach? Adjacent to the palace is the Royal Mews, also designed by Nash, where the royal carriages, including the Gold State Coach, are housed. This rococo gilt coach, designed by Sir William Chambers in 1760, has painted panels by G. B. Cipriani. It was first used for the State Opening of Parliament by George III in 1762 and has been used by the monarch for every coronation since George IV. It was last used for the Golden Jubilee of Elizabeth II. Also housed in the mews are the coach horses used at royal ceremonial processions. Answer: ",
  "completion": "Sir William Chambers"
}
```

In [None]:
!head -n 2 /dli/task/dataset/training/training.jsonl | jq '.'

### 5.1.3.2 Upload all the files to training, testing and validation folders

This step uploads the dataset (training, testing, and validation) to the Nemo Datastore repository using the `hf_api.upload_folder()` method.


- `folder_path`: Specifies the local path of the training dataset.
- `repo_id`: Refers to the repository identifier where the dataset is stored on Nemo Datastore.
- `repo_type` Defines the type of repository (`dataset` in this case).
- `path_in_repo`: Creates a folder named example `training` inside the repository where the uploaded files will be stored.

Sample output : 

```
CommitInfo(commit_url='', commit_message='Upload folder using huggingface_hub', commit_description='', oid='4c6b40743a16139133b897bbbd6b172228ccc854', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)
```

In [None]:
training_data_folder = "/dli/task/dataset/training"  # Path to the folder
testing_data_folder = "/dli/task/dataset/testing"  # Path to the folder
validation_data_folder = "/dli/task/dataset/validation"  # Path to the folder

# Upload the folder
hf_api.upload_folder(
    folder_path=training_data_folder,
    repo_id=repo_id,
    repo_type=repo_type,
    path_in_repo="training"
)

hf_api.upload_folder(
    folder_path=testing_data_folder,
    repo_id=repo_id,
    repo_type=repo_type,
    path_in_repo="testing"
)

CommitInfo = hf_api.upload_folder(
    folder_path=validation_data_folder,
    repo_id=repo_id,
    repo_type=repo_type,
    path_in_repo="validation"
)
CommitInfo

### 5.1.3.3 Check uploaded files

Example Output:

```json
[
  {
    "sha": "25c5ceb2d33e05176b146b0d162cb61c1852fae5",
    "size": 130,
    "path": "testing/testing.jsonl"
  },
  {
    "sha": "25d7e14bb8fafb4a1390a8b4c499093f58480589",
    "size": 131,
    "path": "training/training.jsonl"
  },
  {
    "sha": "64ace1aabee4eb2f9ddf15ddfdebbc8e28a15189",
    "size": 130,
    "path": "validation/validation.jsonl"
  }
]
```

In [None]:
!curl  "$data_store_url/v1/hf/api/datasets/$namespace/$dataset_name/tree/$CommitInfo.oid" | jq

## 5.2.4 Register Dataset in Nemo Entity Store 

After uploading the dataset to the Nemo Datastore , the next step is registering it in the Nemo Entity Store. This ensures that the dataset is officially recognized and can be accessed by other components of the system.


Example output:

```json
{
 'schema_version': '1.0',
 'id': 'dataset-7Sztzr5MuXVNwifSCBNrZk',
 'description': 'This is an example of dataset',
 'type_prefix': None,
 'namespace': 'default',
 'project': 'example-project',
 'created_at': '2025-02-18T13:53:47.921652',
 'updated_at': '2025-02-18T13:53:47.921654',
 'custom_fields': {},
 'ownership': None,
 'name': 'example-dataset',
 'version_id': 'main',
 'version_tags': [],
 'format': None,
 'files_url': 'hf://datasets/default/example-dataset'}
```

### 5.2.4.1 Register Datasets

In [None]:
url = f"{entity_store_url}/v1/datasets"

headers = { 'accept': 'application/json'}

data = {
      "name": dataset_name,
      "namespace": namespace,
      "description": "This is an example of dataset",
      "files_url": f"hf://datasets/{namespace}/{dataset_name}",
      "project": project_name
}

response=requests.request("POST", url, headers=headers, json=data, verify=False)
response_entity = response.json()
response_entity

### 5.2.4.2 Check Datasets in Nemo Entity Store

Example Output
```json
{
  "object": "list",
  "data": [
    {
      "created_at": "2025-02-22T23:57:31.696208",
      "updated_at": "2025-02-22T23:57:31.696209",
      "name": "test-dataset",
      "namespace": "default",
      "description": "This is an example of dataset",
      "files_url": "hf://datasets/default/test-dataset",
      "project": "example-project",
      "custom_fields": {}
    }
  ],
  "pagination": {
    "page": 1,
    "page_size": 10,
    "current_page_size": 1,
    "total_pages": 1,
    "total_results": 1
  },
  "sort": "created_at"
}
```

In [None]:
! curl $entity_store_url/v1/datasets | jq

## 5.2.5 Creating a Customization Job for llama-3.1-8b-instruct Using Nemo Customizer

This step submits a customization job to fine-tune the ```meta/llama-3.1-8b-instruct``` model using Nemo Customizer. The process involves configuring the model, specifying the dataset, setting hyperparameters, and defining training parameters.

1. Prepare the Data for Fine-Tuning
    - `config`: Specifies the base model to fine-tune (meta/llama-3.1-8b-instruct).
    - `dataset`: Defines which dataset to use for training.
        - `name`: The name of the dataset (e.g., "example-dataset").
        - `namespace`: The namespace where the dataset is stored (e.g., "default").

2. Define Fine-Tuning Hyperparameters

```python
"hyperparameters": {
           "training_type": "sft",
           "finetuning_type": "lora",
           "epochs": 10,
           "batch_size": 16,
           "learning_rate": 0.0001,
           "lora": {"adapter_dim": 16}
        },
```
3. Define Model Details
    - `enabled`: "true" → Enables fine-tuning.
    - `finetuning_types`: Specifies "lora" as the fine-tuning method.
    - `max_seq_length`: Maximum sequence length (4096 tokens).
    - `micro_batch_size`: 1 → Number of samples per GPU before accumulating gradients.
    - `model_path`: Path to the base model (/mount/models/llama-3_1-8b-instruct).
    - `num_gpus`: Uses 1 GPU.
    - `num_nodes`: Uses 1 node.
    - `num_parameters`: Model size (8 billion parameters).
    - `precision`: "bf16" (Brain Floating Point 16-bit precision).
    - `tensor_parallel_size`: 1 (Single GPU per tensor parallelism).
    
4. Define output model name
```json
"output_model": new_model_name
```

<center><img src="./images-dli/nemo_customizer.png" style="width: 800px;"></center>



In [None]:
url = f"{customizer_url}/v1/customization/jobs"
headers = { 'accept': 'application/json'}

data = {
    "config": "meta/llama-3.1-8b-instruct",
        "dataset": {
           "name": dataset_name,
           "namespace": namespace
        },
        "hyperparameters": {
           "training_type": "sft",
           "finetuning_type": "lora",
           "epochs": 1,
           "batch_size": 32,
           "learning_rate": 0.001,
           "lora": {"adapter_dim": 16}
        },
        "project": project_name,
        "model": {
            "enabled": "true", 
            "finetuning_types": ["lora"], 
            "max_seq_length": 4096, 
            "micro_batch_size": 1, 
            "model_path": "/mount/models/llama-3_1-8b-instruct", 
            "name": "meta/llama-3.1-8b-instruct", 
            "num_gpus": 1, 
            "num_nodes": 1, 
            "num_parameters": 8000000000, 
            "precision": "bf16", 
            "tensor_parallel_size": 1
        },
        "ownership": {
           "created_by": "me",
           "access_policies": {
              "arbitrary": "json"
           }
        },
        "output_model": new_model_name
}

response=requests.request("POST", url, headers=headers, json=data, verify=False)
response_customization = response.json()

In [None]:
# check customization response
response_customization 

## 5.2.6  Check Customization Job Status

In this section, we will be querying the status of the customization job submitted earlier.

1. Get the Customization Job ID
    - After submitting the customization job, the response contains the job details, including the job ID.
    - `response_customization["id"]` extracts the job ID from the response.
2. The status endpoint URL is constructed using the `customizer_url` and the customization job ID.
       `http://nemo-customizer.local/v1/customization/jobs/{response_customization_id}/status`
3. After submitting the request, response will contain details about the current status of the customization job.
    - It might return status information like "queued", "in-progress", or "completed".
    - It  also provide timestamps, error messages, or additional status information.

### 5.2.6.1 Check Customization status

In [None]:
response_customization_id = response_customization["id"]
url = f"{customizer_url}/v1/customization/jobs/{response_customization_id}/status"

headers = { 'accept': 'application/json'}

response=requests.request("GET", url, headers=headers, verify=False)
response= response.json()
JSON(response, expanded=True)

### 5.2.6.2 Check Customization pods

In [None]:
!kubectl get pods -n nemo-customizer

### 5.2.6.3 Check logs of job

In [None]:
customizer_worker_pod_name= ! kubectl get pods -n nemo-customizer | grep training-job-worker-0 | cut -d' ' -f1
!kubectl logs  {customizer_worker_pod_name[0]}  -n nemo-customizer


### 5.2.6.4 Continuously Check Customization Job Status until its completed


Example output depending on how many steps you ran this for. You can see the training and validation loss.  

```
Status: completed, Progress: 100.0%
Final Response:
{'created_at': '2025-02-23T00:07:43.991608',
 'updated_at': '2025-02-23T01:05:01.498712',
 'status': 'completed',
 'steps_completed': 230,
 'epochs_completed': 10,
 'percentage_done': 100.0,
 'best_epoch': 4,
 'train_loss': 0.18880946934223175,
 'val_loss': 0.39688706398010254,
 'metrics': {'keys': ['train_loss', 'val_loss'],
  'metrics': {'train_loss': [{'value': 3.227027177810669,
     'step': 9,
     'timestamp': '2025-02-23T00:12:21.263000'},
    {'value': 2.7157721519470215,
     'step': 19,
     'timestamp': '2025-02-23T00:14:32.493000'},
```

In [None]:
response_customization_id = response_customization["id"]
url = f"{customizer_url}/v1/customization/jobs/{response_customization_id}/status"

headers = {'accept': 'application/json'}

while True:
    response = requests.request("GET", url, headers=headers, verify=False).json()
    
    status = response.get("status")
    percentage_done = response.get("percentage_done", 0.0)
    
    print(f"Status: {status}, Progress: {percentage_done}%")

    if status in ["completed", "failed"]:
        break

    time.sleep(120)  # Wait for 5 seconds before checking again

print("Final Response:")
response

<div class="alert alert-block alert-warning">

It takes 10 minutes to complete.
</div>

### 5.2.6.5 Check Customization metrics in mlflow
In this section, you will monitoring the metrics of the customization job in MLflow, a popular machine learning lifecycle management platform. The key steps are outlined below and shown in diagram below:

1. Before you can access the customization metrics, the MLflow endpoint needs to be exposed and accessible.
2. Now that the endpoint is available, you can query metrics related to your custom model (in your case, "example-model@v2"). 

<img src="./images-dli/mlflow-model.png" style="width: 435px; float: left">
<img src="./images-dli/mlflow-cust.png" style="width: 500px; float: right">


In [None]:
import subprocess

subprocess.Popen(
    ["kubectl", "-n", "mlflow", "port-forward", "--address", "0.0.0.0", "service/mlflow-tracking", "30090:80"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
    close_fds=True
)

In [None]:
%%js
const href = window.location.hostname;
let a = document.createElement('a');
let link = document.createTextNode('Open MLFlow UI!');
a.appendChild(link);
a.href = "http://" + href + "/mlflow/";
a.style.color = "navy"
a.target = "_blank"
element.append(a);

## 5.2.7 Check Nemo Entity Store Models API to list New Lora Adapter Model 

This step involves querying Nemo Entity Store's API to verify that the newly created LoRA adapter model has been successfully registered and is available in the store. The NeMo Entity Store acts as a metadata store where models, datasets, and other ML-related artifacts are tracked.

You will see the name as :'example-model@v2',
Example output:

```json
{'object': 'list',
 'data': [{'created_at': '2025-02-23T00:07:44.036516',
   'updated_at': '2025-02-23T00:07:44.036518',
   'name': 'example-model@v2',
   'namespace': 'default',
   'description': 'None',
   'spec': {'num_parameters': 8000000000,
    'context_size': 4096,
    'num_virtual_tokens': 0,
    'is_chat': False},
   'artifact': {'gpu_arch': 'Ampere',
    'precision': 'bf16',
    'tensor_parallelism': 1,
    'backend_engine': 'nemo',
    'status': 'created',
    'files_url': 'hf://default/example-model@v2'},
   'base_model': 'meta/llama-3.1-8b-instruct',
   'peft': {'finetuning_type': 'lora'},
   'schema_version': '1.0',
   'project': 'customizer',
   'custom_fields': {}}],
 'pagination': {'page': 1,
  'page_size': 10,
  'current_page_size': 1,
  'total_pages': 1,
  'total_results': 1},
 'sort': 'created_at'}
```


In [None]:
## Nemo Entity Store endpoint

url = f"{entity_store_url}/v1/models"

headers = { 'accept': 'application/json'}

response=requests.request("GET", url, headers=headers, verify=False)
response_entity = response.json()
JSON(response_entity, expanded=True)

## 5.2.8 Check NIM Models API to list New Lora Adapter Model 
This step involves using the NIM  Models API to check whether the newly fine-tuned LoRA adapter model is registered and available for deployment.

You will see new lora adapter (`example-model@v2`) as part of the NIM API: 

```
{'object': 'list',
 'data': [{'id': 'meta/llama-3.1-8b-instruct',
   'object': 'model',
   'created': 1740273131,
   'owned_by': 'system',
   'root': 'meta/llama-3.1-8b-instruct',
   'parent': None,
   'max_model_len': 131072,
   'permission': [{'id': 'modelperm-37abc8fbdf7c4f93a28965c8b687820d',
     'object': 'model_permission',
     'created': 1740273131,
     'allow_create_engine': False,
     'allow_sampling': True,
     'allow_logprobs': True,
     'allow_search_indices': False,
     'allow_view': True,
     'allow_fine_tuning': False,
     'organization': '*',
     'group': None,
     'is_blocking': False}]},
  {'id': 'example-model@v2',
   'object': 'model',
   'created': 1740273131,
   'owned_by': 'system',
   'root': 'hf://default/example-model@v2',
   'parent': 'meta/llama-3.1-8b-instruct',
   'max_model_len': None,
   'permission': [{'id': 'modelperm-e01b3c55f1664c0bb73862c743ce5257',
     'object': 'model_permission',
     'created': 1740273131,
     'allow_create_engine': False,
     'allow_sampling': True,
     'allow_logprobs': True,
     'allow_search_indices': False,
     'allow_view': True,
     'allow_fine_tuning': False,
     'organization': '*',
     'group': None,
     'is_blocking': False}]}]}
```


In [None]:
## NIM endpoint
url = f"{nim_url}/v1/models"

headers = { 'accept': 'application/json'}

response=requests.request("GET", url, headers=headers, verify=False)
response_nim = response.json()
JSON(response_nim, expanded=True)


## 5.2.9  Evaluation using NeMo Evaluator
This step describes the NeMo Evaluator workflow, which is used to assess the performance of models (such as fine-tuned LLMs) by running inference, computing metrics, and storing evaluation results.
A typical NeMo Evaluator workflow looks like the following:

1. (Optional) If you are using a custom dataset for evaluation, upload it to NeMo Data Store before you run an evaluation.
2. Create an evaluation target in NeMo Evaluator.
    - The evaluation target specifies which model is being evaluated.
    - This is typically the fine-tuned LoRA adapter model or another LLM in NeMo Entity Store.

3. Create an evaluation configuration in NeMo Evaluator. This step defines the evaluation parameters, including:
    - Dataset to use for evaluation
    - Evaluation metrics (e.g., BLEU, Rouge, perplexity)
    - Evaluation mode (e.g., zero-shot, fine-tuned performance)

4. Run an evaluation job by submitting a request to NeMo Evaluator.
    - NeMo Evaluator downloads custom data, if any, from NeMo Data Store.
    - NeMo Evaluator runs inference with NIM for LLMs, Embeddings, and Reranking, depending on the model being evaluated.
    - NeMo Evaluator writes the results, including generations, logs, and metrics to NeMo Data Store.
    - NeMo Evaluator returns the results.
5. Get the results.

<center><img src="./images-dli/nemo_eval.png" style="width: 800px;"></center>


### 5.2.9.1 Create Evaluation Target

    - The evaluation target specifies which model is being evaluated.
    - This is typically the fine-tuned LoRA adapter model or another LLM in NeMo Entity Store.

Example output: 
```
{'namespace': '-',
 'name': 'eval-target-Py6Yb7aY46T22PktdDRY1V',
 'type': 'model',
 'model': {'api_endpoint': {'url': 'http://meta-llama3-1-8b-instruct.llama3-1-8b-instruct.svc.cluster.local:8000/v1/completions',
   'model_id': 'example-model@v2',
   'api_key': None},
  'cached_outputs': None},
 'retriever': None,
 'rag': None,
 'tags': None,
 'id': 'eval-target-Py6Yb7aY46T22PktdDRY1V'}
```


In [None]:
url = f"{eval_url}/v1/evaluation/targets"
print(url)
headers = { 'accept': 'application/json'}

data = {
      "type": "model",
       "model": {
            "api_endpoint": {
                "url": f"{nim_internal_endpoint}/v1/completions",
                "model_id": new_model_name
            }
        }
}

response=requests.request("POST", url, headers=headers, json=data, verify=False)
response_eval_target = response.json()
JSON(response_eval_target, expanded=True)

### 5.2.9.2 Create Evaluation Config

This step defines the evaluation parameters, including:
- Dataset to use for evaluation
- Evaluation metrics (e.g., BLEU, Rouge, perplexity)
- Evaluation mode (e.g., zero-shot, fine-tuned performance)

example output: 

```
{'id': 'eval-config-4mg8tWWJuQMHrC9ozHi7Sg',
 'namespace': '-',
 'name': 'eval-config-4mg8tWWJuQMHrC9ozHi7Sg',
 'type': 'similarity_metrics',
 'tags': [],
 'tasks': [{'type': 'default',
   'params': {'tokens_to_generate': 200,
    'temperature': 0.7,
    'top_k': 20,
    'n_samples': -1},
   'dataset': {'files_url': 'nds:default/test-dataset/testing/testing.jsonl'},
   'metrics': [{'name': 'accuracy'},
    {'name': 'bleu'},
    {'name': 'rouge'},
    {'name': 'em'},
    {'name': 'f1'}]}]}
```

In [None]:
url = f"{eval_url}/v1/evaluation/configs"
headers = { 'accept': 'application/json'}

data = {
      "type": "similarity_metrics",
      "tasks": [
         {
            "type": "default",
            "dataset": {
               "files_url": f"nds:{namespace}/{dataset_name}/testing/testing.jsonl"
            },
            "metrics": [
               {
                  "name": "accuracy"
               },
               {
                  "name": "bleu"
               },
               {
                  "name": "rouge"
               },
               {
                  "name": "em"
               },
               {
                  "name": "f1"
               }
            ],
            "params": {
               "tokens_to_generate": 200,
               "temperature": 0.7,
               "top_k": 20,
               "n_samples": -1
            }
         }
      ]
}

response=requests.request("POST", url, headers=headers, json=data, verify=False)
response_eval_config = response.json()
JSON(response_eval_config, expanded=True)

### 5.2.9.3 Create Evaluation

Now combing the above created target and configuration we perform evaluation. This evaluation creates a sequence of jobs. 

Example output: 

```
{'namespace': '-',
 'name': 'eval-YRmL6ZEGELomrZXCdJ3W7K',
 'tags': None,
 'id': 'eval-YRmL6ZEGELomrZXCdJ3W7K',
 'target': {'namespace': '-',
  'name': 'eval-target-EmThWsr9vTeEdJmYyVduAq',
  'type': 'model',
  'model': {'api_endpoint': {'url': 'http://meta-llama3-1-8b-instruct.llama3-1-8b-instruct.svc.cluster.local:8000/v1/completions',
    'model_id': 'example-model@v2',
    'api_key': None},
   'cached_outputs': None},
  'retriever': None,
  'rag': None,
  'tags': None,
  'id': 'eval-target-EmThWsr9vTeEdJmYyVduAq'},
 'config': {'id': 'eval-config-4mg8tWWJuQMHrC9ozHi7Sg',
  'namespace': '-',
  'name': 'eval-config-4mg8tWWJuQMHrC9ozHi7Sg',
  'type': 'similarity_metrics',
  'tags': [],
  'params': None,
  'tasks': [{'type': 'default',
    'params': {'tokens_to_generate': 200,
     'temperature': 0.7,
     'top_k': 20,
     'n_samples': -1},
    'dataset': {'files_url': 'nds:default/test-dataset/testing/testing.jsonl',
     'format': None},
    'metrics': [{'name': 'accuracy', 'params': None},
     {'name': 'bleu', 'params': None},
     {'name': 'rouge', 'params': None},
     {'name': 'em', 'params': None},
     {'name': 'f1', 'params': None}]}],
  'aggregate_metrics': None},
 'status': 'initializing',
 'created_at': '2025-02-23T01:17:25Z'}
```

In [None]:
url = f"{eval_url}/v1/evaluation/jobs"
headers = { 'accept': 'application/json'}

data = {
      "target_id": response_eval_target["id"],
      "config_id": response_eval_config["id"]
}

response=requests.request("POST", url, headers=headers, json=data, verify=False)
response_eval = response.json()
JSON(response_eval, expanded=True)

## 5.2.10 Check Evaluation Status
After submitting an evaluation job in NeMo Evaluator, we can track its status.
This step checks that the evaluation is running successfully and retrieves the final results.

1. Once the evaluation request is submitted, we extract the evaluation job ID from the response.
2. We send a GET request to NeMo Evaluator's API to retrieve the job status.

You will observe the status something like below: 
```
'status': {'name': 'evaluation',
  'level': 'evaluation',
  'status': 'running',
  'message': None,
  'jobs': [],
  'children': []},
 'created_at': '2025-02-23T01:17:25Z',
 'results': []}
```


In [None]:
evaluation_id = response_eval["id"]

url = f"{eval_url}/v1/evaluation/jobs/-/{evaluation_id}"
headers = { 'accept': 'application/json'}

response=requests.request("GET", url, headers=headers, verify=False)

response_eval_status = response.json()
response_eval_status

### 5.2.10.1 Check Evaluation Status in ArgoWorkflow

Since NeMo Evaluator uses Argo Workflows (a Kubernetes-native workflow orchestrator), we can also check the evaluation status directly in Argo.




In [None]:
import subprocess

subprocess.Popen(
    ["kubectl", "-n", "argoworkflows", "port-forward", "--address", "0.0.0.0", "service/argo-workflows-server", "31091:2746"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
    close_fds=True
)

In [None]:
%%js
const href = window.location.hostname;
let a = document.createElement('a');
let link = document.createTextNode('Open Argoworkflow UI!');
a.appendChild(link);
a.href = "http://" + href + "/";
a.style.color = "navy"
a.target = "_blank"
element.append(a);

#### Argo Workflows UI

Once you open the Argo Workflows UI: 
1. Click on the workflows option from the side menu.
2. Remove the namespace selection and you will see the workflow run by Nemo Evaluator.
    
<img src="./images-dli/argoworkflows-ui.png" style="width: 250px; float: left">
<img src="./images-dli/argoworkflows-ui-2.png" style="width: 700px; float: right">

#### Evaluator Workflow in Argoworkflows
1. Click on the `eval-commands-..` workflow
2. It will open up the sequential/DAG graph workflow run by Nemo Evaluator.
3. You can click on each component to see details and logs about those steps.

<img src="./images-dli/eval-workflow.png" style="width: 430px; float: left">
<img src="./images-dli/eval-workflow-logs.png" style="width: 500px; float: right">


## 5.2.11 Perform Inference
After training and evaluating the new LoRA-adapted model, the next step is to perform inference using it.
This involves sending a request to the NIM API. We will here compare the response with both the lora-adapted model and the original model. 

Note: We haven't fully finetuned the model as we only ran for 2 epochs. 

### 5.2.11.1 Using the New Model

You will see the response as 1714, which is what we have in the testing dataset. 

```
{'id': 'cmpl-b9d840aa5340443287d708dacea0151b',
 'object': 'text_completion',
 'created': 1740273275,
 'model': 'example-model@v2',
 'choices': [{'index': 0,
   'text': '1714',
   'logprobs': None,
   'finish_reason': 'stop',
   'stop_reason': 128001,
   'prompt_logprobs': None}],
 'usage': {'prompt_tokens': 57, 'total_tokens': 60, 'completion_tokens': 3}}
```

In [None]:
prompt = "When was the war of Spanish Succession? The decline of Catalan continued in the 16th and 17th centuries. The Catalan defeat in the War of Spanish Succession (1714) initiated a series of measures imposing the use of Spanish in legal documentation. Answer: "
data = {
  "model": new_model_name,
  "prompt": prompt,
  "temperature": 1.0,
  "nvext": {"top_k": 1,
          "top_p": 0.0
           },
  "max_tokens": 100,
}

headers = {'accept': 'application/json', 'Content-Type': 'application/json'}


llm_response = requests.post(f"{nim_url}/v1/completions", headers=headers, json=data,verify=False)
# See LLM response
JSON(llm_response.json(), expanded=True)

### 5.2.11.2 Using the base Foundational Model

Here also the answer is 1714, but the output tokens are quite high with all other information included. 

```
{'id': 'cmpl-11b728544c0b4860a00086efe2775990',
 'object': 'text_completion',
 'created': 1740273343,
 'model': 'meta/llama-3.1-8b-instruct',
 'choices': [{'index': 0,
   'text': '1714. The War of the Spanish Succession (1701-1714) was a global conflict that involved many European powers. The War of the Spanish Succession (1701-1714) was a global conflict that involved many European powers. The War of the Spanish Succession (1701-1714) was a global conflict that involved many European powers. The War of the Spanish Succession (1701-1714) was a global conflict that involved many European powers. The',
   'logprobs': None,
   'finish_reason': 'length',
   'stop_reason': None,
   'prompt_logprobs': None}],
 'usage': {'prompt_tokens': 57, 'total_tokens': 157, 'completion_tokens': 100}}

```

In [None]:
prompt = "When was the war of Spanish Succession? The decline of Catalan continued in the 16th and 17th centuries. The Catalan defeat in the War of Spanish Succession (1714) initiated a series of measures imposing the use of Spanish in legal documentation. Answer: "
data = {
  "model": "meta/llama-3.1-8b-instruct",
  "prompt": prompt,
  "temperature": 1.0,
  "nvext": {"top_k": 1,
          "top_p": 0.0
           },
  "max_tokens": 100,
}

headers = {'accept': 'application/json', 'Content-Type': 'application/json'}

llm_response = requests.post(f"{nim_url}/v1/completions", headers=headers, json=data,verify=False)
# See LLM response
JSON(llm_response.json(), expanded=True)


---
<h2 style="color:green;">Congratulations!</h2>

You've made it through the fifth Notebook. In this notebook, you have:
- Checked health status of all the Nemo Microservices endpoints.
- Run through the E2E fine-tuning pipeline using all the Nemo Microservices
    - Create a dataset store in Nemo-Datastore.
    - Added training, test and validation files in the dataset.
    - Created a customization job via Nemo-Customizer on a foundational model.
    - Observed the customization job metrics in the MLFlow.
    - Created Evaluation job of the fine-tuned/customized model via Nemo-Evaluator.
    - Execute the inference both on the fine-tuned and base foundational model. 

Next, you'll see learn to run automation E2E fine-tuning pipeline using  ArgoWorkflowsin [06_Fine_Tuning_Automation_Pipelines.ipynb](06_Fine_Tuning_Automation_Pipelines.ipynb)


<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>