<img src="./images/DLI_Header.png" style="width: 400px;">

# 4.0 Nemo Microservices Deployment 

<center><img src="./images-dli/nemo-microservices.png" style="width: 800px;"></center>


Let’s take a look at NVIDIA NeMo, the end-to-end, cloud-native solution for building, customizing, and deploying generative AI models. As enterprises increasingly harness the transformative power of generative AI, a robust platform or Foundry is needed to construct and manage these sophisticated models. That's where Nvidia's NeMo comes into play.

- Data Curation: The NeMo framework simplifies the often complex process of data curation. It effectively extracts, deduplicates, and filters information from massive amounts of unstructured data. This sets the stage for your training process to utilize high-quality, relevant data at scale.

- Optimized Training: What makes NeMo truly unique is its ability to efficiently use GPU resources and memory across tens of thousands of nodes, leveraging distributed training with advanced parallelism techniques. By dividing the model and training data, NeMo enables maximum throughput and a significantly minimized training time.

- Model Customization: The magic of NeMo unfolds as you explore its customization capabilities. Once your foundational models are trained, you can easily adapt them to a variety of tasks. Add functional skills, focus on specific domains, and implement guard rails to prevent inappropriate responses. You also have the option to continuously refine your models with techniques such as reinforcement learning, supplemented by human feedback.  Customization techniques supported by the platform include including P-tuning, SFT (Supervised Fine Tuning), Adapters, RLHF (Reinforcement Learning from Human Feedback), and AliBi. This flexibility enables the development of functional skills, focus on specific domains, and implementation of guard rails to prevent inappropriate responses.

- Deployment at Scale: NeMo integrates seamlessly with the Nvidia Triton inference server to accelerate the inference process, delivering cutting-edge accuracy, low latency, and high throughput. NeMo allows for secure and efficient large-scale deployments, providing guardrails that align with safety and security requirements.



## 4.0 Local Lab's Setup

To make this course self-contained, we created a local registry populated with the assets (containers, models and Helm charts) that we will use in the lab.
In order to reproduce it on your own environment, you will need minimal changes to substitute some of the resources with NGC-hosted one's.


In [None]:
# get the Kubernetes Controlplane IP
minikube_ip=!minikube ip
minikube_ip=minikube_ip[0]
minikube_ip

import subprocess
import time

def wait_for_rollouts(deployments, check_interval=5):
    """
    Waits for all specified Kubernetes deployments to be successfully rolled out.

    Parameters:
    - deployments (list of tuples): List of (namespace, deployment_name) pairs.
    - check_interval (int): Time in seconds to wait between rollout status checks.

    Returns:
    - None
    """
    while True:
        all_ready = True  # Flag to track if all deployments are ready

        for namespace, deployment in deployments:
            # Run kubectl rollout status command for the current deployment
            result = subprocess.run(
                ["kubectl", "rollout", "status", f"deployment/{deployment}", "-n", namespace, "--timeout=5s"],
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                text=True
            )

            # Check if the deployment is fully rolled out
            if "successfully rolled out" not in result.stdout:
                all_ready = False  # Mark that at least one deployment is not ready
                print(f"Waiting for {deployment} in {namespace} namespace to be ready...")

        if all_ready:
            print("All deployments are ready!")
            break  # Exit the loop when all deployments are ready

        time.sleep(check_interval)  # Wait before checking again


### 4.0.1 Check the status of cluster

Wait for all the deployments to be fully deployed before going forward


wait until you get

```
All deployments are ready!

```

In [None]:
deployments = [
    ("argocd", "argocd-server"),
    ("argoworkflows", "argo-workflows-server"),
    ("milvus", "milvus-standalone"),
    ("nemo-kubernetes-operator", "nemo-kubernetes-operator-customizer-controller-manager"),
    ("volcano", "volcano-controllers"),
    ("k8s-nim-operator", "k8s-nim-operator"),
]
# Call the function to wait for all deployments to be ready
wait_for_rollouts(deployments)

## 4.0.2 Global Variables

<div class="alert alert-block alert-warning">

Add the git Username
</div>

In [None]:
print("Please enter your Git username:")
git_username = input()

# Add assertions to validate the variables
assert git_username and git_username.strip(), "GitHub username cannot be empty"


In [None]:
import os
git_repo_name="llmops-nvidia"
git_base_url="github.com"
applications_base_dir = "llmops-nvidia/applications"
secrets_base_dir = "llmops-nvidia/secrets"
ingress_base_dir = "llmops-nvidia/ingress"

In [None]:
git_repo_url=f"git@{git_base_url}:{git_username}/{git_repo_name}.git" 
git_repo_url_ssh=f"ssh://git@{git_base_url}/{git_username}/{git_repo_name}.git"
commit_name=git_username 
commit_email=f"{git_username}@llmops-nvidia"
print(git_repo_url)

## 4.1 Deploy Nemo DataStore

NVIDIA NeMo Data Store (NeMo Data Store) simplifies the storage and retrieval of an ever-growing collection of models and datasets to your applications. NeMo Data Store is well integrated into the NVIDIA NeMo eco-system so that you can easily manage the life cycle of models and datasets in your projects.

<center><img src="./images-dli/nemo_microservices_datastore.png" style="width: 800px;"></center>


### 4.1.1 Create Ingress YAML file

In [None]:
os.makedirs(f"{ingress_base_dir}/nemo-datastore", exist_ok=True)

In [None]:
minio_secrets_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nemo-datastore-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: nemo-datastore.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nemo-datastore 
                port:
                  number: 3000
"""
with open(f"{ingress_base_dir}/nemo-datastore/ingress.yaml", "w") as f:
    f.write(minio_secrets_yaml)

### 4.1.2 Create Application YAML file

In [None]:
os.makedirs(f"{applications_base_dir}/nemo-datastore", exist_ok=True)

In [None]:
datastore_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: nemo-datastore
  namespace: argocd
spec:
  project: default
  sources:
    - chart: nemo-datastore
      repoURL: 'docker-registry.registry.svc.cluster.local/nvidia/nemo-microservices/charts'
      targetRevision: "0.4.1"
      helm:
        releaseName: nemo-datastore
        valuesObject:
          ingress:
            enabled: false
          service:
            http:
              clusterIP: ""
          replicaCount: 1
          image:
            repository: {minikube_ip}:30500/nvidia/nemo-microservices/datastore
            tag: 25.01
          persistence:
            enabled: true
            claimName: datastore-shared-storage
            size: 70Gi
            storageClass: "standard"
            accessModes:
              - ReadWriteOnce
          external:
            rootUrl: ""
            domain: ""
          postgresql:
            enabled: false
          objectStore:
            enabled: false
            endpoint: "minio-client.minio.svc.cluster.local:9000"
            bucketName: "datastore"
            existingSecret: "existing-minio-auth-secret"
            existingSecretAccessKey: "username"
            existingSecretAccessSecret: "password"
            region: ""
            ssl: false
          jwtSecret:
            value: ""
          serviceAccount:
            create: true
            imagePullSecrets:
            - name: nvcrimagepullsecret
            name: datastore
          strategy:
            type: Recreate
          externalDatabase:
            host: "postgresql.postgresql.svc.cluster.local"
            port: 5432
            user: "nemo-user"
            database: "api-database"
            sslMode: "disable"
            existingSecret: "existing-postgres-auth-secret"
            existingSecretPasswordKey: "postgresql-admin-password"

    - repoURL: '{git_repo_url_ssh}'
      path: secrets/minio
      targetRevision: main
      directory:
        recurse: true 
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/postgres
      targetRevision: main
      directory:
        recurse: true   
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/nvcr
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: ingress/nemo-datastore
      targetRevision: main
      directory:
        recurse: true  
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: nemo-datastore
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open(f"{applications_base_dir}/nemo-datastore/app.yaml", "w") as f:
    f.write(datastore_application_yaml)

### 4.1.3 Commit the added files

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add nemo-datastore" && git push

### 4.1.4 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram. You will see `nemo-datastore` application shown up

<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/nemo-datastore-app.png" style="width: 500px; float: right">


### 4.1.5 Wait for the nemo-datastore to run before we move forward

In [None]:
!kubectl get deployments -n nemo-datastore

In [None]:
deployments = [
    ("nemo-datastore", "nemo-datastore"),
]
# Call the function to wait for nemo-datastore deployment to be ready
wait_for_rollouts(deployments)

## 4.2 Deploy Nemo EntityStore

NeMo Entity Store is a microservice for managing platform-wide entities such as namespaces, models, and datasets within the NeMo microservices platform.


NeMo Entity Store manages the following entities.

- Namespaces: Namespaces are logical groupings of related resources such as models, datasets, and projects, within the NeMo microservices platform. They help organize resources for efficient access and management.
- Models: Models represent machine learning models available for various operations such as inference and fine-tuning. Models include foundational models, fine-tuned models, or those optimized with specific techniques such as LoRA.
- Datasets: Datasets are collections of data for machine learning tasks such as fine-tuning and evaluation.


### 4.2.1 Create Ingress YAML file

In [None]:
os.makedirs(f"{ingress_base_dir}/nemo-entity-store", exist_ok=True)

In [None]:
entity_store_ingress_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nemo-entity-store-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: nemo-entity-store.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nemo-entity-store
                port:
                  number: 8000
"""
with open(f"{ingress_base_dir}/nemo-entity-store/ingress.yaml", "w") as f:
    f.write(entity_store_ingress_yaml)

### 4.2.2 Create Application YAML file

In [None]:
os.makedirs(f"{applications_base_dir}/nemo-entity-store", exist_ok=True)

In [None]:
entity_store_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: nemo-entity-store
  namespace: argocd
spec:
  project: default
  sources:
    - chart: nemo-entity-store
      repoURL: 'docker-registry.registry.svc.cluster.local/nvidia/nemo-microservices/charts'
      targetRevision: "0.1.0"
      helm:
        releaseName: nemo-entity-store
        valuesObject:
          imagePullSecrets:
            - name: nvcrimagepullsecret
          image:
            repository: {minikube_ip}:30500/nvidia/nemo-microservices/nemo-entity-store
            tag: 25.01
          postgresql:
            enabled: false
          appConfig:
            BASE_URL_NIM: http://meta-llama3-1-8b-instruct.llama3-1-8b-instruct.svc.cluster.local:8000
            BASE_URL_DATASTORE: http://nemo-datastore.nemo-datastore.svc.cluster.local:3000/v1/hf
                        
          externalDatabase:
            host: "postgresql.postgresql.svc.cluster.local"
            port: 5432
            user: "nemo-user"
            password: "QuX2+P16SPc7"
            database: "gateway"
            sslMode: "disable"
            existingSecret: "existing-postgres-auth-secret"
            existingSecretPasswordKey: "postgresql-admin-password"
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/postgres
      targetRevision: main
      directory:
        recurse: true   
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/nvcr
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: ingress/nemo-entity-store
      targetRevision: main
      directory:
        recurse: true  
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: nemo-entity-store
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open(f"{applications_base_dir}/nemo-entity-store/app.yaml", "w") as f:
    f.write(entity_store_application_yaml)

### 4.2.3 Commit the added files

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add nemo-entity-store" && git push

### 4.2.4 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram. You will see `nemo-entity-store` application shown up

<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/nemo-entity-store-app.png" style="width: 500px; float: right">


### 4.2.5 Wait for the nemo-entity-store to run before we move forward

In [None]:
!kubectl get deployments -n nemo-entity-store

In [None]:
deployments = [
    ("nemo-entity-store", "nemo-entity-store"),
]
# Call the function to wait for nemo-datastore deployment to be ready
wait_for_rollouts(deployments)

## 4.3  LLM NIM Deployment

<center><img src="./images-dli/nim_service.png" style="width: 350px;"></center>


In [None]:
os.makedirs(f"llmops-nvidia/k8s-manifests/llama3-1-8b-instruct", exist_ok=True)

### 4.3.1 Create PVC and PV YAML file

In [None]:
nim_pvc_yaml = f"""
kind: PersistentVolume
apiVersion: v1
metadata:
  name: nim-llm-pv
  labels:
    type: local
spec:
  capacity:
    storage: "50Gi"
  storageClassName: manual
  accessModes:
    - "ReadWriteMany"
  persistentVolumeReclaimPolicy: Retain
  claimRef:
    namespace: llama3-1-8b-instruct
    name: nim-llm-pvc
  hostPath:
    path: "/inference-model-data"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nim-llm-pvc
  labels:
    app.kubernetes.io/name: nim-llm-pvc
spec:
  accessModes:
    - "ReadWriteMany"
  storageClassName: manual
  resources:
    requests:
      storage: "50Gi"
"""
with open("llmops-nvidia/k8s-manifests/llama3-1-8b-instruct/nim-pvc.yaml", "w") as f:
    f.write(nim_pvc_yaml)

### 4.3.2 Create NIMService YAML file

In [None]:
nim_service_yaml = f"""
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: meta-llama3-1-8b-instruct
spec:
  image:
    repository: "{minikube_ip}:30500/nvidia/nemo-microservices/llama-3.1-8b-instruct"
    tag: "1.5"
    pullPolicy: IfNotPresent
  authSecret: ngc-api
  env:
  - name: NIM_DISABLE_MODEL_DOWNLOAD
    value: "true"
  - name: NIM_IGNORE_MODEL_DOWNLOAD_FAIL 
    value: "true"
  - name: NIM_MANIFEST_PROFILE
    value: "7b8458eb682edb0d2a48b4019b098ba0bfbc4377aadeeaa11b346c63c7adf724"
  - name: NIM_MODEL_PROFILE
    value: "7b8458eb682edb0d2a48b4019b098ba0bfbc4377aadeeaa11b346c63c7adf724"
  - name: NIM_MODEL_NAME
    value: "/model-store"
  - name: NIM_PEFT_SOURCE
    value: http://nemo-entity-store.nemo-entity-store.svc.cluster.local:8000
  - name: NIM_PEFT_REFRESH_INTERVAL
    value: "30"
  storage:
    pvc:
      name: "nim-llm-pvc"
  replicas: 1
  resources:
    limits:
      nvidia.com/gpu: 1
  expose:
    service:
      type: ClusterIP
      port: 8000
"""
with open("llmops-nvidia/k8s-manifests/llama3-1-8b-instruct/nim-service.yaml", "w") as f:
    f.write(nim_service_yaml)

### 4.3.3 Create Ingress YAML file

In [None]:
os.makedirs(f"{ingress_base_dir}/llama3-1-8b-instruct", exist_ok=True)

In [None]:
nim_ingress_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: llama3-1-8b-instruct-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: llama3-1-8b-instruct.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: meta-llama3-1-8b-instruct
                port:
                  number: 8000
"""
with open(f"{ingress_base_dir}/llama3-1-8b-instruct/ingress.yaml", "w") as f:
    f.write(nim_ingress_yaml)

### 4.3.4 Create Application YAML file

In [None]:
os.makedirs(f"{applications_base_dir}/llama3-1-8b-instruct", exist_ok=True)

In [None]:
llm_nim_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: llama3-1-8b-instruct
  namespace: argocd
spec:
  project: default
  sources:
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/nvcr
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: k8s-manifests/llama3-1-8b-instruct
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: ingress/llama3-1-8b-instruct
      targetRevision: main
      directory:
        recurse: true 
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: llama3-1-8b-instruct
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open(f"{applications_base_dir}/llama3-1-8b-instruct/app.yaml", "w") as f:
    f.write(llm_nim_application_yaml)

### 4.3.4 Commit the added files

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add llm nim llama3-1-8b-instruct" && git push

### 4.3.5 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram. You will see `llama3-1-8b-instruct` application shown up

<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/llama3-1-8b-instruct.png" style="width: 500px; float: right">


### 4.3.6 Check applications via UI

[Open ArgoCD!](/argocd/applications)


<center><img src="./images-dli/argocd_nim.png" style="width: 1000px;"></center>


## 4.4 Deploy Nemo Evaluator

NVIDIA NeMo Evaluator is the one stop shop for evaluating your LLMs as part of the NeMo ecosystem. It enables real-time evaluations of your LLM application through APIs, guiding developers and researchers in refining and optimizing LLMs for enhanced performance and real-world applicability. The NeMo Evaluator APIs can be seamlessly automated within development pipelines, enabling faster iterations without the need for live data. It is cost effective and suitable for pre-deployment checks and regression testing.

<center><img src="./images-dli/nemo_eval.png" style="width: 800px;"></center>

    
```NeMo Evaluator depends on NVIDIA NIM for LLMs and NeMo Data Store```

A typical NeMo Evaluator workflow looks like the following:



1. (Optional) If you are using a custom dataset for evaluation, upload it to NeMo Data Store before you run an evaluation.

2. Create an evaluation target in NeMo Evaluator.

3. Create an evaluation configuration in NeMo Evaluator.

4. Run an evaluation job by submitting a request to NeMo Evaluator.

    - NeMo Evaluator downloads custom data, if any, from NeMo Data Store.

    - NeMo Evaluator runs inference with NIM for LLMs, Embeddings, and Reranking, depending on the model being evaluated.

    - NeMo Evaluator writes the results, including generations, logs, and metrics to NeMo Data Store.

    - NeMo Evaluator returns the results.

5. Get your results.

### 4.4.1 Create Ingress YAML file

In [None]:
os.makedirs(f"{ingress_base_dir}/nemo-evaluator", exist_ok=True)

In [None]:
evaluator_store_ingress_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nemo-evaluator-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: nemo-evaluator.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nemo-evaluator
                port:
                  number: 7331
"""
with open(f"{ingress_base_dir}/nemo-evaluator/ingress.yaml", "w") as f:
    f.write(evaluator_store_ingress_yaml)

### 4.4.2 Create Application YAML file

In [None]:
os.makedirs(f"{applications_base_dir}/nemo-evaluator", exist_ok=True)

In [None]:
evaluator_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: nemo-evaluator
  namespace: argocd
spec:
  project: default
  sources:
    - chart: nemo-evaluator
      repoURL: 'docker-registry.registry.svc.cluster.local/nvidia/nemo-microservices/charts'
      targetRevision: "0.1.19"
      helm:
        releaseName: nemo-evaluator
        valuesObject:
          image:
            repository: {minikube_ip}:30500/nvidia/nemo-microservices/evaluation-ms
            tag: 25.01
          imagePullSecrets:
            - name: nvcrimagepullsecret
          argoWorkflows:
            enabled: false
          external:
            dataStore:
              endpoint: "http://nemo-datastore.nemo-datastore.svc.cluster.local:3000/v1/hf"
            argoWorkflows:
              endpoint: "http://argo-workflows-server.argoworkflows.svc.cluster.local:2746"
            milvus:
              endpoint: "http://milvus.milvus.svc.cluster.local:19530"
            postgres:
              host: "postgresql.postgresql.svc.cluster.local"
              port: 5432
              auth:
                username: "nemo-user"
                database: "evaluation"
                existingSecret: "existing-postgres-auth-secret"
                secretKeys:
                  userPasswordKey: "postgresql-admin-password"
          milvus:
            enabled: false
          postgresql:
            enabled: false
          argoServiceAccount:
            create: true
            name: workflow-executor

    - repoURL: '{git_repo_url_ssh}'
      path: secrets/postgres
      targetRevision: main
      directory:
        recurse: true   
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/nvcr
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: ingress/nemo-evaluator
      targetRevision: main
      directory:
        recurse: true 
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: nemo-evaluator
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open(f"{applications_base_dir}/nemo-evaluator/app.yaml", "w") as f:
    f.write(evaluator_application_yaml)

### 4.4.3 Commit the added files

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add nemo-evaluator" && git push

### 4.4.4 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram. You will see `nemo-evaluator` application shown up

<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/nemo-evaluator.png" style="width: 500px; float: right">


### 4.4.5 Check applications via UI

[Open ArgoCD!](/argocd/applications)

<center><img src="./images-dli/argocd_app_evaluator.png" style="width: 1000px;"></center>



## 4.5 Deploy Nemo Customizer

<center><img src="./images-dli/nemo_customizer.png" style="width: 800px;"></center>

NVIDIA NeMo Customizer (NeMo Customizer) gives you the power to take the state-of-the-art large language models and condition them to your application needs. You can balance your compute requirements with your particular performance needs by selecting among a range of models, model sizes and fine-tuning techniques. NeMo Customizer creates models that can easily integrate with NVIDIA NIM for LLMs. 




In [None]:
os.makedirs(f"llmops-nvidia/k8s-manifests/nemo-customizer", exist_ok=True)

### 4.5.1 Create PV YAML file

In [None]:
customizer_pv_yaml = f"""
kind: PersistentVolume
apiVersion: v1
metadata:
  name: finetuning-ms-models-pv
  labels:
    type: local
spec:
  capacity:
    storage: "50Gi"
  storageClassName: manual
  accessModes:
    - "ReadWriteMany"
  persistentVolumeReclaimPolicy: Retain
  claimRef:
    namespace: nemo-customizer
    name: finetuning-ms-models-pvc
  hostPath:
    path: "/finetuning-ms-models"

"""
with open("llmops-nvidia/k8s-manifests/nemo-customizer/pv.yaml", "w") as f:
    f.write(customizer_pv_yaml)

### 4.5.2 Create Ingress YAML file

In [None]:
os.makedirs(f"{ingress_base_dir}/nemo-customizer", exist_ok=True)

In [None]:
customizer_ingress_yaml = f"""
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nemo-customizer-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
    - host: nemo-customizer.local
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nemo-customizer-api 
                port:
                  number: 8000
"""
with open(f"{ingress_base_dir}/nemo-customizer/ingress.yaml", "w") as f:
    f.write(customizer_ingress_yaml)

### 4.5.3 Create Application YAML file

In [None]:
os.makedirs(f"{applications_base_dir}/nemo-customizer", exist_ok=True)

In [None]:
customizer_application_yaml = f"""
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: nemo-customizer
  namespace: argocd
spec:
  project: default
  sources:
    - chart: nemo-customizer
      repoURL: 'docker-registry.registry.svc.cluster.local/nvidia/nemo-microservices/charts'
      targetRevision: "0.9.0-alpha.22"
      helm:
        releaseName: nemo-customizer
        valuesObject:
          global:
            imagePullSecrets:
              - name: nvcrimagepullsecret
            storageClass: "standard"

          # This needs only one install per cluster
          volcano:
            enabled: false

          # Configure the PVC for models mount, where we store the parent/base models.
          modelsStorage:
            enabled: true
            storageClassName: "manual"
            size: 50Gi
            accessModes:
              - ReadWriteMany
          # Configure the PVC for a training job's workspace, where we store the checkpoints and fine-tuned model.
          workspaceStorage:
            enabled: true
            storageClassName: "standard"
            size: 50Gi
            accessModes:
              - ReadWriteMany

          postgresql:
            # Tells our Helm chart to leverage the information in externalDatabase instead.
            enabled: false
          image:
            registry: {minikube_ip}:30500
            repository: nvidia/nemo-microservices/customizer
            tag: 25.01

          customizerConfig:
            nemoOperatorURL: "http://nemo-deployment-management.nemo-deployment-management.svc.cluster.local:8000"
            nemoDataStoreURL: "http://nemo-datastore.nemo-datastore.svc.cluster.local:3000"
            entityStoreURL: "http://nemo-entity-store.nemo-entity-store.svc.cluster.local:8000"
            mlflowURL: "http://mlflow-tracking.mlflow.svc.cluster.local:80"
            models:
              meta/llama-3.1-8b-instruct:
                # dataStoreURL is the internal K8s DNS record for the Data Store service.
                enabled: true
                model_path: llama-3_1-8b-instruct
                finetuning_types:
                  - lora
                num_gpus: 1
                micro_batch_size: 1
                tensor_parallel_size: 1
                max_seq_length: 4096
            training:
              pvc:
                storageClass: "standard"

          nemoDataStoreTools:
            registry: {minikube_ip}:30500
            repository: nvidia/nemo-microservices/nds-v2-huggingface-cli
            tag: 25.01
            pullSecrets:
            - name: nvcrimagepullsecret

          # Configure the custom volcano scheduling queue
          customizerQueue:
            weight: 5
            capability:
              nvidiaGPU: 4
              mlnxnics: 8

          externalDatabase:
            host: postgresql.postgresql.svc.cluster.local
            port: 5432
            user: nemo-user
            database: finetuning
            existingSecret: "existing-postgres-auth-secret" 
            existingSecretPasswordKey: "postgresql-user-password" 
    - repoURL: '{git_repo_url_ssh}'
      path: k8s-manifests/nemo-customizer
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/postgres
      targetRevision: main
      directory:
        recurse: true   
    - repoURL: '{git_repo_url_ssh}'
      path: secrets/nvcr
      targetRevision: main
      directory:
        recurse: true  
    - repoURL: '{git_repo_url_ssh}'
      path: ingress/nemo-customizer
      targetRevision: main
      directory:
        recurse: true 
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: nemo-customizer
  syncPolicy:
    syncOptions:
    - Validate=false
    - CreateNamespace=true
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
"""
with open(f"{applications_base_dir}/nemo-customizer/app.yaml", "w") as f:
    f.write(customizer_application_yaml)

### 4.5.4 Commit the added files

In [None]:
!git config --global user.email $commit_email
!git config --global user.name $commit_user
!cd llmops-nvidia/ && git add . && git commit -m "add nemo-customizer" && git push

### 4.5.5 Sync vi UI
As we have commit our code to Git, argocd usually sync automatically after every 5 minutes. But we can force it to sync either via UI or CLI. 

Here we use UI to sync. 

Click on the sycn button on the `argocd-components` application as shown in diagram. You will see `nemo-customizer` application shown up

<img src="./images-dli/sync-apps.png" style="width: 435px; float: left">
<img src="./images-dli/nemo-customizer-app.png" style="width: 500px; float: right">


### 4.5.6 Check applications via UI

[Open ArgoCD!](/argocd/applications)

<center><img src="./images-dli/argocd_app_customizer.png" style="width: 1000px;"></center>



## 4.6 Add FQDN of Nemo Microservices for Ingress

### 4.6.1 Check all ingress

Example output:
```
NAMESPACE              NAME                           CLASS   HOSTS                        ADDRESS        PORTS   AGE
llama3-1-8b-instruct   llama3-1-8b-instruct-ingress   nginx   llama3-1-8b-instruct.local   192.168.49.2   80      9m3s
minio                  minio-ingress                  nginx   minio.local                  192.168.49.2   80      63m
nemo-customizer        nemo-customizer-ingress        nginx   nemo-customizer.local        192.168.49.2   80      76s
nemo-datastore         nemo-datastore-ingress         nginx   nemo-datastore.local         192.168.49.2   80      48m
nemo-entity-store      nemo-entity-store-ingress      nginx   nemo-entity-store.local      192.168.49.2   80      12m
nemo-evaluator         nemo-evaluator-ingress         nginx   nemo-evaluator.local         192.168.49.2   80      3m32s
```

In [None]:
!kubectl get ingress -A

### 4.6.2 Add FQDN to /etc/hosts (workaround for lab)

In [None]:
!echo "$(minikube ip) minio.local" | sudo tee -a /etc/hosts
!echo "$(minikube ip) nemo-datastore.local" | sudo tee -a /etc/hosts
!echo "$(minikube ip) nemo-entity-store.local" | sudo tee -a /etc/hosts
!echo "$(minikube ip) llama3-1-8b-instruct.local" | sudo tee -a /etc/hosts
!echo "$(minikube ip) nemo-evaluator.local" | sudo tee -a /etc/hosts
!echo "$(minikube ip) nemo-customizer.local" | sudo tee -a /etc/hosts


### 4.6.3 Check /etc/hosts

In [None]:
!cat /etc/hosts

---
<h2 style="color:green;">Congratulations!</h2>

You've made it through the fourth Notebook. In this notebook, you have:
- Deployed various Nemo Micrsoservices as ArgoCD applications

Next, you'll see learn to finetune a foundation model using Nemo Microservices .

Move on to [05_Fine_Tuning_Nemo_MS.ipynb](05_Fine_Tuning_Nemo_MS.ipynb)

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>