# Download the Models

The first step in this demo is to download the models you will use. You will
download two models:

- An Large Language Model (LLM) model to support the agent's decisions as well
  as the chatbot's responses.
- An Embeddings model to embed the users queries so the agent can retrieve the
  most relevant information from the vector store.

To this end, you will need the following:

- An NGC API Key, to download the Embeddings model.
- A Hugging Face API Key, to download the LLM.

In [None]:
import base64
import getpass
import subprocess

In [None]:
NGC_API_KEY = getpass.getpass("Enter your NGC API key: ")

In [None]:
HF_API_KEY = getpass.getpass("Enter your Hugging Face API key: ")

Next, you need to create:

- An NGC CLI secret to store the NGC API Key. You will use this secret to
  download the Embeddings model.
- A Hugging Face CLI secret to store the Hugging Face API Key. You will use
  this secret to download the LLM.
- An image pull secret to download the NVIDIA-specific containers you will use
  throughout this tutorial.

In [None]:
ngc_cli_secret = """
apiVersion: v1
kind: Secret
metadata:
  name: ngc-cli-secret
type: Opaque
data:
  NGC_CLI_API_KEY: {0}
""".format(base64.b64encode(NGC_API_KEY.encode()).decode())

with open("ngc-cli-secret.yaml", "w") as f:
    f.write(ngc_cli_secret)

subprocess.run(["kubectl", "apply", "-f", "ngc-cli-secret.yaml"])

In [None]:
hf_cli_secret = """apiVersion: v1
kind: Secret
metadata:
  name: hf-cli-secret
type: Opaque
data:
  HF_CLI_API_KEY: {0}
""".format(base64.b64encode(HF_API_KEY.encode()).decode())

with open("hf-cli-secret.yaml", "w") as f:
    f.write(hf_cli_secret)

subprocess.run(["kubectl", "apply", "-f", "hf-cli-secret.yaml"])

In [None]:
NVCR_SECRET = """
{{"auths":{{"nvcr.io":{{"username":"$oauthtoken","password":"{0}"}}}}}}
"""

ngc_secret = """apiVersion: v1
kind: Secret
metadata:
  name: ngc-secret
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: {0}
""".format(base64.b64encode(NVCR_SECRET.format(NGC_API_KEY).encode()).decode())

with open("ngc-secret.yaml", "w") as f:
    f.write(ngc_secret)

subprocess.run(["kubectl", "apply", "-f", "ngc-secret.yaml"])

Next, you will create a PVC that will store the models you download. This PVC
will be mounted to the containers you will use in this tutorial.

In [None]:
model_repository = """
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-repository
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 500Gi
  storageClassName: dataplatform
"""

with open("model-repository.yaml", "w") as f:
    f.write(model_repository)

subprocess.run(["kubectl", "apply", "-f", "model-repository.yaml"])

In [None]:
ngc_downloader_job = """apiVersion: batch/v1
kind: Job
metadata:
  name: ngc-downloader
spec:
  template:
    spec:
      containers:
      - name: ngc-downloader
        image: nvcr.io/ohlfw0olaadg/ea-participants/ngc-cli:v3.41.2
        command:
        - "/bin/sh"
        - "-c"
        - |
          echo "Creating directory /mnt/model-repo/model-store"
          mkdir -p /mnt/model-repo/model-store
          echo "Downloading embeddings model..."
          ngc registry model download-version --dest /mnt/model-repo/model-store ohlfw0olaadg/ea-participants/nv-embed-qa:4
          echo "Model downloaded successfully."
        volumeMounts:
        - name: model-volume
          mountPath: /mnt/model-repo
        env:
        - name: NGC_CLI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ngc-cli-secret
              key: NGC_CLI_API_KEY
        - name: NGC_CLI_ORG
          value: "nemo-microservice (ohlfw0olaadg)"
        securityContext:
          runAsUser: 0
      restartPolicy: Never
      imagePullSecrets:
      - name: ngc-secret
      volumes:
      - name: model-volume
        persistentVolumeClaim:
          claimName: model-repository
"""

with open("ngc-downloader-job.yaml", "w") as f:
    f.write(ngc_downloader_job)

subprocess.run(["kubectl", "apply", "-f", "ngc-downloader-job.yaml"])

In [None]:
hf_downloader_job = """apiVersion: batch/v1
kind: Job
metadata:
  name: hf-downloader
spec:
  template:
    spec:
      containers:
      - name: hf-downloader
        image: python:3.12.3
        command:
        - "/bin/sh"
        - "-c"
        - |
          echo "Installing the HuggingFace CLI..."
          pip install -U huggingface_hub[cli]
          echo "Creating directory /mnt/model-repo/Llama-2-7b-chat-hf"
          mkdir -p /mnt/model-repo/Llama-2-7b-chat-hf
          echo "Downloading Llama 2 7B Chat from HuggingFace HUB..."
          huggingface-cli download meta-llama/Llama-2-7b-chat-hf --local-dir /mnt/model-repo/Llama-2-7b-chat-hf
          echo "Model downloaded successfully."
        volumeMounts:
        - name: model-volume
          mountPath: /mnt/model-repo
        env:
        - name: HF_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-cli-secret
              key: HF_CLI_API_KEY
        securityContext:
          runAsUser: 0
      restartPolicy: Never
      volumes:
      - name: model-volume
        persistentVolumeClaim:
          claimName: model-repository
"""

with open("hf-downloader-job.yaml", "w") as f:
    f.write(hf_downloader_job)

subprocess.run(["kubectl", "apply", "-f", "hf-downloader-job.yaml"])