# Deploying NV-Ingest On A Single NVIDIA GPU

This document covers the process of deploying NV-Ingest On A Single GPU. A few prerequisites are expected before running through this notebook, the notebook will cover the process of configuring GPU sharing via time-slicing using the NVIDIA GPU Operator. From there, NV-Ingest will be deployed on a local kubernetes cluster that is exposed to a single GPU. A few prerequisite assumptions include the following:

- Access To At Least 1 GPU (An A100 80GB GPU is used in this walkthrough)
- Access To A GPU Enabled Kubernetes Cluster
- Access to `kubectl` and `helm` CLI's
- Access To A Default Storage Class For Dynamic Volume Provisioning - The application will use PVC's that pods will be mounted to
- Access To Single GPU Node Pools - Access To Single GPU Node Pools is recommended for this walkthrough. Time-slicing will advertise a single GPU as a bunch of replicas (slices) but this is all still under a single GPU, so when scheduling it's gauranteed that all the applications requesting resources will all get scheduled onto the same GPU. Adding another GPU into the mix will not guarantee that all workloads will be scheduled onto a single GPU.  GPU scheduling 

## Import Dependencies

In [12]:
%%bash

az extension add --name aks-preview
az extension update --name aks-preview



In [None]:
import os

## Set NGC API Key

An API key is needed to pull resources from NGC, we'll set this so it can be used to execute shell commands as an env variable

In [1]:
!pip install python-dotenv



In [2]:
import os
from dotenv import load_dotenv

# os.environ["NGC_API_KEY"] = "nvapi-xxxxx"

# Load environment variables from .env file
load_dotenv()

True

In [3]:

os.environ["REGION"] = "westeurope"
os.environ["RESOURCE_GROUP"] = "rg-aks-pvc"
os.environ["ZONE"] = "2"
os.environ["CPU_COUNT"] = "1"
os.environ["CLUSTER_NAME"] = "akspvc"

In [10]:
!  az group create -l $REGION -n $RESOURCE_GROUP


{
  "id": "/subscriptions/b7d41fc8-d35d-41db-92ed-1f7f1d32d4d9/resourceGroups/rg-aks-pvc",
  "location": "westeurope",
  "managedBy": null,
  "name": "rg-aks-pvc",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}


In [None]:
#!  az group delete --resource-group $RESOURCE_GROUP --yes 


[K \ Finished ..

In [12]:
! az aks create -g  $RESOURCE_GROUP -n $CLUSTER_NAME --location $REGION --zones $ZONE --node-count $CPU_COUNT --enable-node-public-ip  --node-vm-size Standard_D4s_v3 --ssh-key-value ~/.ssh/id_rsa.pub

[33mThe default value of '--node-vm-size' will be changed to 'Dynamically Selected By Azure' from 'Standard_DS2_V2 (Linux), Standard_DS2_V3 (Windows)' in next breaking change release(2.73.0) scheduled for May 2025.[0m
[93mThe behavior of this command has been altered by the following extension: aks-preview[0m
[93mThe new node pool will enable SSH access, recommended to use '--ssh-access disabled' option to disable SSH access for the node pool to make it more secure.[0m
[K{- Finished ..
  "aadProfile": null,
  "addonProfiles": null,
  "agentPoolProfiles": [
    {
      "artifactStreamingProfile": null,
      "availabilityZones": [
        "2"
      ],
      "capacityReservationGroupId": null,
      "count": 1,
      "creationData": null,
      "currentOrchestratorVersion": "1.31.7",
      "eTag": "0bdd96f7-e4c0-45e0-8fdd-8faca7a1cdb0",
      "enableAutoScaling": false,
      "enableCustomCaTrust": false,
      "enableEncryptionAtHost": false,
      "enableFips": false,
      "ena

In [13]:
!az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME


[93mThe behavior of this command has been altered by the following extension: aks-preview[0m
[93m/Users/azeltov/.kube/config has permissions "644".
It should be readable and writable only by its owner.[0m
[93mMerged "akspvc" as current context in /Users/azeltov/.kube/config[0m


In [14]:
!kubectl config get-contexts


CURRENT   NAME      CLUSTER   AUTHINFO                          NAMESPACE
          aksnemo   aksnemo   clusterUser_rg-aks-nemo_aksnemo   
*         akspvc    akspvc    clusterUser_rg-aks-pvc_akspvc     


In [15]:
os.environ["STORAGE_ACCOUNT_NAME"] = "stgmodelweights"
os.environ["FILE_SHARE_NAME"] = "huggingface-models"

## Create a storage account (supports Azure Files)


In [16]:
!az storage account create \
  --resource-group $RESOURCE_GROUP \
  --name $STORAGE_ACCOUNT_NAME \
  --sku Standard_LRS \
  --kind StorageV2

[K{/ Finished ..
  "accessTier": "Hot",
  "accountMigrationInProgress": null,
  "allowBlobPublicAccess": false,
  "allowCrossTenantReplication": false,
  "allowSharedKeyAccess": null,
  "allowedCopyScope": null,
  "azureFilesIdentityBasedAuthentication": null,
  "blobRestoreStatus": null,
  "creationTime": "2025-05-02T16:04:16.278312+00:00",
  "customDomain": null,
  "defaultToOAuthAuthentication": null,
  "dnsEndpointType": null,
  "enableExtendedGroups": null,
  "enableHttpsTrafficOnly": true,
  "enableNfsV3": null,
  "encryption": {
    "encryptionIdentity": null,
    "keySource": "Microsoft.Storage",
    "keyVaultProperties": null,
    "requireInfrastructureEncryption": null,
    "services": {
      "blob": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2025-05-02T16:04:16.465809+00:00"
      },
      "file": {
        "enabled": true,
        "keyType": "Account",
        "lastEnabledTime": "2025-05-02T16:04:16.465809+00:00"
      },
      "q

In [None]:
%%bash
# Get the account key
ACCOUNT_KEY=$(az storage account keys list \
  --resource-group $RESOURCE_GROUP \
  --account-name $STORAGE_ACCOUNT_NAME \
  --query '[0].value' -o tsv)

#echo $ACCOUNT_KEY

# Create the file share
az storage share create \
  --name $FILE_SHARE_NAME \
  --account-name $STORAGE_ACCOUNT_NAME \
  --account-key $ACCOUNT_KEY

First, let's update the storage account's network rules to allow access from the AKS cluster's virtual network:

In [None]:
!az storage account update --name stgmodelweights --resource-group rg-aks-pvc --default-action Allow

In [20]:
!kubectl create namespace nim

namespace/nim created


In [21]:
%%bash
Now create a secret in your AKS cluster with the storage credentials:
kubectl create secret generic azure-secret \
  --from-literal=azurestorageaccountname=$STORAGE_ACCOUNT_NAME \
  --from-literal=azurestorageaccountkey=$ACCOUNT_KEY \
  -n nim  

bash: line 1: Now: command not found


secret/azure-secret created


In [22]:
!mkdir -p manifests

In [5]:
%%bash
cat << EOF > manifests/azurefile-pv-pvc.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: hf-models-pv
  labels:
    volume: hf-models
spec:
  capacity:
    storage: 600Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: azurefile
  volumeMode: Filesystem
  azureFile:
    secretName: azure-secret
    shareName: huggingface-models
    readOnly: false

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hf-models-pvc
  namespace: nim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  volumeMode: Filesystem
  resources:
    requests:
      storage: 600Gi
  selector:
    matchLabels:
      volume: hf-models
EOF

In [6]:
!kubectl apply -f manifests/azurefile-pv-pvc.yaml



persistentvolume/hf-models-pv created
persistentvolumeclaim/hf-models-pvc created


In [7]:
%%bash
cat << EOF > manifests/debug-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: debug-hf-pod
spec:
  containers:
  - name: writer
    image: mcr.microsoft.com/oss/ubuntu/bash:latest
    command: [ "bash", "-c", "--" ]
    args: [ "while true; do sleep 3600; done;" ]
    volumeMounts:
    - name: modelvolume
      mountPath: /mnt/models
  volumes:
  - name: modelvolume
    persistentVolumeClaim:
      claimName: hf-models-pvc
  restartPolicy: Never
EOF

In [8]:
!kubectl apply -f manifests/debug-pod.yaml

pod/debug-hf-pod created


In [9]:
%%bash
cat << EOF > manifests/debug-blob-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pvc-debugger
  namespace: nim
spec:
  containers:
  - name: debug
    image: ubuntu
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 30; done;"]
    volumeMounts:
    - name: hf-volume
      mountPath: /mnt/hf
  volumes:
  - name: hf-volume
    persistentVolumeClaim:
      claimName: hf-models-pvc
  restartPolicy: Never
EOF

In [10]:
!kubectl apply -f manifests/debug-blob-pod.yaml

pod/pvc-debugger created


once it’s running, exec into the pod:
```
kubectl exec -n nim -it pvc-debugger -- bash
```
Inside the pod, check the mounted path:
```
df -h /mnt/hf

root@pvc-debugger:/# df -h /mnt/hf
Filesystem                                                  Size  Used Avail Use% Mounted on
//stgmodelweights.file.core.windows.net/huggingface-models  100T     0  100T   0% /mnt/hf

ls -la /mnt/hf

touch /mnt/hf/testfile
```

In [38]:
!az aks stop --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME

[93mThe behavior of this command has been altered by the following extension: aks-preview[0m
[K \ Finished ..

In [39]:
!az aks start --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME

[93mThe behavior of this command has been altered by the following extension: aks-preview[0m
[K - Finished ..

In [41]:
!kubectl get pods -n nim    

No resources found in nim namespace.


In [43]:
!kubectl apply -f manifests/debug-blob-pod.yaml

pod/pvc-debugger created


In [44]:
%%bash
az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name gpupool --node-count 1 --skip-gpu-driver-install --node-vm-size standard_nc24ads_a100_v4 --node-osdisk-size 2048 --max-pods 110



{
  "artifactStreamingProfile": null,
  "availabilityZones": null,
  "capacityReservationGroupId": null,
  "count": 1,
  "creationData": null,
  "currentOrchestratorVersion": "1.31.7",
  "eTag": "93ec221e-1f9e-4460-9f45-a4cbcf9a8b1c",
  "enableAutoScaling": false,
  "enableCustomCaTrust": false,
  "enableEncryptionAtHost": false,
  "enableFips": false,
  "enableNodePublicIp": false,
  "enableUltraSsd": false,
  "gatewayProfile": null,
  "gpuInstanceProfile": null,
  "gpuProfile": {
    "driverType": "",
    "installGpuDriver": false
  },
  "hostGroupId": null,
  "id": "/subscriptions/b7d41fc8-d35d-41db-92ed-1f7f1d32d4d9/resourcegroups/rg-aks-pvc/providers/Microsoft.ContainerService/managedClusters/akspvc/agentPools/gpupool",
  "kubeletConfig": null,
  "kubeletDiskType": "OS",
  "linuxOsConfig": null,
  "maxCount": null,
  "maxPods": 110,
  "messageOfTheDay": null,
  "minCount": null,
  "mode": "User",
  "name": "gpupool",
  "networkProfile": {
    "allowedHostPorts": null,
    "applica

## Install NVIDIA GPU Operator

The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GFD, DCGM based monitoring and others.

### Add NVIDIA Helm Repository

In [45]:
%%bash
# add nvidia helm repo
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --pass-credentials && helm repo update


"nvidia" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvdp" chart repository
...Successfully got an update from the "nvgfd" chart repository
...Successfully got an update from the "milvus" chart repository
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!���


### Install NVIDIA GPU Operator

With the helm repo added, we can install the NVIDIA GPU Operator. The configurations below are geared towards installing the operator in a scenario where drivers are installed on the node pools machines and will not be managed by the operator. In scenarios where the drivers are managed by the operator, please disreagrd the `--set` command fields.

In [46]:
!helm install --create-namespace --namespace gpu-operator nvidia/gpu-operator --wait --generate-name


NAME: gpu-operator-1746210529
LAST DEPLOYED: Fri May  2 14:28:53 2025
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None


### Verify Installation

In [47]:
!kubectl get pods -n gpu-operator

NAME                                                              READY   STATUS        RESTARTS   AGE
gpu-feature-discovery-crwq6                                       0/1     Terminating   0          10s
gpu-operator-1746210529-node-feature-discovery-gc-6fb49447thx28   1/1     Running       0          24s
gpu-operator-1746210529-node-feature-discovery-master-686cdnrgk   1/1     Running       0          24s
gpu-operator-1746210529-node-feature-discovery-worker-2q7xb       1/1     Running       0          24s
gpu-operator-1746210529-node-feature-discovery-worker-dghg7       1/1     Running       0          24s
gpu-operator-5bcb97589d-sbt5p                                     1/1     Running       0          24s
nvidia-container-toolkit-daemonset-d6tk2                          0/1     Terminating   0          11s
nvidia-dcgm-exporter-f5xj4                                        0/1     Terminating   0          10s
nvidia-device-plugin-daemonset-vflr5                              0/1    

In [4]:
!kubectl describe node

Name:               aks-gpupool-75697254-vmss000000
Roles:              <none>
Labels:             accelerator=nvidia
                    agentpool=gpupool
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=standard_nc24ads_a100_v4
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=westeurope
                    failure-domain.beta.kubernetes.io/zone=0
                    feature.node.kubernetes.io/cpu-cpuid.ADX=true
                    feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX=true
                    feature.node.kubernetes.io/cpu-cpuid.AVX2=true
                    feature.node.kubernetes.io/cpu-cpuid.CETSS=true
                    feature.node.kubernetes.io/cpu-cpuid.CLZERO=true
                    feature.node.kubernetes.io/cpu-cpuid.CMPXCHG8=true
                    feature.node.kubernetes.io/cpu-cpuid.FMA

## Confirm Default StorageClass

We need to confirm we have access to a default storage class that can be used to enable dynamic volume provisioning - this will be useful for provisioning the PVC's the nv-ingest pods will be bound to. There should be at least one storage class returned with the `(default)` marker next to it.

In [49]:
%%bash
kubectl get storageclass

NAME                    PROVISIONER          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
azurefile               file.csi.azure.com   Delete          Immediate              true                   150m
azurefile-csi           file.csi.azure.com   Delete          Immediate              true                   150m
azurefile-csi-premium   file.csi.azure.com   Delete          Immediate              true                   150m
azurefile-premium       file.csi.azure.com   Delete          Immediate              true                   150m
default (default)       disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   150m
managed                 disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   150m
managed-csi             disk.csi.azure.com   Delete          WaitForFirstConsumer   true                   150m
managed-csi-premium     disk.csi.azure.com   Delete          WaitForFirstConsumer   true                 

Fetch the NIM LLM Helm chart
Now that we've configured the NGC API key, we can download the NIM LLM Helm chart from NGC using the following command:

In [5]:
!helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.7.0.tgz --username='$oauthtoken' --password=$NGC_CLI_API_KEY

In order to configure and launch an NVIDIA NIM, it is important to configure the secrets we’ll need to pull all the model artifacts directly from NGC. This can be done using your NGC API key:

In [6]:
%%bash
kubectl create secret docker-registry registry-secret --docker-server=nvcr.io --docker-username='$oauthtoken'     --docker-password=$NGC_CLI_API_KEY -n nim
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_CLI_API_KEY -n nim

secret/registry-secret created
secret/ngc-api created


In [31]:
%%bash
# create nim_custom_value.yaml manifest
cat <<EOF > nim_custom_value.yaml
image:
  repository: "nvcr.io/nim/meta/llama-3.1-8b-instruct" # container location
  tag: 1.3.3 # NIM version you want to deploy

model:
  ngcAPISecret: ngc-api  # name of a secret in the cluster that includes a key named NGC_CLI_API_KEY and is an NGC API key

# Disable default persistence since we're using pre-created PV/PVC
persistence:
  enabled: false

imagePullSecrets:
  - name: registry-secret # name of a secret used to pull nvcr.io images

resources:
  limits:
    nvidia.com/gpu: 1

env:
  - name: NIM_CACHE_PATH
    value: /mnt/models


extraVolumeMounts:
  - name: hf-models
    mountPath: /mnt/models

extraVolumes:
  - name: hf-models
    persistentVolumeClaim:
      claimName: hf-models-pvc
EOF

In [32]:
!helm install my-nim nim-llm-1.7.0.tgz -f nim_custom_value.yaml --namespace nim

NAME: my-nim
LAST DEPLOYED: Mon May  5 10:03:03 2025
NAMESPACE: nim
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing nim-llm.

**************************************************
| It may take some time for pods to become ready |
| while model files download                     |
**************************************************

Your NIM version is: 1.3.3


In [11]:
!kubectl get pods -n nim

NAME               READY   STATUS             RESTARTS      AGE
my-nim-nim-llm-0   0/1     CrashLoopBackOff   4 (45s ago)   2m31s
pvc-debugger       1/1     Running            0             4m3s


In [12]:
# Describe the pod to check its status and events
!kubectl describe pod my-nim-nim-llm-0 -n nim

Name:             my-nim-nim-llm-0
Namespace:        nim
Priority:         0
Service Account:  default
Node:             aks-gpupool-75697254-vmss000000/10.224.0.5
Start Time:       Mon, 05 May 2025 12:13:19 -0400
Labels:           app.kubernetes.io/instance=my-nim
                  app.kubernetes.io/name=nim-llm
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=my-nim-nim-llm-87d7c6ccf
                  statefulset.kubernetes.io/pod-name=my-nim-nim-llm-0
Annotations:      <none>
Status:           Running
IP:               10.244.1.124
IPs:
  IP:           10.244.1.124
Controlled By:  StatefulSet/my-nim-nim-llm
Containers:
  nim-llm:
    Container ID:   containerd://6c7e26c21652d37ce4be965d646bec56644721db5f406f8f79b2f2d96ec51ddc
    Image:          nvcr.io/nim/meta/llama-3.1-8b-instruct:1.3.3
    Image ID:       nvcr.io/nim/meta/llama-3.1-8b-instruct@sha256:088a7f400286291e9c1512e326596f4caedd2ed7aa30b909d78cbd3727b55ee8
    Port:           80

In [36]:
!kubectl logs my-nim-nim-llm-0 -n nim

Error from server (BadRequest): container "nim-llm" in pod "my-nim-nim-llm-0" is waiting to start: ContainerCreating


In [13]:
!kubectl logs my-nim-nim-llm-0 -n nim


== NVIDIA Inference Microservice LLM NIM ==

NVIDIA Inference Microservice LLM NIM Version 1.3.0
Model: meta/llama-3.1-8b-instruct

Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

The NIM container is governed by the NVIDIA Software License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement) and the Product Specific Terms for AI Products (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products).

A copy of this license can be found under /opt/nim/LICENSE.

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement).

ADDITIONAL INFORMATION: Llama 3.1 Community License Agreement, Built with Llama.
Traceback (most recent call last):
  File "/opt/nim/llm/.venv/bin/nim-llm-che

In [14]:
!kubectl get pvc,pv -n nim

NAME                                  STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/hf-models-pvc   Bound    hf-models-pv   600Gi      RWX            azurefile      <unset>                 8m44s

NAME                            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/hf-models-pv   600Gi      RWX            Retain           Bound    nim/hf-models-pvc   azurefile      <unset>                          8m45s


In [27]:
!helm uninstall my-nim -n nim

release "my-nim" uninstalled


In [None]:
!az aks nodepool stop --resource-group rg-aks-pvc --cluster-name akspvc --name gpupool

In [15]:
!kubectl get pods -n nim

NAME               READY   STATUS    RESTARTS   AGE
my-nim-nim-llm-0   0/1     Pending   0          2m12s


In [16]:
!az aks nodepool start --resource-group rg-aks-pvc --cluster-name akspvc --name gpupool

[93mThe behavior of this command has been altered by the following extension: aks-preview[0m
[K | Finished ..{
  "artifactStreamingProfile": null,
  "availabilityZones": null,
  "capacityReservationGroupId": null,
  "count": 1,
  "creationData": null,
  "currentOrchestratorVersion": "1.31.7",
  "eTag": "c3d804f1-641d-49f7-9e9e-11112a45d34f",
  "enableAutoScaling": false,
  "enableCustomCaTrust": false,
  "enableEncryptionAtHost": false,
  "enableFips": false,
  "enableNodePublicIp": false,
  "enableUltraSsd": false,
  "gatewayProfile": null,
  "gpuInstanceProfile": null,
  "gpuProfile": {
    "driverType": "",
    "installGpuDriver": false
  },
  "hostGroupId": null,
  "id": "/subscriptions/b7d41fc8-d35d-41db-92ed-1f7f1d32d4d9/resourcegroups/rg-aks-pvc/providers/Microsoft.ContainerService/managedClusters/akspvc/agentPools/gpupool",
  "kubeletConfig": null,
  "kubeletDiskType": "OS",
  "linuxOsConfig": null,
  "maxCount": null,
  "maxPods": 110,
  "messageOfTheDay": null,
  "minCount

In [18]:
!kubectl get pods -n nim

NAME               READY   STATUS    RESTARTS   AGE
my-nim-nim-llm-0   1/1     Running   0          26m


In [20]:
!kubectl get svc -n nim

NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
my-nim-nim-llm       ClusterIP   10.0.24.148   <none>        8000/TCP   76m
my-nim-nim-llm-sts   ClusterIP   None          <none>        8000/TCP   76m


In [26]:
# kubectl port-forward svc/my-nim-nim-llm 8000:8000 -n nim

In [25]:
%%bash
curl -X 'POST' \
'http://localhost:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
    {
    "content": "You are a polite and respectful chatbot helping people plan a vacation.",
    "role": "system"
    },
    {
    "content": "What should I do for a 4 day vacation in Spain?",
    "role": "user"
    }
],
"model": "meta/llama-3.1-8b-instruct",
"max_tokens": 512,
"top_p": 1,
"n": 1,
"stream": false,
"frequency_penalty": 0.0
}'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2820  100  2459  100   361    255     37  0:00:09  0:00:09 --:--:--   559  100   361    255     37  0:00:09  0:00:09 --:--:--   725


{"id":"chat-d11de1e79e0949f695dfd626b9f58221","object":"chat.completion","created":1746468631,"model":"meta/llama-3.1-8b-instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Spain is a wonderful destination! With 4 days, you'll have a great taste of the culture, food, and beauty of this beautiful country. Here are a few itinerary suggestions, depending on your interests:\n\n**Option 1: Explore Madrid**\n\n* Day 1: Arrive in Madrid, visit the Royal Palace, Prado Museum, and stroll through the beautiful Retiro Park.\n* Day 2: Discover the historic center, including the Plaza Mayor, Puerta del Sol, and the lively Malasaña neighborhood.\n* Day 3: Visit the stunning Almudena Cathedral and the nearby Sabatini Gardens. Enjoy a flamenco show in the evening.\n* Day 4: Relax in the pleasant Parque del Oeste or visit the famous Rastro Market.\n\n**Option 2: Discover Barcelona**\n\n* Day 1: Arrive in Barcelona, visit the iconic Sagrada Familia, Park Güell, and the Gothic Quarte