# Llama 3.1-8b  NIM Deployment Guide with AKS PVC Installation 

## Overview
This notebook demonstrates how to deploy the Llama 3.1 8B Instruct NIM (NVIDIA Inference Microservice) on Azure Kubernetes Service (AKS) with persistent storage using Azure Files for model weights caching.

## Prerequisites
- Access to at least 1 GPU (Example uses A100 80GB GPU)
- Access to a GPU-enabled Kubernetes cluster
- `kubectl` and `helm` CLI tools installed
- Access to GPU node pools
- NGC API key for accessing NVIDIA containers and models

## Setup Process

### 1. Initial Infrastructure Setup

### 2. Storage Configuration

### 3. Persistent Volume Setup

### 4. GPU Infrastructure

### 5. NIM Deployment Steps
- **Helm Chart Setup**
- **NIM Configuration**
- **Model Deployment**

### 6. Testing and Verification
- **Service Access**
- **Model Testing**

## 7. Cleanup

------

## Prerequisites

Please follow [prerequisites instruction](../../aks/prerequisites/README.md) to get ready for AKS creation.

- Access to at least 1 GPU (Example uses A100 80GB GPU)
- Access to a GPU-enabled Kubernetes cluster
- `kubectl` and `helm` CLI tools installed
- Access to GPU node pools
- NGC API key for accessing NVIDIA containers and models

------

### 1. Initial Infrastructure Setup
- Creates Azure resource group and AKS cluster
- Configures basic node pool with Standard_D4s_v3 VM size
- Sets up cluster credentials and context

#### Set NGC API Key

An API key is needed to pull resources from NGC, set in the [.env](.env) file and use dotenv package to load it

In [None]:
!pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv

# os.environ["NGC_API_KEY"] = "nvapi-xxxxx"

# Load environment variables from .env file
load_dotenv()

Specify the following parameters:

In [None]:

os.environ["REGION"] = "westeurope"
os.environ["RESOURCE_GROUP"] = "rg-az-akspvc"
os.environ["ZONE"] = "2"
os.environ["CPU_COUNT"] = "1"
os.environ["CLUSTER_NAME"] = "akspvc"
os.environ["STORAGE_ACCOUNT_NAME"] = "stgmodelweights"
os.environ["FILE_SHARE_NAME"] = "huggingface-models"

Create Azure Resource Group for this Lab

In [None]:
!  az group create -l $REGION -n $RESOURCE_GROUP


Create AKS Cluster

In [None]:
! az aks create -g  $RESOURCE_GROUP -n $CLUSTER_NAME --location $REGION --zones $ZONE --node-count $CPU_COUNT --enable-node-public-ip  --node-vm-size Standard_D4s_v3 --ssh-key-value ~/.ssh/id_rsa.pub

Get Credentials:

In [None]:
!az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME


check kubectl context is properly setup

In [None]:
!kubectl config get-contexts


You should see output like this:

```
CURRENT   NAME     CLUSTER   AUTHINFO                          NAMESPACE
*         akspvc   akspvc    clusterUser_rg-az-akspvc_akspvc   
```

------

### 2. Storage Configuration
- Creates Azure Storage Account and File Share
- Sets up 600GB persistent volume for Hugging Face models
- Configures storage access and network rules
- Creates Kubernetes secrets for storage credentials

Create a storage account (supports Azure Files)


In [None]:
!az storage account create \
  --resource-group $RESOURCE_GROUP \
  --name $STORAGE_ACCOUNT_NAME \
  --sku Standard_LRS \
  --kind StorageV2

In [None]:
%%bash
# Get the account key
ACCOUNT_KEY=$(az storage account keys list \
  --resource-group $RESOURCE_GROUP \
  --account-name $STORAGE_ACCOUNT_NAME \
  --query '[0].value' -o tsv)

echo $ACCOUNT_KEY

# Create the file share
az storage share create \
  --name $FILE_SHARE_NAME \
  --account-name $STORAGE_ACCOUNT_NAME \
  --account-key $ACCOUNT_KEY

First, let's update the storage account's network rules to allow access from the AKS cluster's virtual network:

In [None]:
!az storage account update --name $STORAGE_ACCOUNT_NAME  --resource-group $RESOURCE_GROUP --default-action Allow

Check azure portal , you should have aks cluster and azure blob storage account like this:
![](imgs/azureportal.png)

Create a NIM Namespace

In [None]:
!kubectl create namespace nim

Now create a secret in your AKS cluster with the storage credentials:


Create a new azure-secret with the correct storage account credentials
The secret should contain:
- azurestorageaccountname: The name of your storage account (stgmodelweights)
- azurestorageaccountkey: The access key for your storage account


In [None]:
%%bash

kubectl create secret generic azure-secret  \
 --from-literal=azurestorageaccountname=$STORAGE_ACCOUNT_NAME \
 --from-literal=azurestorageaccountkey=$(az storage account keys list --account-name $STORAGE_ACCOUNT_NAME --query '[0].value' -o tsv) \
 -n nim

create folder manifests

In [None]:
!mkdir -p manifests

------

### 3. Persistent Volume Setup
- Creates PersistentVolume (PV) and PersistentVolumeClaim (PVC)
- Configures ReadWriteMany access mode
- Implements storage class: azurefile
- Deploys debug pod to verify storage functionality

Create PersistentVolume and PVC :

✅ The PersistentVolume (PV): • Represents the actual Azure File share that you created manually (with the CLI). • You tell Kubernetes: “Here’s a real external volume (Azure File share) that I want to use. It exists, here’s its name, secret, access settings, etc.” • You link it to the file share name and storage account.

✅ The PersistentVolumeClaim (PVC): • This is what your pods use to request access to storage. • The PVC says: “I need a 100Gi volume that’s ReadOnlyMany and uses the azurefile storage class.” • Kubernetes will then bind this claim to the PV you defined, if it matches.

We will create a PVC to store the HF weights that will be re-used for NIMS , this way NIMs will not have to redowload the HF weights every time the cluster restarts.

In [None]:
%%bash
cat << EOF > manifests/azurefile-pv-pvc.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: hf-models-pv
  labels:
    volume: hf-models
spec:
  capacity:
    storage: 600Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: azurefile
  volumeMode: Filesystem
  azureFile:
    secretName: azure-secret
    shareName: huggingface-models
    readOnly: false

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hf-models-pvc
  namespace: nim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  volumeMode: Filesystem
  resources:
    requests:
      storage: 600Gi
  selector:
    matchLabels:
      volume: hf-models
EOF

apply the pv-pvc 

In [None]:
!kubectl apply -f manifests/azurefile-pv-pvc.yaml



create the pvc-debugger pod to test pvc 

In [None]:
%%bash
cat << EOF > manifests/debug-blob-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pvc-debugger
  namespace: nim
spec:
  containers:
  - name: debug
    image: ubuntu
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 30; done;"]
    volumeMounts:
    - name: hf-volume
      mountPath: /mnt/models
  volumes:
  - name: hf-volume
    persistentVolumeClaim:
      claimName: hf-models-pvc
  restartPolicy: Never
EOF

Deploys debug pod to verify storage functionality


In [None]:
!kubectl apply -f manifests/debug-blob-pod.yaml

verify pvc-debugger is running

In [None]:
!kubectl get pods -n nim

In [None]:
!kubectl describe pod  pvc-debugger -n nim

once it’s running, exec into the pod:
```
kubectl exec -n nim -it pvc-debugger -- bash
```
Inside the pod, check the mounted path:
```
root@pvc-debugger:/# df -h /mnt/models/
Filesystem                                                  Size  Used Avail Use% Mounted on
//stgmodelweights.file.core.windows.net/huggingface-models  100T     0  100T   0% /mnt/models

root@pvc-debugger:/# ls -la /mnt/models/
total 4
drwxrwxrwx 2 root root    0 May  6 18:36 .
drwxr-xr-x 1 root root 4096 May  6 18:36 ..

root@pvc-debugger:/# touch /mnt/models/testfile

root@pvc-debugger:/# ls -la /mnt/models/
total 4
drwxrwxrwx 2 root root    0 May  6 18:36 .
drwxr-xr-x 1 root root 4096 May  6 18:36 ..
-rwxrwxrwx 1 root root    0 May  6 18:46 testfile

exit
```

In [None]:
!kubectl exec -it pvc-debugger -n nim -- ls -la /mnt/models

In [None]:
!kubectl get pods -n nim    

------

### 4. GPU Infrastructure
- Adds GPU node pool with A100 GPU (standard_nc24ads_a100_v4)
- Installs NVIDIA GPU Operator via Helm
- Configures GPU drivers and container runtime

In [None]:
%%bash
az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $CLUSTER_NAME --name gpupool --node-count 1 --skip-gpu-driver-install --node-vm-size standard_nc24ads_a100_v4 --node-osdisk-size 2048 --max-pods 110

### Add NVIDIA Helm Repository

In [None]:
%%bash
# add nvidia helm repo
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --pass-credentials && helm repo update


### Install NVIDIA GPU Operator

With the helm repo added, we can install the NVIDIA GPU Operator. The configurations below are geared towards installing the operator in a scenario where drivers are installed on the node pools machines and will not be managed by the operator. In scenarios where the drivers are managed by the operator, please disreagrd the `--set` command fields.

In [None]:
!helm install --create-namespace --namespace gpu-operator nvidia/gpu-operator --wait --generate-name


### Verify Installation

In [None]:
!kubectl get pods -n gpu-operator

------

### 5. NIM Deployment Steps
- **Helm Chart Setup**
   - Fetches NIM LLM Helm chart from NGC
   - Creates necessary NGC secrets for pulling images
   - Sets up registry secrets for nvcr.io access

- **NIM Configuration**
   - Creates custom values file for Helm deployment
   - Configures model repository and version
   - Sets up volume mounts for model caching
   - Configures GPU resource limits

- **Model Deployment**
   - Installs Llama 3.1 8B Instruct model using Helm
   - Mounts PVC for model weight persistence
   - Configures environment variables for caching

Fetch the NIM LLM Helm chart
Now that we've configured the NGC API key, we can download the NIM LLM Helm chart from NGC using the following command:

In [None]:
!helm fetch https://helm.ngc.nvidia.com/nim/charts/nim-llm-1.7.0.tgz --username='$oauthtoken' --password=$NGC_CLI_API_KEY

In order to configure and launch an NVIDIA NIM, it is important to configure the secrets we’ll need to pull all the model artifacts directly from NGC. This can be done using your NGC API key:

In [None]:
%%bash
kubectl create secret docker-registry registry-secret --docker-server=nvcr.io --docker-username='$oauthtoken'     --docker-password=$NGC_CLI_API_KEY -n nim
kubectl create secret generic ngc-api --from-literal=NGC_API_KEY=$NGC_CLI_API_KEY -n nim

create nim_custom_value.yaml manifest

In [None]:
%%bash
# create nim_custom_value.yaml manifest
cat <<EOF > nim_custom_value.yaml
image:
  repository: "nvcr.io/nim/meta/llama-3.1-8b-instruct" # container location
  tag: 1.3.3 # NIM version you want to deploy

model:
  ngcAPISecret: ngc-api  # name of a secret in the cluster that includes a key named NGC_CLI_API_KEY and is an NGC API key

# Disable default persistence since we're using pre-created PV/PVC
persistence:
  enabled: false

imagePullSecrets:
  - name: registry-secret # name of a secret used to pull nvcr.io images

resources:
  limits:
    nvidia.com/gpu: 1

env:
  - name: NIM_CACHE_PATH
    value: /mnt/models


extraVolumeMounts:
  - name: hf-models
    mountPath: /mnt/models

extraVolumes:
  - name: hf-models
    persistentVolumeClaim:
      claimName: hf-models-pvc
EOF

Install llama-3.1-8b-instruct nim 

In [None]:
!helm install my-nim nim-llm-1.7.0.tgz -f nim_custom_value.yaml --namespace nim

In [None]:
!kubectl get pods -n nim

Describe the pod to check its status and events


In [None]:
!kubectl describe pod my-nim-nim-llm-0 -n nim

wait until the pod is up and running

In [None]:
!kubectl get pods -n nim

In [None]:
!kubectl logs my-nim-nim-llm-0 -n nim

In [None]:
!kubectl get pvc,pv -n nim

In [None]:
!kubectl get svc -n nim

When you deploy your nim it should , your azure blob file share should look like this:
![](imgs/azureblobstore.png)

In [2]:
#run this from the terminal not jupyter notebook
#!kubectl exec -it my-nim-nim-llm-0 -n nim -- /bin/bash

you can double check that the artifacts are stored in pvc like this:
```
(base) azeltov@azeltov-mlt nvingest-aks-timeslice % kubectl exec -it my-nim-nim-llm-0 -n nim -- /bin/bash

nim@my-nim-nim-llm-0:/$ ls /mnt/models/
huggingface/ local_cache/ ngc/         testfile     

nim@my-nim-nim-llm-0:/$ ls /mnt/models/ngc/hub/
models--nim--meta--llama-3_1-8b-instruct/ tmp/

nim@my-nim-nim-llm-0:/$ ls -al /mnt/models/ngc/hub/models--nim--meta--llama-3_1-8b-instruct/
total 0
drwxrwxrwx 2 root nim 0 May  6 18:59 .
drwxrwxrwx 2 root nim 0 May  6 18:59 ..
drwxrwxrwx 2 root nim 0 May  6 18:59 blobs
drwxrwxrwx 2 root nim 0 May  6 18:59 refs
drwxrwxrwx 2 root nim 0 May  6 18:59 snapshots

nim@my-nim-nim-llm-0:/$ ls -al /mnt/models/ngc/hub/models--nim--meta--llama-3_1-8b-instruct/snapshots/hf-8c22764-nim1.3b/
total 21
drwxrwxrwx 2 root nim  0 May  6 18:59 .
drwxrwxrwx 2 root nim  0 May  6 18:59 ..
lrwxrwxrwx 1 root nim 44 May  6 19:02 LICENSE.txt -> ../../blobs/3cd9c71fda5c30fd224140dfec0cd6f3
lrwxrwxrwx 1 root nim 44 May  6 19:02 NOTICE.txt -> ../../blobs/c67fa93728e8b46b192ff4f685802d5e
....
lrwxrwxrwx 1 root nim 44 May  6 19:01 tokenizer_config.json -> ../../blobs/523573f406014bef4ce6d8fec12d218c
lrwxrwxrwx 1 root nim 44 May  6 19:02 tool_use_config.json -> ../../blobs/f08779fe481535c7bac34e5534353ea1
```

In [None]:
# run in terminal, otherwise it will block
#kubectl port-forward svc/my-nim-nim-llm 8000:8000 -n nim

------

### 6. Testing and Verification
- **Service Access**
   - Sets up port forwarding to access the NIM service
   - Exposes service on port 8000


Testing NIM deployment

In [None]:
%%bash
curl -X 'POST' \
'http://localhost:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
    {
    "content": "You are a polite and respectful chatbot helping people plan a vacation.",
    "role": "system"
    },
    {
    "content": "What should I do for a 4 day vacation in Spain?",
    "role": "user"
    }
],
"model": "meta/llama-3.1-8b-instruct",
"max_tokens": 512,
"top_p": 1,
"n": 1,
"stream": false,
"frequency_penalty": 0.0
}'

In [None]:
!curl -v http://localhost:8000/v1/models

------------

## Cleanup


In [None]:
#!az aks stop --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
#!  az group delete --resource-group $RESOURCE_GROUP --yes 
