<div class="alert alert-block alert-success">
Let’s strive to create better notebooks for Blueprints. It doesn’t take much extra effort and it pays off greatly. This template provides NVIDIAian's standards so that we can all adopt better habits.  There are a few simple rules for writing effective notebooks:

- Name your notebooks intutively.  If notebooks need to be executed in certain order, use numbers within the title.
- Add clear yet concise explanations of what NVIDIA NIMs are used and what your code does, how it works, what are the most important results, and what conclusions were drawn.
- Use the markdown cells effectively to describe what each code cell is doing. It’s not just the code that speaks; the text around it that says why this is essential, what the results signify, or why a specific coding approach was taken.

Since there should a deployment notebook for every blueprint, this notebook serves as template for best practices.  Please make a copy of this notebook and modify content within the predetermined section headings as appropriate.    


# Introduction

> Describe what is achieved within notebook. This should be very brief but provide enough context to blueprint goal.
> 
This notebook will deploy the AI virtual assistant for customer service NIM Agent Blueprint.  You will install the neccessary prerequisities, spin up the NVIDIA NeMo Retriever™ and NVIDIA NIM™ microservices on a single node, and download sample data.  Once deployed, you will have a fully functional reference UI as well as sample code which you can personalize Q&A responses based on structured and unstructured data, such as order history and product details. 


# Getting Started
>[Prerequisites](#Prerequisites)  
>[Spin Up Blueprint](#Spin-Up-Blueprint)  
>[Download Sample Data](#Download-Sample-Data)  
>[Validate Deployment](#Validate-Deployment)  
>[API Reference](#API-Reference)  
>[Next Steps](#Next-Steps)  
>[Shutting Down Blueprint](#Stopping-Services-and-Cleaning-Up)  
>[Appendix](#Appendix)  
________________________


## Prerequisites

### Clone repository and install software

1. **Clone** <name> Git repository

In [None]:
!git clone ssh://git@gitlab-master.nvidia.com:12051/chat-labs/OpenSource/ai-virtual-assistant.git

2. Install **[Docker](https://docs.docker.com/engine/install/ubuntu/)**

<div class="alert alert-block alert-success">
    <b>Tip:</b> Ensure the Docker Compose plugin version is 2.29.1 or higher.  Run docker compose version to confirm. Refer to Install the Compose plugin Docker documentation for more information.

3. Install **[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit)** to configure Docker for GPU-accelerated containers, for example Milvus, NVIDIA NIM.
 If you are using a system deployed with Brev you can skip this step since Brev systems come with NVIDIA Container Toolkit preinstalled. 



<div class="alert alert-block alert-info">
    <b>Note:</b> After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation.

<div class="alert alert-block alert-success">
    <b>Tip:</b> Step 3 is considered optional since by default the blueprint uses the NVIDIA API Catalog hosted NIM API endpoints for LLM, embedding and reranking models.  But once you familiarize yourself with the blueprint, you will most likely want to deploy with NIMs on-prem so you can customize based upon your use case.  

### Get a API Keys

#### Let's start by logging into the NVIDIA Container Registry. 
 
The NVIDIA NGC API Key is a mandatory key that is required to use this blueprint. This is needed to log into the NVIDIA container registry, nvcr.io, and to pull secure container images used in this NVIDIA NIM Blueprint.
Refer to [Generating NGC API Keys](https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html#generating-api-key) in the NVIDIA NGC User Guide for more information.



Authenticate with the NVIDIA Container Registry with the following command:

In [None]:
!docker login nvcr.io

<div class="alert alert-block alert-info">
    <b>Note:</b> Use oauthtoken as the username and your API key as the password. The $oauthtoken username is a special name that indicates that you will authenticate with an API key and not a user name and password.After installing the toolkit, follow the instructions in the Configure Docker section in the NVIDIA Container Toolkit documentation. 

#### Next, let's set the NVIDIA API Catalog key. 

This NVIDIA API Catalog key will be used to access cloud hosted models in API Catalog.

You can use different model API endpoints with the same API key.

1. Navigate to **[NVIDIA API Catalog](https://build.nvidia.com/explore/discover)**.

2. Select a model, such as llama3-8b-instruct.
   

3. Select an **Input** option. The following example is of a model that offers a Docker option. Not all of the models offer this option, but all include a “Get API Key” link

<img src="https://docscontent.nvidia.com/dims4/default/d6307a8/2147483647/strip/true/crop/1920x919+0+0/resize/2880x1378!/format/webp/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F00000192-bfa6-da2c-a1f2-ffbf41aa0000%2Fnim%2Flarge-language-models%2Flatest%2F_images%2Fbuild_docker_tab.png" />

3. Click **Get API Key**.

<img src="https://docscontent.nvidia.com/dims4/default/c6e2096/2147483647/strip/true/crop/1920x919+0+0/resize/2880x1378!/format/webp/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F00000192-bfa6-da2c-a1f2-ffbf41aa0000%2Fnim%2Flarge-language-models%2Flatest%2F_images%2Fbuild_get_api_key.png" />

4. Select **"Generate Key"**

<img src="https://docscontent.nvidia.com/dims4/default/e7c4057/2147483647/strip/true/crop/1920x919+0+0/resize/2880x1378!/format/webp/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F00000192-bfa6-da2c-a1f2-ffbf41aa0000%2Fnim%2Flarge-language-models%2Flatest%2F_images%2Fbuild_generate_key.png" />

5. **Copy your key** and store it in a secure place. Do not share it.

<img src="https://docscontent.nvidia.com/dims4/default/4b0710a/2147483647/strip/true/crop/1920x919+0+0/resize/2880x1378!/format/webp/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F00000192-bfa6-da2c-a1f2-ffbf41aa0000%2Fnim%2Flarge-language-models%2Flatest%2F_images%2Fbuild_copy_key.png" />

<div class="alert alert-block alert-success">
    <b>Tip:</b> The key begins with the letters nvapi-.

6. Export the API Key as an environment variable

In [None]:
import getpass
import os
if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key

## Spin Up Blueprint
Docker compose scripts are provided which spin up the microservices on a single node.  This docker-compose yaml file will start the agents as well as dependant microservices.  This may take up to **15 minutes** to complete.


<div class="alert alert-block alert-success">
    <b>Tip:</b> Refer to the deploy/compose/docker-compose.yaml for complete details.

In [None]:
!docker compose -f ai-virtual-assistant-main/deploy/compose/docker-compose.yaml up -d --build

<div class="alert alert-block alert-success">
    <b>Tip:</b> If you would like to monitor progress, refer to https://docs.docker.com/reference/cli/docker/compose/logs/.

<div class="alert alert-block alert-info">
    <b>Note:</b> By default, the blueprint uses the NVIDIA API Catalog hosted endpoints for LLM, embedding and reranking models.

To validate the deployment of the blueprint, execute the following command to ensure the container are running.

In [None]:
!docker ps --format "table {{{{.ID}}}}\t{{{{.Names}}}}\t{{{{.Status}}}}"

This command should produce similiar output in the following format:

<div class="alert alert-block alert-info">
    <b>Note:</b> The Nemo microservices are not listed since hosted endpoints are being used for LLM, embedding and reranking models.  Once you familiarize yourself with the blueprint and you want to deploy these NIM microservices locally, refer to the Appendix.

## Download Sample Data
This blueprint comes with synthetic sample data representing a typical customer service function, including customer profiles, order histories (structured data). Next you will download technical product manuals (unstructured data) from the internet into data/manuals_pdf folder.

In [None]:
# Run this script to download the manuals listed in the specified txt file
! ai-virtual-assistant-main/data/download.sh ai-virtual-assistant-main/data/list_manuals.txt 

Verify the manuals have been downloaded.

In [None]:
! ls ai-virtual-assistant-main/data/manuals_pdf

## Data Ingestion

Go to the notebooks folder inside the repo and run through the `ingest_data.ipynb` notebook.

This notebook does the following: 
1. Uploades Unstructured Data (PDF) to Milvus DB. These are the PDFs we downloaded above which contain product information. 
2. Uploads Structured Data (CSV) Ingestion to Postgres DB. These CSV files contain information about the gear store (e.g. product names, category, prices) and previous orders (e.g. order ID, order date, return status).

## Validate Deployment
The blueprint includes a reference UI and an AI assistant (developed using the LangGraph framework) that leverages sub-agents to handle queries from both structured and unstructured data sources.  Let's make sure the API endpoint and UI is up and running.

1. Create a new session using the create_session API at `http://<HOST-IP>:8081/docs#/default/create_session_create_session_get`

<div class="alert alert-block alert-info">
    <b>Note:</b> If you are using an environment deployed with Brev, make sure to expose the port 8081 on your Brev console. A HTTP URL will be generated for each public port, so open the link and append `/docs#/default/create_session_create_session_get` after the port number.
    

2. To test queries, visit the UI at `http://<HOST-IP>:8090`

Ensure you specify user_id and session_id in their respective fields. Use session_id from create_session response, and user_id from order.csv

<div class="alert alert-block alert-info">
    <b>Note:</b> Again, if you are using an environment deployed with Brev, make sure to expose port 8090 on your Brev console and use the HTTP URL created. 

3. After testing queries, end the session at `http://<HOST-IP>:8081/docs#/default/end_session_end_session_get`

4. Explore the analytics server API at `http://<HOST-IP>:8082/docs#/`

This server offers three APIs

- `/sessions` - Lists all sessions from the last k hours
- `/session/summary` - Provides summary and sentiment analysis for a given session's conversation
- `/session/conversation`x
-  - Offers sentiment analysis for individual queries and responses

## API Reference

For detailed API references, please refer to the following locations in the Blueprint repository:
- Summary & Conversation APIs:
`./docs/api_references/analytics_server.json`

- Generate API:
`./docs/api_references/agent_server.json`


## Next Steps
Go to the Synthetic Data Generation notebook. This notebook demonstrates how to use the nemotron-4-340b-instruct model for synthetic data generation that is used in this blueprint. It uses the nvidia gear store data as a source of product data. Then, it then creates a sample customer set and a realistic order history based on the nvidia gear store data.
You can follow a similar process to create your own data, which you can then upload to the knowledge base in the Data Ingestion notebook.

## Stopping Services and Cleaning Up

To shut down the microservices, run the following command

In [None]:
! docker compose -f ai-virtual-assistant-release-1.0.0/deploy/compose/docker-compose.yaml down

## Appendix 

### Deploy NIM microservices locally

In [None]:
# Create model directory to download model from NGC
!mkdir -p ~/.cache/models
!export MODEL_DIRECTORY=~/.cache/models/

# export you ngc api key, note it's not nvidia_api_key from build.nvidia.com
!export NGC_API_KEY=<ngc-api-key>
!export USERID="$(id -u):$(id -g)"

# Export path where NIMs are hosted
# LLM server path
!export APP_LLM_SERVERURL=nemollm-inference:8000
# Embedding server path
!export APP_EMBEDDINGS_SERVERURL=nemollm-embedding:8000
# Re-ranking model path
!export APP_RANKING_SERVERURL=ranking-ms:8000

!docker compose -f deploy/compose/docker-compose.yaml --profile local-nim up -d --build

To validate, execute the following command to ensure the container are running.

In [None]:
! docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

This command should produce similiar output in the following format:

<div class="alert alert-block alert-info">
    <b>Note:</b> By default, GPU IDs 0-3 are for LLM, 4 for the embedding model, and 5 for the reranking model.
    
>To change the GPUs used for NIM deployment, set the following environment variables:

>>**LLM_MS_GPU_ID**: Update this to specify the LLM GPU IDs (e.g., 0,1,2,3).

>>**EMBEDDING_MS_GPU_ID**: Change this to set the embedding GPU ID.
>>
>>**RANKING_MS_GPU_ID**: Modify this to adjust the reranking LLM GPU ID.