# AI Virtual Assistant for Customer Service

## Content

* [Overview](#Overview)
* [Software Components](#Software-components)
* [Key Functionality](#Key-functionality)
* [How it Works](#How-it-works)
* [Key Components](#Key-Components)
* [Prerequisites](#Prerequisites)
* [Deployment - hands on starts here](#Deployment)
* [Getting API Keys](#Getting-API-keys)
* [Docker Compose Check](#Docker-compose-check)
* [Clone the Repository & Set Up Environment](#Clone-the-Repository-&-Set-Up-Environment)
* [Build the Docker Containers](#Build-the-Docker-containers)
* [Ingest Data](#Ingest-Data)
* [Exposing the Interface for Testing](#Exposing-the-Interface-for-Testing)


## Overview

This Blueprint is showcasing an AI virtual assistant with NVIDIA NIM microservices (https://build.nvidia.com/nim)

This blueprint is a reference solution for a text based virtual assistant. Companies are eager to enhance their customer service operations by integrating knowledge bases into AI assistants. Traditional approaches often fall short in delivering a combination of context-aware, secure, and real-time responses to complex customer queries. This results in longer resolution times, limited customer satisfaction, and potential data exposure risks. A centralized knowledge base that integrates seamlessly with internal applications and call center tools is vital to improving customer experience while ensuring data governance. The AI virtual assistant for customer service NVIDIA AI Blueprint, powered by NVIDIA NeMo Retriever™ and NVIDIA NIM™ microservices, along with retrieval-augmented generation (RAG), offers a streamlined solution for enhancing customer support. It implements context-aware, multi-turn conversations that feature general and personalized Q&A responses based on structured and unstructured data, such as order history and product details.

This notebook will provide you with insights to the key components and walk you through its deployment and architecture in a step-by-step fashion. Note that this walk through is specific for the Docker Compose deployment. If you visit the [code repository](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant), you will find additional information and other forms of deployment instructions (e.g. Helm chart deployment).

## Software Components

- NVIDIA NIM microservices
    - Response Generation (Inference)
        - NIM of meta/llama-3.1-70b-instruct
        - NIM of nvidia/nv-embedqa-e5-v5
        - NIM of nvidia/rerank-qa-mistral-4b
    - Synthetic Data Generation for reference
        - NIM of Nemotron4-340B
- Orchestrator Agent - LangGraph based
- Text Retrievers - LangChain
- Structured Data (CSV) Ingestion - Postgres Database
- Unstructured Data (PDF) Ingestion - Milvus Database (Vector GPU-optimized)

Docker Compose scripts are provided which spin up the microservices on a single node. When ready for a larger-scale deployment, you can use the included Helm charts to spin up the necessary microservices. You will use sample Jupyter notebooks with the JupyterLab service to interact with the code directly.

The Blueprint contains sample use-case data pertaining to retail product catalog and customer data with purchase history but Developers can build upon this blueprint, by customizing the RAG application to their specific use case. A sample customer service agent user interface and API-based analytic server for conversation summary and sentiment are also included.

## Key Functionality

- Personalized Responses: Handles structured and unstructured customer queries (e.g., order details, spending history).
- Multi-Turn Dialogue: Offers context-aware, seamless interactions across multiple questions.
- Custom Conversation Style: Adapts text responses to reflect corporate branding and tone.
- Sentiment Analysis: Analyzes real-time customer interactions to gauge sentiment and adjust responses.
- Multi-Session Support: Allows for multiple user sessions with conversation history and summaries.
- Data Privacy: Integrates with on-premises or cloud-hosted knowledge bases to protect sensitive data.

By integrating NVIDIA NIM and RAG, the system empowers developers to build customer support solutions that can provide faster and more accurate support while maintaining data privacy.

## How it works

This blueprint uses a combination of retrieval-augmented generation and large language models to deliver an intelligent, context-aware virtual assistant for customer service. It connects to both structured data (like customer profiles and order histories) and unstructured data (like product manuals, FAQs) so that it can find and present relevant information in real time.

The process works as follows:

- User Query: The customer asks a question in natural language.
- Data Retrieval: The system retrieves relevant data—such as support documents or order details—by embedding and searching through internal databases, product manuals, and FAQs.
- Contextual Reasoning: A large language model uses these retrieved details to generate a helpful, coherent, and contextually appropriate response.
- Additional Capabilities: Tools like sentiment analysis gauge the user’s satisfaction and conversation summaries help supervisors quickly review interactions.
- Continuous Improvement: Feedback from interactions is fed back into the system, refining the model’s accuracy and efficiency over time. The end result is a virtual assistant that can understand complex questions, find the right information, and provide personalized, human-like responses.

### Key Components

The detailed architecture consists of the following components:

**Sample Data** The blueprint comes with synthetic sample data representing a typical customer service function, including customer profiles, order histories (structured data), and technical product manuals (unstructured data). A notebook is provided to guide users on how to ingest both structured and unstructured data efficiently.

Structured Data: Includes customer profiles and order history Unstructured Data: Ingests product manuals, product catalogs, and FAQs

**AI Agent** This reference solution implements three sub-agents using the open-source LangGraph framework. These sub-agents address common customer service tasks for the included sample dataset. They rely on the Llama 3.1 model 70B and NVIDIA NIM microservices for generating responses, converting natural language into SQL queries, and assessing the sentiment of the conversation.

**Structured Data Retriever** Works in tandem with a Postgres database and Vanna.AI to fetch relevant data based on user queries.

**Unstructured Data Retriever** Processes unstructured data (e.g., PDFs, FAQs) by chunking it, creating embeddings using the NeMo Retriever embedding NIM, and storing it in Milvus for fast retrieval.

**Analytics and Admin Operations** To support operational requirements, the blueprint includes reference code for managing key administrative tasks:

- Storing conversation histories
- Generating conversation summaries
- Conducting sentiment analysis on customer interactions These features ensure that customer service teams can efficiently monitor and evaluate interactions for quality and performance.

**Data Flywheel** The blueprint includes a robust set of APIs, some of which are explicitly designed for feedback collection (identified by 'feedback' in their URLs). These APIs support the process of gathering data for continuous model improvement, forming a feedback loop or 'data flywheel.' While this process enables refinement of the model's performance over time to improve accuracy and cost-effectiveness, it is important to note that they do not directly perform the model fine-tuning itself.

**Summary** In summary, this NVIDIA AI Blueprint offers a comprehensive solution for building intelligent, generative AI-powered virtual assistants for customer service, leveraging structured and unstructured data to deliver personalized and efficient support. It includes all necessary tools and guidance to deploy, monitor, and continually improve the solution in real-world environments.

![Blueprint Diagram](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/IVA-blueprint-diagram-r5.png)

## Prerequisites

### Docker compose

#### System requirements

Ubuntu 20.04 or 22.04 based machine, with sudo privileges

Install software requirements
- Install Docker Engine and Docker Compose. Refer to the instructions for Ubuntu.
- Ensure the Docker Compose plugin version is 2.29.1 or higher.
- Run docker compose version to confirm.
- Refer to Install the Compose plugin in the Docker documentation for more information.
- To configure Docker for GPU-accelerated containers, install the NVIDIA Container Toolkit.
- Install git

By default the provided configurations use GPU optimized databases such as Milvus.


# Deployment

## Install required modules

Restart the kernel after running following cell.

In [None]:
!pip install openai

## Apply for an NVIDIA NGC Account
To use the NVIDIA AI Foundational Models, you will need to apply for an account on [NVIDIA NGC](https://ngc.nvidia.com). Nvidia GPU cloud (NGC) is a GPU-accelerated cloud platform portal optimized for deep learning and scientific computing.


---

### Steps
1. Go to the [NVIDIA NGC page](https://ngc.nvidia.com)
2. If not logged in, click the "Welcome Guest" icon in the top right corner of the page and select "Sign In / Sign Up"
3. Login with your NVIDIA email
4. Select NV-Developer as your NVIDIA Cloud Account and agree to the terms & conditions
5. Click on the profile icon at the top-right and select Setup
6. Click Generate Personal Key.
7. In the popup window that appears:
  * Enter a name for your personal key
  * Select a expiration date of 12 months
  * Select "Cloud Functions" under the "Services Included" field
8. Copy and save this API key. __IMPORTANT__ Keep this somewhere safe because it will only display it once! If you lose your key, you'll have to create a new one.

In [None]:
import getpass
import os

def set_ngc_api_key():
    """Prompt for and set the NVIDIA API key if not already valid."""
    while True:
        key = getpass.getpass("Enter your NVIDIA API key: ")
        if key.startswith("nvapi-"):
            break
        print("Invalid API key. Please try again.")

    return key

# Check if the key is already set and valid
NVIDIA_API_KEY = os.environ.get("NVIDIA_API_KEY", "")

if not NVIDIA_API_KEY.startswith("nvapi-"):
    print("NVIDIA API Key is missing or invalid.")
    NVIDIA_API_KEY = set_ngc_api_key()
else:
    print("NVIDIA API Key is already set.")
    if input("Would you like to enter a different key? (yes/no): ").strip().lower() in ["yes", "y"]:
        NVIDIA_API_KEY = set_ngc_api_key()

# Set all related environment variables
for var in ["NVIDIA_API_KEY", "NGC_CLI_API_KEY", "NGC_API_KEY"]:
    os.environ[var] = NVIDIA_API_KEY

print("API keys configured successfully.")

In [None]:
from openai import OpenAI

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = f"{NVIDIA_API_KEY}"
)

completion = client.chat.completions.create(
  model="nvdev/meta/llama-3.3-70b-instruct",
  messages=[{"role":"user","content":"Write a limerick about the wonders of GPU computing."}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

## Docker Compose check
Ensure the Docker Compose plugin version is 2.29.1 or higher.

In [None]:
# Check certain versions and packages installed
!docker compose version

## Clone the Repository & Set Up Environment

In [None]:
#  Clone the Repository
!git clone https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant

The purpose of this code snippet below is to ensure that the notebook is operating within a directory named "ai-virtual-assistant". If it's not, it changes to that directory.

In [None]:
import os

current_path = os.getcwd()
last_part = os.path.basename(current_path)

if os.path.basename(os.getcwd()) != "ai-virtual-assistant":
    os.chdir("ai-virtual-assistant")

os.getcwd()

We login into the NGC catalogue.

In [None]:
!docker login nvcr.io -u '$oauthtoken' -p $NGC_API_KEY

Let's update our `docker-compose.yaml` to use our [internal NVIDIA inference endpoints](https://nvidia.sharepoint.com/sites/nvbuild/SitePages/Endpoints-for-Internal-Development-Use.aspx).

NVDev internal endpoints are replicas of the public API endpoints on build.nvidia.com and adhere to the exact same OpenAPI specification and maintain compatibility with all the same libraries and interfaces.

To save time, we'll download a preconfigured version directly from GitHub.

In [None]:
# Download modified Docker compose from GitHub
!wget -O ./deploy/compose/docker-compose.yaml https://raw.githubusercontent.com/adaveinthelife/nvidian_blueprints/main/docker_compose_files/ai_virtual_assistant/docker_compose.yaml

## Build the Docker containers

We are launching the containers by using the following command:

In [None]:
import subprocess

compose_file = "deploy/compose/docker-compose.yaml"
cmd = ["docker", "compose", "-f", compose_file, "up", "-d"]

print("🚀 Starting containers...")

process = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

if process.returncode == 0:
    print("✅ Docker containers started successfully.")
else:
    print("❌ Failed to start containers.")
    print("stderr:\n", process.stderr)

In [None]:
%%bash
## Ensure the containers are spun up and look healthy
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

## Download data

Download the manuals into data/manuals_pdf folder Run this script to download the manuals listed in the specified txt file

In [None]:
!./data/download.sh ./data/list_manuals.txt

## Ingest data

Now it’s time to ingest our data. We'll use the Unstructured Data Ingestion APIs to extract, process, and prepare content the data we downloaded above. First, let’s set the ports for our Unstructured service and our localhost...

In [None]:
IPADDRESS = "172.17.0.1"
UNSTRUCTURED_DATA_PORT = "9093"

### PDF Document Ingestion

Before ingesting PDF documents, we’ll verify that the Unstructured Data Ingestion service is up and running by sending a health check request to its `/health` endpoint.

In [None]:
import requests

url = f'http://{IPADDRESS}:{UNSTRUCTURED_DATA_PORT}/health'
print(url)
headers = {
    'accept': 'application/json'
}

response = requests.get(url, headers=headers)

# Print the response
print(response.status_code)
print(response.json())

#### Ingest PDFs

In this step, we’ll upload all the downloaded PDF manuals from the `manuals_pdf directory`, along with any additional individual PDFs (like FAQ.pdf), to the Unstructured Data Ingestion API. Each file is sent via a POST request to the `/documents` endpoint, where it’s processed and prepared for downstream use.

In [None]:
import requests
import os

# Base URL of the API endpoint
url = f'http://{IPADDRESS}:{UNSTRUCTURED_DATA_PORT}/documents'

# List of PDF file paths to upload (includes directory and individual files)
pdf_files = []

# Add all PDFs from the manuals directory
manuals_dir = './data/manuals_pdf'
pdf_files.extend([
    os.path.join(manuals_dir, f)
    for f in os.listdir(manuals_dir)
    if f.endswith('.pdf')
])

# Add specific individual files
pdf_files.append('./data/FAQ.pdf')

# Upload each PDF
for file_path in pdf_files:
    with open(file_path, 'rb') as file:
        files = {'file': file}
        response = requests.post(url, files=files)
    
    # Print the response from the server
    print(f'Uploaded {file_path}: {response.status_code}')
    print(response.json())


#### Retrieve Uploaded Document List
To verify which PDFs have been successfully uploaded, we send a GET request to the `/documents` endpoint. If the request is successful, the response will include a list of available documents, which is then formatted and displayed for review.

In [None]:
import requests

# URL of the API endpoint
url = f'http://{IPADDRESS}:{UNSTRUCTURED_DATA_PORT}/documents'

# Send the GET request
response = requests.get(url)

# Print the response from the server
print(f'Response Status Code: {response.status_code}')

# Check if the request was successful
if response.status_code == 200:
    data = response.json()
    documents = data.get('documents', [])

    # Format and print the list of documents
    print("Available Documents:")
    for idx, document in enumerate(documents, start=1):
        print(f"{idx}. {document}")
else:
    print(f"Failed to retrieve documents. Status Code: {response.status_code}")

### CSV Document Ingestion
The Unstructured Data Ingestion API supports `.txt` files, but not CSVs directly. To work around this, we’ll first load the CSV data and then convert each row into a separate `.txt` file for ingestion.

#### Displaying the CSV data
We’ll begin by loading the `gear-store.csv` file into a DataFrame and displaying the first few rows to get a quick look at the structure. We’ll also print the total number of entries to understand the dataset size before converting it for ingestion.

In [None]:
%%capture output
! pip install pandas
! pip install psycopg2-binary

In [None]:
import pandas as pd
from IPython.display import display

# Read and display the CSV data
df = pd.read_csv('./data/gear-store.csv')
display(df.head())

# Show total number of rows
print(f"Total rows: {len(df)}")

#### Converting and Ingesting CSV Data
Now that we've reviewed the CSV content, we'll convert each row of the product DataFrame into a `.txt` file containing the item’s name, category, price, and description. Filenames are cleaned and formatted for consistency, and all files are saved to `./data/product` for us to use in the next step in the processing pipeline.

In [None]:
import os
import re

# Function to create a valid filename
def create_valid_filename(s):
    # Remove invalid characters and replace spaces with underscores
    s = re.sub(r'[^\w\-_\. ]', '', s)
    return s.replace(' ', '_')

# Create the directory if it doesn't exist
os.makedirs('./data/product', exist_ok=True)

# Iterate through each row in the DataFrame
for index, row in df.iterrows():
    # Create filename using name, category, and subcategory
    filename = f"{create_valid_filename(row['name'])}_{create_valid_filename(row['category'])}_{create_valid_filename(row['subcategory'])}.txt"

    print(f"Creating file {filename}, current index {index}")
    # Full path for the file
    filepath = os.path.join('./data/product', filename)

    # Create the content for the file
    content = f"Name: {row['name']}\n"
    content += f"Category: {row['category']}\n"
    content += f"Subcategory: {row['subcategory']}\n"
    content += f"Price: ${row['price']}\n"
    content += f"Description: {row['description']}\n"

    # Write the content to the file
    with open(filepath, 'w', encoding='utf-8') as file:
        file.write(content)

print(f"Created {len(df)} files in ./data/product")

With the product text files prepared, this function uploads each file to the document ingestion API used by the virtual assistant. It includes a retry mechanism with exponential backoff to handle temporary issues like rate limits or server errors, ensuring reliable ingestion into the assistant’s retrieval system.

In [None]:
import requests
import os
import time

# Retry configuration (Added due to rate limits for API Catalog embedding model )
MAX_RETRIES = 5
INITIAL_BACKOFF = 1  # Initial backoff in seconds

def ingest_file(filepath: str) -> bool:
    """
    Ingest file in canonical RAG retriever with retry mechanism

    Args:
        filepath: Path to the file to be ingested in retriever

    Returns:
        bool: Status of file ingestion
    """
    # URL of the API endpoint
    url = f'http://{IPADDRESS}:{UNSTRUCTURED_DATA_PORT}/documents'
    retries = 0
    backoff = INITIAL_BACKOFF

    while retries <= MAX_RETRIES:
        with open(filepath, 'rb') as file:
            files = {'file': file}
            try:
                response = requests.post(url, files=files)

                if response.status_code == 200:
                    return True
                elif response.status_code != 200:  # Handle Too Many Requests error
                    if retries < MAX_RETRIES:
                        print(f"Internal Server error for {os.path.basename(filepath)}. Retrying after {backoff}s...")
                        time.sleep(backoff)
                        backoff *= 2  # Exponential backoff
                        retries += 1
                    else:
                        print(f"Max retries reached for {os.path.basename(filepath)}. Giving up.")
                        return False

            except requests.exceptions.RequestException as e:
                print(f"Request failed for {os.path.basename(filepath)}: {e}")
                return False

    return False

To speed up the ingestion process, this block uses multithreading to upload product files in parallel. Each file is submitted to the ingestion API using a thread pool, and results are tracked to report how many files were successfully ingested versus those that failed.

In [None]:
from concurrent.futures import ThreadPoolExecutor, as_completed

directory_path = './data/product'
max_workers = 5  # Adjust this based on your system's capabilities and API limits

filepaths = [os.path.join(directory_path, filename) for filename in os.listdir(directory_path) if filename.endswith(".txt")]
filepaths

successfully_ingested = []
failed_ingestion = []

with ThreadPoolExecutor(max_workers=max_workers) as executor:
    future_to_file = {executor.submit(ingest_file, filepath): filepath for filepath in filepaths}

    for future in as_completed(future_to_file):
        filepath = future_to_file[future]
        try:
            if future.result():
                print(f"Successfully Ingested {os.path.basename(filepath)}")
                successfully_ingested.append(filepath)
            else:
                print(f"Failed to Ingest {os.path.basename(filepath)}")
                failed_ingestion.append(filepath)
        except Exception as e:
            print(f"Exception occurred while ingesting {os.path.basename(filepath)}: {e}")
            # traceback.print_exc()
            failed_ingestion.append(filepath)

print(f"Total files successfully ingested: {len(successfully_ingested)}")
print(f"Total files failed ingestion: {len(failed_ingestion)}")

#### Import Customer Order Data into PostgreSQL
To enable structured querying and analysis of customer interactions, we’ll import a CSV file containing order data into a PostgreSQL database. This script creates a `customer_data` table (if it doesn’t already exist), parses and cleans each row of the CSV, and inserts the data into the table.

The code handles timestamp parsing, removes special characters like ® and ™ from product fields, and gracefully manages optional return-related date fields. Once complete, the data will be stored in a structured format and ready for use in later stages of the workflow.

In [None]:
import csv
import re
import psycopg2
from datetime import datetime

# Database connection parameters
db_params = {
    'dbname': 'customer_data',
    'user': 'postgres',
    'password': 'password',
    'host': IPADDRESS,  # e.g., 'localhost' or the IP address
    'port': '5432'   # e.g., '5432'
}

# CSV file path
csv_file_path = './data/orders.csv'

# Connect to the database
conn = psycopg2.connect(**db_params)
cur = conn.cursor()

# Create the table if it doesn't exist
create_table_query = '''
CREATE TABLE IF NOT EXISTS customer_data (
    customer_id INTEGER NOT NULL,
    order_id INTEGER NOT NULL,
    product_name VARCHAR(255) NOT NULL,
    product_description VARCHAR NOT NULL,
    order_date DATE NOT NULL,
    quantity INTEGER NOT NULL,
    order_amount DECIMAL(10, 2) NOT NULL,
    order_status VARCHAR(50),
    return_status VARCHAR(50),
    return_start_date DATE,
    return_received_date DATE,
    return_completed_date DATE,
    return_reason VARCHAR(255),
    notes TEXT,
    PRIMARY KEY (customer_id, order_id)
);
'''
cur.execute(create_table_query)

# Open the CSV file and insert data
with open(csv_file_path, 'r') as f:
    reader = csv.reader(f)
    next(reader)  # Skip the header row

    for row in reader:
        # Access columns by index as per the provided structure
        order_id = int(row[1])  # OrderID
        customer_id = int(row[0])  # CID (Customer ID)

        # Correcting the order date to include time
        order_date = datetime.strptime(row[4], "%Y-%m-%dT%H:%M:%S")  # OrderDate with time

        quantity = int(row[5])  # Quantity

        # Handle optional date fields with time parsing
        return_start_date = datetime.strptime(row[9], "%Y-%m-%dT%H:%M:%S") if row[9] else None  # ReturnStartDate
        return_received_date = datetime.strptime(row[10],"%Y-%m-%dT%H:%M:%S") if row[10] else None  # ReturnReceivedDate
        return_completed_date = datetime.strptime(row[11], "%Y-%m-%dT%H:%M:%S") if row[11] else None  # ReturnCompletedDate

        # Clean product name
        product_name = re.sub(r'[®™]', '', row[2])  # ProductName

        product_description = re.sub(r'[®™]', '', row[3])
        # OrderAmount as float
        order_amount = float(row[6].replace(',', ''))

        # Insert data into the database
        cur.execute(
            '''
            INSERT INTO customer_data (
                customer_id, order_id, product_name, product_description, order_date, quantity, order_amount,
                order_status, return_status, return_start_date, return_received_date,
                return_completed_date, return_reason, notes
            ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
            ''',
            (customer_id, order_id, product_name, product_description, order_date, quantity, order_amount,
             row[7],  # OrderStatus
             row[8],  # ReturnStatus
             return_start_date, return_received_date, return_completed_date,
             row[12],  # ReturnReason
             row[13])  # Notes
        )

# Commit the changes and close the connection
conn.commit()
cur.close()
conn.close()

print("CSV Data imported successfully!")

#### Preview Customer Order Records
After importing the CSV data, we can verify the contents of the customer_data table by querying the first few rows. This script connects to the PostgreSQL database, runs a simple SELECT query, and prints out the first five entries along with their column names. It’s a quick way to confirm that the data was ingested and structured correctly.

In [None]:
import psycopg2

# Database connection parameters
db_params = {
    'dbname': 'customer_data',
    'user': 'postgres',
    'password': 'password',
    'host': IPADDRESS,  # e.g., 'localhost' or the IP address
    'port': '5432'   # e.g., '5432'
}

# Connect to the database
conn = psycopg2.connect(**db_params)
cur = conn.cursor()

# Query to select the first 5 rows from the customer_data table
query = 'SELECT * FROM customer_data LIMIT 5;'

# Execute the query
cur.execute(query)

# Fetch the column headers
colnames = [desc[0] for desc in cur.description]

# Fetch the first 5 rows
rows = cur.fetchall()

# Print the headers and the corresponding rows
for i, row in enumerate(rows, start=1):
    print(f"\nRow {i}:")
    for header, value in zip(colnames, row):
        print(f"{header}: {value}")

# Close the connection
cur.close()
conn.close()

At this point, we’ve successfully ingested all of our data — including PDF manuals, CSV product records, and structured customer orders — into the system. Everything is now prepared and indexed for use in our RAG pipeline. Next, we’ll shift to testing the deployment using the Blueprint’s built-in UI so you can interact with your virtual assistant and validate that the data retrieval and response generation are working as expected.

## Exposing the Interface for Testing

The Blueprint comes equiped with a basic UI for testing the deployment. This interface is served at port 3001. In order to expose the port and try out the interaction, you need to follow the steps below.

First, navigate back to the created Launchable instance page and click on the Access menu.


![Access Menu](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-cli-install.png)


Scroll down until you find "Using Tunnels" section and click on Share a Service button.


![Using Tunnels](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-tunnels.png)


Enter the port 3001, as that is where the UI service endpoint is. Confirm with Done. Then click on Edit Access and make the port public:


![Share Access](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-share-access.png)


Past this point, by clicking on the link, the UI should appear in your browser and you are free to interact with the assistant and to ask him about the data that was ingested.


![AI Virtual Assistant Interface](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/ai-virtual-assistant-interface.png)